Category: Presto | All Categories

Convert CSVs to ORC Faster

I compare the ORC file construction times of Spark 2.4.0, Hive 2.3.4 and Presto 0.214.


1.1 Billion Taxi Rides: Spark 2.4.0 versus Presto 0.214

I investigate how fast Spark and Presto can query 1.1 Billion Taxi Journeys using a 21-node EMR cluster.


Using SQL to query Kafka, MongoDB, MySQL, PostgreSQL and Redis with Presto

A guide to connecting to five different data stores using Presto.


1.1 Billion Taxi Rides: EC2 versus EMR

I investigate how fast Spark and Presto can query 1.1 Billion Taxi Journeys using an i3.8xlarge EC2 instance with 1.7 TB of NVMe storage versus a 21-node EMR cluster.


A Billion Taxi Rides: AWS S3 versus HDFS

I investigate the speed differences between S3 and HDFS when querying over a billion records using Presto on AWS EMR.


50-node Presto Cluster on Amazon EMR

I investigate how fast a 50-node AWS EMR cluster can query over a billion records using Presto.


A Billion Taxi Rides on Amazon EMR running Presto

I investigate how fast a small AWS EMR cluster can query over a billion records using Presto.


A Billion Taxi Rides in Hive & Presto

Import the metadata of over a billion Yellow and Green Taxi and Uber rides in New York City into ORC-formatted, columnar-based files on HDFS and query them using Hive & Presto.


Presto, Parquet & Airpal

Using Airpal to execute queries on Parquet-fomatted data via Presto.

Copyright © 2014 - 2017 Mark Litwintschik. This site's template is based off a template by Giulio Fidente.