Home | Benchmarks | Archives | Atom Feed

Posted on Mon 27 June 2016

1.1 Billion Taxi Rides with MapD & 8 Nvidia Tesla K80s

For most of 2016 I've been using a dataset of 1.1 billion taxi journeys made in New York City over the course of six years to benchmark various Big Data solutions. Though these aren't apples-for-apples comparisons, I've benchmarked BigQuery, Elasticsearch, Presto on EMR and Dataproc, PostgreSQL and Redshift using the same dataset.

So far all of the benchmarks have a common theme in that they are running on Intel CPUs. Probably the most dramatic difference between using GPUs versus a run-of-the-mill CPU is that memory read speeds should be able to hit 300 GB/s due to a much wider bus whereas with CPUs you're looking at closer to 20 GB/s. This, coupled with a number of other architectural differences has resulted in some of my benchmark queries running 55x quicker than the fastest benchmarks I've performed in the past.

A Supercomputer Up & Running

MapD have been kind enough to grant me access to a machine that I'll use to benchmark their GPU-based database software with the 1.1 billion taxi trips dataset. The machine I'll be using would have been one of the world's 20 fastest computers 10 years ago. It has 8 x Nvidia Telsa K80s, each with 2 GPUs per card. The K80 has 2.91 teraflops of double-precision performance and 8.73 teraflops single-precision performance giving me 23.28 and 69.84 teraflops of performance respectively.

The following is the output from Nvidia's system management interface showing various diagnostics of the GPU cluster.

$ nvidia-smi
+------------------------------------------------------+
| NVIDIA-SMI 352.93     Driver Version: 352.93         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 0000:06:00.0     Off |                  Off |
| N/A   48C    P0    67W / 149W |   2748MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           On   | 0000:07:00.0     Off |                  Off |
| N/A   36C    P0    78W / 149W |   2714MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           On   | 0000:0A:00.0     Off |                  Off |
| N/A   47C    P0    66W / 149W |   2714MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           On   | 0000:0B:00.0     Off |                  Off |
| N/A   35C    P0    77W / 149W |   2714MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           On   | 0000:10:00.0     Off |                  Off |
| N/A   44C    P0    64W / 149W |   2714MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           On   | 0000:11:00.0     Off |                  Off |
| N/A   33C    P0    76W / 149W |   2714MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           On   | 0000:14:00.0     Off |                  Off |
| N/A   47C    P0    64W / 149W |   2714MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           On   | 0000:15:00.0     Off |                  Off |
| N/A   36C    P0    74W / 149W |   2714MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   8  Tesla K80           On   | 0000:86:00.0     Off |                  Off |
| N/A   43C    P0    61W / 149W |   2714MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   9  Tesla K80           On   | 0000:87:00.0     Off |                  Off |
| N/A   34C    P0    77W / 149W |   2714MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  10  Tesla K80           On   | 0000:8A:00.0     Off |                  Off |
| N/A   47C    P0    66W / 149W |   2714MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  11  Tesla K80           On   | 0000:8B:00.0     Off |                  Off |
| N/A   34C    P0    78W / 149W |   2714MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  12  Tesla K80           On   | 0000:90:00.0     Off |                  Off |
| N/A   44C    P0    64W / 149W |   2714MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  13  Tesla K80           On   | 0000:91:00.0     Off |                  Off |
| N/A   36C    P0    77W / 149W |   2714MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  14  Tesla K80           On   | 0000:94:00.0     Off |                  Off |
| N/A   45C    P0    65W / 149W |   2714MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|  15  Tesla K80           On   | 0000:95:00.0     Off |                  Off |
| N/A   36C    P0    76W / 149W |   2714MiB / 12287MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      7542  C+G   /raidStorage/prod/mapd/bin/mapd_server        2666MiB |
|    0     10718    G   /usr/bin/Xorg                                   16MiB |
|    1      7542  C+G   /raidStorage/prod/mapd/bin/mapd_server        2632MiB |
|    1     10718    G   /usr/bin/Xorg                                   16MiB |
|    2      7542  C+G   /raidStorage/prod/mapd/bin/mapd_server        2632MiB |
|    2     10718    G   /usr/bin/Xorg                                   16MiB |
|    3      7542  C+G   /raidStorage/prod/mapd/bin/mapd_server        2632MiB |
|    3     10718    G   /usr/bin/Xorg                                   16MiB |
|    4      7542  C+G   /raidStorage/prod/mapd/bin/mapd_server        2632MiB |
|    4     10718    G   /usr/bin/Xorg                                   16MiB |
|    5      7542  C+G   /raidStorage/prod/mapd/bin/mapd_server        2632MiB |
|    5     10718    G   /usr/bin/Xorg                                   16MiB |
|    6      7542  C+G   /raidStorage/prod/mapd/bin/mapd_server        2632MiB |
|    6     10718    G   /usr/bin/Xorg                                   16MiB |
|    7      7542  C+G   /raidStorage/prod/mapd/bin/mapd_server        2632MiB |
|    7     10718    G   /usr/bin/Xorg                                   16MiB |
|    8      7542  C+G   /raidStorage/prod/mapd/bin/mapd_server        2632MiB |
|    8     10718    G   /usr/bin/Xorg                                   16MiB |
|    9      7542  C+G   /raidStorage/prod/mapd/bin/mapd_server        2632MiB |
|    9     10718    G   /usr/bin/Xorg                                   16MiB |
|   10      7542  C+G   /raidStorage/prod/mapd/bin/mapd_server        2632MiB |
|   10     10718    G   /usr/bin/Xorg                                   16MiB |
|   11      7542  C+G   /raidStorage/prod/mapd/bin/mapd_server        2632MiB |
|   11     10718    G   /usr/bin/Xorg                                   16MiB |
|   12      7542  C+G   /raidStorage/prod/mapd/bin/mapd_server        2632MiB |
|   12     10718    G   /usr/bin/Xorg                                   16MiB |
|   13      7542  C+G   /raidStorage/prod/mapd/bin/mapd_server        2632MiB |
|   13     10718    G   /usr/bin/Xorg                                   16MiB |
|   14      7542  C+G   /raidStorage/prod/mapd/bin/mapd_server        2632MiB |
|   14     10718    G   /usr/bin/Xorg                                   16MiB |
|   15      7542  C+G   /raidStorage/prod/mapd/bin/mapd_server        2632MiB |
|   15     10718    G   /usr/bin/Xorg                                   16MiB |
+-----------------------------------------------------------------------------+

The machine is running CentOS 7.2.1511 on a 16-core, 32-thread Intel Xeon E5-2667 v3 clocked at 3.2 GHz with 792 GB of RAM.

I'll be using a RAID array to store both the raw CSV files and MapD's internal columnar-based files it uses to represent the dataset. The RAID array is comprised of 4 x Samsung EVO 2 TB SSDs in a RAID 10 configuration with a LSI Logic / Symbios Logic MegaRAID SAS2108 RAID bus controller making for a total of 4 TB of usable storage. This configuration should see sequential read speeds of up to 500 MB/s.

Loading 1.1 Billion Trips into MapD

To start, I'll download the 104 GB of CSV data I created in my Billion Taxi Rides in Redshift blog post. This data sits in 56 GZIP files and decompresses into around 500 GB of raw CSV data.

$ cd /raidStorage/mark/
$ vi urls.txt
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xaa.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xab.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xac.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xad.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xae.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xaf.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xag.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xah.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xai.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xaj.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xak.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xal.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xam.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xan.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xao.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xap.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xaq.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xar.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xas.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xat.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xau.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xav.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xaw.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xax.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xay.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xaz.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xba.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbb.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbc.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbd.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbe.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbf.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbg.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbh.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbi.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbj.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbk.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbl.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbm.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbn.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbo.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbp.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbq.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbr.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbs.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbt.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbu.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbv.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbw.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbx.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xby.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbz.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xca.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xcb.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xcc.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xcd.csv.gz
$ cat urls.txt | \
    xargs -n 1 -P 6 \
        wget

I'll then create a schema for my trips table. The SQL looks very much like what you'd use with most RDBMS offerings. Two items which will stand out is the dictionary encoding on string columns with low cardinality and a fragment size parameter. The fragment size helps split the 1.1 billion records amongst all of the 8 K80 cards so they can work in parallel with roughly equal workloads.

$ vi create_trips_table.sql
CREATE TABLE trips (
    trip_id                 INTEGER,
    vendor_id               VARCHAR(3) ENCODING DICT,

    pickup_datetime         TIMESTAMP,

    dropoff_datetime        TIMESTAMP,
    store_and_fwd_flag      VARCHAR(1) ENCODING DICT,
    rate_code_id            SMALLINT,
    pickup_longitude        DECIMAL(14,2),
    pickup_latitude         DECIMAL(14,2),
    dropoff_longitude       DECIMAL(14,2),
    dropoff_latitude        DECIMAL(14,2),
    passenger_count         SMALLINT,
    trip_distance           DECIMAL(14,2),
    fare_amount             DECIMAL(14,2),
    extra                   DECIMAL(14,2),
    mta_tax                 DECIMAL(14,2),
    tip_amount              DECIMAL(14,2),
    tolls_amount            DECIMAL(14,2),
    ehail_fee               DECIMAL(14,2),
    improvement_surcharge   DECIMAL(14,2),
    total_amount            DECIMAL(14,2),
    payment_type            VARCHAR(3) ENCODING DICT,
    trip_type               SMALLINT,
    pickup                  VARCHAR(50) ENCODING DICT,
    dropoff                 VARCHAR(50) ENCODING DICT,

    cab_type                VARCHAR(6) ENCODING DICT,

    precipitation           SMALLINT,
    snow_depth              SMALLINT,
    snowfall                SMALLINT,
    max_temperature         SMALLINT,
    min_temperature         SMALLINT,
    average_wind_speed      SMALLINT,

    pickup_nyct2010_gid     SMALLINT,
    pickup_ctlabel          VARCHAR(10) ENCODING DICT,
    pickup_borocode         SMALLINT,
    pickup_boroname         VARCHAR(13) ENCODING DICT,
    pickup_ct2010           VARCHAR(6) ENCODING DICT,
    pickup_boroct2010       VARCHAR(7) ENCODING DICT,
    pickup_cdeligibil       VARCHAR(1) ENCODING DICT,
    pickup_ntacode          VARCHAR(4) ENCODING DICT,
    pickup_ntaname          VARCHAR(56) ENCODING DICT,
    pickup_puma             VARCHAR(4) ENCODING DICT,

    dropoff_nyct2010_gid    SMALLINT,
    dropoff_ctlabel         VARCHAR(10) ENCODING DICT,
    dropoff_borocode        SMALLINT,
    dropoff_boroname        VARCHAR(13) ENCODING DICT,
    dropoff_ct2010          VARCHAR(6)  ENCODING DICT,
    dropoff_boroct2010      VARCHAR(7)  ENCODING DICT,
    dropoff_cdeligibil      VARCHAR(1)  ENCODING DICT,
    dropoff_ntacode         VARCHAR(4)  ENCODING DICT,
    dropoff_ntaname         VARCHAR(56) ENCODING DICT,
    dropoff_puma            VARCHAR(4)  ENCODING DICT
) with (fragment_size=35000000);

I'll create environment variables with my credentials for MapD.

$ read MAPD_USERNAME
$ read MAPD_PASSWORD
$ export MAPD_USERNAME
$ export MAPD_PASSWORD

The following will create the table schema using the mapdql cli tool.

$ mapdql mapd \
    -u $MAPD_USERNAME \
    -p $MAPD_PASSWORD \
    < create_trips_table.sql

I'll then check that the table has been created:

$ echo "\t" | mapdql mapd -u $MAPD_USERNAME -p $MAPD_PASSWORD
User mapd connected to database mapd
trips
User mapd disconnected from database mapd

MapD doesn't support loading CSV data from GZIP files at this time so I'll decompress the CSV files before loading them.

$ gunzip trips_x*.csv.gz

With the table and files in place I'll load the 500 GB of CSV data into MapD.

$ for filename in *.csv; do
      echo "COPY trips
            FROM '/raidStorage/mark/$filename'
            WITH (header='false');" | \
          mapdql \
              mapd \
              -u $MAPD_USERNAME \
              -p $MAPD_PASSWORD
  done

The above completed in 51 minutes and 47 seconds.

Benchmarking MapD

I ran each of the queries below multiple times. In interactive analytics you are often running the same query more or less with changing filtering conditions and/or group by clauses. With this said I've discarded the first query time because if a column of data hasn't been loaded onto the GPUs then that'll need to happen first. This is in line with my other benchmarks.

The times quoted below are the lowest query times seen during a series of runs.

$ mapdql \
    mapd \
    -u $MAPD_USERNAME \
    -p $MAPD_PASSWORD
\timing on

The following completed in 0.027 seconds.

SELECT cab_type,
       count(*)
FROM trips
GROUP BY 1;

The following completed in 0.083 seconds.

SELECT passenger_count,
       avg(total_amount)
FROM trips
GROUP BY 1;

The following completed in 0.163 seconds.

SELECT passenger_count,
       extract(year from pickup_datetime),
       count(*)
FROM trips
GROUP BY 1,
         2;

The following completed in 0.891 seconds.

SELECT passenger_count,
       extract(year from pickup_datetime),
       cast(trip_distance as int),
       count(*)
FROM trips
GROUP BY 1,
         2,
         3
ORDER BY 2,
         4 desc;

For me personally, the future of BI reporting is GPU-based. The cards these benchmarks ran on are based on an architecture that's two generations old yet the query times are 55x faster in some cases than I've seen anywhere else - including large clustered CPU solutions.

MapD are a young company and it's early days with their offering but the future looks extremely bright both for them and for the world of BI.

Thank you for taking the time to read this post. I offer consulting, architecture and hands-on development services to clients in Europe. If you'd like to discuss how my offerings can help your business please contact me via LinkedIn.

Copyright © 2014 - 2017 Mark Litwintschik. This site's template is based off a template by Giulio Fidente.