Update: MapD rebranded as OmniSci in 2018.
On August 2nd, Nvidia released a new Pascal-based Titan X graphics card. This new card brought an architectural upgrade over the previous Maxwell-based Titan X as well as upgrading the CUDA core count from 3,072 to 3,584 and increasing the base clock speed from 1 GHz to 1.417 GHz.
In my previous benchmark where I used MapD and 4 x Maxwell-based Titan Xs I saw some incredibly fast query times on what is consumer-grade hardware. In this blog post I'm going to see how much of an upgrade the new Pascal-based cards offer MapD when querying 1.1 billion taxi trips made in New York City over the course of six years.
The Hardware
For this benchmark I'll be using a CentOS 7.2.1511 machine with an Intel Xeon E5-2667 v4 CPU clocked at 3.20GHz. This system has two of these chips, each with 8 cores and 16 threads.
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
Stepping: 1
CPU MHz: 1200.375
BogoMIPS: 6406.18
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 25600K
NUMA node0 CPU(s): 0-7,16-23
NUMA node1 CPU(s): 8-15,24-31
This machine has 512 GB of RAM.
$ cat /proc/meminfo
MemTotal: 528086776 kB
...
For storage there are 2 Samsung EVO 850 1TB SSD drives setup in a RAID 1 mirror mode.
$ cat /proc/mdstat
Personalities : [raid1]
md126 : active raid1 sda2[0] sdb2[1]
1024960 blocks super 1.0 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk
md127 : active raid1 sdb3[1] sda3[0]
908492800 blocks super 1.2 [2/2] [UU]
bitmap: 0/7 pages [0KB], 65536KB chunk
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 931.5G 0 disk
├─sda1 8:1 0 64G 0 part
│ └─centos_titan-swap 253:0 0 128G 0 lvm [SWAP]
├─sda2 8:2 0 1001M 0 part
│ └─md126 9:126 0 1001M 0 raid1 /boot
└─sda3 8:3 0 866.5G 0 part
└─md127 9:127 0 866.4G 0 raid1 /
sdb 8:16 0 931.5G 0 disk
├─sdb1 8:17 0 64G 0 part
│ └─centos_titan-swap 253:0 0 128G 0 lvm [SWAP]
├─sdb2 8:18 0 1001M 0 part
│ └─md126 9:126 0 1001M 0 raid1 /boot
└─sdb3 8:19 0 866.5G 0 part
└─md127 9:127 0 866.4G 0 raid1 /
These drives expose ~930 GB of formatted capacity.
$ df -H
Filesystem Size Used Avail Use% Mounted on
/dev/md127 930G 401G 530G 44% /
...
The Maxwell-based Titan Xs would report a truncated model name when running Nvidia's system management interface:
$ nvidia-smi
...
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 0000:02:00.0 Off | N/A |
| 28% 59C P8 17W / 250W | 12199MiB / 12287MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
...
Thankfully now they've shortened the product name displayed and it now reads simply as "Titan X".
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.35 Driver Version: 367.35 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN X On | 0000:04:00.0 Off | N/A |
| 23% 25C P8 14W / 250W | 2782MiB / 12189MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 TITAN X On | 0000:05:00.0 Off | N/A |
| 23% 25C P8 14W / 250W | 2748MiB / 12189MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 TITAN X On | 0000:08:00.0 Off | N/A |
| 23% 28C P8 14W / 250W | 2748MiB / 12189MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 TITAN X On | 0000:09:00.0 Off | N/A |
| 23% 27C P8 15W / 250W | 2748MiB / 12189MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 TITAN X On | 0000:84:00.0 Off | N/A |
| 23% 29C P8 15W / 250W | 2748MiB / 12189MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 TITAN X On | 0000:85:00.0 Off | N/A |
| 23% 26C P8 15W / 250W | 2748MiB / 12189MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 TITAN X On | 0000:88:00.0 Off | N/A |
| 23% 22C P8 13W / 250W | 2748MiB / 12189MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 TITAN X On | 0000:89:00.0 Off | N/A |
| 23% 27C P8 15W / 250W | 2748MiB / 12189MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
I'll be using Nvidia's 367.35 driver which is the absolute lowest version you want to be running with these cards.
If you're planning on building one of these systems and running it in a data centre then you'll need a 5U chassis as these consumer-grade cards have their power sockets facing upwards and the extra U is needed to give the power cables enough space.
Loading 1.1 Billion Trips into MapD
To start, I'll download the 104 GB of CSV data I created in my Billion Taxi Rides in Redshift blog post. This data sits in 56 GZIP files and decompresses into around 500 GB of raw CSV data. I'll be storing this data on an NFS mount.
$ cd /theHoard/trips/
$ vi urls.txt
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xaa.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xab.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xac.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xad.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xae.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xaf.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xag.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xah.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xai.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xaj.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xak.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xal.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xam.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xan.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xao.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xap.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xaq.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xar.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xas.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xat.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xau.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xav.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xaw.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xax.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xay.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xaz.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xba.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbb.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbc.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbd.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbe.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbf.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbg.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbh.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbi.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbj.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbk.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbl.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbm.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbn.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbo.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbp.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbq.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbr.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbs.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbt.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbu.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbv.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbw.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbx.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xby.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xbz.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xca.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xcb.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xcc.csv.gz
https://s3-eu-west-1.amazonaws.com/<s3_bucket>/csv/trips_xcd.csv.gz
$ cat urls.txt | \
xargs -n 1 -P 6 \
wget
I'll then create a schema for my trips table. The fragment size for this exercise will be 85 million. Fragment size tells MapD how to spread records across each GPU.
$ vi create_trips_table.sql
CREATE TABLE trips (
trip_id INTEGER,
vendor_id VARCHAR(3) ENCODING DICT,
pickup_datetime TIMESTAMP,
dropoff_datetime TIMESTAMP,
store_and_fwd_flag VARCHAR(1) ENCODING DICT,
rate_code_id SMALLINT,
pickup_longitude DECIMAL(14,2),
pickup_latitude DECIMAL(14,2),
dropoff_longitude DECIMAL(14,2),
dropoff_latitude DECIMAL(14,2),
passenger_count SMALLINT,
trip_distance DECIMAL(14,2),
fare_amount DECIMAL(14,2),
extra DECIMAL(14,2),
mta_tax DECIMAL(14,2),
tip_amount DECIMAL(14,2),
tolls_amount DECIMAL(14,2),
ehail_fee DECIMAL(14,2),
improvement_surcharge DECIMAL(14,2),
total_amount DECIMAL(14,2),
payment_type VARCHAR(3) ENCODING DICT,
trip_type SMALLINT,
pickup VARCHAR(50) ENCODING DICT,
dropoff VARCHAR(50) ENCODING DICT,
cab_type VARCHAR(6) ENCODING DICT,
precipitation SMALLINT,
snow_depth SMALLINT,
snowfall SMALLINT,
max_temperature SMALLINT,
min_temperature SMALLINT,
average_wind_speed SMALLINT,
pickup_nyct2010_gid SMALLINT,
pickup_ctlabel VARCHAR(10) ENCODING DICT,
pickup_borocode SMALLINT,
pickup_boroname VARCHAR(13) ENCODING DICT,
pickup_ct2010 VARCHAR(6) ENCODING DICT,
pickup_boroct2010 VARCHAR(7) ENCODING DICT,
pickup_cdeligibil VARCHAR(1) ENCODING DICT,
pickup_ntacode VARCHAR(4) ENCODING DICT,
pickup_ntaname VARCHAR(56) ENCODING DICT,
pickup_puma VARCHAR(4) ENCODING DICT,
dropoff_nyct2010_gid SMALLINT,
dropoff_ctlabel VARCHAR(10) ENCODING DICT,
dropoff_borocode SMALLINT,
dropoff_boroname VARCHAR(13) ENCODING DICT,
dropoff_ct2010 VARCHAR(6) ENCODING DICT,
dropoff_boroct2010 VARCHAR(7) ENCODING DICT,
dropoff_cdeligibil VARCHAR(1) ENCODING DICT,
dropoff_ntacode VARCHAR(4) ENCODING DICT,
dropoff_ntaname VARCHAR(56) ENCODING DICT,
dropoff_puma VARCHAR(4) ENCODING DICT
) WITH (FRAGMENT_SIZE=85000000);
I'll create two environment variables with my credentials for MapD.
$ read MAPD_USERNAME
$ read MAPD_PASSWORD
$ export MAPD_USERNAME
$ export MAPD_PASSWORD
The following will create the table schema using the mapdql cli tool.
$ mapdql mapd \
-u $MAPD_USERNAME \
-p $MAPD_PASSWORD \
< create_trips_table.sql
I'll then check that the table has been created:
$ echo "\t" | mapdql mapd -u $MAPD_USERNAME -p $MAPD_PASSWORD
User mapd connected to database mapd
trips
User mapd disconnected from database mapd
MapD doesn't support loading CSV data from GZIP files at this time so I'll decompress the CSV files before loading them.
$ find *.gz | \
xargs -n 1 -P 16 \
gunzip
With the table and files in place I'll load the 500 GB of CSV data into MapD.
$ for filename in *.csv; do
echo "COPY trips
FROM '/theHoard/trips/$filename'
WITH (header='false');" | \
mapdql \
mapd \
-u $MAPD_USERNAME \
-p $MAPD_PASSWORD
done
The above completed in 43 minutes.
Benchmarking MapD
The times quoted below are the lowest query times seen during a series of runs. As with all my benchmarks, I use lowest query time as a way of indicating "top speed".
$ mapdql \
mapd \
-u $MAPD_USERNAME \
-p $MAPD_PASSWORD
\timing on
The following completed in 0.021 seconds.
SELECT cab_type,
count(*)
FROM trips
GROUP BY cab_type;
The following completed in 0.053 seconds.
SELECT passenger_count,
avg(total_amount)
FROM trips
GROUP BY passenger_count;
The following completed in 0.165 seconds.
SELECT passenger_count,
extract(year from pickup_datetime) AS pickup_year,
count(*)
FROM trips
GROUP BY passenger_count,
pickup_year;
The following completed in 0.51 seconds.
SELECT passenger_count,
extract(year from pickup_datetime) AS pickup_year,
cast(trip_distance as int) AS distance,
count(*) AS the_count
FROM trips
GROUP BY passenger_count,
pickup_year,
distance
ORDER BY pickup_year,
the_count desc;
It's incredible to see how much of an improvement Nvidia's Pascal-based Titan X offers. 21 milliseconds for the first query is the fastest I've ever queried this dataset on any platform with any piece of software. Having such a well-tuned architecture combined with better data locality via PCIe slots instead of a network have given me query times 74x faster than the fastest CPU-based systems I've ever come across.
GPUs for non-realtime, OLAP-related workloads might seem like a luxury but almost every client and prospect I've spoken with this summer in the US and here in Europe has expressed a desire to integrate Deep Learning into their data pipelines. Given the necessity for GPUs for that type of workload it makes sense to accelerate their OLAP and visualisation workloads at the same time to take maximum advantage of their hardware.
I've begun asking people in my network which pieces of software they're planning on removing from their stack in the next two years. While the answers varied, it is clear that GPU-powered applications are on the rise as those clients anticipate investing in those technologies while they divest older, legacy compute solutions.