During the summer I benchmarked BrytlytDB, a GPU-powered database. The software performed well and since then its version 1.0 has sat at the seventh position on my benchmarks recap board. This week I've had a chance to benchmark version 2.0 on two p16.8xlarge AWS EC2 instances again. Previously, there were a number of manual steps that were needed to get the software up and running but thankfully those steps have been largely removed thanks to an AWS Marketplace page where you can spin up a BrytlytDB cluster on the instance sizes and counts of your choice with a few clicks.
For those unfamiliar with BrytlytDB, it's a GPU-powered database built on top of PostgreSQL. Joins, Stored procedures, ODBC (important for Tableau users) and the PostgreSQL wire protocol all work as expected. When queries are run their execution plans are put together by PostgreSQL's regular codebase before being passed off to BrytlytDB to execute them on one or more of Nvidia's CUDA-powered GPUs. The GPUs don't need to live on the same machine either; ByrtlytDB, as will be shown in this benchmark, can work with a cluster of machines hosting their own discrete GPUs connected via a regular network.
In this benchmark I'll see how well the new version 2.0 of BrytlytDB runs on 32 Tesla K80 GPUs spread across two EC2 instances when querying 1.1 billion taxi trips. I'll be using the same dataset I've used to benchmark Amazon Athena, BigQuery, ClickHouse, Elasticsearch, EMR, kdb+/q, MapD, PostgreSQL, Redshift and Vertica. I have a single-page summary of all these benchmarks for comparison.
A GPU-Powered AWS EC2 Cluster
For this benchmark I'll be using two p2.16xlarge EC2 instances running Ubuntu 16.04.2 LTS in Amazon Web Services' us-east-1a region in Northern Virginia. Each machine has 8 Nvidia K80 cards which have 2 GPUs each, 64 virtual CPUs and 732 GB of memory. There's also 20 Gbit/s of networking capacity available to each instance.
Below are the specifications of the compute capabilities available each one of the EC2 instances.
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping: 1
CPU MHz: 1230.230
CPU max MHz: 3000.0000
CPU min MHz: 1200.0000
BogoMIPS: 4600.08
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 46080K
NUMA node0 CPU(s): 0-15,32-47
NUMA node1 CPU(s): 16-31,48-63
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq monitor est ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt ida
This is the layout of the GPUs available on one of the two EC2 instances.
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.98 Driver Version: 384.98 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:0F.0 Off | 0 |
| N/A 65C P0 57W / 149W | 11240MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 00000000:00:10.0 Off | 0 |
| N/A 54C P0 73W / 149W | 11240MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | 00000000:00:11.0 Off | 0 |
| N/A 73C P0 61W / 149W | 11240MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | 00000000:00:12.0 Off | 0 |
| N/A 55C P0 71W / 149W | 11240MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 Off | 00000000:00:13.0 Off | 0 |
| N/A 71C P0 60W / 149W | 11240MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 Off | 00000000:00:14.0 Off | 0 |
| N/A 56C P0 74W / 149W | 11240MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 Off | 00000000:00:15.0 Off | 0 |
| N/A 74C P0 61W / 149W | 11240MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 Off | 00000000:00:16.0 Off | 0 |
| N/A 57C P0 71W / 149W | 11240MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 8 Tesla K80 Off | 00000000:00:17.0 Off | 0 |
| N/A 65C P0 59W / 149W | 11240MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 9 Tesla K80 Off | 00000000:00:18.0 Off | 0 |
| N/A 53C P0 72W / 149W | 11240MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 10 Tesla K80 Off | 00000000:00:19.0 Off | 0 |
| N/A 72C P0 61W / 149W | 11240MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 11 Tesla K80 Off | 00000000:00:1A.0 Off | 0 |
| N/A 57C P0 71W / 149W | 11240MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 12 Tesla K80 Off | 00000000:00:1B.0 Off | 0 |
| N/A 70C P0 59W / 149W | 11240MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 13 Tesla K80 Off | 00000000:00:1C.0 Off | 0 |
| N/A 57C P0 74W / 149W | 11240MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 14 Tesla K80 Off | 00000000:00:1D.0 Off | 0 |
| N/A 74C P0 58W / 149W | 11240MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 15 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 57C P0 71W / 149W | 11240MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
The first machine has an IP address of 52.91.174.189 and acts as the Coordinator and host to the first 16 data nodes.
The second machine has an IP address of 34.235.138.233 and hosts the second set of 16 data nodes.
TCP port 5432 is open between the two machines for communicating via PostgreSQL's wire protocol, TCP port 7777 is open for global transaction manager communication and ports 15,432 through to 20,000 are open for data node communication.
In addition to the default 20 GB EBS volumes on each EC2 instance there are six 500 GB General Purpose SSDs attached which offer a baseline of 100 IOPS each and can burst to 3,000 IOPS if need be. These drives are setup in a RAID 0 configuration on each instance. This gives each instance around 2.2 TB of formatted storage capacity in addition to the 32 GB boot drives. The Marketplace AMI sets up the RAID formation automatically.
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 32G 0 disk
└─xvda1 202:1 0 32G 0 part /
xvdb 202:16 0 500G 0 disk
└─md0 9:0 0 3T 0 raid0 /usr/local/data
xvdc 202:32 0 500G 0 disk
└─md0 9:0 0 3T 0 raid0 /usr/local/data
xvdd 202:48 0 500G 0 disk
└─md0 9:0 0 3T 0 raid0 /usr/local/data
xvde 202:64 0 500G 0 disk
└─md0 9:0 0 3T 0 raid0 /usr/local/data
xvdf 202:80 0 500G 0 disk
└─md0 9:0 0 3T 0 raid0 /usr/local/data
xvdg 202:96 0 500G 0 disk
└─md0 9:0 0 3T 0 raid0 /usr/local/data
BrytlytDB Up & Running
The BrytlytDB software will be installed and running when the EC2 instances launch but since I'll be using the two machines as a cluster I'll register the second instance's data nodes on the first instance using the brytlyt PostgreSQL user account.
$ vi register.sql
CREATE NODE d16 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15432);
CREATE NODE d17 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15433);
CREATE NODE d18 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15434);
CREATE NODE d19 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15435);
CREATE NODE d20 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15436);
CREATE NODE d21 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15437);
CREATE NODE d22 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15438);
CREATE NODE d23 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15439);
CREATE NODE d24 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15440);
CREATE NODE d25 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15441);
CREATE NODE d26 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15442);
CREATE NODE d27 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15443);
CREATE NODE d28 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15444);
CREATE NODE d29 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15445);
CREATE NODE d30 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15446);
CREATE NODE d31 WITH (HOST='34.235.138.233', TYPE='datanode', PORT=15447);
SELECT pgxc_pool_reload();
$ psql -U brytlyt -f register.sql
Once that's run I can check that the data nodes have all registered.
SELECT * FROM pgxc_node;
node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id
----------+-----------+-----------+----------------+----------------+------------------+-------------
c0 | C | 5432 | localhost | f | f | 546080270
d0 | D | 15432 | 127.0.0.1 | f | f | -2088056588
d1 | D | 15433 | 127.0.0.1 | f | f | 342786568
d2 | D | 15434 | 127.0.0.1 | f | f | 823431008
d3 | D | 15435 | 127.0.0.1 | f | f | 907287964
d4 | D | 15436 | 127.0.0.1 | f | f | -1005442993
d5 | D | 15437 | 127.0.0.1 | f | f | -2088175618
d6 | D | 15438 | 127.0.0.1 | f | f | 653419373
d7 | D | 15439 | 127.0.0.1 | f | f | -2114583294
d8 | D | 15440 | 127.0.0.1 | f | f | -1181003975
d9 | D | 15441 | 127.0.0.1 | f | f | 1701757799
d10 | D | 15442 | 127.0.0.1 | f | f | 2938326
d11 | D | 15443 | 127.0.0.1 | f | f | -340513757
d12 | D | 15444 | 127.0.0.1 | f | f | 899853678
d13 | D | 15445 | 127.0.0.1 | f | f | -1529184863
d14 | D | 15446 | 127.0.0.1 | f | f | 297400288
d15 | D | 15447 | 127.0.0.1 | f | f | -2050394442
d16 | D | 15432 | 34.235.138.233 | f | f | -951284347
d17 | D | 15433 | 34.235.138.233 | f | f | 940304836
d18 | D | 15434 | 34.235.138.233 | f | f | 1871723707
d19 | D | 15435 | 34.235.138.233 | f | f | 1351481950
d20 | D | 15436 | 34.235.138.233 | f | f | -958755150
d21 | D | 15437 | 34.235.138.233 | f | f | 1823411759
d22 | D | 15438 | 34.235.138.233 | f | f | 1397545445
d23 | D | 15439 | 34.235.138.233 | f | f | 68184193
d24 | D | 15440 | 34.235.138.233 | f | f | -965766358
d25 | D | 15441 | 34.235.138.233 | f | f | 1469379654
d26 | D | 15442 | 34.235.138.233 | f | f | -2015650531
d27 | D | 15443 | 34.235.138.233 | f | f | 14563042
d28 | D | 15444 | 34.235.138.233 | f | f | 1604721461
d29 | D | 15445 | 34.235.138.233 | f | f | -557897158
d30 | D | 15446 | 34.235.138.233 | f | f | 1910249780
d31 | D | 15447 | 34.235.138.233 | f | f | -1701917292
Loading 1.1 Billion Trips Into BrytlytDB
I'll first define the table schema for the 1.1 billion taxi trip records.
$ psql -U brytlyt
CREATE FOREIGN TABLE trips (
trip_id INTEGER,
vendor_id VARCHAR(3),
pickup_datetime DATE,
dropoff_datetime DATE,
store_and_fwd_flag VARCHAR(1),
rate_code_id INTEGER,
pickup_longitude DOUBLE PRECISION,
pickup_latitude DOUBLE PRECISION,
dropoff_longitude DOUBLE PRECISION,
dropoff_latitude DOUBLE PRECISION,
passenger_count INTEGER,
trip_distance DOUBLE PRECISION,
fare_amount DOUBLE PRECISION,
extra DOUBLE PRECISION,
mta_tax DOUBLE PRECISION,
tip_amount DOUBLE PRECISION,
tolls_amount DOUBLE PRECISION,
ehail_fee DOUBLE PRECISION,
improvement_surcharge DOUBLE PRECISION,
total_amount DOUBLE PRECISION,
payment_type VARCHAR(10),
trip_type INTEGER,
pickup VARCHAR(50),
dropoff VARCHAR(50),
cab_type VARCHAR(6),
precipitation INTEGER,
snow_depth INTEGER,
snowfall INTEGER,
max_temperature INTEGER,
min_temperature INTEGER,
average_wind_speed INTEGER,
pickup_nyct2010_gid INTEGER,
pickup_ctlabel VARCHAR(10),
pickup_borocode INTEGER,
pickup_boroname VARCHAR(13),
pickup_ct2010 VARCHAR(6),
pickup_boroct2010 VARCHAR(7),
pickup_cdeligibil VARCHAR(1),
pickup_ntacode VARCHAR(4),
pickup_ntaname VARCHAR(56),
pickup_puma VARCHAR(4),
dropoff_nyct2010_gid INTEGER,
dropoff_ctlabel VARCHAR(10),
dropoff_borocode INTEGER,
dropoff_boroname VARCHAR(13),
dropoff_ct2010 VARCHAR(6),
dropoff_boroct2010 VARCHAR(7),
dropoff_cdeligibil VARCHAR(1),
dropoff_ntacode VARCHAR(4),
dropoff_ntaname VARCHAR(56),
dropoff_puma VARCHAR(4)
) SERVER gm_fdw_server
OPTIONS (max_size '35000000',
is_random 'false',
index '10,24')
DISTRIBUTE BY roundrobin;
Note in the above there are indices on the passenger count and cab type columns. Also, the timestamp columns are using a DATE column type. I've been told DATETIME fields will be supported in a future version of the software. This is important to take into consideration as there will be data loss as only the dates and not the times of the journeys can be queried.
Also note that I had to use the larger INTEGER field instead of a SMALLINT field I normally use for several columns due to the current level of column type support in the software.
For this benchmark I've downloaded and decompressed the 500 GB of CSV data I created in my Billion Taxi Rides in Redshift blog post onto the /usr/local/data folder on the first EC2 instance. The following command will load the data onto the two EC2 instances. It took around 50 minutes to complete.
count=0
for filename in /usr/local/data/xa*; do
psql -c "EXECUTE DIRECT ON ( d$count )
'SELECT load_data(''trips'',
''$filename'',
'','',
1000000,
''0'',
'''',
''overwrite'')'" 2>&1 &
psql -c "EXECUTE DIRECT ON ( d$(($count+16)) )
'SELECT load_data(''trips'',
''$filename'',
'','',
1000000,
''0'',
'''',
''overwrite'')'" 2>&1 &
(( count++ ))
done
Benchmarking BrytlytDB 2.0
The times quoted below are the lowest query times seen during a series of runs. As with all my benchmarks, I use the lowest query time as a way of indicating "top speed".
$ psql -U brytlyt
\timing on
The following completed in 0.009 seconds.
SELECT cab_type,
count(*)
FROM trips
GROUP BY cab_type;
The following completed in 0.027 seconds.
SELECT passenger_count,
avg(total_amount)
FROM trips
GROUP BY passenger_count;
The following completed in 0.287 seconds.
SELECT passenger_count,
extract(year from pickup_datetime) AS pickup_year,
count(*)
FROM trips
GROUP BY passenger_count,
pickup_year;
The following completed in 0.428 seconds.
SELECT passenger_count,
extract(year from pickup_datetime) AS pickup_year,
cast(trip_distance as int) AS distance,
count(*) AS the_count
FROM trips
GROUP BY passenger_count,
pickup_year,
distance
ORDER BY pickup_year,
the_count desc;
BrytlytDB is now the fastest database I've benchmarked. Query 1 can run 111 times in a second, no other database I've benchmarked has been able to perform at that rate with any hardware I've been given access to. The 84x speed up in query 1's performance compared to version 1.0 of the software is absolutely incredible.
BrytlytDB is built on top of PostgreSQL and supports all of its features. I thought that this would some how make it difficult to optimise but I was wrong. It's incredible to think that the fastest OLAP setup I've come across is effectively PostgreSQL running on a few GPUs.
I'm genuinely impressed to see the installation process more or less disappear as well. This does a lot to lower the friction to getting people using GPU-powered databases. If spot instances are used on AWS then the cost of using the database on billions of rows of data could be no more than a few dollars an hour.