Home | Benchmarks | Categories | Atom Feed

Posted on Sat 26 November 2016 under Databases

Alenka: A GPU-Driven, Open Source Database

At the beginning of October I began looking at an open source, GPU-driven database called Alenka. It's primary developer, Anton, has been working on it for about four years. Over the following eight weeks, Anton was kind enough to provide guidance on using the software as well as fixing various bugs I had uncovered during my testing.

Alenka uses the Nvidia's Thrust library's stable_sort_by_key for sorting, copy_if for filtering and copy_if and transform for grouping and, up until recently, ModernGPU's RelationalJoin for joining records.

The software runs on CentOS and Ubuntu and some users have reported getting it to run on Mac OSX but as of this writing it doesn't yet run on Windows since ModernGPU, a library Alenka relies on, has yet to be ported to the latest version of Visual Studio and Nvidia's CUDA 8.

Installing Dependencies

The following was run on a fresh Ubuntu 16.04.1 LTS installation. The machine I'm using has an Nvidia GeForce GTX 1080 graphics card which comes with 8 GB of GDDR5X memory, an Intel Core i5 4670K clocked at 3.4 GHz, 32 GB of system RAM, a 960 GB SSD and a second, 3 TB mechanical drive which is used to store the 1.1 taxi trips dataset I use in my benchmarks.

I'll first install a few dependencies to support Alenka and the GPU capabilities of my system.

$ sudo apt update
$ sudo apt install \
    freeglut3-dev \
    g++-4.9 \
    gcc-4.9 \
    libglu1-mesa-dev \
    libx11-dev \
    libxi-dev \
    libxmu-dev \
    nvidia-modprobe \
    bison \
    flex

When I started looking at Alenka in October, the 367 driver from Nvidia seem to work the best with my GTX 1080 card and Ubuntu 16.

$ sudo apt purge nvidia-*
$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt update
$ sudo apt install nvidia-367

Throughout this past eight weeks Nvidia has continued to release newer drivers but I've kept to 367 as it seems stable. When I last checked 367.57 was the latest sub-revision of the 367 driver.

With the driver and its dependencies installed I'll reboot the system.

$ sudo reboot

I've set GCC 4.9 to be the default version being used on this system.

$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.9 10
$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.9 20

$ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.9 10
$ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.9 20

Next I'll download the 64-bit version of the CUDA 8 platform distribution for Ubuntu 16.04.

$ curl -O https://developer.nvidia.com/compute/cuda/8.0/prod/local_installers/cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64-deb
$ sudo dpkg -i cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64.deb
$ sudo apt update
$ sudo apt install cuda

I'll then add the environment variables for the CUDA platform to my .bashrc file.

$ echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
$ echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
$ source ~/.bashrc

I can now run the CUDA compiler that's been installed.

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sun_Sep__4_22:14:01_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44

Compiling Alenka

I'll first clone the Alenka repository from GitHub.

$ cd ~
$ git clone https://github.com/antonmks/Alenka.git

As of this writing I'm using the d05ecd revision. I suggest to always use the latest master revision of this software as each commit brings a lot of improvements but, for the sake of reproducibility, I'm including the revision I used with while working on this blog post.

$ cd Alenka
$ git rev-parse HEAD
d05ecdf7f9d2e77b16d48b6bb7d5bc7da948789d

Alenka's Makefile includes configuration for various levels of CUDA compute compatibility. If you're using a Maxwell-series card from Nvidia compute_50, compute_52 and compute_53 should work. I'm using the GTX 1080 which is a Pascal-series card and it supports compute compatibility 6.1. Below is the modifications I've made to Alenka's Makefile.

$ vi Makefile
# GENCODE_SM30  := -gencode arch=compute_30,code=sm_30
# GENCODE_SM35  := -gencode arch=compute_35,code=sm_35
# GENCODE_SM50  := -gencode arch=compute_50,code=sm_50
GENCODE_SM61    := -gencode arch=compute_61,code=sm_61
GENCODE_FLAGS   := $(GENCODE_SM30) $(GENCODE_SM35) $(GENCODE_SM50) $(GENCODE_SM61)

Alenka relies on ModernGPU for some of its functionality so I'll clone it inside of Alenka's code repository. This way Alenka's Makefile's library path flag for ModernGPU doesn't need to be pointed elsewhere.

$ git clone https://github.com/moderngpu/moderngpu.git
$ cd moderngpu

Again, for the sake of reproducibility, I'm using the d78a5f commit of ModernGPU.

$ git rev-parse HEAD
d78a5f9495f055c8eef3199fef8950e54b631088

Building ModernGPU is as straight-forward as calling make.

$ make

I'll create a small piece of code to test that ModernGPU is working with my card properly.

$ vi hello.cu
#include <moderngpu/transform.hxx>

using namespace mgpu;

int main(int argc, char** argv) {
  // The context encapsulates things like an allocator and a stream.
  // By default it prints device info to the console.
  standard_context_t context;

  // Launch five threads to greet us.
  transform([]MGPU_DEVICE(int index) {
    printf("Hello GPU from thread %d\n", index);
  }, 5, context);

  // Synchronize on the context's stream to send the output to the console.
  context.synchronize();

  return 0;
}
$ nvcc \
      -std=c++11 \
      --expt-extended-lambda \
      -gencode arch=compute_61,code=compute_61 \
      -I ./src/ \
      -o hello \
      hello.cu
$ ./hello
GeForce GTX 1080 : 1835.000 Mhz   (Ordinal 0)
20 SMs enabled. Compute Capability sm_61
FreeMem:   6678MB   TotalMem:   8110MB   64-bit pointers.
Mem Clock: 5005.000 Mhz x 256 bits   (320.3 GB/s)
ECC Disabled


Hello GPU from thread 0
Hello GPU from thread 1
Hello GPU from thread 2
Hello GPU from thread 3
Hello GPU from thread 4

With that working I'll change directory up one level and compile Alenka.

$ cd ..
$ make -j$(nproc)

The above completed in 43 minutes.

Importing 1.1 Billion Taxi Trips

I'll be importing the 104 GB of CSV data I created in my Billion Taxi Rides in Redshift blog post. This data sits in 56 gzip files and decompresses into around 500 GB of raw CSV data. This data lives on a 3 TB mechanical drive that is mounted at /media/mark/Archive2/ on my system.

Alenka doesn't support importing data from gzip files so I will create a loop that will take each gzip file, decompress it into a file called data.csv, I'll then import that file into a table called 'trips' in Alenka and repeat for the remaining gzip files. The data will live on my 960 GB SSD drive when it's in Alenka's internal storage format.

$ mkdir -p ~/taxis && cd ~/taxis
$ vi load.sql
A  :=  LOAD 'data.csv' USING (',') AS (
    trip_id{1}:int,
    vendor_id{2}:varchar(3) NO ENCODING,

    pickup_datetime{3}:int,
    dropoff_datetime{4}:int,
    store_and_fwd_flag{5}:varchar(1) NO ENCODING,
    rate_code_id{6}:int,
    pickup_longitude{7}:decimal(14,2),
    pickup_latitude{8}:decimal(14,2),
    dropoff_longitude{9}:decimal(14,2),
    dropoff_latitude{10}:decimal(14,2),
    passenger_count{11}:int,
    trip_distance{12}:decimal(14,2),
    fare_amount{13}:decimal(14,2),
    extra{14}:decimal(14,2),
    mta_tax{15}:decimal(14,2),
    tip_amount{16}:decimal(14,2),
    tolls_amount{17}:decimal(14,2),
    ehail_fee{18}:decimal(14,2),
    improvement_surcharge{19}:decimal(14,2),
    total_amount{20}:decimal(14,2),
    payment_type{21}:varchar(3) NO ENCODING,
    trip_type{22}:int,
    pickup{23}:varchar(50) NO ENCODING,
    dropoff{24}:varchar(50) NO ENCODING,

    cab_type{25}:varchar(6) NO ENCODING,

    precipitation{26}:int,
    snow_depth{27}:int,
    snowfall{28}:int,
    max_temperature{29}:int,
    min_temperature{30}:int,
    average_wind_speed{31}:int,

    pickup_nyct2010_gid{32}:int,
    pickup_ctlabel{33}:varchar(10) NO ENCODING,
    pickup_borocode{34}:int,
    pickup_boroname{35}:varchar(13) NO ENCODING,
    pickup_ct2010{36}:varchar(6) NO ENCODING,
    pickup_boroct2010{37}:varchar(7) NO ENCODING,
    pickup_cdeligibil{38}:varchar(1) NO ENCODING,
    pickup_ntacode{39}:varchar(4) NO ENCODING,
    pickup_ntaname{40}:varchar(56) NO ENCODING,
    pickup_puma{41}:varchar(4) NO ENCODING,

    dropoff_nyct2010_gid{42}:int,
    dropoff_ctlabel{43}:varchar(10) NO ENCODING,
    dropoff_borocode{44}:int,
    dropoff_boroname{45}:varchar(13) NO ENCODING,
    dropoff_ct2010{46}:varchar(6) NO ENCODING,
    dropoff_boroct2010{47}:varchar(7) NO ENCODING,
    dropoff_cdeligibil{48}:varchar(1) NO ENCODING,
    dropoff_ntacode{49}:varchar(4) NO ENCODING,
    dropoff_ntaname{50}:varchar(56) NO ENCODING,
    dropoff_puma{51}:varchar(4) NO ENCODING
);
STORE A INTO 'trips' APPEND BINARY;
$ for filename in /media/mark/Archive2/Taxi\ Data/20M\ blocks/trips_x*.csv.gz; do
      gunzip -c "$filename" > data.csv
      ~/Alenka/alenka -l 200 load.sql
  done

During the import I could see that multiple gigabytes of memory are being used.

top - 21:16:57 up  1:20,  1 user,  load average: 3,22, 2,27, 2,01
Tasks: 220 total,   3 running, 217 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0,9 us,  2,2 sy,  0,0 ni,  7,0 id, 90,0 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu1  :  1,3 us,  1,3 sy,  0,0 ni, 54,8 id, 42,5 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu2  :  0,9 us,  2,2 sy,  0,0 ni,  0,4 id, 96,5 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu3  : 51,1 us,  6,9 sy,  0,0 ni,  0,0 id, 40,7 wa,  0,0 hi,  1,3 si,  0,0 st
KiB Mem : 32824752 total,   235308 free,  2173876 used, 30415568 buff/cache
KiB Swap: 33430524 total, 33225596 free,   204928 used. 29118716 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 7418 mark      20   0 15,968g 1,999g 992,6m R  58,9  6,4   0:56.58 alenka

And the nvidia-smi tool showed 1,533 MB of GPU memory being used by Alenka.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 0000:02:00.0      On |                  N/A |
| 27%   53C    P2    57W / 200W |   2655MiB /  8110MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0       932    G   /usr/lib/xorg/Xorg                             727MiB |
|    0      1685    G   compiz                                         390MiB |
|    0      8047    C   /home/mark/Alenka/alenka                      1533MiB |
+-----------------------------------------------------------------------------+

The import completed after 3 hours and 9 minutes.

Benchmarking Alenka

I am keen to see how fast Alenka performs with my 4 benchmark queries I've run on various big data systems this year. As of this writing there are some issues to iron out in order for the queries to execute properly. I'll describe how far Anton and myself have come with each of them.

Query 1:

A := SELECT cab_type AS type_of_cab,
            COUNT(cab_type) AS cnt
     FROM trips
     GROUP BY cab_type;

DISPLAY A USING ('|');

This query runs for a few minutes with high CPU and memory consumption. It should be finishing in seconds at the most so something is going astray. Work is being carried out to fix this issue and fingers crossed as some point in the future I'll be able to provide a benchmark time for it.

top - 07:43:49 up 11:47,  1 user,  load average: 0,95, 0,48, 0,19
Tasks: 219 total,   2 running, 217 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0,2 us,  0,2 sy,  0,0 ni, 99,6 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu1  :  0,2 us,  0,4 sy,  0,0 ni, 99,4 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu2  :  0,2 us,  0,1 sy,  0,0 ni, 99,7 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu3  :  0,1 us, 99,9 sy,  0,0 ni,  0,0 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
KiB Mem : 32824752 total,   640440 free,  2443244 used, 29741068 buff/cache
KiB Swap: 33430524 total, 32682616 free,   747908 used. 27141564 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1419 mark      20   0 16,731g 3,764g 2,639g R 100,0 12,0   2:50.50 alenka

Query 2:

A := SELECT passenger_count AS pac,
            AVG(total_amount) AS ta,
            COUNT(passenger_count) AS cnt
     FROM trips
     GROUP BY passenger_count;

DISPLAY A USING ('|');

This query does complete in 15.43 seconds. I've yet to audit the results but for the record, here they are:

|37 |14.1900    |1  |
|208 |7.0416    |1508   |
|19 |5.0000 |1  |
|137 |59.6400   |1  |
|38 |7.2900 |1  |
|158 |14.4400   |1  |
|255 |17.9890   |10 |
|249 |9.5000    |1  |
|58 |25.9350    |2  |
|223 |9.5000    |1  |
|33 |8.5950 |2  |
|25 |7.5900 |1  |
|2 |13.7109 |161755340  |
|49 |3.2457 |26 |
|70 |10.6900    |1  |
|155 |90.2300   |1  |
|113 |13.3000   |1  |
|125 |16.6000   |1  |
|0 |10.6676 |3902029    |
|34 |16.8000    |1  |
|250 |12.5666   |3  |
|163 |15.5300   |1  |
|97 |9.9000 |1  |
|177 |17.0000   |1  |
|6 |14.3061 |23796601   |
|211 |7.0000    |1  |
|254 |6.5000    |1  |
|8 |25.4042 |876    |
|7 |25.7082 |913    |
|5 |13.1016 |77761602   |
|129 |8.7857    |7  |
|165 |12.1400   |1  |
|53 |7.2900 |1  |
|134 |55.1400   |1  |
|133 |10.3000   |1  |
|66 |19.3000    |1  |
|84 |43.8400    |1  |
|3 |13.3259 |48313914   |
|225 |16.0000   |1  |
|141 |18.9400   |1  |
|69 |5.7900 |1  |
|4 |13.4157 |23325370   |
|36 |61.5400    |1  |
|61 |31.3400    |1  |
|1 |13.1786 |772743590  |
|13 |31.5000    |1  |
|213 |2.5000    |4  |
|65 |23.3600    |3  |
|17 |39.9500    |1  |
|247 |19.4400   |1  |
|47 |9.0000 |1  |
|9 |41.7145 |422    |
|10 |42.4800    |16 |
|164 |62.1400   |1  |
|160 |15.3400   |1  |
|15 |12.0500    |2  |
|193 |7.5000    |1  |

Query 3:

A := SELECT passenger_count AS pac,
            YEAR(pickup_datetime) AS pickup_year,
            COUNT(passenger_count) AS pc
     FROM trips
     GROUP BY passenger_count,
              pickup_year;

DISPLAY A USING ('|');

This query crashes Alenka after 11.77 seconds with the following complaint:

terminate called after throwing an instance of 'thrust::system::detail::bad_alloc'
  what():  std::bad_alloc: out of memory

The above is an interesting issue. During the execution Alenka allocates all non-allocated memory on the GPU before terminating. I suspect the data is being loaded in one go onto the GPU and there isn't enough memory for the columns of data being worked with. I suspect streaming in data in chunks and combining results could help this query finish properly.

Query 4:

A := SELECT passenger_count AS pac,
            YEAR(pickup_datetime) AS pickup_year,
            CAST_TO_INT(trip_distance) AS distance,
            COUNT(passenger_count) AS the_count
     FROM trips
     GROUP BY passenger_count,
              pickup_year,
              distance;

B := ORDER A BY pickup_year ASC,
                the_count desc;

DISPLAY B USING ('|');

This query crashes Alenka after 19.7 seconds with the same "out of memory" complaint from the Thrust library.

Closing Thoughts

I've read of others seeing good execution times with their datasets on Alenka so I don't think the fact I haven't yet managed to complete my 1.1 billion taxi trips benchmark should put people off from trying this software out.

When I first started looking at Alenka I wanted to dig into the architecture of a GPU-driven database. I've seen these last eight weeks as a real learning experience and now I'm hoping for a few things to become of this blog post.

The first objective is that I want to promote GPUs as a data platform.

The second objective is to encourage people to build data tools that run on GPUs as I think there is a lot of room for innovation in this space.

The third objective is to repay Anton with a shout out for all his help and hard work over the past eight weeks. He was coding up patches during all hours on the weekends and was responsive to every one of my emails I flooded his inbox with. Anton works as a freelancer in Minsk, Belarus doing various pieces of work for clients around the world. If you like the look of his work I suggest you get in contact with him to see if he can help you.

Thank you for taking the time to read this post. I offer both consulting and hands-on development services to clients in North America and Europe. If you'd like to discuss how my offerings can help your business please contact me via LinkedIn.

Copyright © 2014 - 2024 Mark Litwintschik. This site's template is based off a template by Giulio Fidente.