Update: MapD rebranded as OmniSci in 2018.
I was blown away when I recently heard MapD was going to make the source code for their GPU-powered database freely available on GitHub. MapD has always dominated the top of my benchmarks recap board but up until now if you wanted to use it you'd need to buy a commercial license or run MapD's AMI on AWS. Now anyone can compile their database from source and run it on any machine with as many GPUs as they'd like or take the compiled binaries and run them on any GPU-backed AWS, Google Cloud or Azure instance.
MapD can easily run workloads two orders of magnitude quicker than many other popular analytics engines I've worked with and it comes with a web-based charting and query interface so I suspect this news is going to cause an earthquake in the data world. Now that the cost barrier has been removed more developers can explore MapD and I expect its deployment numbers to grow like never before. Anyone running an Nvidia GPU on Linux can now compile, run and analyse the source code of the most advanced GPU-driven database I've worked with to date.
This should also be a big win for Nvidia as MapD uses their CUDA platform and GPU hardware to achieve its performance. That said, it is worth noting that although MapD relies on Nvidia GPUs for its performance, the software will function and run without a GPU present. On a GPU-less machine the Nvidia driver will complain that no devices were found and MapD will fallback to CPU mode. I haven't conducted any benchmarks using CPU mode so I can't comment on what sort of performance penalty there is but nonetheless MapD seems to function well and without issue.
In this blog post I'll walk through compiling and running MapD from source. As a heads up, if you're following along and you run into any issues please do head over to the MapD community forum to try and get your questions answered.
My Hardware & OS Setup
I'm using a machine with an Intel Core i5 4670K clocked at 3.4 GHz, 8 GB of DDR3 RAM, a SanDisk SDSSDHII960G 960 GB SSD drive and an Nvidia GTX 1080 running on a fresh install of Ubuntu 16.04.2 Server LTS. I've picked this version of Ubuntu as it will be supported until April 2021.
Installing MapD's Dependencies
I'll start by enabling the source code repositories in apt's sources list.
$ sudo sed -i -- \ 's/# deb-src/deb-src/g' \ /etc/apt/sources.list
I'll then refresh apt's sources lists and install 39 packages.
$ sudo apt update $ sudo apt install \ autoconf \ autoconf-archive \ binutils-dev \ bison++ \ bisonc++ \ build-essential \ clang-3.8 \ clang-format-3.8 \ cmake \ cmake-curses-gui \ default-jdk \ default-jdk-headless \ default-jre \ default-jre-headless \ flex \ git-core \ golang \ google-perftools \ libboost-all-dev \ libcurl4-openssl-dev \ libdouble-conversion-dev \ libevent-dev \ libgdal-dev \ libgflags-dev \ libgoogle-glog-dev \ libgoogle-perftools-dev \ libiberty-dev \ libjemalloc-dev \ libldap2-dev \ liblz4-dev \ liblzma-dev \ libncurses5-dev \ libpng-dev \ libsnappy-dev \ libssl-dev \ llvm-3.8 \ llvm-3.8-dev \ maven \ zlib1g-dev
I'll then download and install version 8.0 of Nvidia's CUDA Toolkit. This toolkit installs, among other things, graphics card drivers and will replace any existing drivers currently installed.
$ curl -L -O https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-deb $ sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-deb $ sudo apt update $ sudo apt install cuda
With the new drivers in place I'll reboot the system.
$ sudo reboot
Once the system is back up Nvidia's System Management Interface should display daignostics of your driver and GPU(s) installed.
MapD uses Thrift to communicate between its clients and server. I'll install it from source as I know the 0.10.0 release of Thrift is known to work well with MapD.
$ sudo apt build-dep thrift-compiler $ curl -O http://apache.claz.org/thrift/0.10.0/thrift-0.10.0.tar.gz $ tar xvf thrift-0.10.0.tar.gz $ pushd thrift-0.10.0 $ ./configure \ --with-lua=no \ --with-python=no \ --with-php=no \ --with-ruby=no \ --prefix=/usr/local/mapd-deps $ make -j $(nproc) $ sudo make install $ popd
Folly is a library of C++11 components published by Facebook and is also used by MapD throughout its source code. Below are the steps I ran to compile and build the library from source.
$ curl -O -L https://github.com/facebook/folly/archive/v2017.04.10.00.tar.gz $ tar xvf v2017.04.10.00.tar.gz $ pushd folly-2017.04.10.00/folly $ autoreconf -ivf $ ./configure \ --prefix=/usr/local/mapd-deps $ make -j $(nproc) $ sudo make install $ popd
Bison is one of the two libraries used by MapD for generating its SQL parser. Below are the steps I ran to compile and build the library from source.
$ curl -O -L https://github.com/jarro2783/bisonpp/archive/1.21-45.tar.gz $ tar xvf 1.21-45.tar.gz $ pushd bisonpp-1.21-45 $ ./configure $ make -j $(nproc) $ sudo make install $ popd
Below I'll make sure we're using the intended version of LLVM's binaries prior to MapD's compilation.
$ for BIN in llvm-config llc clang clang++ clang-format do sudo update-alternatives \ --install \ /usr/bin/$BIN \ $BIN \ /usr/lib/llvm-3.8/bin/$BIN \ 1 done
I'll setup the executable and library path environment variables with the following script.
$ sudo vi /etc/profile.d/mapd-deps.sh
LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH LD_LIBRARY_PATH=/usr/lib/jvm/default-java/jre/lib/amd64/server:$LD_LIBRARY_PATH LD_LIBRARY_PATH=/usr/local/mapd-deps/lib:$LD_LIBRARY_PATH LD_LIBRARY_PATH=/usr/local/mapd-deps/lib64:$LD_LIBRARY_PATH PATH=/usr/local/cuda/bin:$PATH PATH=/usr/local/mapd-deps/bin:$PATH export LD_LIBRARY_PATH PATH
$ sudo chmod +x /etc/profile.d/mapd-deps.sh $ source /etc/profile.d/mapd-deps.sh
I'll clone MapD's core source code repository and checkout the 21fc39 commit. It's a good idea to stick to known good releases and/or the master branch but for the sake of these instructions working consistently I've pinned this walk-through to that specific commit.
$ git clone https://github.com/mapd/mapd-core.git $ cd mapd-core $ git checkout 21fc39
I'll create a build folder for MapD and compile the source code with debugging enabled.
$ mkdir -p ~/mapd-core/build $ cd ~/mapd-core/build $ cmake -DCMAKE_BUILD_TYPE=debug .. $ make -j $(nproc)
MapD Up & Running
With MapD's binaries compiled I'll create a data folder, initialise it and then launch both MapD's database server and its Immerse web server.
$ mkdir ~/mapd-data $ bin/initdb --data ~/mapd-data $ bin/mapd_server --data ~/mapd-data & $ bin/mapd_web_server &
Keep in mind these services are binded to all network interfaces so make sure TCP ports 9090, 9091 and 9092 are firewalled off to systems you do not want accessing them.
The Immerse web server should now be available on TCP port 9092.
$ open http://127.0.0.1:9092/
There is a link to the SQL editor at the top of the Immerse UI. In there you can execute SQL against MapD. Keep in mind only the first SQL command in the query textbox will be executed so the the following three queries will need to each be run one at a time.
CREATE TABLE testing ( pk INTEGER );
INSERT INTO testing (pk) VALUES (123);
SELECT * FROM testing LIMIT 1;
If you're keen to interact with MapD from the command line the following will launch their CLI and connect to the MapD server using the default credentials and database.
$ bin/mapdql -p HyperInteractive
To learn more about setting up databases and users have a look at MapD's concise guide.