Index | Archives | Atom Feed

1.1 Billion Taxi Rides with MapD & 8 Nvidia Pascal Titan Xs

I investigate how fast MapD can query 1.1 billion taxi journeys using 8 Nvidia Pascal-based Titan X cards.


TensorFlow on a GTX 1080

I walk through setting up TensorFlow, a Deep Learning Framework, on Ubuntu 16 with an Nvidia GTX 1080 and use it to build "Deep Fizz buzz".


Building a Data Pipeline with Airflow

I walk through setting up a data pipeline for currency exchange rates using Airflow, PostgreSQL and Redis.


1.1 Billion Taxi Rides with MapD & AWS EC2

I investigate how fast MapD can query 1.1 billion taxi journeys using 4 g2.8xlarge EC2 instances.


1.1 Billion Taxi Rides with MapD & 4 Nvidia Titan Xs

I investigate how fast MapD can query 1.1 billion taxi journeys using 4 Nvidia Titan X cards.


1.1 Billion Taxi Rides with MapD & 8 Nvidia Tesla K80s

I investigate how fast MapD can query 1.1 billion taxi journeys using 8 Nvidia Telsa K80 GPU cards.


1.2 Billion Taxi Rides on AWS RDS running PostgreSQL

I investigate how fast a series of graph generated using R can be created across 4 different types of AWS RDS instances.


1.1 Billion Taxi Rides on a Large Redshift Cluster

I investigate how fast a 6-node ds2.8xlarge Redshift Cluster can query over a billion records.


All 1.1 Billion Taxi Rides on Redshift

I investigate how fast a single Redshift ds2.xlarge instance can query over a billion records.


All 1.1 Billion Taxi Rides in Elasticsearch

I look at ways of fitting every column of the 1.1 billion taxi rides into Elasticsearch on a single, 850 GB SSD.


50-node Presto Cluster on Google Cloud's Dataproc

I investigate how fast a 50-node Dataproc cluster queries the metadata of 1.1 billion taxi trips.


Performance Impact of File Sizes on Presto Query Times

I investigate the performance impact of ORC file sizes on Presto query times using Google Cloud's Dataproc service.


Faster IPv4 WHOIS Crawling

I examine the performance and reliably increases from using Redis across a 51-node IPv4 WHOIS crawling cluster.


33x Faster Queries on Google Cloud's Dataproc

I look at speeding up Presto queries on 1.1 billion records run on a 10-node Dataproc cluster.


Mass IP Address WHOIS Collection with Django & Kafka

I investigate how fast a cluster of EC2 instances can collect WHOIS records of IPv4 addresses.


A Billion Taxi Rides: AWS S3 versus HDFS

I investigate the speed differences between S3 and HDFS when querying over a billion records using Presto on AWS EMR.


A Billion Taxi Rides on Google's Dataproc running Presto

I investigate how fast a small Dataproc cluster can query over a billion records using Presto.


50-node Presto Cluster on Amazon EMR

I investigate how fast a 50-node AWS EMR cluster can query over a billion records using Presto.


A Billion Taxi Rides on Google's BigQuery

I investigate how fast BigQuery can query the metadata of 1.1 billion NYC taxi journeys.


Bulk IP Address WHOIS Collection with Python and Hadoop

I investigate how fast a 40-node Hadoop cluster on AWS EMR can collect WHOIS records of IPv4 addresses.


A Billion Taxi Rides in PostgreSQL

I look at query speeds on 1.1 billion records on a single PostgreSQL installation running on an SSD.


A Billion Taxi Rides in Elasticsearch

I investigate how fast a single instance of Elasticsearch can query over a billion records.


A Billion Taxi Rides on Amazon EMR running Spark

I investigate how fast a small AWS EMR cluster can query over a billion records using Spark.


A Billion Taxi Rides on Amazon EMR running Presto

I investigate how fast a small AWS EMR cluster can query over a billion records using Presto.


Kafka Producer Latency with Large Topic Counts

I look at the relationship between topic counts and producer latency with Kafka.


A Billion Taxi Rides in Hive & Presto

Import the metadata of over a billion Yellow and Green Taxi and Uber rides in New York City into ORC-formatted, columnar-based files on HDFS and query them using Hive & Presto.


A Billion Taxi Rides in Redshift

Import the metadata of over a billion Yellow and Green Taxi and Uber rides in New York City into a columnar-based Data Warehouse.


Presto, Parquet & Airpal

Using Airpal to execute queries on Parquet-fomatted data via Presto.


A Million Songs on AWS Redshift

Parallel imports of CSV data from AWS S3 into Redshift.


Hadoop Up and Running

I explore three ways to get Hadoop installed and running.


Faster Testing with RAM Drives

Reduce the I/O overhead of running tests in Django.


Popular Airline Passenger Routes

Scraping 29K Wikipedia pages to find the most popular commercial airline passenger routes.


Recommendation Engine built using Spark and Python

An end-to-end guide to building a film recommendation engine.


Tightening Django Admin Logins

A strategy for blocking dictionary attacks and restricting access to a white list of IP addresses.


Linting UK Postcodes

Parsing and linting UK postcodes is ripe with edge cases.


Passwords in Django

A review of Django auth's password storage format and password storage upgrading capabilities.


Faster Python

Six tips for speeding up Python code.


Crushing, caching and CDN deployment in Django

A strategy for crushing, caching and deploying front-end-optimised Django sites.


Better Python Package Management

Python's most popular package management tool is pip. I explore some tools to increase it's functionality.


Load balancing Django

Setup a load-balanced, two-node Django cluster with a minimal Ansible footprint.


Faster Django Testing

Run Django tests concurrently with pytest-xdist.


Django exception archaeology

How to capture, monitor and analyse exceptions raised from a Django project.


Python's killer apps for blogging: Pelican and S3cmd

I look into the steps of creating a blog using Pelican and hosting it with low-cost CDN services from Amazon with the help of S3cmd.


Collecting all IPv4 WHOIS records in Python

An exploratory effort to see how hard it is to collect all IPv4's WHOIS records.


Former PHP developer

I stopped coding in PHP in 2011, here are the thoughts that led me to that decision.


File uploads to Amazon S3 in Django

How to upload files to Amazon S3 from a form in Django as well as (very important) how to test the upload process.


IP Address lookups using Python

A comparison of four methods used to find the country of an IP address.


Django speaking JSON

django-jsonview offers a method decorator which will cause all responses (including exceptions) to return in API-friend, JSON format.


Querying Elasticsearch from Google App Engine

GAE strips HTTP body payloads if sent via HTTP GET. Elasticsearch excepts post bodies sent via HTTP GET. Re-writing the HTTP verb fixes the communications problem.

© Copyright 2014 - 2016 Mark Litwintschik. This site's template is based off a template by Giulio Fidente.