Home | Benchmarks | Categories | Atom Feed

Posted on Thu 23 February 2023 under GIS

European Route Planning

I recently came across a transit route feed aggregator called Transitland. They list feeds from 2,500 operators in 55+ countries around the world. Among these feeds is one for FlixBus, a 12-year-old coach service provider. Below is a route map of their European destinations.

FlixBus Routes in QGIS

In this post, I'll import their feed into PostgreSQL, build visualisations of their routes and plan a bus trip from Vienna to Oslo.

Installing Prerequisites

I've installed Homebrew on my 2020 MacBook Pro and I'll use it to install Python and pgRouting. If PostgreSQL and PostGIS aren't yet installed, pgRouting will install them.

$ brew install \
    gdal \
    jq \
    pgrouting \
    virtualenv

I'll then install csvkit which will be used to import the files from the feed into their respective PostgreSQL tables.

$ virtualenv ~/.flixbus
$ source ~/.flixbus/bin/activate
$ pip install \
    csvkit \
    psycopg2-binary

The FlixBus Dataset

I'll download and unzip the link provided on FlixBus feed page.

$ wget -c http://gtfs.gis.flix.tech/gtfs_generic_eu.zip
$ unzip gtfs_generic_eu.zip

The ZIP file is 40 MB in size and contains the following comma-separated files.

$ (echo "file,bytes,lines,columns";
   ls -S *.txt \
      | xargs -I= \
              bash -c "echo = ',' \`stat -f%z =\` ',' \`wc -l = | grep -o '[0-9]*'\` ',' \`head -n1 = | sed 's/,/\\n/g' | wc -l\`") \
      | csvlook
| file                |       bytes |     lines | columns |
| ------------------- | ----------- | --------- | ------- |
| shapes.txt          | 219,735,424 | 3,943,060 |       4 |
| stop_times.txt      |  12,894,031 |   136,209 |      11 |
| trips.txt           |   1,968,584 |    18,489 |      14 |
| calendar.txt        |   1,081,910 |    18,489 |      10 |
| calendar_dates.txt  |     309,736 |     8,279 |       3 |
| stops.txt           |     177,701 |     1,663 |      13 |
| transfers.txt       |     168,642 |     1,852 |       8 |
| routes.txt          |      54,476 |       810 |      11 |
| agency.txt          |         406 |         4 |       8 |

Populating PostgreSQL

I'll create a database and populate it with the FlixBus feed.

$ createdb flixbus

$ for FILENAME in *.txt; do
      csvsql --db postgresql:///flixbus \
             --insert \
             $FILENAME
  done

Data Enrichment

The geometry describing each bus route is spread across multiple records instead of as single Well-Known Binary (WKB) values for each route. In the example below, there are 2,214 records for a single route.

$ grep -c 16d20d742417722026392c1c4b7d7f3b shapes.txt
2214

I'll enable the pgRouting extension and build a new shapes table with one record per bus route that includes a single column of its geometry.

$ psql flixbus
CREATE EXTENSION pgrouting CASCADE;

CREATE TABLE shapes_geom AS
    WITH shapes_ordered AS (
        SELECT shape_id,
               shape_pt_lon,
               shape_pt_lat
        FROM shapes
        ORDER BY shape_id,
                 shape_pt_sequence
    )
    SELECT
            shape_id,
            GeomFromEWKT(
                CONCAT('SRID=4326;LINESTRING(',
                       string_agg(CONCAT(shape_pt_lon,
                                         ' ',
                                         shape_pt_lat), ','),
                       ')')
            ) geom
    FROM shapes_ordered
    GROUP BY shape_id
    ORDER BY shape_id;

I'll then add constraints to the tables so their relationship can be mapped out with DBeaver.

ALTER TABLE agency      ADD CONSTRAINT uniq_agency_id  UNIQUE (agency_id);
ALTER TABLE routes      ADD CONSTRAINT uniq_routes_id  UNIQUE (route_id);
ALTER TABLE shapes_geom ADD CONSTRAINT uniq_shape_id   UNIQUE (shape_id);
ALTER TABLE stops       ADD CONSTRAINT uniq_stops_id   UNIQUE (stop_id);
ALTER TABLE trips       ADD CONSTRAINT uniq_service_id UNIQUE (service_id);
ALTER TABLE trips       ADD CONSTRAINT uniq_trip_id    UNIQUE (trip_id);

ALTER TABLE calendar
    ADD CONSTRAINT fk_service_id
    FOREIGN KEY (service_id)
    REFERENCES trips (service_id);

ALTER TABLE calendar_dates
    ADD CONSTRAINT fk_calendar_dates_service_id
    FOREIGN KEY (service_id)
    REFERENCES trips (service_id);

ALTER TABLE routes
    ADD CONSTRAINT fk_routes_agency_id
    FOREIGN KEY (agency_id)
    REFERENCES agency (agency_id);

ALTER TABLE stop_times
    ADD CONSTRAINT fk_trip_id
    FOREIGN KEY (trip_id)
    REFERENCES trips (trip_id);

ALTER TABLE stop_times
    ADD CONSTRAINT fk_stop_times_trip_id
    FOREIGN KEY (trip_id)
    REFERENCES trips (trip_id);

ALTER TABLE stop_times
    ADD CONSTRAINT fk_stop_times_stop_id
    FOREIGN KEY (stop_id)
    REFERENCES stops (stop_id);

ALTER TABLE transfers
    ADD CONSTRAINT fk_transfers_from_stop_id
    FOREIGN KEY (from_stop_id)
    REFERENCES stops (stop_id);

ALTER TABLE transfers
    ADD CONSTRAINT fk_transfers_to_stop_id
    FOREIGN KEY (to_stop_id)
    REFERENCES stops (stop_id);

ALTER TABLE transfers
    ADD CONSTRAINT fk_transfers_from_route_id
    FOREIGN KEY (from_route_id)
    REFERENCES routes (route_id);

ALTER TABLE transfers
    ADD CONSTRAINT fk_transfers_to_route_id
    FOREIGN KEY (to_route_id)
    REFERENCES routes (route_id);

ALTER TABLE trips
    ADD CONSTRAINT fk_trips_route_id
    FOREIGN KEY (route_id)
    REFERENCES routes (route_id);

ALTER TABLE trips
    ADD CONSTRAINT fk_trips_shape_id
    FOREIGN KEY (shape_id)
    REFERENCES shapes_geom (shape_id);

The diagram below was produced with DBeaver and helps explain the relationships between each of the tables in the database.

FlixBus Table Relationships

pgRouting wants to work with integer-based primary keys but FlixBus uses string-based primary keys. Below I'll generate corresponding integer primary keys for the trips and stops tables.

DROP TABLE IF EXISTS trip_ids_1;

CREATE TABLE trip_ids_1 AS
    SELECT trip_id
    FROM trips;

DROP TABLE IF EXISTS trip_ids;

CREATE TABLE trip_ids AS
    SELECT row_number() over() AS trip_pk,
           trip_id
    FROM trip_ids_1;

DROP TABLE IF EXISTS stops_pk;

CREATE TABLE stops_pk AS
    SELECT row_number() over() AS stop_pk,
           *
    FROM stops;

I'll then build a table that lists each route's first and last stops. I meant for this to be a series of WITH statements but my time box for this post was relatively small.

DROP TABLE IF EXISTS first_stop;

CREATE TABLE first_stop AS
    SELECT trips.trip_id,
           stop_times.stop_id
    FROM trips
    LEFT JOIN stop_times ON trips.trip_id = stop_times.trip_id
    WHERE stop_times.stop_sequence = 1;

DROP TABLE IF EXISTS last_stops;

CREATE TABLE last_stops AS
    SELECT stop_times.stop_id,
           trips.trip_id,
           stop_times.stop_sequence
    FROM trips
    LEFT JOIN stop_times ON trips.trip_id = stop_times.trip_id
    ORDER BY trips.trip_id,
             stop_times.stop_sequence DESC;

DROP TABLE IF EXISTS last_stop;

CREATE TABLE last_stop AS
    SELECT stop_id,
           trip_id,
           MAX(stop_sequence)
    FROM last_stops
    GROUP BY trip_id,
             stop_id;

I'll combine the above to produce a trip_routes table which will be fed into pgRouting.

DROP TABLE IF EXISTS trip_routes;

CREATE TABLE trip_routes AS
    SELECT trip_ids.trip_pk,
           shapes_geom.geom,
           ST_Length(geom)    AS distance,
           stops_pk1.stop_pk  AS source,
           stops_pk2.stop_pk  AS target
    FROM trips
    LEFT JOIN shapes_geom        ON trips.shape_id     = shapes_geom.shape_id
    LEFT JOIN trip_ids           ON trips.trip_id      = trip_ids.trip_id
    LEFT JOIN first_stop         ON trips.trip_id      = first_stop.trip_id
    LEFT JOIN last_stop          ON trips.trip_id      = last_stop.trip_id
    LEFT JOIN stops_pk stops_pk1 ON first_stop.stop_id = stops_pk1.stop_id
    LEFT JOIN stops_pk stops_pk2 ON last_stop.stop_id  = stops_pk2.stop_id;

The following will create a vertices table with pgRouting.

DROP TABLE IF EXISTS trip_routes_vertices_pgr;

SELECT pgr_createVerticesTable(
            'trip_routes',
            'geom',
            'source',
            'target');

Visualising the Routes

I'll export the bus route geometry to a 254 MB GeoJSON file. QGIS supports GeoJSON but GeoPackage files load much faster.

$ vi geojson_export.sql
SELECT json_build_object(
            'type',
            'FeatureCollection',
            'features',
            json_agg(
                ST_AsGeoJSON(
                    shapes_geom.*)::json))
FROM shapes_geom
$ psql -t flixbus \
    < geojson_export.sql \
    | jq -S . \
    > flixbus.geojson
$ ogr2ogr -f GPKG \
          flixbus.gpkg \
          flixbus.geojson

The above GeoPackage file is 63 MB in size. Below is a rendering of its contents in QGIS on top of Bing's Virtual Earth Basemap.

FlixBus Routes in QGIS

If you use Esri's ArcGIS Pro then the following will produce a Geodatabase of the bus routes.

$ ogr2ogr -f OpenFileGDB \
          flixbus.gdb \
          flixbus.geojson
$ zip flixbus.gdb.zip flixbus.gdb/*

Finding the Shortest Route

Below is a SQL statement that will produce a list of stops between the shortest route between Vienna Central Station and the bus station in Oslo that FlixBus uses.

SELECT DISTINCT
    stops_pk.stop_name,
    shortest_route.seq                     AS stop_num,
    shortest_route.node                    AS stop_pk,
    ROUND(shortest_route.cost::numeric, 2) AS cost
FROM
    pgr_dijkstra(
        'SELECT trip_pk as id,
                source,
                target,
                distance AS cost
         FROM trip_routes
         WHERE distance IS NOT NULL',
        (SELECT stop_pk
         FROM stops_pk
         WHERE stop_name = 'Vienna Central Station'
         LIMIT 1),
        (SELECT stop_pk
         FROM stops_pk
         WHERE stop_name = 'Oslo (Busterminalen Galleriet)'
         LIMIT 1),
        FALSE
    ) AS shortest_route
LEFT JOIN trip_routes ON shortest_route.edge = trip_routes.trip_pk
LEFT JOIN stops_pk    ON shortest_route.node = stops_pk.stop_pk
ORDER BY shortest_route.seq;
                     stop_name                     | stop_num | stop_pk | cost
---------------------------------------------------+----------+---------+------
 Vienna Central Station                            |        1 |     753 | 1.09
 Vienna Erdberg (Busterminal VIB)                  |        2 |    1137 | 3.37
 Prague (ÚAN Florenc bus station)                  |        3 |    1070 | 3.72
 Berlin central bus station                        |        4 |     930 | 4.80
 Copenhagen central train station (Ingerslevsgade) |        5 |    1112 | 4.96
 Oslo (Busterminalen Galleriet)                    |        6 |    1349 | 0.00

The interesting thing about the cost field is that it doesn't need to be the distance of the journey, it could also be the price or any other number of measurable aspects. Also, the fact this is all in SQL means it's easy to add criteria like avoiding certain countries or demanding routes that allow bikes.

Thank you for taking the time to read this post. I offer both consulting and hands-on development services to clients in North America and Europe. If you'd like to discuss how my offerings can help your business please contact me via LinkedIn.

Copyright © 2014 - 2024 Mark Litwintschik. This site's template is based off a template by Giulio Fidente.