Home | Benchmarks | Categories | Atom Feed

Posted on Fri 01 August 2025 under GIS

Alberta's Pipelines

Alberta produces 85% of Canada's oil helping it rank 5th among the world's top oil exporters.

There are 173K active pipelines in Alberta, 2,884 of which ship crude oil. Below is a map of Alberta's active pipelines.

Alberta's Pipelines

For context, I shifted Alberta's borders 140 degrees East to compare its landmass to Europe's. Alberta's Northern border sits at the same latitude as Helsinki.

Alberta's Pipelines

Alberta's Energy Regulator (AER) hosts a large number of datasets describing the province's pipelines and other energy-related infrastructure.

In this post, I'll explore a few of their open datasets.

My Workstation

I'm using a 5.7 GHz AMD Ryzen 9 9950X CPU. It has 16 cores and 32 threads and 1.2 MB of L1, 16 MB of L2 and 64 MB of L3 cache. It has a liquid cooler attached and is housed in a spacious, full-sized Cooler Master HAF 700 computer case.

The system has 96 GB of DDR5 RAM clocked at 4,800 MT/s and a 5th-generation, Crucial T700 4 TB NVMe M.2 SSD which can read at speeds up to 12,400 MB/s. There is a heatsink on the SSD to help keep its temperature down. This is my system's C drive.

The system is powered by a 1,200-watt, fully modular Corsair Power Supply and is sat on an ASRock X870E Nova 90 Motherboard.

I'm running Ubuntu 24 LTS via Microsoft's Ubuntu for Windows on Windows 11 Pro. In case you're wondering why I don't run a Linux-based desktop as my primary work environment, I'm still using an Nvidia GTX 1080 GPU which has better driver support on Windows and ArcGIS Pro only supports Windows natively.

Installing Prerequisites

I'll use GDAL 3.9.3, Python 3.12.3 and a few other tools to help analyse the data in this post.

$ sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable
$ sudo apt update
$ sudo apt install \
    gdal-bin \
    jq

I'll use DuckDB, along with its H3, JSON, Lindel, Parquet and Spatial extensions, in this post.

$ cd ~
$ wget -c https://github.com/duckdb/duckdb/releases/download/v1.3.0/duckdb_cli-linux-amd64.zip
$ unzip -j duckdb_cli-linux-amd64.zip
$ chmod +x duckdb
$ ~/duckdb
INSTALL h3 FROM community;
INSTALL lindel FROM community;
INSTALL json;
INSTALL parquet;
INSTALL spatial;

I'll set up DuckDB to load every installed extension each time it launches.

$ vi ~/.duckdbrc
.timer on
.width 180
LOAD h3;
LOAD lindel;
LOAD json;
LOAD parquet;
LOAD spatial;

The maps in this post were rendered with QGIS version 3.44. QGIS is a desktop application that runs on Windows, macOS and Linux. The application has grown in popularity in recent years and has ~15M application launches from users all around the world each month.

I used QGIS' Tile+ plugin to add basemaps from Esri and CARTO to the maps throughout this post.

Analysis-Ready Data

I'll download AER's Pipelines and Facilities datasets. I'll convert their projections to EPSG:4326 and store them in spatially-sorted, ZStandard-compressed Parquet format. This format will load without issue in QGIS 3.44 and ArcGIS Pro 3.5.

$ wget https://www.aer.ca/data/pipeline/Pipelines_SHP.zip
$ unzip Pipelines_SHP.zip
$ ~/duckdb pipes.duckdb
COPY (
    SELECT   * EXCLUDE(geom),
             ST_ASWKB(
                ST_TRANSFORM(geom,
                             'EPSG:4269',
                             'EPSG:4326')) geom,
             {'xmin': ST_XMIN(ST_EXTENT(geom)),
              'ymin': ST_YMIN(ST_EXTENT(geom)),
              'xmax': ST_XMAX(ST_EXTENT(geom)),
              'ymax': ST_YMAX(ST_EXTENT(geom))} AS bbox
    FROM     ST_READ('Pipelines_SHP/Pipelines_GCS_NAD83.shx')
    ORDER BY HILBERT_ENCODE([ST_Y(ST_CENTROID(geom)),
                             ST_X(ST_CENTROID(geom))]::double[2])
) TO 'pipelines.parquet' (
    FORMAT            'PARQUET',
    CODEC             'ZSTD',
    COMPRESSION_LEVEL 22,
    ROW_GROUP_SIZE    15000);

The above produced a 323,259-record, 47 MB Parquet file. Below is an example record.

$ echo "FROM READ_PARQUET('pipelines.parquet')
        LIMIT 1" \
    | ~/duckdb -json \
    | jq -S .
[
  {
    "BA_CODE": "A5D4",
    "BIDIRE_IND": null,
    "CLASS_LOC": null,
    "COMP_NAME": "Cenovus Energy Inc.",
    "EXT_COAT": null,
    "FLD_CTR_NM": "Bonnyville",
    "FROM_FAC": "Blind End",
    "FROM_LOC": "10-07-071-02W4",
    "GEOM_SRCE": "Mapping",
    "H2S_CONT": "0",
    "H2S_R_LEVL": null,
    "H2S_R_VOLM": null,
    "HDD_IND": null,
    "INT_COAT": "Uncoated",
    "IS_NEB": "N",
    "JOINTMETHD": "Welded",
    "LAST_OCCYR": "2007",
    "LICAPPDATE": "2007-01-04",
    "LICENCE_NO": "26186",
    "LIC_LI_NO": "26186-477",
    "LINER_GRD": null,
    "LINER_TYPE": null,
    "LINE_NO": "477",
    "OBJECTID": 42504337,
    "ORG_ISSUED": "2006-01-04",
    "ORIGLIN_NO": "477",
    "ORIGPSPPID": null,
    "ORIG_LICNO": "26186",
    "OUT_DIAMET": "168.3",
    "PERMT_APPR": "2013-06-04",
    "PERMT_EXPI": "2007-01-04",
    "PIPE_ENVIR": "Creek Crossing",
    "PIPE_GRADE": "3592",
    "PIPE_MAOP": "0",
    "PIPE_TYPE": "Z245.1",
    "PIPTECHSTD": null,
    "PIP_MATERL": "Steel",
    "PLLICSEGID": "245633",
    "PL_SPEC_ID": "19",
    "SEG_LENGTH": "1.76",
    "SEG_STATUS": "Abandoned",
    "SHAPE_LEN": "1873.2395712004013",
    "STRESSLEVL": "0",
    "SUBSTANCE1": "Natural Gas",
    "SUBSTANCE2": null,
    "SUBSTANCE3": null,
    "TEMPSURFPL": null,
    "TO_FAC": "Blind End",
    "TO_LOC": "07-12-071-03W4",
    "WALL_THICK": "3.2",
    "bbox": "{'xmin': -110.32211225441641, 'ymin': 55.132266882976076, 'xmax': -110.30557263240699, 'ymax': 55.13400688790718}",
    "geom": "LINESTRING (-110.30557263240699 55.13400688790718, -110.31362641340809 55.13333017544028, -110.32211225441641 55.132266882976076)"
  }

I'll generate a heatmap of the pipelines.

$ ~/duckdb pipes.duckdb
CREATE OR REPLACE TABLE heatmap AS
    SELECT UNNEST(
               H3_POLYGON_WKT_TO_CELLS_EXPERIMENTAL_STRING(
                    ST_BUFFER(geom, 0.1),
                    'CONTAINMENT_OVERLAPPING',
                    5)) AS h3_5
    FROM   READ_PARQUET('pipelines.parquet');

COPY (
    SELECT   ST_ASWKB(h3_cell_to_boundary_wkt(h3_5)::GEOMETRY) geom,
             COUNT(*) AS num_recs
    FROM     heatmap
    GROUP BY 1
    ORDER BY 2 DESC
) TO 'pipelines.h3_5.parquet' (
    FORMAT            'PARQUET',
    CODEC             'ZSTD',
    COMPRESSION_LEVEL 22,
    ROW_GROUP_SIZE    15000);
Alberta's Pipelines

I'll do the same for the facilities dataset below.

$ https://www.aer.ca/data/codes/ST102-SHP.zip
$ unzip ST102-SHP.zip
$ ~/duckdb
COPY (
    SELECT   * EXCLUDE(geom),
             ST_ASWKB(
                ST_TRANSFORM(geom,
                             'EPSG:4269',
                             'EPSG:4326')) geom,
             {'xmin': ST_XMIN(ST_EXTENT(geom)),
              'ymin': ST_YMIN(ST_EXTENT(geom)),
              'xmax': ST_XMAX(ST_EXTENT(geom)),
              'ymax': ST_YMAX(ST_EXTENT(geom))} AS bbox
    FROM     ST_READ('ST102_Facility_GCS_NAD83.shx')
    ORDER BY HILBERT_ENCODE([ST_Y(ST_CENTROID(geom)),
                             ST_X(ST_CENTROID(geom))]::double[2])
) TO 'facilities.parquet' (
    FORMAT            'PARQUET',
    CODEC             'ZSTD',
    COMPRESSION_LEVEL 22,
    ROW_GROUP_SIZE    15000);

The above produced a 123,532-record, 4.8 MB Parquet file. Below is an example record.

$ echo "FROM READ_PARQUET('facilities.parquet')
        LIMIT 1" \
    | ~/duckdb -json \
    | jq -S .
[
  {
    "EDCT_DESCR": "Compressor Stn <0.01",
    "EDCT_TYPE": "040",
    "FAC_ID": "ABCS0030747",
    "FAC_NAME": "Encana Corporation",
    "FAC_STATUS": "Abandoned",
    "FAC_SUB_TY": "Compressor Station",
    "LICENSEE": "Cenovus Energy Inc.",
    "LIC_BA_ID": "A5D4",
    "LIC_NUMBER": "30747",
    "LIC_TYPE": "F",
    "LOC_SOURCE": "Facility Licence",
    "OPERATOR": "Cenovus Energy Inc.",
    "OP_BA_ID": "A5D4",
    "SUB_CODE": "601",
    "bbox": "{'xmin': -110.32407899999998, 'ymin': 55.13219800000007, 'xmax': -110.32407899999998, 'ymax': 55.13219800000007}",
    "geom": "POINT (-110.32407899999998 55.13219800000007)"
  }
]

Below is a rendering of the two datasets overlapping one another.

Alberta's Pipelines

This is another rendering showing pipelines connected to various batteries.

Alberta's Pipelines

Some clear patterns emerge in this dataset, like pipelines largely avoiding major population centres.

Alberta's Pipelines

The facilities heatmap closely matches the footprint of energy resources in the province.

$ ~/duckdb
COPY (
    SELECT   ST_ASWKB(h3_cell_to_boundary_wkt(
                  h3_latlng_to_cell(ST_Y(geom),
                                    ST_X(geom),
                                    6))::GEOMETRY) geom,
             COUNT(*) as num_recs
    FROM     READ_PARQUET('facilities.parquet')
    GROUP BY 1
) TO 'facilities.h3_6.parquet' (
    FORMAT            'PARQUET',
    CODEC             'ZSTD',
    COMPRESSION_LEVEL 22,
    ROW_GROUP_SIZE    15000);
Alberta's Pipelines

Pipelines

The metadata for these datasets is pretty rich and the values are well-normalised.

Below are the most common materials used to construct the pipelines.

$ ~/duckdb
SELECT   COUNT(*),
         PIP_MATERL
FROM     READ_PARQUET('pipelines.parquet')
GROUP BY 2
ORDER BY 1 DESC
LIMIT    10
┌──────────────┬──────────────────────┐
│ count_star() │      PIP_MATERL      │
│    int64     │       varchar        │
├──────────────┼──────────────────────┤
│       258867 │ Steel                │
│        36175 │ Polyethylene         │
│        13052 │ Fibreglass           │
│        12302 │ Composite            │
│         1983 │ Aluminum             │
│          364 │ Polyvinyl Chloride   │
│          194 │ Asbestos Cement      │
│          155 │ Unknown              │
│          105 │ Stainless Steel      │
│           45 │ Poly (not certified) │
├──────────────┴──────────────────────┤
│ 10 rows                   2 columns │
└─────────────────────────────────────┘

As mentioned earlier, there are 173K pipelines currently operating.

SELECT   COUNT(*),
         SEG_STATUS
FROM     READ_PARQUET('pipelines.parquet')
GROUP BY 2
ORDER BY 1 DESC
LIMIT    10
┌──────────────┬──────────────┐
│ count_star() │  SEG_STATUS  │
│    int64     │   varchar    │
├──────────────┼──────────────┤
│       173188 │ Operating    │
│        97653 │ Abandoned    │
│        48234 │ Discontinued │
│         2147 │ Removed      │
│         2037 │ Permitted    │
└──────────────┴──────────────┘

Pipelines can transport more than one substance in some cases. Below is the most common primary substances being transported.

SELECT   COUNT(*),
         SUBSTANCE1
FROM     READ_PARQUET('pipelines.parquet')
GROUP BY 2
ORDER BY 1 DESC
LIMIT    10
┌──────────────┬───────────────────┐
│ count_star() │    SUBSTANCE1     │
│    int64     │      varchar      │
├──────────────┼───────────────────┤
│       184727 │ Natural Gas       │
│        76939 │ Oil-Well Effluent │
│        18950 │ Salt Water        │
│        13547 │ Fuel Gas          │
│        11285 │ Sour Natural Gas  │
│         5498 │ Crude Oil         │
│         4842 │ Fresh Water       │
│         1819 │ HVP Products      │
│         1573 │ LVP Products      │
│         1178 │ Steam             │
├──────────────┴───────────────────┤
│ 10 rows                2 columns │
└──────────────────────────────────┘

Below are the top firm and primary substance pairs.

SELECT   COUNT(*),
         COMP_NAME,
         SUBSTANCE1
FROM     READ_PARQUET('pipelines.parquet')
GROUP BY 2, 3
ORDER BY 1 DESC
LIMIT    20;
┌──────────────┬───────────────────────────────────────┬───────────────────┐
│ count_star() │               COMP_NAME               │    SUBSTANCE1     │
│    int64     │                varchar                │      varchar      │
├──────────────┼───────────────────────────────────────┼───────────────────┤
│        38126 │ Canadian Natural Resources Limited    │ Natural Gas       │
│        11280 │ Ipc Canada Ltd.                       │ Natural Gas       │
│        11266 │ Canadian Natural Resources Limited    │ Oil-Well Effluent │
│        11121 │ Ember Resources Inc.                  │ Natural Gas       │
│        10317 │ Torxen Energy Ltd.                    │ Natural Gas       │
│         7736 │ Cenovus Energy Inc.                   │ Natural Gas       │
│         6744 │ Canlin Energy Corporation             │ Natural Gas       │
│         6028 │ Pine Cliff Energy Ltd.                │ Natural Gas       │
│         5618 │ ATCO Gas And Pipelines Ltd.           │ Natural Gas       │
│         5202 │ Lynx Energy ULC                       │ Natural Gas       │
│         4815 │ Tourmaline Oil Corp.                  │ Natural Gas       │
│         3683 │ Canadian Natural Resources Limited    │ Fuel Gas          │
│         3278 │ TAQA North Ltd.                       │ Natural Gas       │
│         3260 │ Canadian Natural Resources Limited    │ Salt Water        │
│         2961 │ Cardinal Energy Ltd.                  │ Oil-Well Effluent │
│         2896 │ Conifer Energy Inc.                   │ Oil-Well Effluent │
│         2709 │ Cenovus Energy Inc.                   │ Oil-Well Effluent │
│         2687 │ InPlay Oil Corp.                      │ Oil-Well Effluent │
│         2686 │ Sequoia Resources Corp.               │ Natural Gas       │
│         2658 │ Peyto Exploration & Development Corp. │ Natural Gas       │
├──────────────┴───────────────────────────────────────┴───────────────────┤
│ 20 rows                                                        3 columns │
└──────────────────────────────────────────────────────────────────────────┘

Below are the top 20 firms ranked by number of primary substances transported.

SELECT   COMP_NAME,
         COUNT(DISTINCT SUBSTANCE1)
FROM     READ_PARQUET('pipelines.parquet')
GROUP BY 1
ORDER BY 2 DESC
LIMIT    20;
┌───────────────────────────────────────┬────────────────────────────┐
│               COMP_NAME               │ count(DISTINCT SUBSTANCE1) │
│                varchar                │           int64            │
├───────────────────────────────────────┼────────────────────────────┤
│ Canadian Natural Resources Limited    │                         21 │
│ Keyera Energy Ltd.                    │                         17 │
│ Suncor Energy Inc.                    │                         17 │
│ Cenovus Energy Inc.                   │                         16 │
│ Strathcona Resources Ltd.             │                         14 │
│ Tourmaline Oil Corp.                  │                         14 │
│ Shell Canada Limited                  │                         14 │
│ Imperial Oil Resources Limited        │                         14 │
│ Pembina Pipeline Corporation          │                         13 │
│ ConocoPhillips Canada Resources Corp. │                         13 │
│ PGI Processing ULC                    │                         13 │
│ Conifer Energy Inc.                   │                         13 │
│ Canadian Natural Upgrading Limited    │                         12 │
│ Plains Midstream Canada ULC           │                         12 │
│ ARC Resources Ltd.                    │                         12 │
│ Canlin Energy Corporation             │                         12 │
│ Baytex Energy Ltd.                    │                         11 │
│ Secure Waste Infrastructure Corp.     │                         11 │
│ Ovintiv Canada ULC                    │                         11 │
│ Loyal Energy (canada) Operating Ltd.  │                         11 │
├───────────────────────────────────────┴────────────────────────────┤
│ 20 rows                                                  2 columns │
└────────────────────────────────────────────────────────────────────┘
Thank you for taking the time to read this post. I offer both consulting and hands-on development services to clients in North America and Europe. If you'd like to discuss how my offerings can help your business please contact me via LinkedIn.

Copyright © 2014 - 2025 Mark Litwintschik. This site's template is based off a template by Giulio Fidente.