Home | Benchmarks | Categories | Atom Feed

Posted on Fri 03 October 2025 under GIS

Canada's 14M Buildings

In April, Statistics Canada released a refreshed version of their Open Database of Buildings (ODB) dataset for Canada. This is one of Canada's most comprehensive building datasets. Below is a heatmap of its building footprints.

Open Database of Buildings

Statscan pulled buildings data from 530 datasets across 107 government sources. When I initially began to spot-check the data, even newly constructed neighbourhoods in South East Calgary were covered. Below is Overture's September release in purple and ODB's buildings in yellow.

Open Database of Buildings

In this post, I'll explore Statistics Canada's ODB dataset.

My Workstation

I'm using a 5.7 GHz AMD Ryzen 9 9950X CPU. It has 16 cores and 32 threads and 1.2 MB of L1, 16 MB of L2 and 64 MB of L3 cache. It has a liquid cooler attached and is housed in a spacious, full-sized Cooler Master HAF 700 computer case.

The system has 96 GB of DDR5 RAM clocked at 4,800 MT/s and a 5th-generation, Crucial T700 4 TB NVMe M.2 SSD which can read at speeds up to 12,400 MB/s. There is a heatsink on the SSD to help keep its temperature down. This is my system's C drive.

The system is powered by a 1,200-watt, fully modular Corsair Power Supply and is sat on an ASRock X870E Nova 90 Motherboard.

I'm running Ubuntu 24 LTS via Microsoft's Ubuntu for Windows on Windows 11 Pro. In case you're wondering why I don't run a Linux-based desktop as my primary work environment, I'm still using an Nvidia GTX 1080 GPU which has better driver support on Windows and ArcGIS Pro only supports Windows natively.

Installing Prerequisites

I'll use GDAL 3.9.3 and a few other tools to help analyse the data in this post.

$ sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable
$ sudo apt update
$ sudo apt install \
    gdal-bin \
    jq

I'll use DuckDB v1.3.0, along with its H3, JSON, Lindel, Parquet and Spatial extensions, in this post. Normally I try and use the latest release of DuckDB but v1.4.0 has an issue where it's Parquet files aren't readable by many of the tools I use at the moment.

$ cd ~
$ wget -c https://github.com/duckdb/duckdb/releases/download/v1.3.0/duckdb_cli-linux-amd64.zip
$ unzip -j duckdb_cli-linux-amd64.zip
$ chmod +x duckdb
$ ~/duckdb
INSTALL h3 FROM community;
INSTALL lindel FROM community;
INSTALL json;
INSTALL parquet;
INSTALL spatial;

I'll set up DuckDB to load every installed extension each time it launches.

$ vi ~/.duckdbrc
.timer on
.width 180
LOAD h3;
LOAD lindel;
LOAD json;
LOAD parquet;
LOAD spatial;

The maps in this post were mostly rendered with QGIS version 3.44. QGIS is a desktop application that runs on Windows, macOS and Linux. The application has grown in popularity in recent years and has ~15M application launches from users all around the world each month.

I used QGIS' Tile+ plugin to add basemaps from Google and OpenStreetMap (OSM) to the maps throughout this post.

Analysis-Ready Data

Statscan broke up the dataset into Zipped, GeoPackage (GPKG) files by province / territory with some of these broken across multiple files. Below I'll build a manifest of these URLs and download them with four concurrent threads.

$ mkdir -p ~/odb
$ cd ~/odb

$ vi urls.txt
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_NL.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_PE.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_NS.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_NB.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_QC_1.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_QC_2.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_ON_1.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_ON_2.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_ON_3.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_MB.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_SK.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_AB.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_BC.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_YT.zip
https://www150.statcan.gc.ca/pub/34-26-0001/2018001/zip/ODB_v3_NT.zip
$ cat urls.txt \
    | xargs -n1 \
            -P4 \
            -I% \
            wget -c "%"

I'll extract the GPKG files from each of the ZIPs.

$ find . \
    -name "*.zip" \
    -type f \
    -exec unzip {} "*.gpkg" \;

Below is an example record from one of the GPKG files. Note ".." is being used for NULL values.

$ echo "FROM  ST_READ('ODB_v3_AB.gpkg')
        LIMIT 1" \
    | ~/duckdb -json \
    | jq -S .
[
  {
    "address": "..",
    "csdname": "Clearwater County",
    "csduid": "4809002",
    "dataset": "Building Footprints",
    "floors": "..",
    "geom": "MULTIPOLYGON (((4615587.032158795 2032592.4950364884, 4615575.2249902915 2032585.4147511278, 4615570.227409255 2032593.7528553815, 4615582.04455081 2032600.8323487062, 4615587.032158795 2032592.4950364884)))",
    "height": "..",
    "id": "c9ffb5f954f942b3a98bad6b1932360c",
    "name": "Residence 2",
    "prov_terr": "AB",
    "source": "Government of Canada",
    "source_id": "1.0",
    "sq_ft": "..",
    "type": "..",
    "units": "..",
    "year_built": ".."
  }
]

I'll extract the projection Statscan used. This proj4 string will be used to below to re-project the data into EPSG:4326.

$ gdalsrsinfo \
    -o proj4 \
    ODB_v3_AB.gpkg
+proj=lcc +lat_0=63.390675 +lon_0=-91.8666666666667 +lat_1=49 +lat_2=77 +x_0=6200000 +y_0=3000000 +datum=NAD83 +units=m +no_defs

I'll convert the GPKG files into spatially-sorted, ZStandard-compressed Parquet format with an EPSG:4326 projection. I've also made a clear source field and added bounding boxes to each piece of geometry.

This format will load without issue in QGIS 3.44 and ArcGIS Pro 3.5. The bounding boxes will help optimise bandwidth usage when querying this data on remote servers, like AWS S3.

$ for FILENAME in *.gpkg; do
     echo $FILENAME

     BASENAME=`basename $FILENAME | cut -d. -f1`

     echo "COPY (
               WITH a AS (
                   SELECT * EXCLUDE(geom),
                          ST_FLIPCOORDINATES(
                              ST_TRANSFORM(
                                  geom,
                                  '+proj=lcc +lat_0=63.390675 +lon_0=-91.8666666666667 +lat_1=49 +lat_2=77 +x_0=6200000 +y_0=3000000 +datum=NAD83 +units=m +no_defs',
                                  'EPSG:4326')) geometry
                   FROM ST_READ('$FILENAME')
               )
               SELECT   * EXCLUDE (address,
                                   floors,
                                   height,
                                   name,
                                   sq_ft,
                                   type,
                                   units,
                                   year_built,
                                   geometry),
                        {'xmin': ST_XMIN(ST_EXTENT(geometry)),
                         'ymin': ST_YMIN(ST_EXTENT(geometry)),
                         'xmax': ST_XMAX(ST_EXTENT(geometry)),
                         'ymax': ST_YMAX(ST_EXTENT(geometry))} AS bbox,
                         ST_ASWKB(geometry) geometry,
                         CASE WHEN address = '..'    THEN NULL ELSE address    END AS address,
                         CASE WHEN floors = '..'     THEN NULL ELSE floors     END AS floors,
                         CASE WHEN height = '..'     THEN NULL ELSE height     END AS height,
                         CASE WHEN name = '..'       THEN NULL ELSE name       END AS name,
                         CASE WHEN sq_ft = '..'      THEN NULL ELSE sq_ft      END AS sq_ft,
                         CASE WHEN type = '..'       THEN NULL ELSE type       END AS type,
                         CASE WHEN units = '..'      THEN NULL ELSE units      END AS units,
                         CASE WHEN year_built = '..' THEN NULL ELSE year_built END AS year_built
               FROM     a
               ORDER BY HILBERT_ENCODE([ST_Y(ST_CENTROID(geometry)),
                                        ST_X(ST_CENTROID(geometry))]::double[2])
           ) TO '$BASENAME.parquet' (
               FORMAT            'PARQUET',
               CODEC             'ZSTD',
               COMPRESSION_LEVEL 22,
               ROW_GROUP_SIZE    15000);
           " | ~/duckdb
 done

The above turned 2.5 GB of ZIP files containing 6.2 GB of GPKG files into 1.8 GB of Parquet.

$ du -hsc *.parquet
164M    ODB_v3_AB.parquet
188M    ODB_v3_BC.parquet
90M     ODB_v3_MB.parquet
75M     ODB_v3_NB.parquet
21M     ODB_v3_NL.parquet
62M     ODB_v3_NS.parquet
1.4M    ODB_v3_NT.parquet
261M    ODB_v3_ON_1.parquet
259M    ODB_v3_ON_2.parquet
211M    ODB_v3_ON_3.parquet
11M     ODB_v3_PE.parquet
249M    ODB_v3_QC_1.parquet
223M    ODB_v3_QC_2.parquet
35M     ODB_v3_SK.parquet
1.5M    ODB_v3_YT.parquet
1.8G    total

Heatmap

Below is a heatmap of this dataset.

CREATE OR REPLACE TABLE h3_4_stats AS
    SELECT   H3_LATLNG_TO_CELL(
                bbox.ymin,
                bbox.xmin, 4) AS h3_4,
             COUNT(*) num_buildings
    FROM     READ_PARQUET('ODB_v3*.parquet')
    WHERE    bbox.xmin BETWEEN -178.5 AND 178.5
    GROUP BY 1;

COPY (
    SELECT ST_ASWKB(H3_CELL_TO_BOUNDARY_WKT(h3_4)::geometry) geometry,
           num_buildings
    FROM   h3_4_stats
) TO 'h3_4_stats.gpkg'
  WITH (FORMAT GDAL,
        DRIVER 'GPKG',
        LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES');

The following was needed for ArcGIS Pro to recognise the projection of the above hexagons properly.

$ ogr2ogr \
    -f GPKG \
    -a_srs EPSG:4326 \
    h3_4_stats.4326.gpkg \
    h3_4_stats.gpkg
Open Database of Buildings

Data Fluency

Below are the field names, data types, percentages of NULLs per column, number of unique values and minimum and maximum values for each column.

$ ~/duckdb
SELECT   column_name,
         column_type,
         null_percentage,
         approx_unique,
         min,
         max
FROM     (SUMMARIZE
          FROM READ_PARQUET('ODB_v3*.parquet'))
WHERE    column_name != 'geometry'
AND      column_name != 'bbox'
ORDER BY 1;
┌─────────────┬─────────────┬─────────────────┬───────────────┬──────────────────────────────────────────┬────────────────────────────────────────┐
│ column_name │ column_type │ null_percentage │ approx_unique │                   min                    │                  max                   │
│   varchar   │   varchar   │  decimal(9,2)   │     int64     │                 varchar                  │                varchar                 │
├─────────────┼─────────────┼─────────────────┼───────────────┼──────────────────────────────────────────┼────────────────────────────────────────┤
│ address     │ VARCHAR     │           69.19 │       4840881 │                                          │ à définir                              │
│ csdname     │ VARCHAR     │            0.00 │          3099 │ Abbotsford                               │ qathet E                               │
│ csduid      │ VARCHAR     │            0.00 │          3465 │ 1001124                                  │ 6106097                                │
│ dataset     │ VARCHAR     │            0.00 │           966 │ 2022 Voting Location Building Footprint  │ Édifices municipaux; Lieux Publics     │
│ floors      │ VARCHAR     │           97.34 │            35 │ 1.0                                      │ 9.0                                    │
│ height      │ VARCHAR     │           91.59 │        139106 │ -0.10836829                              │ 99.99                                  │
│ id          │ VARCHAR     │            0.00 │      14730146 │ 0000014f86fe39839c5f1c118e058bfb         │ ffffff6e9491f9ecb4f4444f7457e84b       │
│ name        │ VARCHAR     │           98.73 │         29024 │                                          │ �cole secondaire de Par-en-Bas         │
│ prov_terr   │ VARCHAR     │            0.00 │            11 │ AB                                       │ YT                                     │
│ source      │ VARCHAR     │            0.00 │            99 │ Cape Breton Regional Municipality (CBRM) │ Ville de Sherbrooke                    │
│ source_id   │ VARCHAR     │            0.00 │       8539298 │                                          │ {FFFFFB51-F6CE-4DF6-93AA-FBFDDF22FEEB} │
│ sq_ft       │ VARCHAR     │           99.05 │         41985 │ 10.60419022                              │ 9996.0                                 │
│ type        │ VARCHAR     │           85.22 │           989 │ 117                                      │ Église                                 │
│ units       │ VARCHAR     │           98.42 │           176 │ 1.0                                      │ 99.0                                   │
│ year_built  │ VARCHAR     │           98.39 │           235 │ 1750                                     │ c                                      │
├─────────────┴─────────────┴─────────────────┴───────────────┴──────────────────────────────────────────┴────────────────────────────────────────┤
│ 15 rows                                                                                                                               6 columns │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

The NULL percentages on the metadata fields seems very high. Having an address for each building is unique as other datasets like Overture don't link buildings and addresses to one another. It's a shame there's ~30% coverage here.

Some buildings have an address made up of several concatenated addresses. The example below looks to list every unit within a building with its full first line address.

$ ~/duckdb
SELECT   address
FROM     'ODB_v3_*.parquet'
ORDER BY LENGTH(address) DESC
LIMIT    1;
S1001-330 Phillip St; S1002-330 Phillip St; S1003-330 Phillip St; S1004-330 Phillip St; S1005-330 Phillip St; S1006-330 Phillip St; S1007-330...

Sources

Below are a breakdown of sources in this dataset.

$ ~/duckdb
SELECT   source,
         COUNT(*)
FROM     'ODB_v3_*.parquet'
GROUP BY 1
ORDER BY 2 DESC;
┌───────────────────────────────────────┬──────────────┐
│                source                 │ count_star() │
│                varchar                │    int64     │
├───────────────────────────────────────┼──────────────┤
│ Government of Canada                  │      3797173 │
│ Government of Québec                  │      2448716 │
│ Government of New Brunswick           │       572033 │
│ City of Toronto                       │       531571 │
│ City of Calgary                       │       488028 │
│ City of Ottawa                        │       385184 │
│ City of Edmonton                      │       365567 │
│ Regional Municipality of York         │       320664 │
│ Ville de Québec                       │       279793 │
│ Regional Municipality of Durham       │       263487 │
│ City of Mississauga                   │       225262 │
│ Ville de Montréal                     │       220311 │
│ City of Brampton                      │       215959 │
│ City of Hamilton                      │       200234 │
│ City of London                        │       191992 │
│ County of Simcoe                      │       191123 │
│ Niagara Region                        │       168182 │
│ Halifax Regional Municipality         │       163638 │
│ City of Vancouver                     │       153988 │
│ City of Surrey                        │       137180 │
│       ·                               │           ·  │
│       ·                               │           ·  │
│       ·                               │           ·  │
│ Regional District of Central Okanagan │         9248 │
│ City of Waterloo                      │         9017 │
│ District of Summerland                │         8899 │
│ Town of Orangeville                   │         8528 │
│ City of Welland                       │         8273 │
│ City of Oshawa                        │         7722 │
│ District of Squamish                  │         7031 │
│ Resort Municipality of Whistler       │         6500 │
│ Town of Canmore                       │         6365 │
│ Town of Truro                         │         5893 │
│ City of White Rock                    │         5347 │
│ City of Grand Forks                   │         3000 │
│ Town of Gibsons                       │         2581 │
│ Town of Banff                         │         2045 │
│ City of Pickering                     │         1416 │
│ City of Port Moody                    │         1304 │
│ Ville de Sherbrooke                   │          590 │
│ Ville de Shawinigan                   │          424 │
│ City of Markham                       │           81 │
│ City of Maple Ridge                   │           10 │
├───────────────────────────────────────┴──────────────┤
│ 107 rows (40 shown)                        2 columns │
└──────────────────────────────────────────────────────┘

Building Use

12.2M buildings don't have a classification for their usage. Oddly, Calgary seems to have good coverage of this so it might hit-or-miss based on municipality supplying the data.

SELECT   type,
         COUNT(*)
FROM     'ODB_v3_*.parquet'
GROUP BY 1
ORDER BY 2 DESC;
┌─────────────────────────────────────────┬──────────────┐
│                  type                   │ count_star() │
│                 varchar                 │    int64     │
├─────────────────────────────────────────┼──────────────┤
│ NULL                                    │     12286872 │
│ Residential                             │       616944 │
│ General                                 │       218243 │
│ Résidence                               │       142671 │
│ Single Family Dwelling                  │       134333 │
│ Garage - Annexe - Remise                │       127242 │
│ Residential Garage                      │       119061 │
│ Shed                                    │        75565 │
│ Residence                               │        75498 │
│ Résidentielle                           │        74947 │
│ Single / Semi / Duplex                  │        55742 │
│ Detached House                          │        45730 │
│ Single-Family Home                      │        42663 │
│ Detached                                │        36366 │
│ Résidentiel                             │        31142 │
│ Commercial                              │        27585 │
│ Résidentiel et commercial               │        24130 │
│ Garage                                  │        16461 │
│ General/Residential                     │        15758 │
│ Accessory                               │        15452 │
│     ·                                   │            · │
│     ·                                   │            · │
│     ·                                   │            · │
│ Arts Program Facility                   │            1 │
│ Ontario Provincial Police               │            1 │
│ Hospital//Hôpital                       │            1 │
│ Private High School                     │            1 │
│ Education / Fitness / Recreation        │            1 │
│ Auditorium / Concert Hall               │            1 │
│ Multi Recreation Facility               │            1 │
│ Performance Space - Outdoor Venue       │            1 │
│ Cafe / Bakery / Restaurant              │            1 │
│ Farmers Market                          │            1 │
│ Hanger                                  │            1 │
│ Grocery                                 │            1 │
│ Education / Museum / Historic Sites     │            1 │
│ Youth Services                          │            1 │
│ Golf Club                               │            1 │
│ ECOLE ACADIENNE                         │            1 │
│ Lighthouse property//Propriété du phare │            1 │
│ Public Secondary School                 │            1 │
│ LEASE                                   │            1 │
│ Bowling Alley                           │            1 │
├─────────────────────────────────────────┴──────────────┤
│ 968 rows (40 shown)                          2 columns │
└────────────────────────────────────────────────────────┘

Below is South Calgary.

Open Database of Buildings

Building Heights

13.2M buildings don't have any height data and there are a lot of values which seem implausible. There are other sources for building heights in Canada that, with some processing, could probably do a good job in this area.

SELECT   height,
         COUNT(*)
FROM     'ODB_v3_*.parquet'
GROUP BY 1
ORDER BY 1;
┌─────────────┬──────────────┐
│   height    │ count_star() │
│   varchar   │    int64     │
├─────────────┼──────────────┤
│ -0.10836829 │            1 │
│ -0.12246253 │            1 │
│ -0.16561381 │            1 │
│ -0.21919132 │            1 │
│ -0.377      │            2 │
│ -0.50739307 │            1 │
│ 0.00344336  │            1 │
│ 0.0106756   │            1 │
│ 0.01068541  │            1 │
│ 0.01268867  │            1 │
│ 0.01376321  │            1 │
│ 0.01511616  │            1 │
│ 0.016097    │            1 │
│ 0.02035425  │            1 │
│ 0.02102936  │            1 │
│ 0.02107398  │            1 │
│ 0.02117555  │            1 │
│ 0.02222852  │            1 │
│ 0.02618704  │            1 │
│ 0.02664106  │            1 │
│   ·         │            · │
│   ·         │            · │
│   ·         │            · │
│ 99.79       │            2 │
│ 99.8        │            6 │
│ 99.81       │            2 │
│ 99.82       │            1 │
│ 99.84       │            2 │
│ 99.85       │            1 │
│ 99.86       │            8 │
│ 99.87       │            2 │
│ 99.88       │            2 │
│ 99.9        │            9 │
│ 99.91       │            5 │
│ 99.92       │            5 │
│ 99.93       │            3 │
│ 99.94       │            4 │
│ 99.95       │            2 │
│ 99.96       │            3 │
│ 99.97       │            3 │
│ 99.98       │            5 │
│ 99.99       │            2 │
│ NULL        │     13204713 │
├─────────────┴──────────────┤
│   139513 rows (40 shown)   │
└────────────────────────────┘

Year of Construction

It would be nice to see integers in this field but I can understand parts of Canada were settled at a time when record-keeping wasn't amazing or good records might have been lost at some point.

SELECT   year_built,
         COUNT(*)
FROM     'ODB_v3_*.parquet'
GROUP BY 1
ORDER BY 1;
┌──────────────────┬──────────────┐
│    year_built    │ count_star() │
│     varchar      │    int64     │
├──────────────────┼──────────────┤
│ 1750             │            1 │
│ 1758             │            1 │
│ 1784             │            2 │
│ 1785             │            1 │
│ 1786             │            2 │
│ 1787             │            1 │
│ 1791             │            1 │
│ 1797             │            1 │
│ 1800             │           41 │
│ 1807             │            2 │
│ 1810             │            1 │
│ 1812             │            1 │
│ 1814             │            2 │
│ 1816             │            2 │
│ 1818             │            1 │
│ 1819             │            1 │
│ 1820             │            4 │
│ 1824             │            3 │
│ 1825             │            3 │
│ 1826             │            1 │
│  ·               │            · │
│  ·               │            · │
│  ·               │            · │
│ C 1880           │            2 │
│ C 1885           │            1 │
│ C 1890           │            2 │
│ C 1900           │            1 │
│ C 1910           │            4 │
│ C 1920           │            1 │
│ C 1930           │            1 │
│ C 1940           │            1 │
│ CIRCA 1870       │            1 │
│ CIRCA 1900       │            1 │
│ CIRCA 1930       │            1 │
│ Mid-19th century │            2 │
│ PRIOR 1956       │         5437 │
│ PRIOR 1962       │           31 │
│ PRIOR 1966       │            1 │
│ PRIOR 1969       │           20 │
│ PRIOR 1978       │            3 │
│ PRIOR 1989       │            6 │
│ c                │            1 │
│ NULL             │     14186018 │
├──────────────────┴──────────────┤
│ 247 rows (40 shown)   2 columns │
└─────────────────────────────────┘

Calgary's years of construction are entirely unknown.

SELECT   year_built,
         COUNT(*)
FROM     'ODB_v3_*.parquet'
WHERE    ST_X(ST_CENTROID(geometry)) BETWEEN -114.3461 AND -113.8326
AND      ST_Y(ST_CENTROID(geometry)) BETWEEN   50.8334 AND   51.2422
GROUP BY 1
ORDER BY 1;
┌────────────┬──────────────┐
│ year_built │ count_star() │
│  varchar   │    int64     │
├────────────┼──────────────┤
│ NULL       │    496793    │
└────────────┴──────────────┘

Building Floor Counts

382K buildings have a floor count.

SELECT   floors,
         COUNT(*)
FROM     'ODB_v3_*.parquet'
GROUP BY 1
ORDER BY 1;
┌─────────┬──────────────┐
│ floors  │ count_star() │
│ varchar │    int64     │
├─────────┼──────────────┤
│ 1.0     │       212594 │
│ 1.5     │          877 │
│ 10.0    │           44 │
│ 11.0    │           38 │
│ 12.0    │           47 │
│ 13.0    │           15 │
│ 14.0    │           24 │
│ 15.0    │           16 │
│ 16.0    │           13 │
│ 17.0    │           13 │
│ 18.0    │           15 │
│ 19.0    │           16 │
│ 2.0     │       161415 │
│ 20.0    │            8 │
│ 21.0    │            8 │
│ 22.0    │            5 │
│ 23.0    │            2 │
│ 24.0    │            3 │
│ 25.0    │            7 │
│ 26.0    │            3 │
│ 27.0    │            1 │
│ 28.0    │            2 │
│ 3.0     │         6402 │
│ 31.0    │            1 │
│ 32.0    │            1 │
│ 34.0    │            1 │
│ 39.0    │            1 │
│ 4.0     │          642 │
│ 45.0    │            1 │
│ 5.0     │          156 │
│ 6.0     │          240 │
│ 7.0     │           78 │
│ 8.0     │           79 │
│ 9.0     │           60 │
│ NULL    │     14034601 │
├─────────┴──────────────┤
│ 35 rows      2 columns │
└────────────────────────┘

Footprint Coverage

This dataset contains 14,417,429 buildings.

SELECT COUNT(*)
FROM   'ODB_v3_*.parquet';
14,417,429

I'll download OSM's buildings from September 23rd. This file was produced by the Layercake project.

$ wget -c https://data.openstreetmap.us/layercake/buildings.parquet

I'll download a rough outline of Canada's provinces in GeoJSON format. I'll then convert it into GPKG as GPKG files require less syntax to work with in DuckDB.

$ wget -c https://gist.github.com/Thiago4breu/6ba01976161aa0be65e0a289412dc54c/raw/8ec57d8317a2abe5bae18e5fd86f777fab649f84/canada-provinces.geojson
$ ogr2ogr \
    -f GPKG \
    canada-provinces.gpkg \
    canada-provinces.geojson

I'll turn the provinces dataset into a DuckDB table and count how many buildings OSM has that are covered by any of the provinces.

$ ~/duckdb
CREATE OR REPLACE TABLE canada AS
    FROM ST_READ('canada-provinces.gpkg');

SELECT    COUNT(*)
FROM      'buildings.parquet' b
LEFT JOIN canada c ON ST_CoveredBy(b.geometry, c.geom)
WHERE     c.name IS NOT NULL;

The above returned a count of 7,849,223 buildings.

The PSC dataset I reviewed the other week contains 13.7M buildings which is ~305K more than the TUM dataset I also reviewed a few weeks ago.

So far, ODB has the largest building count but I noticed Dease Lake, a remote community in Northern BC, is absent from this dataset. The community has been mapped out in OSM since at least February 2023.

It looks like no one has the perfect dataset. The highest coverage will come from mixing and matching these datasets.

Provincial Boundaries

I looked along Alberta's borders with BC and Saskatchewan and the provincial attributions to each of the buildings look very accurate.

The following settlement at -110 W, 50.956 N sits along the Alberta-Saskatchewan border. The buildings in Alberta are in red and the ones in Saskatchewan are yellow.

Open Database of Buildings
Thank you for taking the time to read this post. I offer both consulting and hands-on development services to clients in North America and Europe. If you'd like to discuss how my offerings can help your business please contact me via LinkedIn.

Copyright © 2014 - 2025 Mark Litwintschik. This site's template is based off a template by Giulio Fidente.