Home | Benchmarks | Categories | Atom Feed

Posted on Thu 19 December 2024 under GIS

Overture Maps' Refreshed Global Geospatial Datasets

The Overture Maps Foundation is a joint development project that publishes global, open map data.

Finding global datasets, outside of OpenStreetMap (OSM), is challenging and when the datasets need to be free, the choices are even fewer and further between.

To add to that, OSM won't accept bulk submissions produced by AI; instead, they want every change to be confirmed by a human. Finding volunteers willing to validate 30K building height measurements from LiDAR scans can be far beyond the reach of charity.

But the world of GIS isn't about using a single data provider. GIS is an ecosystem. Using several datasets from a diverse and unaffiliated set of providers is the path to innovation. To add to this, maps are outdated the moment they're published and AI being fed data from an ever-increasing number of sensors will be key to shorter refresh intervals.

Overture aims to take data from a broad range of providers and blend them into Parquet files hosted on Cloud storage and refreshed every month.

The firms partnering with Overture include Amazon, Esri, Grab, Hyundai, Meta, Microsoft, Precisely, Tripadvisor and TomTom.

These firms have committed to providing funding, engineering and data. When providers are unwilling to publish proprietary datasets, there are cases where they'll use AI to produce derivative datasets.

Google aren't a part of the Overture Foundation but their open buildings dataset helps fill gaps in the map where OSM's and Microsoft's ML-generated buildings have failed to identify building footprints.

Overture is led by Marc Prioleau, who previously spent four years as the head of business development for Mapping and Location at Meta, five years at Mapbox, including sitting on their Board of Directors and a year at Uber doing business development for their location-based services.

With this week's latest release, Overture now has over 7 TB of GIS data hosted on S3.

In this post, I'll review some of the updates they've made in the past few months.

My Workstation

I'm using a 6 GHz Intel Core i9-14900K CPU. It has 8 performance cores and 16 efficiency cores with a total of 32 threads and 32 MB of L2 cache. It has a liquid cooler attached and is housed in a spacious, full-sized, Cooler Master HAF 700 computer case. I've come across videos on YouTube where people have managed to overclock the i9-14900KF to 9.1 GHz.

The system has 96 GB of DDR5 RAM clocked at 6,000 MT/s and a 5th-generation, Crucial T700 4 TB NVMe M.2 SSD which can read at speeds up to 12,400 MB/s. There is a heatsink on the SSD to help keep its temperature down. This is my system's C drive.

The system is powered by a 1,200-watt, fully modular, Corsair Power Supply and is sat on an ASRock Z790 Pro RS Motherboard.

I'm running Ubuntu 22 LTS via Microsoft's Ubuntu for Windows on Windows 11 Pro. In case you're wondering why I don't run a Linux-based desktop as my primary work environment, I'm still using an Nvidia GTX 1080 GPU which has better driver support on Windows and I use ArcGIS Pro from time to time which only supports Windows natively.

Installing Prerequisites

I'll use a few tools to help prepare and visualise the data in this post.

$ sudo apt update
$ sudo apt install \
    awscli \
    jq \
    python3-pip \
    python3-virtualenv

I'll set up a Python Virtual Environment and install some dependencies.

$ virtualenv ~/.overture
$ source ~/.overture/bin/activate

I've been working on a Parquet debugging tool. I'll use it to examine the results of Parquet compression in this post.

$ git clone https://github.com/marklit/pqview \
    ~/pqview
$ python3 -m pip install \
          -r ~/pqview/requirements.txt

I'll use DuckDB, along with its H3, JSON, Lindel, Parquet and Spatial extensions, in this post.

$ cd ~
$ wget -c https://github.com/duckdb/duckdb/releases/download/v1.1.3/duckdb_cli-linux-amd64.zip
$ unzip -j duckdb_cli-linux-amd64.zip
$ chmod +x duckdb
$ ~/duckdb
INSTALL h3 FROM community;
INSTALL lindel FROM community;
INSTALL json;
INSTALL parquet;
INSTALL spatial;

I'll set up DuckDB to load every installed extension each time it launches.

$ vi ~/.duckdbrc
.timer on
.width 180
LOAD h3;
LOAD lindel;
LOAD json;
LOAD parquet;
LOAD spatial;

The maps in this post were rendered with QGIS version 3.40. QGIS is a desktop application that runs on Windows, macOS and Linux. The application has grown in popularity in recent years and has ~15M application launches from users all around the world each month.

I used QGIS' Tile+ plugin to add geospatial context with Esri's World Imagery Basemap. The dark, non-satellite imagery maps are mostly made up of vector data from Natural Earth and Overture.

Political Boundaries

The Database of Global Administrative Areas (GADM) aims to map the administrative areas of all countries, at all levels of sub-division. Its records can be joined to Overture's and allow you to group them by continent, country and/or other political boundaries.

I'll download and import it into DuckDB. The following ZIP file is 1.4 GB and contains a GeoPackage (GPKG) file that is 2.6 GB. There are a total of 356,508 records.

$ wget -c https://geodata.ucdavis.edu/gadm/gadm4.1/gadm_410-gpkg.zip
$ unzip gadm_410-gpkg.zip

$ ~/duckdb ~/overture.duckdb
CREATE OR REPLACE TABLE gadm AS
    FROM   ST_READ('gadm_410.gpkg');

The DuckDB table has a disk footprint of 5 GB. Below is an example record.

$ echo "FROM  gadm
        WHERE NAME_2 = 'Tallinn'
        LIMIT 1" \
    | ~/duckdb -json ~/overture.duckdb \
    | jq -S .
[
  {
    "CC_1": "",
    "CC_2": "",
    "CC_3": "",
    "CC_4": "",
    "CC_5": "",
    "CONTINENT": "Europe",
    "COUNTRY": "Estonia",
    "DISPUTEDBY": "",
    "ENGTYPE_1": "County",
    "ENGTYPE_2": "Town",
    "ENGTYPE_3": "Administrative District",
    "ENGTYPE_4": "",
    "ENGTYPE_5": "",
    "GID_0": "EST",
    "GID_1": "EST.1_1",
    "GID_2": "EST.1.20_1",
    "GID_3": "EST.1.20.1_1",
    "GID_4": "",
    "GID_5": "",
    "GOVERNEDBY": "",
    "HASC_1": "EE.HA",
    "HASC_2": "EE.HA.TA",
    "HASC_3": "",
    "ISO_1": "",
    "NAME_0": "Estonia",
    "NAME_1": "Harju",
    "NAME_2": "Tallinn",
    "NAME_3": "Haabersti",
    "NAME_4": "",
    "NAME_5": "",
    "NL_NAME_1": "",
    "NL_NAME_2": "",
    "NL_NAME_3": "",
    "REGION": "",
    "SOVEREIGN": "Estonia",
    "SUBCONT": "",
    "TYPE_1": "Maakond",
    "TYPE_2": "Linn",
    "TYPE_3": "Linnaosa",
    "TYPE_4": "",
    "TYPE_5": "",
    "UID": 50179,
    "VALIDFR_1": "Unknown",
    "VALIDFR_2": "2002",
    "VALIDFR_3": "Unknown",
    "VALIDFR_4": "",
    "VARNAME_0": "",
    "VARNAME_1": "Harjumaa|Harju maakond",
    "VARNAME_2": "",
    "VARNAME_3": "",
    "VARNAME_4": "",
    "VARREGION": "",
    "geom": "MULTIPOLYGON (((24.62020123500008 59.38849636600014, ...)))"
  }
]

Natural Earth's Country Codes

GADM doesn't have two-letter ISO codes for identifying countries that line up well with what Overture use. For joins needing these, I'll use Natural Earth's dataset instead.

$ aws s3 cp \
        --no-sign-request \
        s3://naturalearth/10m_cultural/ne_10m_admin_0_countries_deu.zip \
        ~/
$ unzip ~/ne_10m_admin_0_countries_deu.zip \
        -d ~/ne_countries

Overture's List of Files

I'll download a JSON-formatted listing of every object in Overture's S3 bucket.

$ aws --no-sign-request \
      --output json \
      s3api \
      list-objects \
      --bucket overturemaps-us-west-2 \
      --max-items=1000000 \
    | jq -c '.Contents[]' \
    > overturemaps.s3.json
$ ~/duckdb ~/overture.duckdb
CREATE OR REPLACE TABLE s3 AS
    SELECT *,
           SPLIT(Key, '/')[1]                        AS section,
           SPLIT(Key, '/')[2]                        AS release,
           REPLACE(SPLIT(Key, '/')[3], 'theme=', '') AS theme,
           REPLACE(SPLIT(Key, '/')[4], 'type=', '')  AS theme_type
    FROM   READ_JSON('overturemaps.s3.json')
    WHERE  Key NOT LIKE '%planet%'
    AND    Key NOT LIKE '%_SUCCESS%'
    AND    Key NOT LIKE '%building-extracts%'
    AND    section = 'release';

Below are the release IDs, their weekday of release and size in GB. Releases are published in the middle of each month but there is a somewhat even distribution between them landing on Tuesdays, Wednesdays or Thursdays.

Overture's December release is 1.4x larger than April's release.

SELECT   release,
         strftime(release[:10]::DATE, '%a') week_day,
         ROUND(SUM(Size) / 1024 ** 3)::INT  sum_size
FROM     s3
GROUP BY 1
ORDER BY 1;
┌────────────────────┬──────────┬──────────┐
│      release       │ week_day │ sum_size │
│      varchar       │ varchar  │  int32   │
├────────────────────┼──────────┼──────────┤
│ 2023-07-26-alpha.0 │ Wed      │      201 │
│ 2023-10-19-alpha.0 │ Thu      │      326 │
│ 2023-11-14-alpha.0 │ Tue      │      296 │
│ 2023-12-14-alpha.0 │ Thu      │      353 │
│ 2024-01-17-alpha.0 │ Wed      │      356 │
│ 2024-02-15-alpha.0 │ Thu      │      355 │
│ 2024-03-12-alpha.0 │ Tue      │      356 │
│ 2024-04-16-beta.0  │ Tue      │      315 │
│ 2024-05-16-beta.0  │ Thu      │      413 │
│ 2024-06-13-beta.0  │ Thu      │      423 │
│ 2024-06-13-beta.1  │ Thu      │      423 │
│ 2024-07-22.0       │ Mon      │      427 │
│ 2024-08-20.0       │ Tue      │      430 │
│ 2024-09-18.0       │ Wed      │      434 │
│ 2024-10-23.0       │ Wed      │      434 │
│ 2024-11-13.0       │ Wed      │      447 │
│ 2024-12-18.0       │ Wed      │      450 │
├────────────────────┴──────────┴──────────┤
│ 17 rows                        3 columns │
└──────────────────────────────────────────┘

These are the themes and theme types list for this month's release.

SELECT   DISTINCT theme,
                  theme_type
FROM     s3
WHERE    release = '2024-12-18.0'
ORDER BY 1, 2;
┌────────────────┬───────────────────┐
│     theme      │    theme_type     │
│    varchar     │      varchar      │
├────────────────┼───────────────────┤
│ addresses      │ address           │
│ base           │ bathymetry        │
│ base           │ infrastructure    │
│ base           │ land              │
│ base           │ land_cover        │
│ base           │ land_use          │
│ base           │ water             │
│ buildings      │ building          │
│ buildings      │ building_part     │
│ divisions      │ division          │
│ divisions      │ division_area     │
│ divisions      │ division_boundary │
│ places         │ place             │
│ transportation │ connector         │
│ transportation │ segment           │
├────────────────┴───────────────────┤
│ 15 rows                  2 columns │
└────────────────────────────────────┘

Both the base and buildings themes have seen net growth since last month's release.

WITH a AS (
    SELECT   theme,
             release,
             ROUND(SUM(Size) / 1024 ** 3)::INT sum_size
    FROM     s3
    GROUP BY 1, 2
)
PIVOT    a
ON       theme
USING    SUM(sum_size)
ORDER BY release;
┌────────────────────┬───────────┬────────┬────────┬───────────┬───────────┬────────┬────────────────┐
│      release       │ addresses │ admins │  base  │ buildings │ divisions │ places │ transportation │
│      varchar       │  int128   │ int128 │ int128 │  int128   │  int128   │ int128 │     int128     │
├────────────────────┼───────────┼────────┼────────┼───────────┼───────────┼────────┼────────────────┤
│ 2023-07-26-alpha.0 │           │      1 │        │       110 │           │      8 │             82 │
│ 2023-10-19-alpha.0 │           │      2 │     61 │       176 │           │      7 │             81 │
│ 2023-11-14-alpha.0 │           │      2 │     59 │       154 │           │      6 │             76 │
│ 2023-12-14-alpha.0 │           │      2 │     52 │       231 │           │      5 │             63 │
│ 2024-01-17-alpha.0 │           │      2 │     53 │       232 │           │      6 │             63 │
│ 2024-02-15-alpha.0 │           │      3 │     50 │       232 │           │      5 │             65 │
│ 2024-03-12-alpha.0 │           │      3 │     50 │       232 │           │      5 │             65 │
│ 2024-04-16-beta.0  │           │      3 │     51 │       196 │         4 │      4 │             56 │
│ 2024-05-16-beta.0  │           │      3 │    147 │       197 │         4 │      4 │             58 │
│ 2024-06-13-beta.0  │           │      3 │    141 │       212 │         4 │      5 │             58 │
│ 2024-06-13-beta.1  │           │      3 │    141 │       212 │         3 │      5 │             58 │
│ 2024-07-22.0       │         6 │        │    142 │       213 │         4 │      5 │             58 │
│ 2024-08-20.0       │         7 │        │    142 │       208 │         4 │      5 │             65 │
│ 2024-09-18.0       │         7 │        │    143 │       208 │         4 │      5 │             68 │
│ 2024-10-23.0       │        12 │        │    143 │       209 │         4 │      5 │             62 │
│ 2024-11-13.0       │        12 │        │    155 │       209 │         4 │      5 │             63 │
│ 2024-12-18.0       │        12 │        │    156 │       210 │         4 │      5 │             63 │
├────────────────────┴───────────┴────────┴────────┴───────────┴───────────┴────────┴────────────────┤
│ 17 rows                                                                                  8 columns │
└────────────────────────────────────────────────────────────────────────────────────────────────────┘

Within the base theme, the water, along with a new bathymetry theme type are helped contribute to the net growth.

WITH a AS (
    SELECT   theme_type,
             release,
             CEIL(SUM(Size) / 1024 ** 3)::INT sum_size
    FROM     s3
    WHERE    theme = 'base'
    GROUP BY 1, 2
)
PIVOT    a
ON       theme_type
USING    SUM(sum_size)
ORDER BY release;
┌────────────────────┬────────────┬────────────────┬────────┬─────────┬────────────┬──────────┬───────────┬────────┐
│      release       │ bathymetry │ infrastructure │  land  │ landUse │ land_cover │ land_use │ landuse_1 │ water  │
│      varchar       │   int128   │     int128     │ int128 │ int128  │   int128   │  int128  │  int128   │ int128 │
├────────────────────┼────────────┼────────────────┼────────┼─────────┼────────────┼──────────┼───────────┼────────┤
│ 2023-10-19-alpha.0 │            │                │     26 │         │            │          │        17 │     20 │
│ 2023-11-14-alpha.0 │            │                │     25 │      16 │            │          │           │     19 │
│ 2023-12-14-alpha.0 │            │                │     22 │      14 │            │          │           │     18 │
│ 2024-01-17-alpha.0 │            │                │     22 │      14 │            │          │           │     18 │
│ 2024-02-15-alpha.0 │            │                │     22 │      10 │            │          │           │     19 │
│ 2024-03-12-alpha.0 │            │                │     22 │         │            │       11 │           │     19 │
│ 2024-04-16-beta.0  │            │              3 │     20 │         │            │       11 │           │     19 │
│ 2024-05-16-beta.0  │            │              7 │     20 │         │         91 │       13 │           │     19 │
│ 2024-06-13-beta.0  │            │              7 │     20 │         │         84 │       13 │           │     19 │
│ 2024-06-13-beta.1  │            │              7 │     20 │         │         84 │       13 │           │     19 │
│ 2024-07-22.0       │            │              7 │     21 │         │         84 │       13 │           │     19 │
│ 2024-08-20.0       │            │              7 │     21 │         │         84 │       13 │           │     20 │
│ 2024-09-18.0       │            │              7 │     21 │         │         84 │       13 │           │     20 │
│ 2024-10-23.0       │            │              7 │     21 │         │         84 │       13 │           │     20 │
│ 2024-11-13.0       │            │              7 │     21 │         │         96 │       13 │           │     20 │
│ 2024-12-18.0       │          1 │              7 │     22 │         │         96 │       14 │           │     21 │
├────────────────────┴────────────┴────────────────┴────────┴─────────┴────────────┴──────────┴───────────┴────────┤
│ 16 rows                                                                                                9 columns │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Addresses for 34 Countries

I'll first import Natural Earth's political metadata into DuckDB. I've added corrections for France, Norway and Taiwan so they line up with Overture's two-letter identifiers.

$ ~/duckdb ~/overture.duckdb
CREATE OR REPLACE TABLE iso2_meta AS
    SELECT REPLACE(ISO_A2, 'CN-', '') AS ISO_A2,
           ADMIN,
           CONTINENT,
           SUBREGION
    FROM   ST_READ('/home/mark/ne_countries/ne_10m_admin_0_countries_deu.shx');

INSERT INTO iso2_meta
    SELECT ISO_A2_EH AS ISO_A2,
           ADMIN,
           CONTINENT,
           SUBREGION
    FROM   ST_READ('/home/mark/ne_countries/ne_10m_admin_0_countries_deu.shx')
    WHERE  ADMIN = 'France'
    OR     SOVEREIGNT ILIKE '%Norway%';

Below are the number of addresses Overture have broken down by country and release to the next nearest million.

CREATE OR REPLACE TABLE country_release AS
    SELECT   SPLIT(filename, '/')[5] AS release,
             country                 AS overture_iso2,
             COUNT(*)                AS num_addrs
    FROM     READ_PARQUET('s3://overturemaps-us-west-2/release/*/theme\=addresses/type\=address/*.parquet',
                          hive_partitioning=1,
                          filename=True)
    GROUP BY 1, 2;

CREATE OR REPLACE TABLE address_stats AS (
    WITH a AS (
        SELECT    SPLIT_PART(a.release, '-', 2) AS release,
                  b.CONTINENT                   AS continent,
                  b.ADMIN                       AS country,
                  a.overture_iso2               AS overture_iso2,
                  CEIL(ROUND(a.num_addrs / 1_000_000, 3))::INT
                                                AS num_addrs
        FROM      country_release a
        LEFT JOIN iso2_meta b ON a.overture_iso2 = b.ISO_A2
    )
    PIVOT    a
    ON       release
    USING    SUM(num_addrs)
    ORDER BY continent,
             country,
             overture_iso2);

SELECT * EXCLUDE(overture_iso2)
FROM   address_stats;
┌───────────────┬──────────────────────────┬────────┬────────┬────────┬────────┬────────┬────────┐
│   continent   │         country          │   07   │   08   │   09   │   10   │   11   │   12   │
│    varchar    │         varchar          │ int128 │ int128 │ int128 │ int128 │ int128 │ int128 │
├───────────────┼──────────────────────────┼────────┼────────┼────────┼────────┼────────┼────────┤
│ Asia          │ Japan                    │        │        │        │     20 │     20 │     20 │
│ Asia          │ Taiwan                   │        │        │        │        │      9 │      9 │
│ Europe        │ Austria                  │      3 │      3 │      3 │      3 │      3 │      3 │
│ Europe        │ Belgium                  │        │        │        │        │      7 │      7 │
│ Europe        │ Croatia                  │        │        │        │      2 │      2 │      2 │
│ Europe        │ Czechia                  │        │        │        │        │        │      3 │
│ Europe        │ Denmark                  │      4 │      4 │      4 │      4 │      4 │      4 │
│ Europe        │ Estonia                  │      3 │      3 │      3 │      3 │      3 │      3 │
│ Europe        │ Finland                  │        │        │      4 │      4 │      4 │      4 │
│ Europe        │ France                   │     27 │     27 │     27 │     27 │     27 │     27 │
│ Europe        │ Germany                  │        │        │        │      6 │     16 │     16 │
│ Europe        │ Iceland                  │        │        │        │        │      1 │      1 │
│ Europe        │ Italy                    │        │        │        │        │     10 │     10 │
│ Europe        │ Latvia                   │        │        │        │        │      1 │      1 │
│ Europe        │ Liechtenstein            │        │        │        │        │        │      1 │
│ Europe        │ Lithuania                │      2 │      2 │      2 │      2 │      2 │      2 │
│ Europe        │ Luxembourg               │      1 │      1 │      1 │      1 │      1 │      1 │
│ Europe        │ Netherlands              │     10 │     10 │     10 │     10 │     10 │     10 │
│ Europe        │ Norway                   │        │        │        │      3 │      3 │      3 │
│ Europe        │ Poland                   │        │        │        │      8 │      8 │      8 │
│ Europe        │ Portugal                 │      6 │      6 │      6 │      6 │      6 │      6 │
│ Europe        │ Slovakia                 │        │        │        │      2 │      2 │      2 │
│ Europe        │ Slovenia                 │        │        │        │        │      1 │      1 │
│ Europe        │ Spain                    │        │        │        │        │     14 │     14 │
│ Europe        │ Switzerland              │        │      4 │      4 │      4 │      4 │      4 │
│ North America │ Canada                   │     17 │     17 │     17 │     17 │     17 │     17 │
│ North America │ Mexico                   │     31 │     31 │     31 │     31 │     31 │     31 │
│ North America │ United States of America │     79 │     79 │     79 │     81 │     81 │     85 │
│ Oceania       │ Australia                │     16 │     16 │     16 │     16 │     16 │     16 │
│ Oceania       │ New Zealand              │      3 │      3 │      3 │      3 │      3 │      3 │
│ South America │ Brazil                   │        │        │        │     90 │     90 │     90 │
│ South America │ Chile                    │        │        │      5 │      5 │      5 │      5 │
│ South America │ Colombia                 │      8 │      8 │      8 │      8 │      8 │      8 │
│ South America │ Uruguay                  │        │        │        │        │        │      2 │
├───────────────┴──────────────────────────┴────────┴────────┴────────┴────────┴────────┴────────┤
│ 34 rows                                                                              8 columns │
└────────────────────────────────────────────────────────────────────────────────────────────────┘

These are the new countries added to this month's release.

SELECT continent,
       country
FROM   address_stats a
WHERE  a."11" IS NULL
AND    a."12" IS NOT NULL;
┌───────────────┬───────────────┐
│   continent   │    country    │
│    varchar    │    varchar    │
├───────────────┼───────────────┤
│ Europe        │ Czechia       │
│ Europe        │ Liechtenstein │
│ South America │ Uruguay       │
└───────────────┴───────────────┘

US Address Coverage

Much of the US is now represented in the US addresses dataset. There are gaps in the map but this dataset has been growing in the past few months. Hopefully it'll be filled in in a future release.

$ ~/duckdb ~/overture.duckdb
COPY (
    SELECT   h3_cell_to_boundary_wkt(
                  h3_latlng_to_cell(bbox.ymax,
                                    bbox.xmax,
                                    5))::GEOMETRY geom,
             COUNT(*) num_addresses
    FROM     READ_PARQUET('s3://overturemaps-us-west-2/release/2024-12-18.0/theme=addresses/type=address/*.parquet')
    WHERE    country = 'US'
    GROUP BY 1
) TO 'addresses.2024.12.US.gpkg'
    WITH (FORMAT GDAL,
          DRIVER 'GPKG',
          LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES');
Overture US Addresses

Federating Data

The one thing I'd like to see in future releases is a table that maps addresses and buildings to points of interest (POIs). This is likely a complex problem but would go a long way to both making this dataset more valuable and validating the POIs contained within Overture.

The New Bathymetry Dataset

The bathymetry theme type is made up of a single, 38 MB Parquet file containing 60,060 records.

Below is an example record. Note the sort key in the cartography dictionary. This will be helpful in rendering this dataset.

$ echo "SELECT * EXCLUDE (bbox,
                          cartography,
                          sources),
               cartography::JSON cartography,
               bbox::JSON        bbox,
               sources::JSON     sources
        FROM   READ_PARQUET('s3://overturemaps-us-west-2/release/2024-12-18.0/theme=base/type=bathymetry/*.parquet')
        LIMIT  1" \
      | ~/duckdb -json \
      | jq -S .
[
  {
    "bbox": {
      "xmax": 180.00869750976562,
      "xmin": 180.0012664794922,
      "ymax": 60.62028503417969,
      "ymin": 60.58367919921875
    },
    "cartography": {
      "max_zoom": null,
      "min_zoom": null,
      "sort_key": 12
    },
    "depth": 500,
    "geometry": "MULTIPOLYGON (((180.005302251 60.620278721, 180.003020948 60.617479661, 180.001292917 60.603416146, 180.001846188 60.59215169, 180.004680763 60.583686294, 180.006962066 60.586485354, 180.008690097 60.600548869, 180.008136825 60.611813325, 180.005302251 60.620278721)))",
    "id": "08b1749b29305fff0006d78df1434687",
    "sources": [
      {
        "confidence": null,
        "dataset": "ETOPO/GLOBathy",
        "property": "",
        "record_id": null,
        "update_time": "2024-12-11T00:00:00.000Z"
      }
    ],
    "theme": "base",
    "type": "bathymetry",
    "version": 0
  }
]

If you import the Parquet file into QGIS, under the blending mode settings for its layer, there will be a checkbox for "Control feature rendering order". In the settings, sort the "cartography.sort_key" field in descending order.

Below is a rendering of this dataset colour-coded by depth.

Overture Bathymetry

These are all sourced from ETOPO/GLOBathy.

$ ~/duckdb
SELECT   JSON_EXTRACT_STRING(sources::JSON->0, 'dataset') source,
         COUNT(*) num_records
FROM     READ_PARQUET('s3://overturemaps-us-west-2/release/2024-12-18.0/theme=base/type=bathymetry/*.parquet')
GROUP BY 1;
┌────────────────┬─────────────┐
│     source     │ num_records │
│    varchar     │    int64    │
├────────────────┼─────────────┤
│ ETOPO/GLOBathy │       60060 │
└────────────────┴─────────────┘

The depths only go to 3000 meters and are somewhat course in their rounding.

SELECT   depth,
         COUNT(*) num_records
FROM     READ_PARQUET('s3://overturemaps-us-west-2/release/2024-12-18.0/theme=base/type=bathymetry/*.parquet')
GROUP BY 1;
┌───────┬─────────────┐
│ depth │ num_records │
│ int32 │    int64    │
├───────┼─────────────┤
│     0 │        4285 │
│    10 │        4228 │
│    50 │        4622 │
│   100 │        3878 │
│   500 │        2748 │
│  1000 │        3308 │
│  1500 │        4722 │
│  2000 │        6750 │
│  2500 │       10309 │
│  3000 │       15210 │
├───────┴─────────────┤
│ 10 rows   2 columns │
└─────────────────────┘

The shorelines have been smoothed out. This is the Baltic Sea around Estonia's coasts overlaid on Esri's World Imagery.

Overture Bathymetry

I wrote a blog post about vessel monitoring in the Baltic Sea earlier this year that used HELCOM's Baltic Sea Bathymetry Database. Overture's Bathymetry dataset is great for a convenient, global dataset in that's already in Parquet format but if you need more detailed and refined data it pays to look around.

Below is HELCOM's raster-based Bathymetry dataset with Overture's overlaid off the coast of Tallinn.

HELCOM vs Overture Bathymetry

Overture's Buildings

The buildings dataset can see a lot of changes between each release. Below is the net change in buildings by country between November and this month's release.

$ ~/duckdb ~/overture.duckdb
CREATE OR REPLACE TABLE building_footprints AS
    SELECT   SPLIT(filename, '/')[5]                AS release,
             SPLIT(SPLIT(filename, '/')[8], '-')[2] AS pq_num,
             ROUND(bbox.xmin, 1)                    AS lon,
             ROUND(bbox.ymin, 1)                    AS lat,
             COUNT(*)                               AS num_buildings
    FROM     READ_PARQUET('s3://overturemaps-us-west-2/release/2024-1*/theme=buildings/type=building/*.parquet',
                            hive_partitioning=1,
                            filename=True)
    GROUP BY 1, 2, 3, 4;

CREATE OR REPLACE TABLE buildings_country_gadm AS
    SELECT   b.CONTINENT          AS continent,
             b.COUNTRY            AS country,
             a.release            AS release,
             a.pq_num             AS pq_num,
             SUM(a.num_buildings) AS num_buildings
    FROM     building_footprints a
    JOIN     gadm b ON ST_CONTAINS(b.geom,
                                   ST_POINT(a.lon, a.lat))
    GROUP BY 1, 2, 3, 4;

SELECT   country,
         SUM(num_buildings) FILTER (release='2024-12-18.0')::int128 -
         SUM(num_buildings) FILTER (release='2024-11-13.0')::int128 delta
FROM     buildings_country_gadm
GROUP BY 1
HAVING   delta != 0
ORDER BY 2 DESC;
┌──────────────────────────────────┬─────────┐
│             country              │  delta  │
│             varchar              │ int128  │
├──────────────────────────────────┼─────────┤
│ Sudan                            │ 1136940 │
│ Ethiopia                         │  714763 │
│ South Sudan                      │  528923 │
│ United States                    │  395941 │
│ Libya                            │  298453 │
│ Japan                            │  225632 │
│ Saudi Arabia                     │  179462 │
│ Chad                             │  157358 │
│ United Kingdom                   │  150885 │
│ India                            │  146677 │
│ China                            │  125931 │
│ Russia                           │  125544 │
│ Germany                          │  106389 │
│ Australia                        │   96214 │
│ Egypt                            │   91198 │
│ Italy                            │   84860 │
│ Spain                            │   81822 │
│ United Arab Emirates             │   78415 │
│ Democratic Republic of the Congo │   72817 │
│ Nigeria                          │   64946 │
│    ·                             │      ·  │
│    ·                             │      ·  │
│    ·                             │      ·  │
│ São Tomé and Príncipe            │     -11 │
│ Brunei                           │     -14 │
│ Martinique                       │     -28 │
│ Guadeloupe                       │     -33 │
│ Côte d'Ivoire                    │     -87 │
│ Trinidad and Tobago              │    -109 │
│ Angola                           │    -170 │
│ Namibia                          │    -198 │
│ Suriname                         │    -319 │
│ Réunion                          │    -501 │
│ Eritrea                          │    -547 │
│ French Guiana                    │    -838 │
│ Burkina Faso                     │    -911 │
│ Puerto Rico                      │    -932 │
│ Guatemala                        │    -935 │
│ Paraguay                         │    -984 │
│ Sri Lanka                        │   -1073 │
│ Zimbabwe                         │   -1467 │
│ Swaziland                        │   -1506 │
│ El Salvador                      │   -7592 │
├──────────────────────────────────┴─────────┤
│ 222 rows (40 shown)              2 columns │
└────────────────────────────────────────────┘

Below I'll put together a heatmap of the changes seen this month.

CREATE OR REPLACE TABLE h3_4_stats AS
    SELECT   H3_LATLNG_TO_CELL(lat, lon, 4) as h3_4,
             SUM(num_buildings) FILTER (release='2024-12-18.0')::int32 -
             SUM(num_buildings) FILTER (release='2024-11-13.0')::int32 delta
    FROM     building_footprints
    WHERE    lon < 172
    AND      lon > -172
    GROUP BY 1;

COPY (
    SELECT H3_CELL_TO_BOUNDARY_WKT(h3_4)::geometry geom,
           delta
    FROM   h3_4_stats
) TO 'building_changes.h3_4.gpkg'
    WITH (FORMAT GDAL,
          DRIVER 'GPKG',
          LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES');

The brightest areas saw the greatest net increase in building footprints over the past month while the darkest saw either no change or a slight net decrease.

Overture Building Footprint changes.

Overture's Roads

Road data is broken up into connector and segment theme types. Measuring by disk usage in GB, both datasets have been fairly stable in the past few months.

$ ~/duckdb ~/overture.duckdb
WITH a AS (
    SELECT   theme_type,
             release,
             ROUND(SUM(Size) / 1024 ** 3)::INT sum_size
    FROM     s3
    WHERE    theme = 'transportation'
    GROUP BY 1, 2
)
PIVOT    a
ON       theme_type
USING    SUM(sum_size)
ORDER BY release;
┌────────────────────┬───────────┬─────────┐
│      release       │ connector │ segment │
│      varchar       │  int128   │ int128  │
├────────────────────┼───────────┼─────────┤
│ 2023-07-26-alpha.0 │        20 │      62 │
│ 2023-10-19-alpha.0 │        20 │      61 │
│ 2023-11-14-alpha.0 │        18 │      58 │
│ 2023-12-14-alpha.0 │        17 │      46 │
│ 2024-01-17-alpha.0 │        17 │      46 │
│ 2024-02-15-alpha.0 │        17 │      47 │
│ 2024-03-12-alpha.0 │        18 │      48 │
│ 2024-04-16-beta.0  │        12 │      44 │
│ 2024-05-16-beta.0  │        14 │      44 │
│ 2024-06-13-beta.0  │        13 │      45 │
│ 2024-06-13-beta.1  │        13 │      45 │
│ 2024-07-22.0       │        13 │      46 │
│ 2024-08-20.0       │        13 │      52 │
│ 2024-09-18.0       │        13 │      55 │
│ 2024-10-23.0       │        13 │      49 │
│ 2024-11-13.0       │        13 │      50 │
│ 2024-12-18.0       │        13 │      50 │
├────────────────────┴───────────┴─────────┤
│ 17 rows                        3 columns │
└──────────────────────────────────────────┘

Below is the net change by country between October and this month's release.

CREATE OR REPLACE TABLE roads AS
    SELECT   SPLIT(filename, '/')[5] AS release,
             ROUND(bbox.xmin, 1) AS lon,
             ROUND(bbox.ymin, 1) AS lat,
             COUNT(*) AS num_roads
    FROM     READ_PARQUET('s3://overturemaps-us-west-2/release/2024-1*/theme=transportation/type=segment/*.parquet',
                          hive_partitioning=1,
                          filename=True)
    GROUP BY 1, 2, 3;

CREATE OR REPLACE TABLE roads_country AS
    SELECT   b.COUNTRY AS country,
             a.release AS release,
             SUM(a.num_roads) AS num_roads
    FROM     roads a
    JOIN     gadm b ON ST_CONTAINS(b.geom,
                                   ST_POINT(a.lon, a.lat))
    GROUP BY 1, 2;

SELECT   country,
         SUM(num_roads) FILTER (release='2024-12-18.0')::int128 -
         SUM(num_roads) FILTER (release='2024-10-23.0')::int128 delta
FROM     roads_country
GROUP BY 1
HAVING   delta != 0
ORDER BY 2 DESC;
┌──────────────────────────────────┬────────┐
│             country              │ delta  │
│             varchar              │ int128 │
├──────────────────────────────────┼────────┤
│ United States                    │ 486493 │
│ India                            │ 287099 │
│ Myanmar                          │ 217286 │
│ China                            │ 207824 │
│ France                           │ 122398 │
│ Germany                          │  77386 │
│ Spain                            │  72846 │
│ Poland                           │  71973 │
│ Guatemala                        │  68542 │
│ Canada                           │  63234 │
│ United Kingdom                   │  60958 │
│ Japan                            │  56353 │
│ Brazil                           │  56134 │
│ Italy                            │  53793 │
│ Russia                           │  50848 │
│ Pakistan                         │  46721 │
│ Australia                        │  38043 │
│ Uzbekistan                       │  32904 │
│ South Korea                      │  31099 │
│ Iran                             │  30110 │
│  ·                               │      · │
│  ·                               │      · │
│  ·                               │      · │
│ Jersey                           │     12 │
│ Cabo Verde                       │     12 │
│ Seychelles                       │     10 │
│ Solomon Islands                  │      7 │
│ Cayman Islands                   │      5 │
│ Saint Pierre and Miquelon        │      5 │
│ Niue                             │      5 │
│ Samoa                            │      5 │
│ Antarctica                       │      5 │
│ Antigua and Barbuda              │      4 │
│ Falkland Islands                 │      3 │
│ South Georgia and the South Sand │      2 │
│ Dominica                         │      2 │
│ São Tomé and Príncipe            │      2 │
│ Virgin Islands, U.S.             │      1 │
│ Faroe Islands                    │      1 │
│ Christmas Island                 │      1 │
│ British Indian Ocean Territory   │     -1 │
│ Saint Kitts and Nevis            │     -6 │
│ Burundi                          │   -230 │
├──────────────────────────────────┴────────┤
│ 219 rows (40 shown)             2 columns │
└───────────────────────────────────────────┘

Extracting a City's Buildings

Up until last week, if I wanted to extract a city-sized area from the buildings dataset, I'd first find out which Parquet files contained data for the country that city is in. Below are the Parquet file numbers for Iran's data in the November release.

SELECT   pq_num,
         SUM(num_buildings)
FROM     buildings_country_gadm
WHERE    release = '2024-11-13.0'
AND      country = 'Iran'
GROUP BY 1
ORDER BY 2 DESC;
┌─────────┬────────────────────┐
│ pq_num  │ sum(num_buildings) │
│ varchar │       int128       │
├─────────┼────────────────────┤
│ 00129   │            6494017 │
│ 00128   │            3928784 │
│ 00127   │             572505 │
│ 00130   │             292240 │
│ 00110   │             261544 │
│ 00111   │              18776 │
└─────────┴────────────────────┘

I'd then download those Parquet files first and then extract out any buildings within a bounding box for the city of interest. Working with Parquet files on a local SSD was often 5x faster than working with them remotely on S3.

Below is the code to extract Tehran's building footprints from the November release.

COPY (
    SELECT a.* EXCLUDE (names,
                        bbox,
                        sources,
                        geometry),
           JSON(a.names)                        AS names,
           JSON(a.bbox)                         AS bbox,
           JSON(a.sources)                      AS sources,
           ST_ASWKB(a.geometry) AS geom
    FROM   READ_PARQUET([
            'part-00129*.parquet',
            'part-00128*.parquet',
            'part-00127*.parquet',
            'part-00130*.parquet',
            'part-00110*.parquet',
            'part-00111*.parquet',
        ]) a
    WHERE  bbox.xmin > 50.7993
    AND    bbox.xmax < 51.8616
    AND    bbox.ymin > 35.3870
    AND    bbox.ymax < 36.0667
ORDER BY   HILBERT_ENCODE([ST_Y(ST_CENTROID(a.geometry)),
                           ST_X(ST_CENTROID(a.geometry))]::double[2])
) TO 'iran.2024.11.buildings.pq' (
    FORMAT            'PARQUET',
    CODEC             'ZSTD',
    COMPRESSION_LEVEL 22,
    ROW_GROUP_SIZE    15000);

The result would be a file that was produced from gigabytes of source data but ultimately only contained tens of megabytes of data of interest.

$ python ~/pqview/main.py \
    sizes \
    iran.2024.11.buildings.pq
 70.2 MB geom
 16.4 MB bbox
  9.8 MB sources
  7.6 MB id
147.0 kB names
 37.9 kB subtype
 36.3 kB class
 18.8 kB height
 11.2 kB num_floors
  4.9 kB level
  3.8 kB roof_shape
  3.4 kB roof_color
  3.0 kB has_parts
  3.0 kB version
  2.9 kB is_underground
  2.7 kB facade_color
  2.5 kB facade_material
  2.4 kB num_floors_underground
  2.4 kB roof_orientation
  2.4 kB min_floor
  2.3 kB min_height
  2.3 kB roof_height
  2.3 kB roof_material
  2.2 kB roof_direction

This is the above Parquet file rendered in QGIS. The basemap is Esri's World Imagery basemap. It has been tinted red to contrast it with the colour-coded building footprints covering it.

Overture December 2024

But last week, I found a faster and simpler way to extract city-level data from Overture.

QGIS Integration

Chris Holmes, who is a Fellow and Vice President at Planet Labs, recently published a QGIS GeoParquet Downloader plugin. This plugin will take the bounding box of your QGIS workspace and only download the data for that area. Many data sources are supported including Overture, Source Cooperative and Hugging Face.

Before installing the plugin, go to the Plugins Menu in QGIS and click "Python Console". Then type:

import pip; pip.main(['install', '--upgrade', 'duckdb'])

Next, click on the Plugins Menu and then "Manage and Install Plugins".

Select "All" from the top left and search for "GeoParquet Downloader". Click the "Install Plugin" button on the bottom right of the UI.

Then, in the middle of the bottom toolbar, type "world" into the coordinates text box and hit enter. A vector map of the world should appear.

Overture December 2024

Zoom into a city of interest somewhere in the world and then click the "Download Overture Maps Data" button in one of the top toolbars. The button's position can vary depending on how you've configured QGIS' layout.

Overture December 2024

Select "Buildings" from the UI that pops up and save the results to a GeoParquet file. The data should appear in the QGIS workspace shortly after it has been downloaded into that GeoParquet file.

Overture's buildings all have numerous metadata fields that can be used to aid visualisation and analysis.

Overture December 2024

The GeoParquet file for the above scene is less than 8 MB in size. When uncompressed and in JSON format the data is over 57 MB in size.

$ echo "FROM 'geoparquet_download_20241211_113210.parquet'" \
    | ~/duckdb -json \
    | wc -c
60200446

Below is an example record.

$ echo "FROM  'geoparquet_download_20241211_113210.parquet'
        WHERE roof_color    IS NOT NULL
        AND   roof_material IS NOT NULL
        LIMIT 1" \
    | ~/duckdb -json \
    | jq -S .
[
  {
    "bbox": "{'xmin': 24.72516, 'xmax': 24.725378, 'ymin': 59.430557, 'ymax': 59.430717}",
    "class": "apartments",
    "facade_color": null,
    "facade_material": null,
    "geometry": "POLYGON ((24.725212 59.430564, 24.725378 59.43058, 24.7253552 59.430648, 24.725332 59.430717, 24.725164 59.430702, 24.725212 59.430564))",
    "has_parts": false,
    "height": null,
    "id": "08b089b1a6082fff0200faf2a07a0508",
    "is_underground": false,
    "level": null,
    "min_floor": null,
    "min_height": null,
    "names": null,
    "num_floors": 2,
    "num_floors_underground": null,
    "roof_color": "#C0C0C0",
    "roof_direction": null,
    "roof_height": null,
    "roof_material": "metal",
    "roof_orientation": null,
    "roof_shape": "hipped",
    "sources": "[{'property': , 'dataset': OpenStreetMap, 'record_id': w26897528@6, 'update_time': 2023-07-12T17:34:53.000Z, 'confidence': NULL}]",
    "subtype": "residential",
    "theme": "buildings",
    "type": "building",
    "version": 0
  }
]

The Map Explorer

Overture's Map Explorer was released a few months ago. It's intended as an x-ray viewer of their datasets meaning it helps inspect individual records in a 2D and 3D map interface.

Overture hosts the map's data in 370+ GB worth of PMTiles on AWS Cloudfront. The viewer selectively downloads data within the field of view from these files, saving a huge amount of bandwidth and loading areas of interest in real-time as you zoom and pan the map.

They've recently added bookmarks to the UI that introduce places and map functionality of interest. Included in these are 3D buildings. Below is a screenshot of Hong Kong.

Overture December 2024

They also have a bookmark for the shipping lanes leaving the Port of Piraeus just outside of Athens.

Overture December 2024

Below is Dubai's Jumeirah Beach Residence (JBR) area with the Palm Jumeirah archipelago in the background.

Overture December 2024

Not all notable structures have representative 3D models. For example, Ain Dubai, the world's highest observation wheel appears as a thin rectangle.

Overture December 2024

This is the Ain Dubai in real life.

Overture December 2024

It would be nice to see support for Digital Elevation Models (DEMs) added to this viewer. Cities like Athens can be difficult to navigate around without any topography to aid orientation.

Overture December 2024

Below is the same area in real life. The topography tells the story of the city's layout.

Overture December 2024

Qiusheng Wu's Python-based leafmap project supports both 3D Terrain and 3D PMTiles so at least one web-based, open source solution does exist.

Extracting 3D Building Parts

Selecting any 3D building in the Explorer will bring up its properties.

Overture December 2024

The building_id field (not to be confused with the id field) can be used to search the building_part dataset.

$ echo "FROM  READ_PARQUET('s3://overturemaps-us-west-2/release/2024-12-18.0/theme=buildings/type=building_part/*.parquet',
                           hive_partitioning=1,
                           filename=True)
        WHERE building_id = '08b43a132dc91fff020088d7020f24e2'" \
    | ~/duckdb -json \
    | jq -S 'del(..|nulls)' \
    > cloud_22.json

The above returned 31 records, each with its own properties and polygons.

$ jq length cloud_22.json # 31

Below is an example record.

$ jq -S '.[0]' cloud_22.json
{
  "bbox": "{'xmin': 55.125946, 'xmax': 55.126446, 'ymin': 25.137445, 'ymax': 25.137896}",
  "building_id": "08b43a132dc91fff020088d7020f24e2",
  "facade_material": "glass",
  "filename": "s3://overturemaps-us-west-2/release/2024-12-18.0/theme=buildings/type=building_part/part-00000-00ec12bf-9729-48e6-b5e9-4ecfd7a91363-c000.zstd.parquet",
  "geometry": "POLYGON ((55.1259462 25.1375893, 55.1260403 25.1375121, 55.1261211 25.1374458, 55.1262212 25.1375423, 55.1263344 25.1376442, 55.1264449 25.1377377, 55.1262972 25.1378926, 55.1261991 25.1378181, 55.1260941 25.1377271, 55.1259462 25.1375893))",
  "height": 137.0,
  "id": "08b43a132c348fff02013812c88f362f",
  "is_underground": false,
  "min_height": 124.0,
  "names": "{'primary': NULL, 'common': NULL, 'rules': NULL}",
  "sources": "[{'property': , 'dataset': OpenStreetMap, 'record_id': w964652257, 'update_time': 2024-05-27T11:05:56.000Z, 'confidence': NULL}]",
  "theme": "buildings",
  "type": "building_part",
  "version": 0
}

Below are a few properties for each of the 31 polygons.

$ jq -c '.[]|[.min_height,.height,.facade_material]' cloud_22.json
[124.0,137.0,"glass"]
[184.0,193.0,"glass"]
[24.0,41.0,"glass"]
[60.0,73.0,"glass"]
[92.0,105.0,"glass"]
[156.0,169.0,"glass"]
[null,189.0,"glass"]
[172.0,181.0,"glass"]
[null,22.0,null]
[76.0,89.0,"glass"]
[140.0,153.0,"glass"]
[44.0,57.0,"glass"]
[109.0,121.0,"glass"]
[null,145.0,"glass"]
[124.0,137.0,"glass"]
[24.0,41.0,"glass"]
[60.0,73.0,"glass"]
[92.0,105.0,"glass"]
[44.0,57.0,"glass"]
[90.0,92.0,null]
[null,20.0,null]
[76.0,90.0,"glass"]
[24.0,41.0,"glass"]
[60.0,73.0,"glass"]
[92.0,105.0,"glass"]
[76.0,89.0,"glass"]
[44.0,57.0,"glass"]
[null,21.0,"glass"]
[140.0,149.0,"glass"]
[null,157.0,"glass"]
[92.0,105.0,"glass"]
Thank you for taking the time to read this post. I offer both consulting and hands-on development services to clients in North America and Europe. If you'd like to discuss how my offerings can help your business please contact me via LinkedIn.

Copyright © 2014 - 2024 Mark Litwintschik. This site's template is based off a template by Giulio Fidente.