Home | Benchmarks | Categories | Atom Feed

Posted on Mon 02 February 2026 under GIS

72M Points of Interest

Overture Map's publish a Places dataset along with each of their monthly releases. Places can be thought of as points of interest (POIs). As of January, they have over 72 million of these POIs covering the world.

Below is an example from Viru Street in Tallinn, Estonia. Each point is labelled with its primary name and basic category in brackets. The point colour is based on the POI's basic category.

Overture's Places

Overture has made several improvements to the Places dataset in recent months. In September, they added operating status and confidence properties. In October, they added a basic category property and in December, a taxonomy property was added.

In this post, I'll examine their latest Places release.

My Workstation

I'm using a 5.7 GHz AMD Ryzen 9 9950X CPU. It has 16 cores and 32 threads and 1.2 MB of L1, 16 MB of L2 and 64 MB of L3 cache. It has a liquid cooler attached and is housed in a spacious, full-sized Cooler Master HAF 700 computer case.

The system has 96 GB of DDR5 RAM clocked at 4,800 MT/s and a 5th-generation, Crucial T700 4 TB NVMe M.2 SSD which can read at speeds up to 12,400 MB/s. There is a heatsink on the SSD to help keep its temperature down. This is my system's C drive.

The system is powered by a 1,200-watt, fully modular Corsair Power Supply and is sat on an ASRock X870E Nova 90 Motherboard.

I'm running Ubuntu 24 LTS via Microsoft's Ubuntu for Windows on Windows 11 Pro. In case you're wondering why I don't run a Linux-based desktop as my primary work environment, I'm still using an Nvidia GTX 1080 GPU which has better driver support on Windows and ArcGIS Pro only supports Windows natively.

Installing Prerequisites

I'll use Python 3.12.3 and a few other tools to help analyse the data in this post.

$ sudo add-apt-repository ppa:deadsnakes/ppa
$ sudo apt update
$ sudo apt install \
    jq \
    python3-pip \
    python3.12-venv

Below, I'll set up a Python Virtual Environment.

$ python3 -m venv ~/.pois
$ source ~/.pois/bin/activate

I'll use a Parquet debugging tool I've been working on to see how much space each column takes up in one of the Parquet files.

$ git clone https://github.com/marklit/pqview \
    ~/pqview

$ python3 -m pip install \
          -r ~/pqview/requirements.txt

I'll also install the latest AWS CLI.

$ python3 -m pip install awscli

I'll use DuckDB v1.4.3, along with its H3, JSON, Lindel, Parquet and Spatial extensions, in this post.

$ cd ~
$ wget -c https://github.com/duckdb/duckdb/releases/download/v1.4.3/duckdb_cli-linux-amd64.zip
$ unzip -j duckdb_cli-linux-amd64.zip
$ chmod +x duckdb
$ ~/duckdb
INSTALL h3 FROM community;
INSTALL lindel FROM community;
INSTALL json;
INSTALL parquet;
INSTALL spatial;

I'll set up DuckDB to load every installed extension each time it launches.

$ vi ~/.duckdbrc
.timer on
.width 180
LOAD h3;
LOAD lindel;
LOAD json;
LOAD parquet;
LOAD spatial;

The maps in this post were rendered using QGIS version 3.44. QGIS is a desktop application that runs on Windows, macOS and Linux. The application has grown in popularity in recent years and has ~15M application launches from users all around the world each month.

I used QGIS' Tile+ and HCMGIS plugins to add basemaps from Bing and Esri to the maps in this post.

Downloading Overture's Places

As of last September, Overture will remove releases after 60 days. The Places URL used in this post will stop working sometime in March.

Below lists Overture's available releases. Use the latest release version in the S3 URL below to download this dataset.

$ curl https://labs.overturemaps.org/data/releases.json
{
    "latest": "2026-01-21.0",
    "releases": [
        "2026-01-21.0",
        "2025-12-17.0"
    ]
}
$ aws s3 --no-sign-request sync \
    s3://overturemaps-us-west-2/release/2026-01-21.0/theme=places/type=place/ \
    ~/places

The above downloaded 8 Parquet files with a total disk footprint of 7.2 GB.

Data Fluency

This dataset contains 72,444,739 records.

$ ~/duckdb
SELECT COUNT(*)
FROM   'places/part*.parquet';
┌─────────────────┐
│  count_star()   │
│      int64      │
├─────────────────┤
│    72444739     │
│ (72.44 million) │
└─────────────────┘

Below is a heatmap showing where the POIs are most heavily concentrated.

CREATE OR REPLACE TABLE h3_3_stats AS
    SELECT   h3_3: H3_LATLNG_TO_CELL(
                       bbox.ymin,
                       bbox.xmin,
                       3),
             num_pois: COUNT(*)
    FROM     'places/part*.parquet'
    GROUP BY 1;

COPY (
    SELECT geometry: ST_ASWKB(H3_CELL_TO_BOUNDARY_WKT(h3_3)::geometry),
           num_pois
    FROM   h3_3_stats
    WHERE  ST_XMIN(geometry::geometry) BETWEEN -179 AND 179
    AND    ST_XMAX(geometry::geometry) BETWEEN -179 AND 179
) TO 'num_pois.h3_3_stats.parquet' (
        FORMAT 'PARQUET',
        CODEC  'ZSTD',
        COMPRESSION_LEVEL 22,
        ROW_GROUP_SIZE 15000);
Overture's Places

Below is an example record from this dataset.

$ echo "SELECT * EXCLUDE(addresses,
                         bbox,
                         brand,
                         categories,
                         names,
                         taxonomy,
                         sources),
               addresses:  addresses::JSON,
               bbox:       bbox::JSON,
               brand:      brand::JSON,
               categories: categories::JSON,
               names:      names::JSON,
               taxonomy:   taxonomy::JSON,
               sources:    sources::JSON
        FROM   'places/part*.parquet'
        WHERE  brand.names.primary = 'Starbucks'
        LIMIT  1" \
       | ~/duckdb -json \
       | jq -S .
[
  {
    "addresses": [
      {
        "country": "CL",
        "freeform": "Juan Soler Manfredini 131, local 302 y 302B, Mall Paseo Costanera Puerto Montt",
        "locality": "Puerto Montt",
        "postcode": "5504750",
        "region": null
      }
    ],
    "basic_category": "cafe",
    "bbox": {
      "xmax": -72.9016342163086,
      "xmin": -72.90164184570312,
      "ymax": -41.48642349243164,
      "ymin": -41.48643112182617
    },
    "brand": {
      "names": {
        "common": null,
        "primary": "Starbucks",
        "rules": null
      },
      "wikidata": null
    },
    "categories": {
      "alternate": null,
      "primary": "cafe"
    },
    "confidence": 0.32347911067676516,
    "emails": null,
    "geometry": "POINT (-72.90164 -41.48643)",
    "id": "72da3584-2530-4bac-b062-124393ecdb0c",
    "names": {
      "common": null,
      "primary": "Starbucks Chile",
      "rules": null
    },
    "operating_status": "open",
    "phones": "[+56232621864]",
    "socials": "['https://www.facebook.com/299687346558972']",
    "sources": [
      {
        "between": null,
        "confidence": 0.3234791106767652,
        "dataset": "meta",
        "license": "CDLA-Permissive-2.0",
        "property": "",
        "record_id": "299687346558972",
        "update_time": "2025-12-01T08:00:00.000Z"
      },
      {
        "between": null,
        "confidence": null,
        "dataset": "Overture",
        "license": "CDLA-Permissive-2.0",
        "property": "/properties/confidence",
        "record_id": null,
        "update_time": "2026-01-13T21:57:48Z"
      }
    ],
    "taxonomy": {
      "alternates": null,
      "hierarchy": [
        "food_and_drink",
        "casual_eatery",
        "cafe"
      ],
      "primary": "cafe"
    },
    "theme": "places",
    "type": "place",
    "version": 5,
    "websites": null
  }
]

Below is a breakdown of how much space each column takes up relative to the others. This dataset only uses point geometry so its geometry field footprint is much smaller than what you'd find in datasets with linestrings and polygons.

The columns using up more space tend to have fewer NULL values and overall better coverage. The exception to this would be something like the basic category column which has ~400 unique values and compresses very well.

$ python3 ~/pqview/main.py \
    types \
    --html \
    places/part-00000-faac29ab-3031-41b7-836a-26e1193347c0-c000.zstd.parquet \
    > places.types.html
Overture's Places

Below are the column types, NULL-value ratios, number of unique values and the lowest and highest value from each field.

$ ~/duckdb
SELECT   column_name,
         column_type[:30],
         null_percentage,
         approx_unique,
         min[:30],
         max[:30]
FROM     (SUMMARIZE
          FROM 'places/part*.parquet')
ORDER BY 1;
┌──────────────────┬────────────────────────────────┬─────────────────┬───────────────┬───────────────────────────────────┬────────────────────────────────────────┐
│   column_name    │        column_type[:30]        │ null_percentage │ approx_unique │             min[:30]              │                max[:30]                │
│     varchar      │            varchar             │  decimal(9,2)   │     int64     │              varchar              │                varchar                 │
├──────────────────┼────────────────────────────────┼─────────────────┼───────────────┼───────────────────────────────────┼────────────────────────────────────────┤
│ addresses        │ STRUCT(freeform VARCHAR, local │            0.00 │      45527236 │ [{'freeform': '', 'locality':     │ [{'freeform': NULL, 'locality'         │
│ basic_category   │ VARCHAR                        │            0.00 │           405 │ accommodation                     │ zoo                                    │
│ bbox             │ STRUCT(xmin FLOAT, xmax FLOAT, │            0.00 │      68675209 │ {'xmin': -180.0, 'xmax': -180.    │ {'xmin': 180.0, 'xmax': 180.0,         │
│ brand            │ STRUCT(wikidata VARCHAR, "name │           93.56 │         89854 │ {'wikidata': Q100146251, 'name    │ {'wikidata': NULL, 'names': {'         │
│ categories       │ STRUCT("primary" VARCHAR, alte │            0.00 │       2625563 │ {'primary': 3d_printing_servic    │ {'primary': zoo, 'alternate':          │
│ confidence       │ DOUBLE                         │            0.00 │        222625 │ 0.0                               │ 1.0                                    │
│ emails           │ VARCHAR[]                      │           97.32 │       1494892 │ ['']                              │ [合羽090-6252-0042kakaricho@char       │
│ geometry         │ GEOMETRY                       │            0.00 │      65426272 │ POINT (2 28)                      │ POINT (-6.240234374999999 53.3         │
│ id               │ VARCHAR                        │            0.00 │      67893339 │ 00000057-00a2-439b-9581-cf9e24    │ ffffffe2-6947-4705-ad9b-936fab         │
│ names            │ STRUCT("primary" VARCHAR, comm │            0.00 │      64313486 │ {'primary': \b음성 A, 'common': N │ {'primary': 𩻸魚堀溪觀魚步道, 'common' │
│ operating_status │ VARCHAR                        │            0.00 │             3 │ closed                            │ temporarily closed                     │
│ phones           │ VARCHAR[]                      │           16.58 │      54450308 │ ['']                              │ [‎+34 951 51 92 20]                     │
│ socials          │ VARCHAR[]                      │           15.59 │      63756430 │ [' https://www.facebook.com/Jo    │ [www.motrio.com]                       │
│ sources          │ STRUCT(property VARCHAR, datas │            0.00 │      65629291 │ [{'property': '', 'dataset': A    │ [{'property': '', 'dataset': m         │
│ taxonomy         │ STRUCT("primary" VARCHAR, hier │            0.00 │          1718 │ {'primary': 3d_printing_servic    │ {'primary': zoo, 'hierarchy':          │
│ version          │ INTEGER                        │            0.00 │             9 │ 1                                 │ 8                                      │
│ websites         │ VARCHAR[]                      │           39.07 │      39341765 │ ['']                              │ ['้http://www.krungsri.com']            │
├──────────────────┴────────────────────────────────┴─────────────────┴───────────────┴───────────────────────────────────┴────────────────────────────────────────┤
│ 17 rows                                                                                                                                                6 columns │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Sources

Overture has 8 outside sources they collect data from and combine to build this dataset.

$ ~/duckdb
SELECT COUNT(*),
       a.dataset
FROM (
    SELECT UNNEST(sources) a
    FROM   'places/part*.parquet'
)
GROUP BY 2
ORDER BY 1 DESC;
┌──────────────┬──────────────┐
│ count_star() │   dataset    │
│    int64     │   varchar    │
├──────────────┼──────────────┤
│     72444739 │ Overture     │
│     58958274 │ meta         │
│      6003562 │ Foursquare   │
│      5644707 │ Microsoft    │
│      1554249 │ AllThePlaces │
│       148042 │ DAC          │
│       127862 │ PinMeTo      │
│         4844 │ RenderSEO    │
│         3199 │ Krick        │
└──────────────┴──────────────┘

In researching the team behind Places, I came across this LinkedIn post from Dana Bauer at Overture where she was looking for someone to join TomTom, one of the Overture Foundation's partner firms, to help her team work on this dataset.

Overture's Places

Categories

There are 398 unique basic categories in this dataset.

$ ~/duckdb
SELECT COUNT(DISTINCT basic_category)
FROM   'places/part*.parquet';
398

Below are the top 200 basic categories.

.maxrows 200

SELECT   COUNT(*),
         basic_category
FROM     'places/part*.parquet'
GROUP BY 2
ORDER BY 1 DESC
LIMIT    100;
┌──────────────┬────────────────────────────────┐
│ count_star() │         basic_category         │
│    int64     │            varchar             │
├──────────────┼────────────────────────────────┤
│      4809112 │ restaurant                     │
│      2941235 │ specialty_store                │
│      1816052 │ beauty_salon                   │
│      1695081 │ clothing_store                 │
│      1428417 │ place_of_learning              │
│      1339376 │ service_location               │
│      1302723 │ hotel                          │
│      1227198 │ christian_place_of_worshop     │
│      1180049 │ retail_location                │
│      1155375 │ auto_repair_service            │
│      1154927 │ grocery_store                  │
│      1117086 │ financial_service              │
│      1116707 │ healthcare_location            │
│      1062436 │ social_or_community_service    │
│      1019714 │ historic_site                  │
│       919867 │ bar                            │
│       856064 │ event_or_party_service         │
│       828329 │ b2b_supplier_distributor       │
│       793373 │ electronics_store              │
│       784525 │ real_estate_service            │
│       735823 │ cafe                           │
│       686114 │ coffee_shop                    │
│       665004 │ general_dentistry              │
│       659636 │ gas_station                    │
│       647944 │ civic_organization_office      │
│       636551 │ casual_eatery                  │
│       631680 │ real_estate_agency             │
│       628708 │ home_service                   │
│       589904 │ gym                            │
│       587477 │ accommodation                  │
│       570939 │ hair_salon                     │
│       558413 │ hospital                       │
│       554701 │ car_dealer                     │
│       546472 │ pharmacy                       │
│       545813 │ nature_outdoors                │
│       542841 │ specialty_school               │
│       536649 │ bakery                         │
│       528118 │ park                           │
│       525704 │ convenience_store              │
│       498561 │ business_advertising_marketing │
│       486786 │ fast_food_restaurant           │
│       478687 │ elementary_school              │
│       475380 │ building_contractor_service    │
│       459196 │ flower_shop                    │
│       458414 │ pizzaria                       │
│       443957 │ travel_service                 │
│       434095 │ insurance_agency               │
│       430175 │ vehicle_service                │
│       423339 │ religious_organization         │
│       413390 │ college_university             │
│       411364 │ doctors_office                 │
│       405059 │ b2b_service                    │
│       396944 │ entertainment_location         │
│       393056 │ atm                            │
│       388623 │ transportation_location        │
│       387077 │ barber_shop                    │
│       383324 │ manufacturer                   │
│       380192 │ government_office              │
│       374121 │ attorney_or_law_firm           │
│       357145 │ sporting_goods_store           │
│       347795 │ printing_service               │
│       325030 │ building_construction_service  │
│       316876 │ alternative_medicine           │
│       311356 │ shipping_delivery_service      │
│       310737 │ sport_fitness_facility         │
│       308497 │ fashion_or_apparel_store       │
│       305517 │ spa                            │
│       304559 │ bank                           │
│       295161 │ jewelry_store                  │
│       281592 │ farm                           │
│       276654 │ preschool                      │
│       275556 │ hardware_store                 │
│       270869 │ shoe_store                     │
│       261106 │ art_craft_hobby_store          │
│       258718 │ post_office                    │
│       255981 │ sport_recreation_club          │
│       236171 │ technical_service              │
│       228379 │ music_venue                    │
│       217210 │ nail_salon                     │
│       206289 │ physical_therapy               │
│       203111 │ private_lodging                │
│       203106 │ veterinarian                   │
│       201486 │ shopping_mall                  │
│       199644 │ personal_service               │
│       198236 │ eyewear_store                  │
│       194648 │ recreational_location          │
│       194297 │ eating_drinking_location       │
│       190095 │ pet_store                      │
│       187244 │ tattoo_or_piercing_salon       │
│       186509 │ media_service                  │
│       182239 │ pub                            │
│       179089 │ bookstore                      │
│       178037 │ corporate_or_business_office   │
│       176928 │ high_school                    │
│       176335 │ beach                          │
│       173075 │ mental_health                  │
│       170891 │ landscaping_gardening_service  │
│       170403 │ bed_and_breakfast              │
│       170397 │ legal_service                  │
│       166948 │ liquor_store                   │
├──────────────┴────────────────────────────────┤
│ 100 rows                            2 columns │
└───────────────────────────────────────────────┘

Below, I've built a map showing the most common category in every H3 zoom-level 3 hexagon across the planet.

CREATE OR REPLACE TABLE h3_5s AS
    WITH b AS (
        WITH a AS (
            SELECT   H3_LATLNG_TO_CELL(bbox.ymin,
                                       bbox.xmin,
                                       3) h3_5,
                     basic_category,
                     COUNT(*) num_recs
            FROM     'places/part*.parquet'
            GROUP BY 1, 2
        )
        SELECT *,
               ROW_NUMBER() OVER (PARTITION BY h3_5
                                  ORDER BY     num_recs DESC) AS rn
        FROM   a
    )
    FROM     b
    WHERE    rn = 1
    ORDER BY num_recs DESC;

COPY (
    SELECT geometry: H3_CELL_TO_BOUNDARY_WKT(h3_5)::GEOMETRY,
           basic_category
    FROM   h3_5s
    WHERE  ST_XMIN(geometry::geometry) BETWEEN -179 AND 179
    AND    ST_XMAX(geometry::geometry) BETWEEN -179 AND 179
) TO 'basic_category.h3_3_stats.parquet' (
    FORMAT 'PARQUET',
    CODEC  'ZSTD',
    COMPRESSION_LEVEL 22,
    ROW_GROUP_SIZE 15000);

QGIS rendered strange things when I tried to change the projection to a US-centric one so I've had to stick with EPSG:4326 below. I've added Natural Earth's Admin-0 boundaries to help outline country borders.

Overture's Places

QGIS' Globe View wouldn't render labels but does show the contiguous nature of the most common categories over land.

This dataset has a lot of POIs over open oceans and seas as well. The most common categories for those POIs are much more diverse than for those over land.

Overture's Places

Below are the most common categories across Europe.

Overture's Places

Below are the most common categories across North Africa, the Middle East, India and South East Asia.

Overture's Places

Operating Status

I suspect the operating status data is in its early stages. A lot of businesses collapse in the first year and I expect in future releases that the closed count will grow from where it is now.

$ ~/duckdb
SELECT   COUNT(*),
         operating_status
FROM     'places/part*.parquet'
GROUP BY 2
ORDER BY 1 DESC;
┌──────────────┬────────────────────┐
│ count_star() │  operating_status  │
│    int64     │      varchar       │
├──────────────┼────────────────────┤
│     72443934 │ open               │
│          785 │ closed             │
│           20 │ temporarily closed │
└──────────────┴────────────────────┘

Version

Records contain version counts. Version 5 is the most common in this dataset at the moment.

$ ~/duckdb
CREATE OR REPLACE TABLE h3_5s AS
    WITH b AS (
        WITH a AS (
            SELECT   H3_LATLNG_TO_CELL(bbox.ymin,
                                       bbox.xmin,
                                       5) h3_5,
                     version,
                     COUNT(*) num_recs
            FROM     'places/part*.parquet'
            GROUP BY 1, 2
        )
        SELECT *,
               ROW_NUMBER() OVER (PARTITION BY h3_5
                                  ORDER BY     num_recs DESC) AS rn
        FROM   a
    )
    FROM     b
    WHERE    rn = 1
    ORDER BY num_recs DESC;

COPY (
    SELECT geometry: H3_CELL_TO_BOUNDARY_WKT(h3_5)::GEOMETRY,
           version
    FROM   h3_5s
    WHERE  ST_XMIN(geometry::geometry) BETWEEN -179 AND 179
    AND    ST_XMAX(geometry::geometry) BETWEEN -179 AND 179
) TO 'version.h3_5_stats.parquet' (
    FORMAT 'PARQUET',
    CODEC  'ZSTD',
    COMPRESSION_LEVEL 22,
    ROW_GROUP_SIZE 15000);
Overture's Places
SELECT   version,
         COUNT(*)
FROM     'version.h3_5_stats.parquet'
GROUP BY 1
ORDER BY 1;
┌─────────┬──────────────┐
│ version │ count_star() │
│  int32  │    int64     │
├─────────┼──────────────┤
│       1 │          645 │
│       2 │        12272 │
│       3 │        28228 │
│       4 │        31962 │
│       5 │       137824 │
│       6 │        80749 │
│       7 │         7962 │
│       8 │          412 │
└─────────┴──────────────┘

Brand Names

Of the 72M+ records, ~4M contain a brand name. Below are the most common ones.

$ ~/duckdb
SELECT   COUNT(*),
         brand.names.primary
FROM     'places/part*.parquet'
GROUP BY 2
ORDER BY 1 DESC
LIMIT    25;
┌──────────────┬─────────────────────────────┐
│ count_star() │           primary           │
│    int64     │           varchar           │
├──────────────┼─────────────────────────────┤
│     68001883 │ NULL                        │
│        57163 │ Citibank                    │
│        56977 │ Wildberries                 │
│        45430 │ Western Union               │
│        42517 │ Shell                       │
│        42036 │ McDonald's                  │
│        33750 │ Amazon Locker               │
│        33117 │ サントリー                  │
│        31789 │ Subway                      │
│        30119 │ Starbucks                   │
│        26846 │ 7-Eleven                    │
│        26509 │ LibertyX Bitcoin ATM        │
│        20088 │ KFC                         │
│        19844 │ Dollar General              │
│        18152 │ Burger King                 │
│        16320 │ Pizza Hut                   │
│        15583 │ InPost                      │
│        15035 │ Domino's Pizza              │
│        14169 │ Indian Oil Corporation Ltd. │
│        13487 │ Enterprise                  │
│        13422 │ OXXO                        │
│        12769 │ Lidl                        │
│        12719 │ bp                          │
│        12366 │ Honda                       │
│        12251 │ TotalEnergies               │
├──────────────┴─────────────────────────────┤
│ 25 rows                          2 columns │
└────────────────────────────────────────────┘

There are 44K McDonald's locations in the world so to have 42K is pretty good coverage.

Taxonomy & Hierarchy

Below are the most common values for the new taxonomy.hierarchy field that was added late last year.

$ ~/duckdb
SELECT   COUNT(*),
         taxonomy.hierarchy
FROM     'places/part*.parquet'
GROUP BY 2
ORDER BY 1 DESC
LIMIT    25;
┌──────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ count_star() │                                                          hierarchy                                                          │
│    int64     │                                                          varchar[]                                                          │
├──────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│      2011362 │ [food_and_drink, restaurant]                                                                                                │
│      1393271 │ [lifestyle_services, beauty_service, beauty_salon]                                                                          │
│      1339177 │ [services_and_business, professional_service]                                                                               │
│      1302723 │ [lodging, hotel]                                                                                                            │
│      1141151 │ [shopping]                                                                                                                  │
│      1027893 │ [shopping, food_and_beverage_store, grocery_store]                                                                          │
│      1023589 │ [shopping, fashion_and_apparel_store, clothing_store]                                                                       │
│      1015254 │ [cultural_and_historic, architectural_landmark]                                                                             │
│       926424 │ [cultural_and_historic, religious_organization, place_of_worship, christian_place_of_worshop]                               │
│       895567 │ [travel_and_transportation, automotive_and_ground_transport, automotive, automotive_services_and_repair, automotive_repair] │
│       808932 │ [services_and_business, professional_service, event_planning]                                                               │
│       730509 │ [food_and_drink, bar]                                                                                                       │
│       715677 │ [education, school]                                                                                                         │
│       711235 │ [food_and_drink, casual_eatery, cafe]                                                                                       │
│       706174 │ [services_and_business, real_estate]                                                                                        │
│       703978 │ [community_and_government, community_service]                                                                               │
│       686114 │ [food_and_drink, beverage_shop, coffee_shop]                                                                                │
│       652453 │ [travel_and_transportation, automotive_and_ground_transport, fueling_station, gas_station]                                  │
│       631680 │ [services_and_business, real_estate, real_estate_agent]                                                                     │
│       589904 │ [sports_and_recreation, sports_and_recreation_venue, gym]                                                                   │
│       578091 │ [health_care, dentist]                                                                                                      │
│       558413 │ [health_care, hospital]                                                                                                     │
│       553915 │ [lifestyle_services, beauty_service, hair_salon]                                                                            │
│       546472 │ [shopping, specialty_store, pharmacy]                                                                                       │
│       535707 │ [food_and_drink, casual_eatery, bakery]                                                                                     │
├──────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ 25 rows                                                                                                                          2 columns │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Maritime POIs

There are a large number of POIs that are over open ocean and seas in this post. I'm not sure how many of these have been placed in error.

Marine Regions has GeoPackage files delineating maritime boundaries. Seven of the nine files contain polygons that I'll use to determine the waters a given POI is in.

Below are the GeoPackage file sizes.

$ ls -lh ~/marineregions_org/*.gpkg
..  69M .. eez_12nm_v4.gpkg
..  60M .. eez_24nm_v4.gpkg
..  77M .. eez_internal_waters_v4.gpkg
..  16M .. eez_boundaries_v12.gpkg
.. 157M .. eez_v12.gpkg
..  39M .. eez_archipelagic_waters_v4.gpkg
.. 2.1M .. ecs_boundaries_v01.gpkg
.. 7.0M .. ecs_v01.gpkg
.. 8.8M .. High_Seas_v1.gpkg

Below is a rendering of this dataset's geometry and names for the Baltic Sea.

Overture's Places

I'll import the GeoPackage files into a DuckDB table.

$ echo "CREATE OR REPLACE TABLE marineregions (
            name VARCHAR,
            geom GEOMETRY);" \
    | ~/duckdb places.duckdb

$ for FILENAME in ~/marineregions_org/High_Seas_v1.gpkg; do
    echo "INSERT INTO marineregions
            SELECT name,
                   geom
            FROM   ST_READ('$FILENAME');" \
        | ~/duckdb places.duckdb
  done

$ for FILENAME in ~/marineregions_org/{ecs_v01,eez_12nm_v4,eez_24nm_v4,eez_archipelagic_waters_v4,eez_internal_waters_v4,eez_v12}*.gpkg; do
    echo $FILENAME
    echo "INSERT INTO marineregions
            SELECT name: GEONAME,
                   geom
            FROM   ST_READ('$FILENAME');" \
        | ~/duckdb places.duckdb
  done

There are 1,006 records in the Marine Regions table in DuckDB. I'll group Overture's 72M+ records into ~914K hexagons. This will give me a quick and rough estimate of how many POIs could be over open oceans and seas.

$ ~/duckdb places.duckdb
CREATE OR REPLACE TABLE h3_stats AS
    SELECT   hexagon:
                H3_LATLNG_TO_CELL(
                    bbox.ymin,
                    bbox.xmin,
                    6) ,
             num_pois: COUNT(*)
    FROM     'places/part*.parquet'
    GROUP BY 1;
SELECT SUM(num_pois)
FROM   h3_stats a
JOIN   marineregions b
    ON ST_Overlaps(H3_CELL_TO_BOUNDARY_WKT(a.hexagon)::GEOMETRY,
                   b.geom)
WHERE  b.name IS NOT NULL;
┌─────────────────┐
│  sum(num_pois)  │
│     int128      │
├─────────────────┤
│    29590753     │
│ (29.59 million) │
└─────────────────┘

The above shows over 29M+ POIs over open oceans and seas. This number won't be super exact. Some hexagons could cover both bodies of water along with a crowded coastline. Also, a cafe on a beach might have its position located in the neighbouring body of water due to inaccuracies in GPS. Nonetheless, 29M shouldn't be too far from an accurate figure.

For reference, this is what the 1,006 Marine Regions polygons look like rendered on a map. They cover just about every major body of water but also have a wide buffer with Antarctica.

COPY (
    SELECT geom,
           name
    FROM   marineregions
) TO 'marineregions.parquet' (
    FORMAT 'PARQUET',
    CODEC  'ZSTD',
    COMPRESSION_LEVEL 22,
    ROW_GROUP_SIZE 15000);
Overture's Places

Confidence

Below are the most common confidence values across this dataset. The brighter the hexagon the higher the confidence.

$ ~/duckdb
SELECT   COUNT(*),
         confidence: (confidence*10)::INT / 10
FROM     'places/part*.parquet'
GROUP BY 2
ORDER BY 2;
┌──────────────┬────────────┐
│ count_star() │ confidence │
│    int64     │   double   │
├──────────────┼────────────┤
│         4083 │        0.0 │
│         4149 │        0.1 │
│         3624 │        0.2 │
│     11248354 │        0.3 │
│      1492960 │        0.4 │
│         7243 │        0.5 │
│     11360452 │        0.6 │
│         7405 │        0.7 │
│     10562751 │        0.8 │
│      1522626 │        0.9 │
│     36231092 │        1.0 │
├──────────────┴────────────┤
│ 11 rows         2 columns │
└───────────────────────────┘

Below, I'll build a map of the most common confidence value for each H3 zoom-level 5 hexagon.

CREATE OR REPLACE TABLE h3_5s AS
    WITH b AS (
        WITH a AS (
            SELECT   H3_LATLNG_TO_CELL(bbox.ymin,
                                       bbox.xmin,
                                       5) h3_5,
                     confidence: (confidence*10)::INT / 10,
                     COUNT(*) num_recs
            FROM     'places/part*.parquet'
            GROUP BY 1, 2
        )
        SELECT *,
               ROW_NUMBER() OVER (PARTITION BY h3_5
                                  ORDER BY     num_recs DESC) AS rn
        FROM   a
    )
    FROM     b
    WHERE    rn = 1
    ORDER BY num_recs DESC;

COPY (
    SELECT geometry: H3_CELL_TO_BOUNDARY_WKT(h3_5)::GEOMETRY,
           confidence
    FROM   h3_5s
    WHERE  ST_XMIN(geometry::geometry) BETWEEN -179 AND 179
    AND    ST_XMAX(geometry::geometry) BETWEEN -179 AND 179
) TO 'confidence.h3_5_stats.parquet' (
    FORMAT 'PARQUET',
    CODEC  'ZSTD',
    COMPRESSION_LEVEL 22,
    ROW_GROUP_SIZE 15000);
Overture's Places

Below, I'll use larger hexagons to make generalising the confidence scores over open oceans easier.

CREATE OR REPLACE TABLE h3_3s AS
    WITH b AS (
        WITH a AS (
            SELECT   H3_LATLNG_TO_CELL(bbox.ymin,
                                       bbox.xmin,
                                       3) h3_3,
                     confidence: (confidence*10)::INT / 10,
                     COUNT(*) num_recs
            FROM     'places/part*.parquet'
            GROUP BY 1, 2
        )
        SELECT *,
               ROW_NUMBER() OVER (PARTITION BY h3_3
                                  ORDER BY     num_recs DESC) AS rn
        FROM   a
    )
    FROM     b
    WHERE    rn = 1
    ORDER BY num_recs DESC;

COPY (
    SELECT geometry: H3_CELL_TO_BOUNDARY_WKT(h3_3)::GEOMETRY,
           confidence
    FROM   h3_3s
    WHERE  ST_XMIN(geometry::geometry) BETWEEN -179 AND 179
    AND    ST_XMAX(geometry::geometry) BETWEEN -179 AND 179
) TO 'confidence.h3_3_stats.parquet' (
    FORMAT 'PARQUET',
    CODEC  'ZSTD',
    COMPRESSION_LEVEL 22,
    ROW_GROUP_SIZE 15000);
Overture's Places

The most common confidence value over the oceans and seas is 0.6. This matches most of African's landmass as well as that of the Middle East's and China's.

Thank you for taking the time to read this post. I offer both consulting and hands-on development services to clients in North America and Europe. If you'd like to discuss how my offerings can help your business please contact me via LinkedIn.

Copyright © 2014 - 2026 Mark Litwintschik. This site's template is based off a template by Giulio Fidente.