In this post, I'll look at several level 0 administrative boundary datasets.
In some cases, their geometry is very accurate and takes a good deal of effort to find errors. In other cases, they've used course outlines and I'll share examples of these.
These datasets also vary in how many countries they outline in the world as well as the amount of metadata they store on each of them. I'll collect these values and show an example record from each of these datasets.
My Workstation
I'm using a 5.7 GHz AMD Ryzen 9 9950X CPU. It has 16 cores and 32 threads and 1.2 MB of L1, 16 MB of L2 and 64 MB of L3 cache. It has a liquid cooler attached and is housed in a spacious, full-sized Cooler Master HAF 700 computer case.
The system has 96 GB of DDR5 RAM clocked at 4,800 MT/s and a 5th-generation, Crucial T700 4 TB NVMe M.2 SSD which can read at speeds up to 12,400 MB/s. There is a heatsink on the SSD to help keep its temperature down. This is my system's C drive.
The system is powered by a 1,200-watt, fully modular Corsair Power Supply and is sat on an ASRock X870E Nova 90 Motherboard.
I'm running Ubuntu 24 LTS via Microsoft's Ubuntu for Windows on Windows 11 Pro. In case you're wondering why I don't run a Linux-based desktop as my primary work environment, I'm still using an Nvidia GTX 1080 GPU which has better driver support on Windows and ArcGIS Pro only supports Windows natively.
Installing Prerequisites
I'll use Python 3.12.3 and a few other tools to help analyse the data in this post.
$ sudo add-apt-repository ppa:deadsnakes/ppa
$ sudo apt update
$ sudo apt install \
jq \
python3-pip \
python3.12-venv
I'll set up a Python Virtual Environment and a JavaScript interpreter for Python.
$ python3 -m venv ~/.boundaries
$ source ~/.boundaries/bin/activate
$ python -m pip install \
awscli
I'll use DuckDB v1.4.1, along with its H3, JSON, Lindel, Parquet and Spatial extensions, in this post.
$ cd ~
$ wget -c https://github.com/duckdb/duckdb/releases/download/v1.4.1/duckdb_cli-linux-amd64.zip
$ unzip -j duckdb_cli-linux-amd64.zip
$ chmod +x duckdb
$ ~/duckdb
INSTALL h3 FROM community;
INSTALL lindel FROM community;
INSTALL json;
INSTALL parquet;
INSTALL spatial;
I'll set up DuckDB to load every installed extension each time it launches.
$ vi ~/.duckdbrc
.timer on
.width 180
LOAD h3;
LOAD lindel;
LOAD json;
LOAD parquet;
LOAD spatial;
The maps in this post were rendered using QGIS version 3.44. QGIS is a desktop application that runs on Windows, macOS and Linux. The application has grown in popularity in recent years and has ~15M application launches from users all around the world each month.
I used QGIS' Tile+ plugin to add basemaps from Bing to the maps in this post.
OpenStreetMap
OpenStreetMap (OSM) now has a Parquet dump of its dataset refreshed weekly and hosted via Cloudflare.
$ mkdir ~/osm
$ cd ~/osm
Below is the boundaries Parquet file. It's 1.8 GB and contains 796,070 records.
$ wget -c https://data.openstreetmap.us/layercake/boundaries.parquet
Below is an example record.
$ echo "SELECT * EXCLUDE(geometry,
bbox,
tags),
bbox::JSON bbox,
tags::JSON tags
FROM 'boundaries.parquet'
LIMIT 1" \
| ~/duckdb -json \
| jq -S .
[
{
"bbox": {
"xmax": -175.5414276123047,
"xmin": -177.24468994140625,
"ymax": -43.3108024597168,
"ymin": -44.63499450683594
},
"id": 2647558,
"tags": {
"ISO3166-1:alpha2": null,
"ISO3166-1:alpha3": null,
"ISO3166-2": "NZ-CIT",
"admin_level": "4",
"alt_name": null,
"alt_names": {},
"border_type": null,
"boundary": "administrative",
"claimed_by": null,
"controlled_by": null,
"disputed_by": null,
"int_name": null,
"name": [
"Chatham Islands"
],
"names": {
"cs": [
"Chathamské ostrovy"
],
"en": [
"Chatham Islands"
],
"ja": [
"チャタム諸島"
],
"mi": [
"Wharekauri"
],
"mk": [
"Четемски Острови"
],
"rrm": [
"Rēkohu"
],
"ru": [
"Архипелаг Чатем"
]
},
"official_name": null,
"official_names": {},
"place": null,
"recognized_by": null,
"type": null,
"wikidata": "Q115459",
"wikipedia": "en:Chatham Islands"
},
"type": "relation"
}
]
Below are the counts of the different admin levels contained within this dataset. OSM has had 20M volunteers over the years so normalisation of its attributes is an ongoing challenge.
$ ~/duckdb
SELECT COUNT(*),
tags.admin_level
FROM 'boundaries.parquet'
GROUP BY 2
ORDER BY 2;
┌──────────────┬─────────────────────────┐
│ count_star() │ admin_level │
│ int64 │ varchar │
├──────────────┼─────────────────────────┤
│ 12 │ 1 │
│ 234135 │ 10 │
│ 1 │ 10- │
│ 8883 │ 11 │
│ 352 │ 12 │
│ 1 │ 13 │
│ 14 │ 14 │
│ 1 │ 17 │
│ 1 │ 18 │
│ 1 │ 1o │
│ 816 │ 2 │
│ 201 │ 3 │
│ 4519 │ 4 │
│ 6538 │ 5 │
│ 50190 │ 6 │
│ 35348 │ 7 │
│ 252801 │ 8 │
│ 175427 │ 9 │
│ 1 │ Bairro │
│ 1 │ Cabecera Municipal │
│ 1 │ Cadastramento de Bairro │
│ 1 │ Desa │
│ 1 │ Eclesiastico │
│ 1 │ El Tinto │
│ 1 │ Municipalité │
│ 1 │ Neighborhood │
│ 1 │ RW/8 │
│ 5 │ Village │
│ 4 │ neighborhood │
│ 1 │ nieghborhood │
│ 1 │ suburb │
│ 1 │ هجرة الجيبان │
│ 26808 │ NULL │
├──────────────┴─────────────────────────┤
│ 33 rows 2 columns │
└────────────────────────────────────────┘
Below are the border type counts.
SELECT COUNT(*),
tags.border_type
FROM 'boundaries.parquet'
GROUP BY 2
ORDER BY 1 DESC;
┌──────────────┬──────────────────────────┐
│ count_star() │ border_type │
│ int64 │ varchar │
├──────────────┼──────────────────────────┤
│ 749336 │ NULL │
│ 10072 │ city │
│ 7517 │ town │
│ 6304 │ township │
│ 4252 │ village │
│ 3827 │ district │
│ 3312 │ county │
│ 3262 │ freguesia │
│ 1472 │ municipi │
│ 1220 │ borough │
│ 1014 │ municipality │
│ 886 │ barangay │
│ 487 │ região_autónoma │
│ 383 │ territorial │
│ 309 │ município │
│ 277 │ suburb │
│ 238 │ neighbourhood │
│ 194 │ arrondissement │
│ 158 │ baseline │
│ 126 │ region │
│ · │ · │
│ · │ · │
│ · │ · │
│ 1 │ hamlet.CDP │
│ 1 │ aboriginal_lands │
│ 1 │ rural_municipality │
│ 1 │ muncipality │
│ 1 │ Village │
│ 1 │ reserve │
│ 1 │ mrc │
│ 1 │ unincorporated_community │
│ 1 │ commune │
│ 1 │ homeowners_association │
│ 1 │ indian_reservation │
│ 1 │ county;town │
│ 1 │ adminstrative │
│ 1 │ m │
│ 1 │ International │
│ 1 │ 禁止水域 │
│ 1 │ Consejo Comunal │
│ 1 │ 10 │
│ 1 │ first_nation │
│ 1 │ municipio │
├──────────────┴──────────────────────────┤
│ 113 rows (40 shown) 2 columns │
└─────────────────────────────────────────┘
Below are the place counts.
SELECT COUNT(*),
tags.place
FROM 'boundaries.parquet'
GROUP BY 2
ORDER BY 1 DESC;
┌──────────────┬─────────────────────────┐
│ count_star() │ place │
│ int64 │ varchar │
├──────────────┼─────────────────────────┤
│ 617500 │ NULL │
│ 60313 │ locality │
│ 29191 │ hamlet │
│ 20311 │ village │
│ 15782 │ neighbourhood │
│ 13065 │ municipality │
│ 9763 │ suburb │
│ 6429 │ town │
│ 4363 │ quarter │
│ 3786 │ civil_parish │
│ 3191 │ city │
│ 2262 │ district │
│ 1666 │ county │
│ 1553 │ islet │
│ 1001 │ isolated_dwelling │
│ 990 │ subdistrict │
│ 939 │ plot │
│ 441 │ subdivision │
│ 409 │ island │
│ 388 │ province │
│ · │ · │
│ · │ · │
│ · │ · │
│ 3 │ payam │
│ 3 │ neighborhood │
│ 3 │ municipal_federation │
│ 2 │ country │
│ 2 │ * │
│ 2 │ sub │
│ 2 │ league │
│ 2 │ township │
│ 1 │ census │
│ 1 │ qah22 │
│ 1 │ districts │
│ 1 │ parish │
│ 1 │ governorate │
│ 1 │ historic_urban_district │
│ 1 │ place │
│ 1 │ Quarter │
│ 1 │ industrial │
│ 1 │ border_type │
│ 1 │ natural │
│ 1 │ block_number │
├──────────────┴─────────────────────────┤
│ 66 rows (40 shown) 2 columns │
└────────────────────────────────────────┘
This is Iceland's data, coloured by the boundary field.
This is the place field's data around Reykjavík.
There are 247 unique country codes in this dataset.
$ ~/duckdb
SELECT COUNT(DISTINCT tags."ISO3166-1:alpha3")
FROM 'boundaries.parquet'; -- 247
The geometry is a little over 8 GB in WKT format.
SELECT SUM(LENGTH(geometry::TEXT)) / 1024 ** 3
FROM 'boundaries.parquet'; -- 8.111921741627157
Overture
Overture release a new dataset monthly. They also remove old releases after 60 days so the URL below will likely need to be changed by the time you read this.
The following will list which releases are available for download.
$ curl https://labs.overturemaps.org/data/releases.json
{
"latest": "2025-11-19.0",
"releases": [
"2025-11-19.0",
"2025-10-22.0"
]
}
I'll download October's release. It's kept in four Parquet files totalling 3.2 GB and containing 1,052,542 records.
$ mkdir ~/overture
$ cd ~/overture
$ aws s3 \
--no-sign-request \
sync \
s3://overturemaps-us-west-2/release/2025-11-19.0/theme=divisions/type=division_area/ \
./
Below is an example record.
$ echo "SELECT * EXCLUDE(geometry,
bbox,
names,
sources),
bbox: bbox::JSON,
names: names::JSON,
sources: sources::JSON,
FROM 'part-*-c000.zstd.parquet'
WHERE country = 'EE'
LIMIT 1" \
| ~/duckdb -json \
| jq -S .
[
{
"bbox": {
"xmax": 27.38501739501953,
"xmin": 27.367538452148438,
"ymax": 57.655853271484375,
"ymin": 57.64270782470703
},
"class": "land",
"country": "EE",
"division_id": "9a1adadd-7f92-4cf9-8deb-5004a3b0c9ac",
"id": "45cafb8b-3a64-493c-8ad5-d038d2642a9e",
"is_land": true,
"is_territorial": true,
"names": {
"common": {
"ru": "Коорла"
},
"primary": "Koorla küla",
"rules": [
{
"between": null,
"language": null,
"perspectives": null,
"side": null,
"value": "Koorla",
"variant": "alternate"
}
]
},
"region": "EE-87",
"sources": [
{
"between": null,
"confidence": null,
"dataset": "OpenStreetMap",
"license": "ODbL-1.0",
"property": "",
"record_id": "r353065@4",
"update_time": "2024-02-20T18:32:20Z"
},
{
"between": null,
"confidence": null,
"dataset": "OpenStreetMap",
"license": "ODbL-1.0",
"property": "/properties/names/common/ru",
"record_id": "n1973426847@6",
"update_time": "2023-08-22T03:53:36Z"
}
],
"subtype": "locality",
"version": 1
}
]
Below are the number of records per class type.
SELECT COUNT(*),
class
FROM 'part-*-c000.zstd.parquet'
GROUP BY 2
ORDER BY 1 DESC;
┌──────────────┬──────────┐
│ count_star() │ class │
│ int64 │ varchar │
├──────────────┼──────────┤
│ 1050537 │ land │
│ 2005 │ maritime │
└──────────────┴──────────┘
Below are the number of records by is_land's value.
SELECT COUNT(*),
is_land
FROM 'part-*-c000.zstd.parquet'
GROUP BY 2
ORDER BY 1 DESC;
┌──────────────┬─────────┐
│ count_star() │ is_land │
│ int64 │ boolean │
├──────────────┼─────────┤
│ 1050537 │ true │
│ 2005 │ false │
└──────────────┴─────────┘
Below are the number of records by is_territorial's value.
SELECT COUNT(*),
is_territorial
FROM 'part-*-c000.zstd.parquet'
GROUP BY 2
ORDER BY 1 DESC;
┌──────────────┬────────────────┐
│ count_star() │ is_territorial │
│ int64 │ boolean │
├──────────────┼────────────────┤
│ 1050537 │ true │
│ 2005 │ false │
└──────────────┴────────────────┘
Below are the number of records by subtype.
SELECT COUNT(*),
subtype
FROM 'part-*-c000.zstd.parquet'
GROUP BY 2
ORDER BY 1 DESC;
┌──────────────┬──────────────┐
│ count_star() │ subtype │
│ int64 │ varchar │
├──────────────┼──────────────┤
│ 560682 │ locality │
│ 303353 │ neighborhood │
│ 78566 │ microhood │
│ 43856 │ macrohood │
│ 39786 │ county │
│ 21109 │ localadmin │
│ 4707 │ region │
│ 378 │ country │
│ 105 │ dependency │
└──────────────┴──────────────┘
There are 272 unique country codes in this dataset.
$ ~/duckdb
SELECT COUNT(DISTINCT country)
FROM 'part-*-c000.zstd.parquet'; -- 272
The geometry is a little over 8 GB in WKT format.
SELECT SUM(LENGTH(geometry::TEXT)) / 1024 ** 3
FROM 'part-*-c000.zstd.parquet'; -- 8.173572240397334
UN COD
The UN Common Operational Datasets (COD) is 9 GB across four Parquet files and contains 541,172 records.
$ mkdir ~/un_cod
$ cd ~/un_cod
$ wget -c https://data.fieldmaps.io/edge-matched/humanitarian/intl/adm1_polygons.parquet
$ wget -c https://data.fieldmaps.io/edge-matched/humanitarian/intl/adm2_polygons.parquet
$ wget -c https://data.fieldmaps.io/edge-matched/humanitarian/intl/adm3_polygons.parquet
$ wget -c https://data.fieldmaps.io/edge-matched/humanitarian/intl/adm4_polygons.parquet
Below is an example record.
$ echo "SELECT * EXCLUDE(geometry,
geometry_bbox),
bbox: geometry_bbox::JSON
FROM 'adm*_polygons.parquet'
WHERE iso_2 = 'EE'
LIMIT 1" \
| ~/duckdb -json \
| jq -S .
[
{
"adm0_id": "EST-20250729",
"adm0_name": "Estonia",
"adm0_name1": "Estonia",
"adm0_name2": null,
"adm0_src": "EST",
"adm1_id": "EST-20230119-01",
"adm1_name": "Harju maakond",
"adm1_name1": null,
"adm1_name2": null,
"adm1_src": "21999144B19319879871883",
"bbox": {
"xmax": 25.955127716064453,
"xmin": 23.730022430419922,
"ymax": 59.700199127197266,
"ymin": 58.989349365234375
},
"fid": 937,
"iso_2": "EE",
"iso_3": "EST",
"iso_3_grp": "EST",
"iso_cd": 233,
"region1_cd": 150,
"region1_nm": "Europe",
"region2_cd": 154,
"region2_nm": "Northern Europe",
"region3_cd": 154,
"region3_nm": "Northern Europe",
"src_date": "2020-01-01",
"src_grp": "geoBoundaries",
"src_lang": "en",
"src_lang1": null,
"src_lang2": null,
"src_lic": "Open Data Commons Open Database License 1.0",
"src_lvl": 3,
"src_name": "OpenStreetMap, OSM Boundaries",
"src_name1": null,
"src_update": "2023-01-19",
"src_url": "https://www.geoboundaries.org/api/current/gbOpen/EST/ALL/",
"status_cd": 1,
"status_nm": "State",
"wld_date": "2025-02-24",
"wld_notes": null,
"wld_update": "2025-07-29",
"wld_view": "intl"
}
]
There are 285 unique country codes in this dataset.
$ ~/duckdb
SELECT COUNT(DISTINCT iso_3)
FROM 'adm*_polygons.parquet'; -- 285
The geometry is a little over 16 GB in WKT format.
SELECT SUM(LENGTH(geometry::TEXT)) / 1024 ** 3
FROM 'adm*_polygons.parquet'; -- 16.433973343111575
Who's On First
Who's On First (WOF) stores its data as GeoJSON and partitions by country into 272 separate repositories on GitHub. Below, I'll convert their data for Bahrain into a single Parquet file.
$ mkdir ~/wof
$ cd ~/wof
$ git clone https://github.com/whosonfirst-data/whosonfirst-data-admin-bh/
$ cd whosonfirst-data-admin-bh/data
$ ~/duckdb wof.duckdb
CREATE OR REPLACE TABLE wof (
id BIGINT,
"type" VARCHAR,
properties JSON,
bbox double[],
geometry geometry);
$ for FILENAME in `find . | grep geojson`; do
echo $FILENAME
echo "INSERT INTO wof
SELECT id,
type,
properties,
bbox,
ST_GeomFromGeoJSON(geometry)
FROM READ_JSON('$FILENAME');" \
| ~/duckdb wof.duckdb
done
The following produced a 112 KB Parquet file.
$ ~/duckdb wof.duckdb
COPY (
FROM wof
ORDER BY HILBERT_ENCODE([ST_Y(ST_CENTROID(geometry)),
ST_X(ST_CENTROID(geometry))]::double[2])
) TO 'wof.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
I haven't studied their data structure in depth but the above produced records with either a linestring or point geometry.
In the North of Bahrain, there are several areas of land that are outside of their boundaries.
The Estonian repository contains 12,120 GeoJSON files and suffers from a small file problem. Converting these into a single Parquet file took a long time for a resulting 11 MB file.
USGS
The US Geological Survey (USGS) has a 1.1 GB Parquet file containing 318 records on level 0 administrative boundaries.
$ mkdir ~/usgs
$ cd ~/usgs
$ wget -c https://data.fieldmaps.io/adm0/usgs/intl/adm0_polygons.parquet
Below is an example record.
$ echo "SELECT * EXCLUDE(geometry,
geometry_bbox),
bbox: geometry_bbox::JSON
FROM 'adm0_polygons.parquet'
WHERE iso_2 = 'EE'
LIMIT 1" \
| ~/duckdb -json \
| jq -S .
[
{
"adm0_id": "EST-20250729",
"adm0_name": "Estonia",
"adm0_name1": "Estonia",
"adm0_name2": null,
"adm0_src": "EST",
"bbox": {
"xmax": 28.20932388305664,
"xmin": 21.76363754272461,
"ymax": 59.821800231933594,
"ymin": 57.50931167602539
},
"iso_2": "EE",
"iso_3": "EST",
"iso_3_grp": "EST",
"iso_cd": 233,
"region1_cd": 150,
"region1_nm": "Europe",
"region2_cd": 154,
"region2_nm": "Northern Europe",
"region3_cd": 154,
"region3_nm": "Northern Europe",
"status_cd": 1,
"status_nm": "State",
"wld_date": "2025-02-24",
"wld_land": "usgs",
"wld_notes": null,
"wld_update": "2025-07-29",
"wld_view": "intl"
}
]
There are 318 unique country codes in this dataset.
$ ~/duckdb
SELECT COUNT(DISTINCT adm0_src)
FROM 'adm0_polygons.parquet'; -- 318
The geometry is a little over 1.7 GB in WKT format.
SELECT SUM(LENGTH(geometry::TEXT)) / 1024 ** 3
FROM 'adm0_polygons.parquet'; -- 1.7173228040337563
The border around Northern Bahrain cuts corners and appears to be offset somewhat.
This contrasts OSM which draws the external boundaries over their maritime borders. This ensures any point along their coastlines will be seen as inside the country when reverse geocoded.
World Bank
The World Bank has a GeoPackage (GPKG) file that is 60 MB and contains 264 records. I needed to use their JS-based UI to download the GPKG file.
$ mkdir ~/world_bank
$ cd ~/world_bank
$ ~/duckdb world_bank.duckdb
CREATE OR REPLACE TABLE wb AS
FROM ST_READ('World Bank Official Boundaries - Admin 0.gpkg');
Below is an example record.
$ echo "SELECT * EXCLUDE(geom),
FROM wb
WHERE ISO_A2 = 'EE'
LIMIT 1" \
| ~/duckdb -json world_bank.duckdb \
| jq -S .
[
{
"GAUL_0": 78,
"HASC_0": "EE",
"ISO_A2": "EE",
"ISO_A3": "EST",
"NAM_0": "Estonia",
"SOVEREIGN": "EST",
"WB_A3": "EST",
"WB_REGION": "ECA",
"WB_STATUS": "Member State"
}
]
There are 244 unique country codes in this dataset.
$ ~/duckdb -json world_bank.duckdb
SELECT COUNT(DISTINCT ISO_A3)
FROM wb; -- 244
The geometry is a little over 142 MB in WKT format.
SELECT SUM(LENGTH(geom::TEXT)) / 1024 ** 2
FROM wb; -- 142.03707027435303
Their Northern Border for Bahrain cuts a lot of corners.
GADM
GADM version 4.1 is distributed as a 1.4 GB ZIP containing a 2.6 GB GPKG file with 356,508 records.
$ mkdir ~/gadm
$ cd ~/gadm
$ wget -c https://geodata.ucdavis.edu/gadm/gadm4.1/gadm_410-gdb.zip
$ unzip -j gadm_410-gdb.zip
Below is an example record.
$ ~/duckdb gadm.duckdb
CREATE OR REPLACE TABLE gadm AS
FROM ST_READ('gadm_410.gpkg');
$ echo "SELECT * EXCLUDE(geom),
FROM gadm
WHERE COUNTRY = 'Estonia'
LIMIT 1" \
| ~/duckdb -json gadm.duckdb \
| jq -S .
[
{
"CC_1": "",
"CC_2": "",
"CC_3": "",
"CC_4": "",
"CC_5": "",
"CONTINENT": "Europe",
"COUNTRY": "Estonia",
"DISPUTEDBY": "",
"ENGTYPE_1": "County",
"ENGTYPE_2": "Parish",
"ENGTYPE_3": "Town",
"ENGTYPE_4": "",
"ENGTYPE_5": "",
"GID_0": "EST",
"GID_1": "EST.1_1",
"GID_2": "EST.1.1_1",
"GID_3": "EST.1.1.1_1",
"GID_4": "",
"GID_5": "",
"GOVERNEDBY": "",
"HASC_1": "EE.HA",
"HASC_2": "EE.HA.AE",
"HASC_3": "",
"ISO_1": "",
"NAME_0": "Estonia",
"NAME_1": "Harju",
"NAME_2": "Aegviidu",
"NAME_3": "Aegviidu",
"NAME_4": "",
"NAME_5": "",
"NL_NAME_1": "",
"NL_NAME_2": "",
"NL_NAME_3": "",
"REGION": "",
"SOVEREIGN": "Estonia",
"SUBCONT": "",
"TYPE_1": "Maakond",
"TYPE_2": "Vald",
"TYPE_3": "Alev",
"TYPE_4": "",
"TYPE_5": "",
"UID": 49772,
"VALIDFR_1": "Unknown",
"VALIDFR_2": "2002",
"VALIDFR_3": "Unknown",
"VALIDFR_4": "",
"VARNAME_0": "",
"VARNAME_1": "Harjumaa|Harju maakond",
"VARNAME_2": "",
"VARNAME_3": "",
"VARNAME_4": "",
"VARREGION": ""
}
There are 263 unique country codes in this dataset.
$ ~/duckdb -json gadm.duckdb
SELECT COUNT(DISTINCT GID_0)
FROM gadm; -- 263
The geometry is a little over 5.4 GB in WKT format.
SELECT SUM(LENGTH(geom::TEXT)) / 1024 ** 3
FROM gadm; -- 5.4283982161432505
Their Northern Border for Bahrain cuts a lot of corners.
GAUL
The Food and Agriculture Organization (FAO) of the United Nations has published FAO GAUL 2024, a 310 MB ZIP file containing 433 MB in Shapefiles across 3,110 records.
$ mkdir ~/gaul
$ cd ~/gaul
$ wget -c https://storage.googleapis.com/fao-maps-catalog-data/boundaries/GAUL_2024_L1.zip
$ unzip -j GAUL_2024_L1.zip
Below is an example record.
$ ~/duckdb gaul.duckdb
CREATE OR REPLACE TABLE gaul AS
FROM ST_READ('GAUL_2024_L1.shx');
$ echo "SELECT * EXCLUDE(geom),
FROM gaul
WHERE iso3_code = 'EST'
LIMIT 1" \
| ~/duckdb -json gaul.duckdb \
| jq -S .
[
{
"continent": "Europe",
"disp_en": "Tartu maakond, Estonia",
"gaul0_code": 298,
"gaul0_name": "Estonia",
"gaul1_code": 3419,
"gaul1_name": "Tartu maakond",
"iso3_code": "EST",
"map_code": "EST"
}
]
There are 265 unique country codes in this dataset.
$ ~/duckdb gaul.duckdb
SELECT COUNT(DISTINCT iso3_code)
FROM gaul; -- 265
There is a little over one GB of geometry in WKT format in this dataset.
SELECT SUM(LENGTH(geom::TEXT)) / 1024 ** 2
FROM gaul; -- 1030.7613172531128
Their Northern Border for Bahrain cuts a lot of corners.
geoBoundaries
The geoBoundaries project has a 155 MB GPKG file containing 218 records.
$ mkdir ~/geoboundaries
$ cd ~/geoboundaries
$ wget -c https://github.com/wmgeolab/geoBoundaries/raw/main/releaseData/CGAZ/geoBoundariesCGAZ_ADM0.gpkg
Below is an example record.
$ ~/duckdb geoBoundaries.duckdb
CREATE OR REPLACE TABLE gb AS
FROM ST_READ('geoBoundariesCGAZ_ADM0.gpkg');
$ echo "SELECT * EXCLUDE(geom),
FROM gb
WHERE shapeGroup = 'EST'
LIMIT 1" \
| ~/duckdb -json geoBoundaries.duckdb \
| jq -S .
[
{
"id": null,
"shapeGroup": "EST",
"shapeName": "Estonia",
"shapeType": "ADM0"
}
]
There are 218 unique country records in this dataset.
$ ~/duckdb geoBoundaries.duckdb
SELECT COUNT(DISTINCT shapeGroup)
FROM gb; -- 218
The geometry is a little over 367 MB in WKT format.
SELECT SUM(LENGTH(geom::TEXT)) / 1024 ** 2
FROM gb; -- 367.0248956680298
Their Northern Border for Bahrain cuts a few corners.
GeoNames
For GeoNames, I needed to download two ZIP files totalling 398 MB. These contained 1.7 GB of CSV data. I then joined these together to create a 249-record dataset of the world's level 0 administrative boundaries.
$ mkdir ~/geonames
$ cd ~/geonames
$ wget -c https://download.geonames.org/export/dump/allCountries.zip
$ unzip -j allCountries.zip
$ ~/duckdb geonames.duckdb
CREATE OR REPLACE TABLE geonames (
geonameid BIGINT,
name VARCHAR,
asciiname VARCHAR,
alternatenames VARCHAR,
latitude DOUBLE,
longitude DOUBLE,
feature_class VARCHAR,
feature_code VARCHAR,
country_code VARCHAR,
cc2 VARCHAR,
admin1_code VARCHAR,
admin2_code VARCHAR,
admin3_code VARCHAR,
admin4_code VARCHAR,
population BIGINT,
elevation BIGINT,
dem BIGINT,
timezone VARCHAR,
modification_date DATE);
INSERT INTO geonames
FROM READ_CSV('allCountries.txt');
$ wget https://download.geonames.org/export/dump/shapes_all_low.zip
$ unzip -j shapes_all_low.zip
$ ~/duckdb geonames.duckdb
CREATE OR REPLACE TABLE geonames_boundaries AS
SELECT a.geoNameId,
b.*,
geometry: ST_GeomFromGeoJSON(geoJSON)
FROM READ_CSV('shapes_all_low.txt') a
LEFT JOIN geonames b ON a.geoNameId = b.geonameid;
Below is an example record.
$ echo "SELECT * EXCLUDE(geometry),
FROM geonames_boundaries
WHERE country_code = 'EE'
LIMIT 1" \
| ~/duckdb -json geonames.duckdb \
| jq -S .
[
{
"admin1_code": "00",
"admin2_code": null,
"admin3_code": null,
"admin4_code": null,
"alternatenames": "'Esitonia,An Eastoin,E-xto-ni-a,E-xto-ni-a (Estonia),E-xtô-ni-a,E-xtô-ni-a (Estonia),EE,Eastland,Eesti,Eesti Noukogude Sotsialistlik Vabariik,Eesti Nõukogude Sotsialistlik Vabariik,Eesti Vabariik,Ehstoni,Ehstonija,Eistland,Esetoni,Esitoni,Esitonia,Esitoniya,Esland,Esthonia,Estija,Estiska,Estlan,Estland,Estlanda,Estlandi,Estlánda,Estlân,Estoni,Estonia,Estonia nutome,Estonian Soviet Socialist Republic,Estonie,Estonii,Estonija,Estonio,Estoniya,Estoniýa,Estonja,Estonnie,Estonskaya Sovetskaya Sotsialisticheskaya Respublika,Estonsko,Estonujo,Estonya,Estonïi,Estooniya,Estounii,Estteeana,Estònia,Estónia,Estónsko,Estóńskô,Estônia,Esztorszag,Igaunija,Istunya,Lestiyaen,Lestiyän,Republic of Estonia,Viro,ai sha ni ya,an Eastoin,an Eastóin,astwny,astwnya,aysٹwniya,aysٽwnya,eseutonia,estoni'a,estoniya,estʼoneti,esutonia,esutonia gong he guo,gugrxesti,i-Estonia,isatoni'a,istoniya,l-Estonja,pra thes xe s to neiy,xe s to neiy,yېstwnyyە,Èstonie,Éstonia,Észtország,Ēastland,Ɛstonia,ʻEsitōnia,ʻstlʼand,ʼstwnyh,ʿesitoniya,Εσθονία,Естония,Естонија,Естонія,Эстони,Эстония,Эстонія,Էստոնիա,אסטוניה,עסטלאַנד,أستونيا,إستونيا,ئېستونىيە,استونی,استونیا,اسٹونیا,ايسٽونيا,ایسٹونِیا,اېسټونيا,ܐܣܬܘܢܝܐ,इस्टोनिया,एस्टोनिया,ইষ্টোনিয়া,এস্তোনিয়া,এস্তোনিয়া,ਇਸਟੋਨੀਆ,એસ્ટોનિયા,ଏସ୍ତୋନିଆ,எஸ்டோனியா,ఎస్టోనియా,ಎಸ್ಟೋನಿಯಾ,എസ്റ്റോണിയ,එස්තෝනියාව,ประเทศเอสโตเนีย,เอสโตเนีย,ເອສໂຕເນຍ,ཨས་ཊ་ན་ཡ,ཨིསྟོ་ནི་ཡ།,ཨེས་ཊོ་ནི་ཡ,အက်စတိုးနီးယား,ესტონეთი,ኤስቶኒያ,អេសតន,អេស្តូនី,エストニア,エストニア共和国,愛沙尼亞,爱沙尼亚,에스토니아",
"asciiname": "Republic of Estonia",
"cc2": null,
"country_code": "EE",
"dem": 99,
"elevation": null,
"feature_class": "A",
"feature_code": "PCLI",
"geoNameId": 453733,
"geonameid_1": 453733,
"latitude": 59.0,
"longitude": 26.0,
"modification_date": "2024-09-05",
"name": "Republic of Estonia",
"population": 1320884,
"timezone": "Europe/Tallinn"
}
]
There are 249 unique country codes in this dataset.
$ ~/duckdb geonames.duckdb
SELECT COUNT(DISTINCT country_code)
FROM geonames_boundaries; -- 249
The geometry is a little over 4.2 GB in WKT format.
SELECT SUM(LENGTH(geometry::TEXT)) / 1024 ** 2
FROM geonames_boundaries; -- 4.253528594970703
I'll convert this dataset into Parquet.
COPY (
FROM geonames_boundaries
ORDER BY HILBERT_ENCODE([ST_Y(ST_CENTROID(geometry)),
ST_X(ST_CENTROID(geometry))]::double[2])
) TO 'geonames_boundaries.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
Their Northern Border for Bahrain cuts a lot of corners.
GISCO
The European Commission has a GISCO dataset with level 0 administrative boundaries. There are several resolutions, years, formats and projections available.
$ mkdir ~/gisco
$ cd ~/gisco
I used their form to download the 1M resolution GPKG file for 2024. It was 23 MB and contained 263 records.
$ ~/duckdb eu.duckdb
CREATE OR REPLACE TABLE eu AS
FROM ST_READ('CNTR_RG_01M_2024_4326.gpkg');
Below is an example record.
$ echo "SELECT * EXCLUDE(geom),
FROM eu
WHERE ISO3_CODE = 'EST'
LIMIT 1" \
| ~/duckdb -json eu.duckdb \
| jq -S .
[
{
"CAPT": "Tallinn",
"CC_STAT": "F",
"CNTR_ID": "EE",
"CNTR_NAME": "Eesti",
"EFTA_STAT": "F",
"EU_STAT": "T",
"ISO3_CODE": "EST",
"NAME_ENGL": "Estonia",
"NAME_FREN": "Estonie",
"NAME_GERM": "Estland",
"SVRG_UN": "UN Member State"
}
]
There are 260 unique country codes in this dataset.
$ ~/duckdb eu.duckdb
SELECT COUNT(DISTINCT CNTR_ID)
FROM eu; -- 260
There is a little over 1.5 GB of geometry in WKT format in this dataset.
SELECT SUM(LENGTH(geom::TEXT)) / 1024 ** 2
FROM eu; -- 1.5088157653808594
Their Northern Border for Bahrain cuts a lot of corners.
Natural Earth
Natural Earth v5.1.2 comes in 10M, 50M and 110M resolutions. I'll use the 10M resolution for this post.
$ mkdir ~/natural_earth
$ cd ~/natural_earth
There are 33 localisations of the level-0 administrative boundaries dataset.
$ aws s3 \
ls \
--no-sign-request \
s3://naturalearth/5.1.2/10m_cultural/ \
| grep 'ne\_10m\_admin\_0\_countries\_[a-z][a-z][a-z]\.zip' \
| cut -c32-
ne_10m_admin_0_countries_arg.zip
ne_10m_admin_0_countries_bdg.zip
ne_10m_admin_0_countries_bra.zip
ne_10m_admin_0_countries_chn.zip
ne_10m_admin_0_countries_deu.zip
ne_10m_admin_0_countries_egy.zip
ne_10m_admin_0_countries_esp.zip
ne_10m_admin_0_countries_fra.zip
ne_10m_admin_0_countries_gbr.zip
ne_10m_admin_0_countries_grc.zip
ne_10m_admin_0_countries_idn.zip
ne_10m_admin_0_countries_ind.zip
ne_10m_admin_0_countries_iso.zip
ne_10m_admin_0_countries_isr.zip
ne_10m_admin_0_countries_ita.zip
ne_10m_admin_0_countries_jpn.zip
ne_10m_admin_0_countries_kor.zip
ne_10m_admin_0_countries_mar.zip
ne_10m_admin_0_countries_nep.zip
ne_10m_admin_0_countries_nld.zip
ne_10m_admin_0_countries_pak.zip
ne_10m_admin_0_countries_pol.zip
ne_10m_admin_0_countries_prt.zip
ne_10m_admin_0_countries_pse.zip
ne_10m_admin_0_countries_rus.zip
ne_10m_admin_0_countries_sau.zip
ne_10m_admin_0_countries_swe.zip
ne_10m_admin_0_countries_tlc.zip
ne_10m_admin_0_countries_tur.zip
ne_10m_admin_0_countries_twn.zip
ne_10m_admin_0_countries_ukr.zip
ne_10m_admin_0_countries_usa.zip
ne_10m_admin_0_countries_vnm.zip
I'll use the German localisation. The following downloaded a 4.7 MB ZIP file containing 9.3 MB worth of shapefiles.
$ aws s3 \
--no-sign-request \
cp \
s3://naturalearth/5.1.2/10m_cultural/ne_10m_admin_0_countries_deu.zip \
./
$ unzip -j ne_10m_admin_0_countries_deu.zip
$ ~/duckdb ne.duckdb
CREATE OR REPLACE TABLE ne AS
FROM ST_READ('ne_10m_admin_0_countries_deu.shx');
Below is an example record.
$ echo "SELECT * EXCLUDE(geom),
FROM ne
WHERE ADM0_A3 = 'EST'
LIMIT 1" \
| ~/duckdb -json ne.duckdb \
| jq -S .
[
{
"ABBREV": "Est.",
"ABBREV_LEN": 4,
"ADM0_A3": "EST",
"ADM0_A3_AR": "EST",
"ADM0_A3_BD": "EST",
"ADM0_A3_BR": "EST",
"ADM0_A3_CN": "EST",
"ADM0_A3_DE": "EST",
"ADM0_A3_EG": "EST",
"ADM0_A3_ES": "EST",
"ADM0_A3_FR": "EST",
"ADM0_A3_GB": "EST",
"ADM0_A3_GR": "EST",
"ADM0_A3_ID": "EST",
"ADM0_A3_IL": "EST",
"ADM0_A3_IN": "EST",
"ADM0_A3_IT": "EST",
"ADM0_A3_JP": "EST",
"ADM0_A3_KO": "EST",
"ADM0_A3_MA": "EST",
"ADM0_A3_NL": "EST",
"ADM0_A3_NP": "EST",
"ADM0_A3_PK": "EST",
"ADM0_A3_PL": "EST",
"ADM0_A3_PS": "EST",
"ADM0_A3_PT": "EST",
"ADM0_A3_RU": "EST",
"ADM0_A3_SA": "EST",
"ADM0_A3_SE": "EST",
"ADM0_A3_TR": "EST",
"ADM0_A3_TW": "EST",
"ADM0_A3_UA": "EST",
"ADM0_A3_UN": -99,
"ADM0_A3_US": "EST",
"ADM0_A3_VN": "EST",
"ADM0_A3_WB": -99,
"ADM0_DIF": 0,
"ADM0_DIFF": null,
"ADM0_ISO": "EST",
"ADM0_TLC": "EST",
"ADMIN": "Estonia",
"BRK_A3": "EST",
"BRK_DIFF": 0,
"BRK_GROUP": null,
"BRK_NAME": "Estonia",
"CONTINENT": "Europe",
"ECONOMY": "2. Developed region: nonG7",
"FCLASS_AR": null,
"FCLASS_BD": null,
"FCLASS_BR": null,
"FCLASS_CN": null,
"FCLASS_DE": null,
"FCLASS_EG": null,
"FCLASS_ES": null,
"FCLASS_FR": null,
"FCLASS_GB": null,
"FCLASS_GR": null,
"FCLASS_ID": null,
"FCLASS_IL": null,
"FCLASS_IN": null,
"FCLASS_ISO": "Admin-0 country",
"FCLASS_IT": null,
"FCLASS_JP": null,
"FCLASS_KO": null,
"FCLASS_MA": null,
"FCLASS_NL": null,
"FCLASS_NP": null,
"FCLASS_PK": null,
"FCLASS_PL": null,
"FCLASS_PS": null,
"FCLASS_PT": null,
"FCLASS_RU": null,
"FCLASS_SA": null,
"FCLASS_SE": null,
"FCLASS_TLC": "Admin-0 country",
"FCLASS_TR": null,
"FCLASS_TW": null,
"FCLASS_UA": null,
"FCLASS_US": null,
"FCLASS_VN": null,
"FIPS_10": "EN",
"FORMAL_EN": "Republic of Estonia",
"FORMAL_FR": null,
"GDP_MD": 31471,
"GDP_YEAR": 2019,
"GEOUNIT": "Estonia",
"GEOU_DIF": 0,
"GU_A3": "EST",
"HOMEPART": 1,
"INCOME_GRP": "1. High income: OECD",
"ISO_A2": "EE",
"ISO_A2_EH": "EE",
"ISO_A3": "EST",
"ISO_A3_EH": "EST",
"ISO_N3": "233",
"ISO_N3_EH": "233",
"LABELRANK": 6,
"LABEL_X": 25.867126,
"LABEL_Y": 58.724865,
"LEVEL": 2,
"LONG_LEN": 7,
"MAPCOLOR13": 10,
"MAPCOLOR7": 3,
"MAPCOLOR8": 2,
"MAPCOLOR9": 1,
"MAX_LABEL": 8.0,
"MIN_LABEL": 3.0,
"MIN_ZOOM": 0.0,
"NAME": "Estonia",
"NAME_ALT": null,
"NAME_AR": "إستونيا",
"NAME_BN": "এস্তোনিয়া",
"NAME_CIAWF": "Estonia",
"NAME_DE": "Estland",
"NAME_EL": "Εσθονία",
"NAME_EN": "Estonia",
"NAME_ES": "Estonia",
"NAME_FA": "استونی",
"NAME_FR": "Estonie",
"NAME_HE": "אסטוניה",
"NAME_HI": "एस्टोनिया",
"NAME_HU": "Észtország",
"NAME_ID": "Estonia",
"NAME_IT": "Estonia",
"NAME_JA": "エストニア",
"NAME_KO": "에스토니아",
"NAME_LEN": 7,
"NAME_LONG": "Estonia",
"NAME_NL": "Estland",
"NAME_PL": "Estonia",
"NAME_PT": "Estónia",
"NAME_RU": "Эстония",
"NAME_SORT": "Estonia",
"NAME_SV": "Estland",
"NAME_TR": "Estonya",
"NAME_UK": "Естонія",
"NAME_UR": "استونیا",
"NAME_VI": "Estonia",
"NAME_ZH": "爱沙尼亚",
"NAME_ZHT": "愛沙尼亞",
"NE_ID": 1159320615,
"NOTE_ADM0": null,
"NOTE_BRK": null,
"POP_EST": 1326590.0,
"POP_RANK": 12,
"POP_YEAR": 2019,
"POSTAL": "EST",
"REGION_UN": "Europe",
"REGION_WB": "Europe & Central Asia",
"SOVEREIGNT": "Estonia",
"SOV_A3": "EST",
"SUBREGION": "Northern Europe",
"SUBUNIT": "Estonia",
"SU_A3": "EST",
"SU_DIF": 0,
"TINY": -99,
"TLC": "1",
"TLC_DIFF": null,
"TYPE": "Sovereign country",
"UN_A3": "233",
"WB_A2": "EE",
"WB_A3": "EST",
"WIKIDATAID": "Q191",
"WOE_ID": 23424805,
"WOE_ID_EH": 23424805,
"WOE_NOTE": "Exact WOE match as country",
"featurecla": "Admin-0 country",
"scalerank": 0
}
]
Their Northern Border for Bahrain cuts a lot of corners.
Scoreboard
During this review, I also checked the borders around Monaco and Estonia. Below is a summary of the accuracy of the borders around these three countries, the size of their geometry in WKT format and the number of unique country codes.
The following shows the overlap between the datasets that cut corners around Bahrain's Northern coast. I excluded WOF as it made the map too complex.