Home | Benchmarks | Categories | Atom Feed

Posted on Thu 29 August 2024 under GIS

AI on Street View

Global Streetscapes is a ten-million-row dataset labelling over 300 characteristics of community-sourced street view imagery from 688 cities around the world. These characteristics include how beautiful or boring a street is, what objects are visible and generally what kind of the image was taken in (on a bridge, in a residential neighbourhood, on a forest road, etc..)

The source imagery was provided by the Mapillary and KartaView projects.

Below is an example image that was used in this dataset.

Global Streetscapes

Collection of imagery happened at all hours of the day, and time of year with a diverse set of camera equipment and vehicles used. Below is a panoramic shot taken in Tallinn's Old Town.

Global Streetscapes

The code for the models that were run on the imagery as well as URLs to each of the 10M images used that produced this dataset and all of the model inference results have been made publicly available.

In this post, I'll download and examine Global Streetscapes' dataset.

My Workstation

I'm using a 6 GHz Intel Core i9-14900K CPU. It has 8 performance cores and 16 efficiency cores with a total of 32 threads and 32 MB of L2 cache. It has a liquid cooler attached and is housed in a spacious, full-sized, Cooler Master HAF 700 computer case. I've come across videos on YouTube where people have managed to overclock the i9-14900KF to 9.1 GHz.

The system has 96 GB of DDR5 RAM clocked at 6,000 MT/s and a 5th-generation, Crucial T700 4 TB NVMe M.2 SSD which can read at speeds up to 12,400 MB/s. There is a heatsink on the SSD to help keep its temperature down. This is my system's C drive.

The system is powered by a 1,200-watt, fully modular, Corsair Power Supply and is sat on an ASRock Z790 Pro RS Motherboard.

I'm running Ubuntu 22 LTS via Microsoft's Ubuntu for Windows on Windows 11 Pro. In case you're wondering why I don't run a Linux-based desktop as my primary work environment, I'm still using an Nvidia GTX 1080 GPU which has better driver support on Windows and I use ArcGIS Pro from time to time which only supports Windows natively.

Installing Prerequisites

I'll be using Python and a few other tools to help analyse the data in this post.

$ sudo apt update
$ sudo apt install \
    git \
    git-lfs \
    jq \
    python3-pip \
    python3-virtualenv

I'll set up a Python Virtual Environment and install some dependencies.

$ virtualenv ~/.streets
$ source ~/.streets/bin/activate

$ python -m pip install \
    duckdb

I'll also use DuckDB, along with its H3, JSON, Parquet and Spatial extensions, in this post.

$ cd ~
$ wget -c https://github.com/duckdb/duckdb/releases/download/v1.0.0/duckdb_cli-linux-amd64.zip
$ unzip -j duckdb_cli-linux-amd64.zip
$ chmod +x duckdb
$ ~/duckdb
INSTALL h3 FROM community;
INSTALL json;
INSTALL parquet;
INSTALL spatial;

I'll set up DuckDB to load every installed extension each time it launches.

$ vi ~/.duckdbrc
.timer on
.width 180
LOAD h3;
LOAD json;
LOAD parquet;
LOAD spatial;

The maps in this post were rendered with QGIS version 3.38.0. QGIS is a desktop application that runs on Windows, macOS and Linux. The application has grown in popularity in recent years and has ~15M application launches from users around the world each month.

I used QGIS' Tile+ plugin to add geospatial context with Esri's World Imagery and CARTO's Basemaps to the maps.

Downloading Global Streetscapes

The following will download 37 GB of CSV data. In addition to this, the .git folder will take up another 62 GB of disk space.

$ git lfs install Git LFS initialized.
$ git lfs clone https://huggingface.co/datasets/NUS-UAL/global-streetscapes/ \
        ~/streets

Data Fluency

In the data/ folder, there are 21 uncompressed CSV files. Below is a common metadata file that lists every image across Mapillary and KartaView's image collections.

Filename                        |   Row Count | Uncompressed Size
--------------------------------|-------------|------------------
metadata_common_attributes.csv  |  10,004,552 | 2.3 GB

Below are five files specific to Mapillary's imagery.

Filename                        |   Row Count | Uncompressed Size
--------------------------------|-------------|------------------
metadata_mly1.csv               |   8,891,021 | 1.6 GB
metadata_mly2.csv               |   8,891,021 | 1.6 GB
metadata_mly3.csv               |   8,891,021 | 913 MB
metadata_mly4.csv               |   8,891,021 | 3.0 GB
metadata_mly5.csv               |   8,891,021 | 2.8 GB

Below is KartaView's imagery metadata.

Filename                        |   Row Count | Uncompressed Size
--------------------------------|-------------|------------------
metadata_kv.csv                 |   1,113,532 | 1.3 GB

Below are 14 CSVs containing metadata and the inference results of the various models the Global Streetscapes project ran against Mapillary and KartaView's imagery.

Filename                        |   Row Count | Uncompressed Size
--------------------------------|-------------|------------------
climate.csv                     |  10,004,552 | 947 MB
contextual.csv                  |  10,004,552 | 1.1 GB
ephem.csv                       |  10,004,552 | 924 MB
gadm.csv                        |  10,004,552 | 2.2 GB
ghsl.csv                        |  10,004,552 | 761 MB
h3.csv                          |  10,004,552 | 3.0 GB
instances.csv                   |  10,004,458 | 3.1 GB
osm.csv                         |  10,004,552 | 2.0 GB
perception.csv                  |  10,004,552 | 897 MB
places365.csv                   |  10,004,552 | 699 MB
season.csv                      |  10,004,552 | 669 MB
segmentation.csv                |  10,004,540 | 3.7 GB
simplemaps.csv                  |  10,004,552 | 1.5 GB
speed.csv                       |  10,004,552 | 2.7 GB

There is a good amount of geographical diversity in this dataset.

$ ~/duckdb
COPY (
    SELECT   h3_cell_to_boundary_wkt(
                    h3_latlng_to_cell(lat, lon, 2))::geometry geom,
             COUNT(*) AS num_recs
    FROM     READ_CSV('metadata_common_attributes.csv')
    WHERE    lon > -175
    AND      lon < 175
    GROUP BY 1
) TO 'h3_2.gpkg'
    WITH (FORMAT GDAL,
          DRIVER 'GPKG',
          LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES');
Global Streetscapes

When zooming into Europe though, you can see almost all the imagery is from dense, urban areas. These urban areas also tend to be places people live rather than visit. Only one of the Canary Islands is represented while none of the Balearic Islands are to be found in any of the source imagery.

COPY (
    SELECT   h3_cell_to_boundary_wkt(
                  h3_latlng_to_cell(lat, lon, 5))::geometry geom,
             COUNT(*) AS num_recs
    FROM     READ_CSV('metadata_common_attributes.csv')
    WHERE    lon > -175
    AND      lon < 175
    GROUP BY 1
) TO 'h3.gpkg'
    WITH (FORMAT GDAL,
          DRIVER 'GPKG',
          LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES');
Global Streetscapes

Estonian Imagery

Almost all Estonian imagery is from the Mapillary project.

$ ~/duckdb
WITH a AS (
    SELECT   source,
             year,
             COUNT(*) records
    FROM     READ_CSV('metadata_common_attributes.csv')
    WHERE    lon > 21.3314 AND lon < 28.2156
    AND      lat > 57.7385 AND lat < 59.6237
    GROUP BY 1, 2
)
PIVOT    a
ON       source
USING    SUM(records)
GROUP BY year
ORDER BY year;
┌───────┬───────────┬───────────┐
│ year  │ KartaView │ Mapillary │
│ int64 │  int128   │  int128   │
├───────┼───────────┼───────────┤
│  2010 │           │         1 │
│  2011 │           │         1 │
│  2012 │           │         1 │
│  2013 │           │        14 │
│  2014 │           │       374 │
│  2015 │           │      3085 │
│  2016 │       169 │       276 │
│  2017 │       110 │       288 │
│  2018 │      2219 │      4737 │
│  2019 │           │     23222 │
│  2020 │           │     14928 │
│  2021 │       255 │     14162 │
│  2022 │           │     37458 │
│  2023 │           │      1874 │
├───────┴───────────┴───────────┤
│ 14 rows             3 columns │
└───────────────────────────────┘

I'll build a GeoPackage (GPKG) file with the Estonian Imagery and the metadata and model inference results bundled together. I'll first isolate the metadata records for Estonia from the other 10M records in this dataset.

$ echo "CREATE OR REPLACE TABLE meta_ca AS
            SELECT *
            FROM   READ_CSV('metadata_common_attributes.csv')
            WHERE  lon > 21.3314 AND lon < 28.2156
            AND    lat > 57.7385 AND lat < 59.6237" \
    | ~/duckdb streets.duckdb

Below I'll load the inference and other metadata CSVs into DuckDB. I needed a multi-million-line sample size to get around some inference issues in some of the CSV files but the 10M-line sample size creates a segment fault with segmentation CSV so for that file, I'll use a 100K-line sample size.

$ for TABLE_NAME in climate \
                    contextual \
                    ephem \
                    gadm \
                    ghsl \
                    instances \
                    osm \
                    perception \
                    places365 \
                    season \
                    segmentation \
                    simplemaps \
                    speed \
                    metadata_kv; do
      echo $TABLE_NAME

      if [ "$TABLE_NAME" == "segmentation" ]; then
         SAMPLE_SIZE=100_000
      else
         SAMPLE_SIZE=10_000_000
      fi

      echo "CREATE OR REPLACE TABLE $TABLE_NAME AS
              SELECT a.*
              FROM   READ_CSV('$TABLE_NAME.csv',
                              sample_size=$SAMPLE_SIZE) a
              JOIN   meta_ca b ON (a.source = b.source AND
                                   a.orig_id = b.orig_id);" \
          | ~/duckdb streets.duckdb
  done

KartaView's image and thumbnail URLs are accessible without an API key so I'll include them as the image and thumbnail fields. Mapillary URLs need to be signed with an API key. If this post proves popular, I'll revisit this section and look to add support for their imagery URLs as well.

$ ~/duckdb streets.duckdb
COPY (
    SELECT COLUMNS(a.* EXCLUDE (lat, lon))              AS 'meta_\0',
           ST_POINT(a.lon, a.lat)                       AS geom,
           COLUMNS(b.* EXCLUDE (uuid, source, orig_id)) AS 'climate_\0',
           COLUMNS(c.* EXCLUDE (uuid, source, orig_id)) AS 'context_\0',
           COLUMNS(d.* EXCLUDE (uuid, source, orig_id)) AS 'ephem_\0',
           COLUMNS(e.* EXCLUDE (uuid, source, orig_id)) AS 'gadm_\0',
           COLUMNS(f.* EXCLUDE (uuid, source, orig_id)) AS 'ghsl_\0',
           COLUMNS(g.* EXCLUDE (uuid, source, orig_id)) AS 'inst_\0',
           COLUMNS(h.* EXCLUDE (uuid, source, orig_id)) AS 'osm_\0',
           COLUMNS(i.* EXCLUDE (uuid, source, orig_id)) AS 'inst_\0',
           COLUMNS(j.* EXCLUDE (uuid, source, orig_id)) AS 'perception_\0',
           COLUMNS(k.* EXCLUDE (uuid, source, orig_id)) AS 'season_\0',
           COLUMNS(l.* EXCLUDE (uuid, source, orig_id)) AS 'seg_\0',
           COLUMNS(m.* EXCLUDE (uuid, source, orig_id)) AS 'simplemaps_\0',
           COLUMNS(n.* EXCLUDE (uuid, source, orig_id)) AS 'speed_\0',
           o.kv_fileurlProc                             AS image,
           o.kv_fileurlTh                               AS thumbnail
    FROM meta_ca           a
    JOIN climate           b ON (a.source = b.source AND a.orig_id = b.orig_id)
    JOIN contextual        c ON (a.source = c.source AND a.orig_id = c.orig_id)
    JOIN ephem             d ON (a.source = d.source AND a.orig_id = d.orig_id)
    JOIN gadm              e ON (a.source = e.source AND a.orig_id = e.orig_id)
    JOIN ghsl              f ON (a.source = f.source AND a.orig_id = f.orig_id)
    JOIN instances         g ON (a.source = g.source AND a.orig_id = g.orig_id)
    JOIN osm               h ON (a.source = h.source AND a.orig_id = h.orig_id)
    JOIN perception        i ON (a.source = i.source AND a.orig_id = i.orig_id)
    JOIN places365         j ON (a.source = j.source AND a.orig_id = j.orig_id)
    JOIN season            k ON (a.source = k.source AND a.orig_id = k.orig_id)
    JOIN segmentation      l ON (a.source = l.source AND a.orig_id = l.orig_id)
    JOIN simplemaps        m ON (a.source = m.source AND a.orig_id = m.orig_id)
    JOIN speed             n ON (a.source = n.source AND a.orig_id = n.orig_id)
    LEFT JOIN metadata_kv  o ON (a.source = o.source AND a.orig_id = o.orig_id)
) TO 'estonia.gpkg'
  WITH (FORMAT GDAL,
        DRIVER 'GPKG',
        LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES');

The above produced a 138 MB GPKG file with 101,461 records with 264 fields. Below is an example record.

$ echo "SELECT *
        FROM   ST_READ('estonia.gpkg')
        WHERE  meta_orig_id = 1352139489
        LIMIT  1" \
    | ~/duckdb -json \
    | grep '\[' \
    | jq -S .
[
  {
    "climate_koppen_geiger_zone": "Dfb",
    "climate_zone_description": "Humid continental, no dry season, warm summer",
    "context_glare": "no",
    "context_lighting_condition": "day",
    "context_pano_status": false,
    "context_platform": "walking surface",
    "context_quality": "good",
    "context_reflection": "no",
    "context_view_direction": "front/back",
    "context_weather": "clear",
    "ephem_calculated_day_night": "day",
    "ephem_hrs_aft_sunrise": null,
    "ephem_hrs_aft_sunset": -6.81,
    "ephem_timezone": "Europe/Tallinn",
    "ephem_utc_offset_s": 10800,
    "gadm_CC_1": null,
    "gadm_CC_2": null,
    "gadm_CC_3": null,
    "gadm_CC_4": null,
    "gadm_CC_5": null,
    "gadm_COUNTRY": "Estonia",
    "gadm_ENGTYPE_1": "County",
    "gadm_ENGTYPE_2": "Town",
    "gadm_ENGTYPE_3": "Administrative District",
    "gadm_ENGTYPE_4": null,
    "gadm_ENGTYPE_5": null,
    "gadm_GID_0": "EST",
    "gadm_GID_1": "EST.1_1",
    "gadm_GID_2": "EST.1.20_1",
    "gadm_GID_3": "EST.1.20.3_1",
    "gadm_GID_4": null,
    "gadm_GID_5": null,
    "gadm_HASC_1": "EE.HA",
    "gadm_HASC_2": "EE.HA.TA",
    "gadm_HASC_3": null,
    "gadm_ISO_1": null,
    "gadm_NAME_1": "Harju",
    "gadm_NAME_2": "Tallinn",
    "gadm_NAME_3": "Kesklinna",
    "gadm_NAME_4": null,
    "gadm_NAME_5": null,
    "gadm_NL_NAME_1": null,
    "gadm_NL_NAME_2": null,
    "gadm_NL_NAME_3": null,
    "gadm_TYPE_1": "Maakond",
    "gadm_TYPE_2": "Linn",
    "gadm_TYPE_3": "Linnaosa",
    "gadm_TYPE_4": null,
    "gadm_TYPE_5": null,
    "gadm_VARNAME_1": "Harjumaa|Harju maakond",
    "gadm_VARNAME_2": null,
    "gadm_VARNAME_3": null,
    "gadm_VARNAME_4": null,
    "geom": "POINT (24.750618 59.443103)",
    "ghsl_urban_code": 30,
    "ghsl_urban_term": "urban centre",
    "image": "https://storage13.openstreetcam.org/files/photo/2021/8/1/proc/3754065_7b4fc0ea36f23b810adcc38e2559295c.jpg",
    "inst_Banner": 1,
    "inst_Barrier": 0,
    "inst_Beautiful": 8.48,
    "inst_Bench": 0,
    "inst_Bicycle": 0,
    "inst_Bicyclist": 0,
    "inst_Bike-Lane": 0,
    "inst_Bike-Rack": 0,
    "inst_Billboard": 3,
    "inst_Bird": 0,
    "inst_Boat": 0,
    "inst_Boring": 8.34,
    "inst_Bridge": 0,
    "inst_Building": 1,
    "inst_Bus": 0,
    "inst_CCTV-Camera": 0,
    "inst_Car": 5,
    "inst_Car-Mount": 0,
    "inst_Caravan": 0,
    "inst_Catch-Basin": 0,
    "inst_Crosswalk---Plain": 0,
    "inst_Curb": 0,
    "inst_Curb-Cut": 0,
    "inst_Depressing": 3.26,
    "inst_Ego-Vehicle": 0,
    "inst_Fence": 0,
    "inst_Fire-Hydrant": 0,
    "inst_Ground-Animal": 0,
    "inst_Guard-Rail": 0,
    "inst_Junction-Box": 0,
    "inst_Lane-Marking---Crosswalk": 0,
    "inst_Lane-Marking---General": 0,
    "inst_Lively": 8.61,
    "inst_Mailbox": 0,
    "inst_Manhole": 0,
    "inst_Motorcycle": 0,
    "inst_Motorcyclist": 0,
    "inst_Mountain": 0,
    "inst_On-Rails": 0,
    "inst_Other-Rider": 0,
    "inst_Other-Vehicle": 0,
    "inst_Parking": 0,
    "inst_Pedestrian-Area": 0,
    "inst_Person": 1,
    "inst_Phone-Booth": 0,
    "inst_Pole": 8,
    "inst_Pothole": 0,
    "inst_Rail-Track": 0,
    "inst_Road": 1,
    "inst_Safe": 8.45,
    "inst_Sand": 0,
    "inst_Service-Lane": 0,
    "inst_Sidewalk": 1,
    "inst_Sky": 1,
    "inst_Snow": 0,
    "inst_Street-Light": 0,
    "inst_Terrain": 1,
    "inst_Traffic-Light": 0,
    "inst_Traffic-Sign-(Back)": 0,
    "inst_Traffic-Sign-(Front)": 0,
    "inst_Traffic-Sign-Frame": 0,
    "inst_Trailer": 0,
    "inst_Trash-Can": 0,
    "inst_Truck": 0,
    "inst_Tunnel": 0,
    "inst_Utility-Pole": 0,
    "inst_Vegetation": 1,
    "inst_Wall": 1,
    "inst_Water": 0,
    "inst_Wealthy": 8.93,
    "inst_Wheeled-Slow": 0,
    "meta_datetime_local": "2021-08-01 14:58:52+03",
    "meta_day": 1,
    "meta_hFoV": 66.487946,
    "meta_heading": 301.18,
    "meta_height": 1936,
    "meta_hour": 14,
    "meta_month": 8,
    "meta_orig_id": 1352139489,
    "meta_projection_type": "perspective",
    "meta_sequence_id": "3754065",
    "meta_sequence_img_count": 10,
    "meta_sequence_index": 9,
    "meta_source": "KartaView",
    "meta_uuid": "c09f4824-6cd9-44a2-a9b3-995ddd4940ea",
    "meta_vFoV": 52.171415353124,
    "meta_width": 2592,
    "meta_year": 2021,
    "osm_access": null,
    "osm_area": null,
    "osm_bridge": null,
    "osm_est_width": null,
    "osm_from": 599735029,
    "osm_highway": "footway",
    "osm_junction": null,
    "osm_key": 0,
    "osm_lanes": null,
    "osm_length": 24.061,
    "osm_maxspeed": null,
    "osm_name": null,
    "osm_oneway": false,
    "osm_osmid": "47008731",
    "osm_ref": null,
    "osm_reversed": "True",
    "osm_road_width": null,
    "osm_service": null,
    "osm_snap_dist": 1.8750683746177468,
    "osm_to": 599735028,
    "osm_tunnel": null,
    "osm_type_highway": "walk",
    "osm_u": 599735028,
    "osm_v": 599735029,
    "perception_place": "driveway",
    "season_season": "Summer",
    "seg_Banner": 0,
    "seg_Barrier": 0,
    "seg_Bench": 0,
    "seg_Bicycle": 0,
    "seg_Bicyclist": 0,
    "seg_Bike-Lane": 0,
    "seg_Bike-Rack": 0,
    "seg_Billboard": 7532,
    "seg_Bird": 0,
    "seg_Boat": 0,
    "seg_Bridge": 0,
    "seg_Building": 137443,
    "seg_Bus": 0,
    "seg_CCTV-Camera": 0,
    "seg_Car": 21603,
    "seg_Car-Mount": 0,
    "seg_Caravan": 0,
    "seg_Catch-Basin": 0,
    "seg_Crosswalk---Plain": 0,
    "seg_Curb": 0,
    "seg_Curb-Cut": 0,
    "seg_Ego-Vehicle": 0,
    "seg_Fence": 0,
    "seg_Fire-Hydrant": 0,
    "seg_Ground-Animal": 0,
    "seg_Guard-Rail": 0,
    "seg_Junction-Box": 0,
    "seg_Lane-Marking---Crosswalk": 0,
    "seg_Lane-Marking---General": 0,
    "seg_Mailbox": 0,
    "seg_Manhole": 0,
    "seg_Motorcycle": 0,
    "seg_Motorcyclist": 0,
    "seg_Mountain": 0,
    "seg_On-Rails": 0,
    "seg_Other-Rider": 0,
    "seg_Other-Vehicle": 0,
    "seg_Parking": 0,
    "seg_Pedestrian-Area": 0,
    "seg_Person": 6203,
    "seg_Phone-Booth": 0,
    "seg_Pole": 51701,
    "seg_Pothole": 0,
    "seg_Rail-Track": 0,
    "seg_Road": 49655,
    "seg_Sand": 0,
    "seg_Service-Lane": 0,
    "seg_Sidewalk": 2607613,
    "seg_Sky": 470246,
    "seg_Snow": 0,
    "seg_Street-Light": 0,
    "seg_Terrain": 740808,
    "seg_Total": 4861260,
    "seg_Traffic-Light": 0,
    "seg_Traffic-Sign-(Back)": 0,
    "seg_Traffic-Sign-(Front)": 0,
    "seg_Traffic-Sign-Frame": 0,
    "seg_Trailer": 0,
    "seg_Trash-Can": 0,
    "seg_Truck": 0,
    "seg_Tunnel": 0,
    "seg_Utility-Pole": 0,
    "seg_Vegetation": 753096,
    "seg_Wall": 15360,
    "seg_Water": 0,
    "seg_Wheeled-Slow": 0,
    "simplemaps_admin_name": "Harjumaa",
    "simplemaps_capital": "primary",
    "simplemaps_city": "Tallinn",
    "simplemaps_city_ascii": "Tallinn",
    "simplemaps_city_id": 1233260021,
    "simplemaps_city_lat": 59.4372,
    "simplemaps_city_lon": 24.745,
    "simplemaps_continent": "Europe",
    "simplemaps_country": "Estonia",
    "simplemaps_iso2": "EE",
    "simplemaps_iso3": "EST",
    "simplemaps_population": 438341,
    "speed_avg_speed_from_prev_kph": 6.203131451725688,
    "speed_distance_from_prev_km": 0.0051692762097714,
    "speed_distance_from_prev_m": 5.169276209771407,
    "speed_segmt_speed_max5_mean_kph": null,
    "speed_segmt_speed_max_kph": null,
    "speed_segmt_speed_mean_kph": null,
    "speed_segmt_speed_var_kph2": null,
    "speed_seq_dist_km": 0.0521873813209414,
    "speed_seq_dist_m": 52.18738132094145,
    "speed_seq_img_count": 10,
    "speed_seq_speed_kph": 5.525722728099683,
    "speed_seq_time_hr": 0.0094444444444444,
    "speed_time_from_prev_hr": 0.0008333333333333,
    "speed_time_from_prev_s": 3,
    "thumbnail": "https://storage13.openstreetcam.org/files/photo/2021/8/1/th/3754065_7b4fc0ea36f23b810adcc38e2559295c.jpg"
  }
]

Thumbnails in QGIS

If you drop the estonia.gpkg file onto a new QGIS project you'll see ~100K+ points load onto the scene. Select that layer's properties and select the display tab. Set the HTML Map Tip to the following:

<img src=[% "thumbnail" %] ><br/>

Beautiful: [% "inst_Beautiful" %]<br/>
Boring: [% "inst_Boring" %]<br/>
Place: [% "perception_place" %]<br/>
Global Streetscapes

In the View Menu, make sure "Show Map Tips" is selected.

Global Streetscapes

If you haven't selected any points then when you hover your mouse pointer over a KartaView-sourced point while its layer is selected, a thumbnail image should appear.

Global Streetscapes

Estonian Observations

The imagery covered only includes a section of central Tallinn and part of central Pärnu. This is the area covered in Tallinn:

Global Streetscapes

This is the area covered in Pärnu:

Global Streetscapes

Below are the number of distinct place types found for each city for each year.

$ ~/duckdb
WITH a AS (
    SELECT   COUNT(DISTINCT perception_place) as places,
             simplemaps_city,
             meta_year
    from     ST_READ('estonia.gpkg')
    GROUP BY 2, 3
)
PIVOT    a
ON       simplemaps_city
USING    SUM(places)
group by meta_year
ORDER BY meta_year;
┌───────────┬────────┬─────────┐
│ meta_year │ Pärnu  │ Tallinn │
│   int64   │ int128 │ int128  │
├───────────┼────────┼─────────┤
│      2010 │      1 │         │
│      2011 │        │       1 │
│      2012 │        │       1 │
│      2013 │        │      10 │
│      2014 │        │      58 │
│      2015 │      9 │      99 │
│      2016 │      9 │      47 │
│      2017 │     26 │      25 │
│      2018 │     71 │     145 │
│      2019 │     58 │     150 │
│      2020 │     89 │     151 │
│      2021 │     19 │     112 │
│      2022 │    111 │     160 │
│      2023 │     49 │         │
├───────────┴────────┴─────────┤
│ 14 rows            3 columns │
└──────────────────────────────┘

Streets can be re-captured over several years. Below is from central Pärnu showing the years where each image was taken.

Global Streetscapes

If I filter to imagery taken in 2023, you can see the results of one of the model's inferences as to the type of area each image was photographing.

Global Streetscapes

Here are the 2023 model inference results for central Tallinn.

Global Streetscapes

The season attributes align well with the months of the year. If this was produced by AI then that is very impressive.

$ ~/duckdb
WITH a AS (
    SELECT   meta_month,
             season_season,
             COUNT(*) records
    FROM     ST_READ('estonia.gpkg')
    GROUP BY 1, 2
)
PIVOT    a
ON       season_season
USING    SUM(records)
group by meta_month
ORDER BY meta_month;
┌────────────┬────────┬────────┬────────┬────────┐
│ meta_month │ Autumn │ Spring │ Summer │ Winter │
│   int64    │ int128 │ int128 │ int128 │ int128 │
├────────────┼────────┼────────┼────────┼────────┤
│          1 │        │        │        │  11175 │
│          2 │        │        │        │   9939 │
│          3 │        │   4153 │        │        │
│          4 │        │   8547 │        │        │
│          5 │        │  14140 │        │        │
│          6 │        │        │   6928 │        │
│          7 │        │        │  13186 │        │
│          8 │        │        │   8925 │        │
│          9 │   6553 │        │        │        │
│         10 │   7006 │        │        │        │
│         11 │   9074 │        │        │        │
│         12 │        │        │        │   1835 │
├────────────┴────────┴────────┴────────┴────────┤
│ 12 rows                              5 columns │
└────────────────────────────────────────────────┘

Below are the weather conditions each image was captured in.

$ ~/duckdb
WITH a AS (
    SELECT   meta_month,
             context_weather,
             COUNT(*) records
    FROM     ST_READ('estonia.gpkg')
    GROUP BY 1, 2
)
PIVOT    a
ON       context_weather
USING    SUM(records)
group by meta_month
ORDER BY meta_month;
┌────────────┬────────┬────────┬────────┬────────┬────────┐
│ meta_month │ clear  │ cloudy │ foggy  │ rainy  │ snowy  │
│   int64    │ int128 │ int128 │ int128 │ int128 │ int128 │
├────────────┼────────┼────────┼────────┼────────┼────────┤
│          1 │   2155 │     92 │        │    102 │   8826 │
│          2 │   1790 │     40 │        │     98 │   8011 │
│          3 │   1550 │    745 │     16 │    139 │   1703 │
│          4 │   4635 │   1777 │      4 │     12 │   2119 │
│          5 │  10660 │   2696 │        │    188 │    596 │
│          6 │   6559 │    284 │      1 │     61 │     23 │
│          7 │   9725 │   2536 │        │     19 │    906 │
│          8 │   7597 │   1205 │        │     41 │     82 │
│          9 │   3814 │   2540 │      1 │    124 │     74 │
│         10 │   3335 │   2876 │        │    445 │    350 │
│         11 │   4413 │   1321 │        │    340 │   3000 │
│         12 │    875 │    117 │        │     73 │    770 │
├────────────┴────────┴────────┴────────┴────────┴────────┤
│ 12 rows                                       6 columns │
└─────────────────────────────────────────────────────────┘

Estonia is so far North that the Sun is up at night during the summer months. Very few images report being taken at night in this dataset between the beginning of May and the end of August.

$ ~/duckdb
WITH a AS (
    SELECT   meta_hour,
             context_lighting_condition,
             COUNT(*) records
    from     ST_READ('estonia.gpkg')
    WHERE    meta_month > 4 AND meta_month < 9
    GROUP BY 1, 2
)
PIVOT    a
ON       context_lighting_condition
USING    SUM(records)
group by meta_hour
ORDER BY meta_hour;
┌───────────┬────────┬───────────┬────────┐
│ meta_hour │  day   │ dusk/dawn │ night  │
│   int64   │ int128 │  int128   │ int128 │
├───────────┼────────┼───────────┼────────┤
│         0 │    282 │        29 │     36 │
│         1 │    729 │       183 │    113 │
│         2 │   1246 │       146 │     46 │
│         3 │      1 │           │        │
│         4 │     54 │       171 │        │
│         5 │    183 │        53 │        │
│         7 │    281 │        13 │        │
│         8 │     43 │         1 │        │
│         9 │   1881 │         2 │        │
│        10 │   5519 │         6 │     15 │
│        11 │   3186 │        45 │      2 │
│        12 │   3028 │           │      1 │
│        13 │   4292 │         4 │      4 │
│        14 │   2102 │         1 │        │
│        15 │   2448 │        27 │     20 │
│        16 │   4192 │        30 │        │
│        17 │   1780 │        15 │        │
│        18 │   3264 │        33 │        │
│        19 │   2154 │       347 │        │
│        20 │    890 │        60 │        │
│        21 │    891 │        41 │        │
│        22 │   1026 │         5 │        │
│        23 │   1761 │       125 │    372 │
├───────────┴────────┴───────────┴────────┤
│ 23 rows                       4 columns │
└─────────────────────────────────────────┘

Further Research

As I uncover more insights into this dataset, I'll update this post with my findings.

Thank you for taking the time to read this post. I offer both consulting and hands-on development services to clients in North America and Europe. If you'd like to discuss how my offerings can help your business please contact me via LinkedIn.

Copyright © 2014 - 2024 Mark Litwintschik. This site's template is based off a template by Giulio Fidente.