Home | Benchmarks | Categories | Atom Feed

Posted on Thu 06 November 2025 under Energy & Transport

American Wind Farms

In May, the US Geological Survey (USGS) published version 8.1 of the U.S. Wind Turbine Database (USWTDB). The first version was published back in 2018 and it documents the location 76K+ wind turbines, their capabilities and associated project data across the US and its territories.

Below is a heatmap of their locations.

USGS Wind Farm Dataset

In this post, I'll walk through converting this dataset into Parquet and examining some of its features.

My Workstation

I'm using a 5.7 GHz AMD Ryzen 9 9950X CPU. It has 16 cores and 32 threads and 1.2 MB of L1, 16 MB of L2 and 64 MB of L3 cache. It has a liquid cooler attached and is housed in a spacious, full-sized Cooler Master HAF 700 computer case.

The system has 96 GB of DDR5 RAM clocked at 4,800 MT/s and a 5th-generation, Crucial T700 4 TB NVMe M.2 SSD which can read at speeds up to 12,400 MB/s. There is a heatsink on the SSD to help keep its temperature down. This is my system's C drive.

The system is powered by a 1,200-watt, fully modular Corsair Power Supply and is sat on an ASRock X870E Nova 90 Motherboard.

I'm running Ubuntu 24 LTS via Microsoft's Ubuntu for Windows on Windows 11 Pro. In case you're wondering why I don't run a Linux-based desktop as my primary work environment, I'm still using an Nvidia GTX 1080 GPU which has better driver support on Windows and ArcGIS Pro only supports Windows natively.

Installing Prerequisites

I'll use GDAL 3.9.3 and a few other tools to help analyse the data in this post.

$ sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable
$ sudo apt update
$ sudo apt install \
    gdal-bin \
    jq

I'll use DuckDB, along with its H3, JSON, Lindel, Parquet and Spatial extensions in this post.

$ cd ~
$ wget -c https://github.com/duckdb/duckdb/releases/download/v1.4.1/duckdb_cli-linux-amd64.zip
$ unzip -j duckdb_cli-linux-amd64.zip
$ chmod +x duckdb
$ ~/duckdb
INSTALL h3 FROM community;
INSTALL lindel FROM community;
INSTALL json;
INSTALL parquet;
INSTALL spatial;

I'll set up DuckDB to load every installed extension each time it launches.

$ vi ~/.duckdbrc
.timer on
.width 180
LOAD h3;
LOAD lindel;
LOAD json;
LOAD parquet;
LOAD spatial;

Most of the maps in this post were rendered with QGIS version 3.44. QGIS is a desktop application that runs on Windows, macOS and Linux. The application has grown in popularity in recent years and has ~15M application launches from users all around the world each month.

I used QGIS' Tile+ plugin to add basemaps from NASA's Blue Marble and Bing for spatial context.

I also used the following GeoJSON files for counties and states.

Downloading the Dataset

I'll download the dataset and convert it into Parquet.

The original field names contained a lot of acronyms and truncated names. I've renamed and restructured these for easier human consumption. I've taken all the values representing missing values and converted them into NULLs. I've also converted any magic numbers into English strings.

$ mkdir -p ~/usgs_wind
$ cd ~/usgs_wind

$ wget https://energy.usgs.gov/uswtdb/assets/data/uswtdbSHP.zip
$ unzip uswtdbSHP.zip
$ ~/duckdb
COPY (
    SELECT
        case_id: case_id::INT,
        eia_id:  IF(eia_id::TEXT='-9999', NULL, eia_id::INT),
        faa: {
            asn: IF(faa_asn='missing', NULL, faa_asn),
            ors: IF(faa_ors='missing', NULL, faa_ors),
        },
        project: {
            avg_turbine_capacity: IF(p_cap::TEXT='-9999.0', NULL, p_cap::DOUBLE),
            name:                 p_name,
            num_turbines:         p_tnum::INT,
            online_year:          IF(p_year::TEXT='-9999', NULL, p_year::INT),
        },
        geometry: geom,
        bbox:     {'xmin': ST_XMIN(ST_EXTENT(geom)),
                   'ymin': ST_YMIN(ST_EXTENT(geom)),
                   'xmax': ST_XMAX(ST_EXTENT(geom)),
                   'ymax': ST_YMAX(ST_EXTENT(geom))},
        location: {
            country: t_county,
            fips:    IF(t_fips::VARCHAR='NA', NULL, t_fips::VARCHAR),
            state:   t_state,
        },
        turbine: {
            capacity:          IF(t_cap::TEXT='-9999', NULL, t_cap::DOUBLE),
            attribute_confidence:
                CASE WHEN t_conf_atr = 1 THEN 'Low'
                     WHEN t_conf_atr = 2 THEN 'Medium'
                     WHEN t_conf_atr = 3 THEN 'High'
                END,
            location_confidence:
                CASE WHEN t_conf_loc = 1 THEN 'Low'
                     WHEN t_conf_loc = 2 THEN 'Medium'
                     WHEN t_conf_loc = 3 THEN 'High'
                END,
            hub_height:        IF(t_hh::TEXT='-9999.0', NULL, t_hh::DOUBLE),
            imagery: {
                captured_at:   t_img_date,
                provider:      t_img_src,
            },
            manufacturer:      IF(t_manu='missing', NULL, t_manu),
            model:             IF(t_model='missing', NULL, t_model),
            is_offshore:       t_offshore=1,
            rated_capacity_mw: IF(t_rd::TEXT='-9999.0', NULL, t_rd::DOUBLE),
            retrofitted_year:  IF(t_retro_yr::TEXT='-9999', NULL, t_retro_yr::INT),
            is_retrofitted:    t_retrofit=1,
            rsa:               IF(t_rsa::TEXT='-9999.0', NULL, t_rsa::DOUBLE),
            ttlh:              IF(t_ttlh::TEXT='-9999.0', NULL, t_ttlh::DOUBLE),
        },
        usgs_pr_id:            IF(usgs_pr_id::TEXT='-9999', NULL, usgs_pr_id::INT),
    FROM ST_READ('uswtdb_V8_1_20250522.shx')
    ORDER BY HILBERT_ENCODE([ST_Y(ST_CENTROID(geom)),
                             ST_X(ST_CENTROID(geom))]::double[2])
) TO 'uswtdb_V8_1_20250522.parquet' (
        FORMAT 'PARQUET',
        CODEC  'ZSTD',
        COMPRESSION_LEVEL 22,
        ROW_GROUP_SIZE 15000);

Data Fluency

The following is an example record from this dataset.

$ echo "SELECT * EXCLUDE(bbox, faa, project, location, turbine),
               bbox::JSON     AS bbox,
               faa::JSON      AS faa,
               project::JSON  AS project,
               location::JSON AS location,
               turbine::JSON  AS turbine
        FROM   'uswtdb_V8_1_20250522.parquet'
        LIMIT  1" \
    | ~/duckdb -json \
    | jq -S .
[
  {
    "bbox": {
      "xmax": 144.72265600000003,
      "xmin": 144.72265600000003,
      "ymax": 13.389381000000071,
      "ymin": 13.389381000000071
    },
    "case_id": 3063607,
    "eia_id": null,
    "faa": {
      "asn": "2013-WTW-2712-OE",
      "ors": null
    },
    "geometry": "POINT (144.72265600000003 13.389381000000071)",
    "location": {
      "country": "Guam",
      "fips": "66010",
      "state": "GU"
    },
    "project": {
      "avg_turbine_capacity": 0.275,
      "name": "Guam Power Authority Wind Turbine",
      "num_turbines": 1,
      "online_year": 2016
    },
    "turbine": {
      "attribute_confidence": "High",
      "capacity": 275.0,
      "hub_height": 55.0,
      "imagery": {
        "captured_at": "2017-08-10",
        "provider": "Maxar"
      },
      "is_offshore": false,
      "is_retrofitted": false,
      "location_confidence": "High",
      "manufacturer": "Vergnet",
      "model": "GEV MP-C",
      "rated_capacity_mw": 32.0,
      "retrofitted_year": null,
      "rsa": 804.25,
      "ttlh": 71.0
    },
    "usgs_pr_id": null
  }
]

Below are the field names, data types, percentages of NULLs per column, number of unique values and minimum and maximum values for each column.

$ ~/duckdb
SELECT   column_name,
         column_type,
         null_percentage,
         approx_unique,
         min,
         max
FROM     (SUMMARIZE
          SELECT case_id,
                 eia_id,
                 faa.*,
                 location.*,
                 project.*,
                 turbine.* EXCLUDE(imagery),
                 turbine.imagery.captured_at,
                 turbine.imagery.provider,
                 usgs_pr_id,
          FROM   'uswtdb_V8_1_20250522.parquet')
ORDER BY 1;
┌──────────────────────┬─────────────┬─────────────────┬───────────────┬──────────────────┬─────────────────────┐
│     column_name      │ column_type │ null_percentage │ approx_unique │       min        │         max         │
│       varchar        │   varchar   │  decimal(9,2)   │     int64     │     varchar      │       varchar       │
├──────────────────────┼─────────────┼─────────────────┼───────────────┼──────────────────┼─────────────────────┤
│ asn                  │ VARCHAR     │            5.37 │         50703 │ 1987-AGL-900-OE  │ 2025-WTW-669-OE     │
│ attribute_confidence │ VARCHAR     │            0.00 │             3 │ High             │ Medium              │
│ avg_turbine_capacity │ DOUBLE      │            3.86 │           762 │ 0.05             │ 1055.6              │
│ capacity             │ DOUBLE      │            5.25 │           111 │ 50.0             │ 13000.0             │
│ captured_at          │ DATE        │            0.00 │          1669 │ 1970-01-01       │ 2025-03-25          │
│ case_id              │ INTEGER     │            0.00 │         78762 │ 3000001          │ 3140627             │
│ country              │ VARCHAR     │            0.00 │           566 │ Adair County     │ Zapata County       │
│ eia_id               │ INTEGER     │            5.17 │          1359 │ 90               │ 68318               │
│ fips                 │ VARCHAR     │            0.08 │           637 │ 02013            │ 72133               │
│ hub_height           │ DOUBLE      │            5.99 │           111 │ 19.0             │ 136.0               │
│ is_offshore          │ BOOLEAN     │            0.00 │             2 │ false            │ true                │
│ is_retrofitted       │ BOOLEAN     │            0.00 │             2 │ false            │ true                │
│ location_confidence  │ VARCHAR     │            0.00 │             3 │ High             │ Medium              │
│ manufacturer         │ VARCHAR     │            5.48 │            60 │ AAER             │ Zond                │
│ model                │ VARCHAR     │            5.65 │           318 │ 1.5-70.5         │ Zond                │
│ name                 │ VARCHAR     │            0.00 │          1806 │ 25 Mile Creek    │ unknown Yuma County │
│ num_turbines         │ INTEGER     │            0.00 │           192 │ 1                │ 713                 │
│ online_year          │ INTEGER     │            1.20 │            42 │ 1982             │ 2025                │
│ ors                  │ VARCHAR     │            5.91 │         72659 │ 02-000669        │ 56-064307           │
│ provider             │ VARCHAR     │            0.00 │             5 │ Bing Maps Aerial │ Planet              │
│ rated_capacity_mw    │ DOUBLE      │            5.88 │           113 │ 13.4             │ 200.0               │
│ retrofitted_year     │ INTEGER     │           89.39 │             9 │ 2015             │ 2023                │
│ rsa                  │ DOUBLE      │            5.88 │           121 │ 141.03           │ 31415.93            │
│ state                │ VARCHAR     │            0.00 │            55 │ AK               │ WY                  │
│ ttlh                 │ DOUBLE      │            5.99 │           235 │ 31.0             │ 210.9               │
│ usgs_pr_id           │ INTEGER     │           51.44 │         36701 │ 1                │ 49135               │
├──────────────────────┴─────────────┴─────────────────┴───────────────┴──────────────────┴─────────────────────┤
│ 26 rows                                                                                             6 columns │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Turbines

Below I'll generate a heatmap of turbine installation locations.

$ ~/duckdb
CREATE OR REPLACE TABLE h3_4_stats AS
    SELECT   H3_LATLNG_TO_CELL(
                bbox.ymin,
                bbox.xmin, 4) AS h3_4,
             COUNT(*) num_turbines
    FROM     'uswtdb_V8_1_20250522.parquet'
    GROUP BY 1;

COPY (
    SELECT ST_ASWKB(H3_CELL_TO_BOUNDARY_WKT(h3_4)::geometry) geometry,
           num_turbines
    FROM   h3_4_stats
    WHERE  ST_XMIN(geometry::geometry) BETWEEN -179 AND 179
    AND    ST_XMAX(geometry::geometry) BETWEEN -179 AND 179
) TO 'turbines.h3_4.gpkg'
  WITH (FORMAT GDAL,
        DRIVER 'GPKG',
        LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES');

The following was needed for ArcGIS Pro to recognise the projection of the above hexagons properly.

$ ogr2ogr \
    -f GPKG \
    -a_srs EPSG:4326 \
    turbines.h3_4.4326.gpkg \
    turbines.h3_4.gpkg
USGS Wind Farm Dataset

There are 63 unique manufactures listed in this dataset. These are the ones with the most and least number of turbines installed.

$ ~/duckdb
SELECT   num_turbines: COUNT(*),
         turbine.manufacturer
FROM     'uswtdb_V8_1_20250522.parquet'
GROUP BY 2
ORDER BY 1 DESC;
┌──────────────┬──────────────────────────────────────────────────────────────────────────┐
│ num_turbines │                               manufacturer                               │
│    int64     │                                 varchar                                  │
├──────────────┼──────────────────────────────────────────────────────────────────────────┤
│        34613 │ GE Wind                                                                  │
│        17649 │ Vestas                                                                   │
│         4689 │ Siemens                                                                  │
│         4167 │ NULL                                                                     │
│         2999 │ Gamesa                                                                   │
│         2765 │ Siemens Gamesa Renewable Energy                                          │
│         2214 │ Mitsubishi                                                               │
│         1917 │ Nordex                                                                   │
│         1190 │ Suzlon                                                                   │
│          758 │ Acciona                                                                  │
│            · │    ·                                                                     │
│            · │    ·                                                                     │
│            · │    ·                                                                     │
│            1 │ Wind Energy Solutions                                                    │
│            1 │ Bergey Energy                                                            │
│            1 │ Leitner Poma                                                             │
│            1 │ Norwin                                                                   │
│            1 │ DWT                                                                      │
│            1 │ Unison                                                                   │
│            1 │ Silver Eagle                                                             │
│            1 │ Changzhou Railcar Propulsion Engineering Research and Development Center │
│            1 │ Wincon                                                                   │
│            1 │ Renewtech LLC                                                            │
├──────────────┴──────────────────────────────────────────────────────────────────────────┤
│ 64 rows (20 shown)                                                            2 columns │
└─────────────────────────────────────────────────────────────────────────────────────────┘

Below are the number of turbines by manufacturer where at least 40 were installed in any one decade.

.maxrows 40

WITH a AS (
    SELECT   num_turbines: COUNT(*),
             decade: (project.online_year/10)::INT * 10,
             manufacturer: turbine.manufacturer
    FROM     'uswtdb_V8_1_20250522.parquet'
    WHERE    turbine.manufacturer IS NOT NULL
    GROUP BY 2, 3
    HAVING   num_turbines > 40
    ORDER BY 1 DESC
) PIVOT a
ON decade
USING SUM(num_turbines)
GROUP BY manufacturer
ORDER BY manufacturer;
┌─────────────────────────────────┬────────┬────────┬────────┬────────┬────────┐
│          manufacturer           │  1980  │  1990  │  2000  │  2010  │  2020  │
│             varchar             │ int128 │ int128 │ int128 │ int128 │ int128 │
├─────────────────────────────────┼────────┼────────┼────────┼────────┼────────┤
│ Acciona                         │   NULL │   NULL │   NULL │    603 │    155 │
│ Bonus                           │    152 │     55 │     97 │   NULL │   NULL │
│ Clipper                         │   NULL │   NULL │   NULL │    198 │   NULL │
│ Enron                           │   NULL │   NULL │    395 │   NULL │   NULL │
│ GE Wind                         │   NULL │   NULL │   1397 │  15234 │  17950 │
│ Gamesa                          │   NULL │   NULL │    386 │   1842 │    771 │
│ Goldwind                        │   NULL │   NULL │   NULL │    106 │     88 │
│ Goldwind Americas               │   NULL │   NULL │   NULL │   NULL │     48 │
│ Micon                           │   NULL │    198 │   NULL │   NULL │   NULL │
│ Mitsubishi                      │   NULL │   NULL │    396 │   1818 │   NULL │
│ NEG Micon                       │   NULL │   NULL │    440 │   NULL │   NULL │
│ Nordex                          │   NULL │   NULL │     41 │    265 │   1611 │
│ Nordtank                        │    207 │   NULL │   NULL │   NULL │   NULL │
│ Northern Power Systems          │   NULL │   NULL │   NULL │     92 │   NULL │
│ REpower                         │   NULL │   NULL │   NULL │    559 │   NULL │
│ Siemens                         │   NULL │   NULL │   NULL │   3300 │   1389 │
│ Siemens Gamesa Renewable Energy │   NULL │   NULL │    314 │    450 │   2001 │
│ Suzlon                          │   NULL │   NULL │   NULL │   1172 │   NULL │
│ Vestas                          │     48 │   NULL │   1532 │   4787 │  11265 │
│ Zond                            │   NULL │   NULL │     83 │   NULL │   NULL │
├─────────────────────────────────┴────────┴────────┴────────┴────────┴────────┤
│ 20 rows                                                            6 columns │
└──────────────────────────────────────────────────────────────────────────────┘

Below I'll plot out the top manufacturer per hexagon in the lower 48 states among the overall top ten.

CREATE OR REPLACE TABLE top_manufacturers AS
    SELECT   num_turbines: COUNT(*),
             manufacturer: turbine.manufacturer
    FROM     'uswtdb_V8_1_20250522.parquet'
    WHERE    turbine.manufacturer IS NOT NULL
    GROUP BY 2
    ORDER BY 1 DESC
    LIMIT 10;

CREATE OR REPLACE TABLE h3_3s AS
    WITH b AS (
        WITH a AS (
            SELECT   H3_LATLNG_TO_CELL(bbox.ymin,
                                       bbox.xmin,
                                       3) h3_3,
                     manufacturer: turbine.manufacturer,
                     COUNT(*) num_recs
            FROM     'uswtdb_V8_1_20250522.parquet'
            WHERE    turbine.manufacturer IN (
                SELECT DISTINCT manufacturer
                FROM   top_manufacturers
            )
            GROUP BY 1, 2
        )
        SELECT *,
               ROW_NUMBER() OVER (PARTITION BY h3_3
                                  ORDER BY     num_recs DESC) AS rn
        FROM   a
    )
    FROM     b
    WHERE    rn = 1
    ORDER BY num_recs DESC;

COPY (
    SELECT geom: H3_CELL_TO_BOUNDARY_WKT(h3_3)::GEOMETRY,
           manufacturer
    FROM   h3_3s
    WHERE  ST_XMIN(geom::GEOMETRY) BETWEEN -179 AND 179
    AND    ST_XMAX(geom::GEOMETRY) BETWEEN -179 AND 179
) TO 'turbine.top_manufacturer.h3_3.parquet' (
    FORMAT 'PARQUET',
    CODEC  'ZSTD',
    COMPRESSION_LEVEL 22,
    ROW_GROUP_SIZE 15000);
USGS Wind Farm Dataset

Only 67 turbines in this dataset are listed as being offshore.

SELECT   COUNT(*),
         turbine.is_offshore
FROM     'uswtdb_V8_1_20250522.parquet'
GROUP BY 2;
┌──────────────┬─────────────┐
│ count_star() │ is_offshore │
│    int64     │   boolean   │
├──────────────┼─────────────┤
│        75984 │ false       │
│           67 │ true        │
└──────────────┴─────────────┘

Below are the top 20 states by their total GW of installed capacity, number of turbines and distinct manufacturers represented.

SELECT   location.state,
         gw: ROUND(SUM(turbine.rated_capacity_mw) / 1000)::INT,
         num_turbines: COUNT(*),
         num_manufacturers: COUNT(DISTINCT turbine.manufacturer)
FROM     'uswtdb_V8_1_20250522.parquet'
GROUP BY 1
ORDER BY 2 DESC
LIMIT    20;
┌─────────┬───────┬──────────────┬───────────────────┐
│  state  │  gw   │ num_turbines │ num_manufacturers │
│ varchar │ int32 │    int64     │       int64       │
├─────────┼───────┼──────────────┼───────────────────┤
│ TX      │  1998 │        19415 │                22 │
│ IA      │   653 │         6472 │                22 │
│ OK      │   591 │         5597 │                13 │
│ KS      │   439 │         4415 │                11 │
│ IL      │   370 │         3837 │                16 │
│ CO      │   284 │         2908 │                12 │
│ CA      │   258 │         5510 │                18 │
│ MN      │   246 │         2736 │                19 │
│ ND      │   215 │         2169 │                 9 │
│ NM      │   208 │         2305 │                 7 │
│ OR      │   193 │         2173 │                 9 │
│ MI      │   186 │         1715 │                 9 │
│ IN      │   169 │         1651 │                12 │
│ NE      │   162 │         1504 │                 6 │
│ SD      │   162 │         1503 │                 9 │
│ WY      │   159 │         1713 │                 6 │
│ WA      │   157 │         1825 │                 9 │
│ NY      │   124 │         1429 │                13 │
│ MO      │   123 │         1108 │                 5 │
│ MT      │    92 │         1039 │                 7 │
├─────────┴───────┴──────────────┴───────────────────┤
│ 20 rows                                  4 columns │
└────────────────────────────────────────────────────┘

Of the 76K turbines in this dataset, 67K have a high confidence rating for both their location and attributes.

WITH a AS (
    SELECT   attribute:    turbine.attribute_confidence,
             location:     turbine.location_confidence,
             num_turbines: COUNT(*)
    FROM     'uswtdb_V8_1_20250522.parquet'
    GROUP BY 1, 2
    ORDER BY 3 DESC
)
PIVOT    a
ON       attribute,
USING    SUM(num_turbines)
GROUP BY location
ORDER BY location;
┌──────────┬────────┬────────┬────────┐
│ location │  High  │  Low   │ Medium │
│ varchar  │ int128 │ int128 │ int128 │
├──────────┼────────┼────────┼────────┤
│ High     │  67060 │   2924 │   4472 │
│ Low      │    137 │   1256 │    141 │
│ Medium   │      1 │     60 │   NULL │
└──────────┴────────┴────────┴────────┘

Project Footprints

Each turbine record contains denormalised project information. Below I'll generate the footprints for each project.

I'll group the projects by level-3 hexagons and name to generate their geographical footprints. This helps avoid country-wide overlaps that would result from grouping by name alone.

$ ~/duckdb
COPY (
    SELECT   h3_3: H3_LATLNG_TO_CELL(bbox.ymin,
                                     bbox.xmin,
                                     3),
             project_name: project.name,
             geometry: {
                'min_x': MIN(ST_X(geometry)),
                'min_y': MIN(ST_Y(geometry)),
                'max_x': MAX(ST_X(geometry)),
                'max_y': MAX(ST_Y(geometry))}::BOX_2D::GEOMETRY
    FROM     READ_PARQUET('uswtdb_V8_1_20250522.parquet')
    GROUP BY 1, 2
) TO 'projects.parquet' (
        FORMAT 'PARQUET',
        CODEC  'ZSTD',
        COMPRESSION_LEVEL 22,
        ROW_GROUP_SIZE 15000);

A total of 2,008 distinct projects were returned. Below are a few of these projects in Texas.

USGS Wind Farm Dataset
Thank you for taking the time to read this post. I offer both consulting and hands-on development services to clients in North America and Europe. If you'd like to discuss how my offerings can help your business please contact me via LinkedIn.

Copyright © 2014 - 2025 Mark Litwintschik. This site's template is based off a template by Giulio Fidente.