The General Catalog of Artificial Space Objects (GCAT) is a dataset collection of space launch sites, vehicles, satellites and other related items. It's compiled and maintained by Jonathan C. McDowell, who uses its data for his space reports. 854 such reports have been published to date.
In this post, I'll convert a handful of the GCAT datasets to Parquet and do some analysis on them.
My Workstation
I'm using a 5.7 GHz AMD Ryzen 9 9950X CPU. It has 16 cores and 32 threads and 1.2 MB of L1, 16 MB of L2 and 64 MB of L3 cache. It has a liquid cooler attached and is housed in a spacious, full-sized Cooler Master HAF 700 computer case.
The system has 96 GB of DDR5 RAM clocked at 4,800 MT/s and a 5th-generation, Crucial T700 4 TB NVMe M.2 SSD which can read at speeds up to 12,400 MB/s. There is a heatsink on the SSD to help keep its temperature down. This is my system's C drive.
The system is powered by a 1,200-watt, fully modular Corsair Power Supply and is sat on an ASRock X870E Nova 90 Motherboard.
I'm running Ubuntu 24 LTS via Microsoft's Ubuntu for Windows on Windows 11 Pro. In case you're wondering why I don't run a Linux-based desktop as my primary work environment, I'm still using an Nvidia GTX 1080 GPU which has better driver support on Windows and ArcGIS Pro only supports Windows natively.
Installing Prerequisites
I'll use jq to help analyse the data in this post.
$ sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable
$ sudo apt update
$ sudo apt install \
jq
I'll use DuckDB, along with its H3, JSON, Lindel, Parquet and Spatial extensions in this post.
$ cd ~
$ wget -c https://github.com/duckdb/duckdb/releases/download/v1.5.1/duckdb_cli-linux-amd64.zip
$ unzip -j duckdb_cli-linux-amd64.zip
$ chmod +x duckdb
$ ~/duckdb
INSTALL h3 FROM community;
INSTALL lindel FROM community;
INSTALL json;
INSTALL parquet;
INSTALL spatial;
I'll set up DuckDB to load every installed extension each time it launches.
$ vi ~/.duckdbrc
.timer on
.width 180
LOAD h3;
LOAD lindel;
LOAD json;
LOAD parquet;
LOAD spatial;
The maps in this post were rendered with QGIS version 4.0.1. QGIS is a desktop application that runs on Windows, macOS and Linux. The application has grown in popularity in recent years and has ~15M application launches from users all around the world each month.
I used QGIS' HCMGIS plugin to add basemaps from Esri to this post.
Organizations
I'll start by downloading the organisations dataset.
$ wget https://planet4589.org/space/gcat/tsv/tables/orgs.tsv
The first row contains the field names but starts with a hash. The second row is a timestamp of when this file was last updated.
$ head -n5 orgs.tsv | cut -c-30
#Code UCode StateCode Type Cla
# Updated 2026 Apr 24 1739:51
EARTH EARTH EARTH AP C - - Ear
LUNA LUNA LUNA AP C - - Luna L
SSYS SSYS SSYS AP C - - Solar
I'll exclude the header row as well as the timestamp. This will create a file that's easier for DuckDB to import.
$ head -n1 orgs.tsv | tail -c+2 > working.csv
$ tail -n+3 orgs.tsv >> working.csv
I'll clean up the values, casting them to stronger types where I can and producing a parquet file with their contents.
$ ~/duckdb
COPY(
SELECT Class,
Code,
EName: IF(LENGTH(TRIM(EName)) < 2, NULL, EName),
Error,
Location: IF(Location = '-', NULL, Location),
Name,
Parent: IF(LENGTH(TRIM(Parent)) < 2, NULL, Parent),
ShortEName: IF(LENGTH(TRIM(ShortEName)) < 2, NULL, ShortEName),
ShortName: IF(LENGTH(TRIM(ShortName)) < 4, NULL, ShortName),
StateCode,
TStart: IF(LENGTH(TRIM(TStart)) < 4, NULL, TStart[:4]::INT),
TStop: IF(LENGTH(TRIM(TStop)) < 4, NULL, TStop[:4]::INT),
Type: IF(LENGTH(TRIM(Type)) < 4, NULL, Type),
UCode,
UName,
geometry: IF(LENGTH(Latitude)>1,
ST_POINT(Longitude::FLOAT, Latitude::FLOAT),
NULL),
bbox: {'xmin': ST_XMIN(ST_EXTENT(geometry)),
'ymin': ST_YMIN(ST_EXTENT(geometry)),
'xmax': ST_XMAX(ST_EXTENT(geometry)),
'ymax': ST_YMAX(ST_EXTENT(geometry))},
FROM 'working.csv'
) TO 'orgs.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
There are 4,013 records in this dataset.
SELECT COUNT(*)
FROM 'orgs.parquet'; -- 4013
Below is a breakdown of unique values and NULL coverage across each column.
SELECT column_name,
column_type[:30],
null_percentage,
approx_unique,
min[:30],
max[:30]
FROM (SUMMARIZE
FROM 'orgs.parquet')
ORDER BY LOWER(column_name);
┌─────────────┬────────────────────────────────┬─────────────────┬───────────────┬────────────────────────────────┬───────────────────────────────────────────────┐
│ column_name │ column_type[:30] │ null_percentage │ approx_unique │ min[:30] │ max[:30] │
│ varchar │ varchar │ decimal(9,2) │ int64 │ varchar │ varchar │
├─────────────┼────────────────────────────────┼─────────────────┼───────────────┼────────────────────────────────┼───────────────────────────────────────────────┤
│ bbox │ STRUCT(xmin DOUBLE, ymin DOUBL │ 0.00 │ 1455 │ {'xmin': -213.25, 'ymin': 49.2 │ {'xmin': NULL, 'ymin': NULL, ' │
│ Class │ VARCHAR │ 0.00 │ 4 │ A │ D │
│ Code │ VARCHAR │ 0.00 │ 3675 │ 10ADS │ ZZWYZ │
│ EName │ VARCHAR │ 66.43 │ 1468 │ 2nd Academy of 7th Machine Bui │ [Same as CALT?] │
│ Error │ VARCHAR │ 0.00 │ 13 │ 0.0000 │ 0.5000 │
│ geometry │ GEOMETRY('OGC:CRS84') │ 0.07 │ 1716 │ POINT (2.25 48.900001525878906 │ POINT (-122.02880096435547 37. │
│ Location │ VARCHAR │ 0.17 │ 1657 │ 's-Gravenhage │ ar-Riyad, Al Arabiyah │
│ Name │ VARCHAR │ 0.00 │ 4553 │ - - │ zhongguo zairen hangtian gongc │
│ Parent │ VARCHAR │ 67.95 │ 429 │ 21AT │ ZZB │
│ ShortEName │ VARCHAR │ 89.48 │ 393 │ 5th Academy │ iSpace │
│ ShortName │ VARCHAR │ 11.06 │ 2707 │ 061 Jidi │ ti Space │
│ StateCode │ VARCHAR │ 0.00 │ 182 │ AAT │ ZW │
│ TStart │ INTEGER │ 7.43 │ 303 │ 700 │ 2025 │
│ TStop │ INTEGER │ 66.58 │ 86 │ 1660 │ 2025 │
│ Type │ VARCHAR │ 66.98 │ 89 │ E/LS │ PL/S │
│ UCode │ VARCHAR │ 0.00 │ 3003 │ 10ADS │ ZZWYZ │
│ UName │ VARCHAR │ 0.00 │ 4238 │ - - │ NEC東芝スペースシステム株式会社 府中事業所 │
└─────────────┴────────────────────────────────┴─────────────────┴───────────────┴────────────────────────────────┴───────────────────────────────────────────────┘
CREATE OR REPLACE TABLE h3_2_stats AS
SELECT h3_2: H3_LATLNG_TO_CELL(
bbox.ymin,
bbox.xmin,
2),
COUNT(*) num_recs
FROM 'orgs.parquet'
GROUP BY 1;
COPY (
SELECT geometry: ST_ASWKB(H3_CELL_TO_BOUNDARY_WKT(h3_2)::geometry),
num_recs
FROM h3_2_stats
WHERE ST_XMIN(H3_CELL_TO_BOUNDARY_WKT(h3_2)::geometry) BETWEEN -179 AND 179
AND ST_XMAX(H3_CELL_TO_BOUNDARY_WKT(h3_2)::geometry) BETWEEN -179 AND 179
AND ST_YMIN(H3_CELL_TO_BOUNDARY_WKT(h3_2)::geometry) BETWEEN -88 AND 88
) TO 'orgs.h3_2.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
Below is a heatmap of the 4K organisation's locations.
Below are a few of the organisations in Florida.
Below are some more in and around Moscow.
Launch Platforms
Below, I'll download and prepare the platforms dataset for DuckDB.
$ wget https://planet4589.org/space/gcat/tsv/tables/platforms.tsv
$ head -n1 platforms.tsv | tail -c+2 > working.csv
$ tail -n+3 platforms.tsv >> working.csv
I'll clean up the values, casting them to stronger types where I can and producing a parquet file with their contents.
$ ~/duckdb
COPY(
SELECT Class,
Code,
"Group": IF("Group" = '-', NULL, "Group"),
Location: IF(Location = '-', NULL, Location),
Name,
Parent,
ShortEName: IF(ShortEName = '-', NULL, ShortEName),
ShortName,
StateCode,
TStart: IF(TStart = '-', NULL, TStart[:4]::INT),
TStop: IF(TStop = '-', NULL, TStop[:4]::INT),
Type,
UCode,
UName,
VClass: IF(VClass = '-', NULL, VClass),
VClassID: IF(VClassID = '-', NULL, VClassID),
VID: IF(VID = '-', NULL, VID)
FROM 'working.csv'
) TO 'platforms.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
There are 384 records in this dataset.
SELECT COUNT(*)
FROM 'platforms.parquet'; -- 384
Below is a breakdown of unique values and NULL coverage across each column.
SELECT column_name,
column_type[:30],
null_percentage,
approx_unique,
min[:30],
max[:30]
FROM (SUMMARIZE
FROM 'platforms.parquet')
ORDER BY LOWER(column_name);
┌─────────────┬──────────────────┬─────────────────┬───────────────┬────────────────────────────────┬────────────────────┐
│ column_name │ column_type[:30] │ null_percentage │ approx_unique │ min[:30] │ max[:30] │
│ varchar │ varchar │ decimal(9,2) │ int64 │ varchar │ varchar │
├─────────────┼──────────────────┼─────────────────┼───────────────┼────────────────────────────────┼────────────────────┤
│ Class │ VARCHAR │ 0.00 │ 3 │ B │ D │
│ Code │ VARCHAR │ 0.00 │ 388 │ 824YONG │ ZULUV │
│ Group │ VARCHAR │ 2.34 │ 35 │ AEGIS │ USSHIPS │
│ Location │ VARCHAR │ 79.95 │ 27 │ Beale AFB │ White Sands │
│ Name │ VARCHAR │ 0.00 │ 364 │ 8.24 Yongung (Gorae class sub) │ Yu. M. Shokal'skiy │
│ Parent │ VARCHAR │ 0.00 │ 51 │ AANII │ VORB │
│ ShortEName │ VARCHAR │ 100.00 │ 0 │ NULL │ NULL │
│ ShortName │ VARCHAR │ 0.00 │ 363 │ AIR B-1B │ White Knight 1 │
│ StateCode │ VARCHAR │ 0.00 │ 11 │ CN │ US │
│ TStart │ INTEGER │ 46.61 │ 68 │ 1944 │ 2024 │
│ TStop │ INTEGER │ 72.92 │ 42 │ 1958 │ 2020 │
│ Type │ VARCHAR │ 0.00 │ 4 │ AIR │ SUB │
│ UCode │ VARCHAR │ 0.00 │ 408 │ 824YOG │ ZULUV │
│ UName │ VARCHAR │ 0.00 │ 375 │ 629 │ 长城-406 │
│ VClass │ VARCHAR │ 7.29 │ 83 │ 224 A.I. Voeykov │ Wind │
│ VClassID │ VARCHAR │ 14.58 │ 307 │ 07-2 │ WYG-279 │
│ VID │ VARCHAR │ 94.79 │ 18 │ IMO5010244 │ NASA 218 │
└─────────────┴──────────────────┴─────────────────┴───────────────┴────────────────────────────────┴────────────────────┘
Below is an example record for the McDonnell Douglas F-15 Eagle.
$ echo "FROM 'platforms.parquet'
WHERE Code = 'F-15A-08x'" \
| ~/duckdb -json \
| jq -S .
[
{
"Class": "D",
"Code": "F-15A-08x",
"Group": "USAIRF",
"Location": "Edwards AFB",
"Name": "F-15 Eagle, ASAT program (084 or 086)",
"Parent": "USAF",
"ShortEName": null,
"ShortName": "AIR F-15",
"StateCode": "US",
"TStart": null,
"TStop": null,
"Type": "AIR",
"UCode": "F-15A-08x",
"UName": "F-15 Eagle, ASAT program (084 or 086)",
"VClass": "F-15A",
"VClassID": "76-0084/0086",
"VID": null
}
]
This is a breakdown of the 23 groups in this dataset.
$ ~/duckdb
SELECT COUNT(*),
"group": SPLIT_PART("Group", '.', 1)
FROM 'platforms.parquet'
GROUP BY 2
ORDER BY 1 DESC;
┌──────────────┬────────────┐
│ count_star() │ group │
│ int64 │ varchar │
├──────────────┼────────────┤
│ 134 │ SUSUB │
│ 41 │ POLARIS │
│ 34 │ AEGIS │
│ 26 │ USAIRB │
│ 26 │ USAIRF │
│ 18 │ TRIDENT │
│ 14 │ USSHIPS │
│ 13 │ SOVSHIPS │
│ 11 │ SNLE │
│ 10 │ OTHERSHIPS │
│ 9 │ NULL │
│ 8 │ USMSHIPS │
│ 8 │ UKSUB │
│ 7 │ USAIRC │
│ 4 │ USAIRT │
│ 4 │ ASDS │
│ 3 │ KRSUB │
│ 3 │ ROCKOON │
│ 3 │ INAV │
│ 3 │ CNSUB │
│ 2 │ KPSUB │
│ 2 │ INSUB │
│ 1 │ SUAIR │
└──────────────┴────────────┘
Below is a list of the Trident-class submarines found in this dataset.
SELECT Code,
Name,
TStart,
TStop,
VClass
FROM 'platforms.parquet'
WHERE "Group" = 'TRIDENT'
ORDER BY TStart;
┌──────────┬──────────────────────┬────────┬───────┬─────────┐
│ Code │ Name │ TStart │ TStop │ VClass │
│ varchar │ varchar │ int32 │ int32 │ varchar │
├──────────┼──────────────────────┼────────┼───────┼─────────┤
│ SSBN-726 │ USS Ohio │ 1979 │ 2003 │ Ohio │
│ SSBN-727 │ USS Michigan │ 1980 │ 2007 │ Ohio │
│ SSBN-728 │ USS Florida │ 1981 │ 2003 │ Ohio │
│ SSBN-729 │ USS Georgia │ 1982 │ 2003 │ Ohio │
│ SSBN-730 │ USS Henry M. Jackson │ 1983 │ NULL │ Ohio │
│ SSBN-731 │ USS Alabama │ 1984 │ NULL │ Ohio │
│ SSBN-732 │ USS Alaska │ 1985 │ NULL │ Ohio │
│ SSBN-733 │ USS Nevada │ 1985 │ NULL │ Ohio │
│ SSBN-734 │ USS Tennessee │ 1986 │ NULL │ Ohio │
│ SSBN-735 │ USS Pennsylvania │ 1988 │ NULL │ Ohio │
│ SSBN-736 │ USS West Virginia │ 1989 │ NULL │ Ohio │
│ SSBN-737 │ USS Kentucky │ 1990 │ NULL │ Ohio │
│ SSBN-738 │ USS Maryland │ 1991 │ NULL │ Ohio │
│ SSBN-739 │ USS Nebraska │ 1992 │ NULL │ Ohio │
│ SSBN-740 │ USS Rhode Island │ 1993 │ NULL │ Ohio │
│ SSBN-741 │ USS Maine │ 1994 │ NULL │ Ohio │
│ SSBN-742 │ USS Wyoming │ 1995 │ NULL │ Ohio │
│ SSBN-743 │ USS Louisiana │ 1996 │ NULL │ Ohio │
└──────────┴──────────────────────┴────────┴───────┴─────────┘
Launch Sites
Below, I'll download and prepare the launch sites dataset for DuckDB.
$ wget https://planet4589.org/space/gcat/tsv/tables/sites.tsv
$ head -n1 sites.tsv | tail -c+2 > working.csv
$ tail -n+3 sites.tsv >> working.csv
I'll clean up the values, casting them to stronger types where I can and producing a parquet file with their contents.
$ ~/duckdb
COPY(
SELECT EName: IF(LENGTH(TRIM(EName)) < 2, NULL, EName),
Error: Error::FLOAT,
Location: IF(Location = '-', NULL, Location),
Name,
Parent: IF(LENGTH(TRIM(Parent)) < 2, NULL, Parent),
ShortEName: IF(LENGTH(TRIM(ShortEName)) < 2, NULL, ShortEName),
ShortName: IF(LENGTH(TRIM(ShortName)) < 4, NULL, ShortName),
Site,
TStart: IF(LENGTH(TRIM(TStart)) < 4, NULL, TStart[:4]::INT),
TStop: IF(LENGTH(TRIM(TStop)) < 4, NULL, TStop[:4]::INT),
UCode,
UName: IF(LENGTH(TRIM(UName)) < 2, NULL, UName),
geometry: IF(LENGTH(Latitude)>1,
ST_POINT(Longitude::FLOAT, Latitude::FLOAT),
NULL),
bbox: {'xmin': ST_XMIN(ST_EXTENT(geometry)),
'ymin': ST_YMIN(ST_EXTENT(geometry)),
'xmax': ST_XMAX(ST_EXTENT(geometry)),
'ymax': ST_YMAX(ST_EXTENT(geometry))},
FROM 'working.csv'
) TO 'sites.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
There are 684 records in this dataset.
SELECT COUNT(*)
FROM 'sites.parquet'; -- 684
Below is a breakdown of unique values and NULL coverage across each column.
SELECT column_name,
column_type[:30],
null_percentage,
approx_unique,
min[:30],
max[:30]
FROM (SUMMARIZE
FROM 'sites.parquet')
ORDER BY LOWER(column_name);
┌─────────────┬────────────────────────────────┬─────────────────┬───────────────┬────────────────────────────────┬────────────────────────────────┐
│ column_name │ column_type[:30] │ null_percentage │ approx_unique │ min[:30] │ max[:30] │
│ varchar │ varchar │ decimal(9,2) │ int64 │ varchar │ varchar │
├─────────────┼────────────────────────────────┼─────────────────┼───────────────┼────────────────────────────────┼────────────────────────────────┤
│ bbox │ STRUCT(xmin DOUBLE, ymin DOUBL │ 0.00 │ 445 │ {'xmin': -177.5, 'ymin': 24.79 │ {'xmin': NULL, 'ymin': NULL, ' │
│ EName │ VARCHAR │ 99.29 │ 2 │ - - │ Taiyuan Space Center │
│ Error │ FLOAT │ 0.00 │ 30 │ 0.0 │ 9.0 │
│ geometry │ GEOMETRY('OGC:CRS84') │ 3.25 │ 618 │ POINT (0 68) │ POINT (-3.0829999446868896 30. │
│ Location │ VARCHAR │ 52.33 │ 135 │ Abbot Point, Queensland │ t'Harde, Nederland │
│ Name │ VARCHAR │ 0.00 │ 772 │ 1-y Gosudarstvenniy Ispitateln │ Zubov Expedition 9,1985 │
│ Parent │ VARCHAR │ 2.40 │ 111 │ AADC │ ZWO │
│ ShortEName │ VARCHAR │ 98.02 │ 9 │ Baykonur │ Vandenberg │
│ ShortName │ VARCHAR │ 3.54 │ 386 │ AGB-3 │ t'Harde │
│ Site │ VARCHAR │ 0.00 │ 694 │ A51 │ ZINGST │
│ TStart │ INTEGER │ 9.19 │ 86 │ 1878 │ 2026 │
│ TStop │ INTEGER │ 37.34 │ 75 │ 1944 │ 2024 │
│ UCode │ VARCHAR │ 0.00 │ 694 │ A51 │ ZINGST │
│ UName │ VARCHAR │ 0.57 │ 794 │ 1-y Gosudarstvenniy Ispitateln │ 海口 │
└─────────────┴────────────────────────────────┴─────────────────┴───────────────┴────────────────────────────────┴────────────────────────────────┘
Below is an example record for Vandenberg Space Force Base.
$ echo "FROM 'sites.parquet'
WHERE Site = 'VSFB'" \
| ~/duckdb -json \
| jq -S .
[
{
"EName": null,
"Error": 0.019999999552965164,
"Location": null,
"Name": "Vandenberg Space Force Base, California",
"Parent": "USSF",
"ShortEName": "Vandenberg",
"ShortName": "Vandenberg",
"Site": "VSFB",
"TStart": 2021,
"TStop": null,
"UCode": "V",
"UName": "Vandenberg Space Force Base, California",
"bbox": {
"xmax": -120.62000274658203,
"xmin": -120.62000274658203,
"ymax": 34.75,
"ymin": 34.75
},
"geometry": "POINT (-120.62000274658203 34.75)"
}
]
Below is a heatmap of the 684 launch sites.
$ ~/duckdb
CREATE OR REPLACE TABLE h3_2_stats AS
SELECT h3_2: H3_LATLNG_TO_CELL(
bbox.ymin,
bbox.xmin,
2),
COUNT(*) num_recs
FROM 'sites.parquet'
GROUP BY 1;
COPY (
SELECT geometry: ST_ASWKB(H3_CELL_TO_BOUNDARY_WKT(h3_2)::geometry),
num_recs
FROM h3_2_stats
WHERE ST_XMIN(H3_CELL_TO_BOUNDARY_WKT(h3_2)::geometry) BETWEEN -179 AND 179
AND ST_XMAX(H3_CELL_TO_BOUNDARY_WKT(h3_2)::geometry) BETWEEN -179 AND 179
AND ST_YMIN(H3_CELL_TO_BOUNDARY_WKT(h3_2)::geometry) BETWEEN -88 AND 88
) TO 'sites.h3_2.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
These are some of the sites in Southern California and Nevada.
Launch Vehicles
Below, I'll download and prepare the launch vehicles dataset for DuckDB.
$ wget https://planet4589.org/space/gcat/tsv/tables/lv.tsv
$ head -n1 lv.tsv | tail -c+2 > working.csv
$ tail -n+3 lv.tsv >> working.csv
I'll clean up the values, casting them to stronger types where I can and producing a parquet file with their contents.
$ ~/duckdb
COPY(
SELECT Apogee: IF(Apogee='-', NULL, Apogee::INT),
Class,
Diameter: IF(Diameter='-', NULL, Diameter::FLOAT),
GTO_Capacity: IF(GTO_Capacity='-', NULL, GTO_Capacity::INT),
Launch_Mass: IF(Launch_Mass='-', NULL, Launch_Mass::FLOAT),
Length: IF(Length='-', NULL, Length::FLOAT),
LEO_Capacity: IF(LEO_Capacity='-', NULL, LEO_Capacity::INT),
LV_Alias: IF(LV_Alias='-', NULL, TRIM(LV_Alias)),
LV_Family: TRIM(LV_Family),
LV_Manufacturer: IF(LV_Manufacturer='-', NULL, TRIM(LV_Manufacturer)),
LV_Max_Stage: LV_Max_Stage::INT,
LV_Min_Stage: LV_Min_Stage::INT,
LV_Name: TRIM(LV_Name),
LV_Variant: TRIM(LV_Variant),
Range: IF(Range='-', NULL, Range::INT),
TO_Thrust: IF(TO_Thrust='-', NULL, TO_Thrust::INT),
FROM 'working.csv'
) TO 'lv.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
There are 1,820 records in this dataset.
SELECT COUNT(*)
FROM 'lv.parquet'; -- 1820
Below is a breakdown of unique values and NULL coverage across each column.
SELECT column_name,
column_type[:30],
null_percentage,
approx_unique,
min[:30],
max[:30]
FROM (SUMMARIZE
FROM 'lv.parquet')
ORDER BY LOWER(column_name);
┌─────────────────┬──────────────────┬─────────────────┬───────────────┬──────────────────────┬──────────┐
│ column_name │ column_type[:30] │ null_percentage │ approx_unique │ min[:30] │ max[:30] │
│ varchar │ varchar │ decimal(9,2) │ int64 │ varchar │ varchar │
├─────────────────┼──────────────────┼─────────────────┼───────────────┼──────────────────────┼──────────┤
│ Apogee │ INTEGER │ 0.33 │ 169 │ 0 │ 400000 │
│ Class │ VARCHAR │ 0.00 │ 15 │ 0 │ Y │
│ Diameter │ FLOAT │ 0.16 │ 160 │ 0.0 │ 14.0 │
│ GTO_Capacity │ INTEGER │ 2.47 │ 100 │ 0 │ 22200 │
│ Launch_Mass │ FLOAT │ 3.13 │ 393 │ 0.0 │ 7000.0 │
│ Length │ FLOAT │ 0.99 │ 304 │ 0.0 │ 124.4 │
│ LEO_Capacity │ INTEGER │ 7.69 │ 152 │ 0 │ 200000 │
│ LV_Alias │ VARCHAR │ 98.74 │ 18 │ ARPA Taurus │ Zenit-3F │
│ LV_Family │ VARCHAR │ 0.00 │ 468 │ 10KS2500 │ Zulfiqar │
│ LV_Manufacturer │ VARCHAR │ 0.16 │ 320 │ ABL │ ZKYT │
│ LV_Max_Stage │ INTEGER │ 0.00 │ 6 │ 1 │ 6 │
│ LV_Min_Stage │ INTEGER │ 0.00 │ 3 │ -1 │ 1 │
│ LV_Name │ VARCHAR │ 0.00 │ 1369 │ 1Ya2TA │ eMRBM-T1 │
│ LV_Variant │ VARCHAR │ 0.00 │ 117 │ (2) │ d │
│ Range │ INTEGER │ 85.82 │ 66 │ 20 │ 16000 │
│ TO_Thrust │ INTEGER │ 25.82 │ 356 │ 0 │ 103000 │
└─────────────────┴──────────────────┴─────────────────┴───────────────┴──────────────────────┴──────────┘
Below is an example record for Starship V4.
$ echo "FROM 'lv.parquet'
WHERE LV_Name = 'Starship V4'" \
| ~/duckdb -json \
| jq -S .
[
{
"Apogee": 200,
"Class": "O",
"Diameter": 9.0,
"GTO_Capacity": 0,
"LEO_Capacity": 200000,
"LV_Alias": null,
"LV_Family": "Starship",
"LV_Manufacturer": "SPX",
"LV_Max_Stage": 2,
"LV_Min_Stage": 1,
"LV_Name": "Starship V4",
"LV_Variant": "-",
"Launch_Mass": 7000.0,
"Length": 124.4000015258789,
"Range": null,
"TO_Thrust": 103000
}
]
Below are the top 20 launch vehicles by launch mass.
$ ~/duckdb
SELECT Launch_Mass,
Length,
TO_Thrust,
LV_Name
FROM 'lv.parquet'
ORDER BY Launch_Mass DESC
LIMIT 20;
┌─────────────┬────────┬───────────┬─────────────────────┐
│ Launch_Mass │ Length │ TO_Thrust │ LV_Name │
│ float │ float │ int32 │ varchar │
├─────────────┼────────┼───────────┼─────────────────────┤
│ 7000.0 │ 124.4 │ 103000 │ Starship V4 │
│ 5500.0 │ 123.0 │ 80800 │ Starship V3 │
│ 5200.0 │ 121.4 │ 80800 │ Starship V2 │
│ 5020.0 │ 121.3 │ 72500 │ Starship V1.0 │
│ 2913.0 │ 110.6 │ 34030 │ Saturn V │
│ 2822.0 │ 104.8 │ 34030 │ Saturn V │
│ 2788.0 │ 105.3 │ 45300 │ N-1 11A52 │
│ 2608.0 │ 98.1 │ 39000 │ SLS Block 1 │
│ 2524.0 │ 58.7 │ 35139 │ Energiya │
│ 2524.0 │ 58.7 │ 35139 │ Energiya/Buran │
│ 2040.0 │ 38.1 │ 28200 │ Space Shuttle │
│ 1470.0 │ 50.0 │ 12000 │ Starship V1.0 Ship │
│ 1400.0 │ 69.2 │ 17000 │ Falcon Heavy │
│ 1200.0 │ 122.0 │ 25622 │ New Glenn 9x4 │
│ 1050.0 │ 98.0 │ 17219 │ New Glenn 7x2 │
│ 1050.0 │ 98.0 │ 19928 │ New Glenn 7x2 │
│ 939.0 │ 62.2 │ 15000 │ Titan 401B/Centaur │
│ 925.0 │ 54.0 │ 15000 │ Titan 402B/IUS │
│ 910.8 │ 67.0 │ 17688 │ Vulcan Centaur VC6S │
│ 910.0 │ 54.0 │ 14200 │ Titan 402A/IUS │
└─────────────┴────────┴───────────┴─────────────────────┘
Launches
Below, I'll download and prepare the launches dataset for DuckDB.
$ wget https://planet4589.org/space/gcat/tsv/launch/launch.tsv
$ head -n1 launch.tsv | tail -c+2 > working.csv
$ tail -n+3 launch.tsv >> working.csv
I'll clean up the values, casting them to stronger types where I can and producing a parquet file with their contents.
$ ~/duckdb
COPY(
SELECT Agency,
Apogee: IF(TRIM(Apogee)='-', NULL, Apogee::INT),
Ascent_Pad: IF(TRIM(Ascent_Pad)='-', NULL, TRIM(Ascent_Pad)),
Ascent_Site: IF(TRIM(Ascent_Site)='-', NULL, TRIM(Ascent_Site)),
Category: IF(TRIM(Category)='-', NULL, TRIM(Category)),
Cite: IF(TRIM(Cite)='-', NULL, TRIM(Cite)),
Dest: IF(TRIM(Dest)='-', NULL, TRIM(Dest)),
FailCode: IF(TRIM(FailCode)='-', NULL, TRIM(FailCode)),
Flight,
Flight_ID,
FlightCode,
"Group": IF(TRIM("Group")='-', NULL, TRIM("Group")),
Launch_Year: Launch_Date[:4]::INT,
Launch_Date,
Launch_JD,
Launch_Pad: IF(TRIM(Launch_Pad)='-', NULL, TRIM(Launch_Pad)),
Launch_Site,
Launch_Tag,
LaunchCode,
LTCite: IF(TRIM(LTCite)='-', NULL, TRIM(LTCite)),
LV_Type,
Mission: IF(TRIM(Mission)='-', NULL, TRIM(Mission)),
Notes: IF(TRIM(Notes)='-', NULL, TRIM(Notes)),
OrbPay: OrbPay::FLOAT,
Platform: IF(TRIM(Platform)='-', NULL, TRIM(Platform)),
Range: IF(TRIM(Range)='-', NULL, Range::INT),
Variant: IF(TRIM(Variant)='-', NULL, TRIM(Variant)),
FROM 'working.csv'
) TO 'launch.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
There are 75,811 records in this dataset.
SELECT COUNT(*)
FROM 'launch.parquet'; -- 75811
Below is a breakdown of unique values and NULL coverage across each column.
SELECT column_name,
column_type[:30],
null_percentage,
approx_unique,
min[:30],
max[:30]
FROM (SUMMARIZE
FROM 'launch.parquet')
ORDER BY LOWER(column_name);
┌─────────────┬──────────────────┬─────────────────┬───────────────┬──────────────────────┬──────────────────────────┐
│ column_name │ column_type[:30] │ null_percentage │ approx_unique │ min[:30] │ max[:30] │
│ varchar │ varchar │ decimal(9,2) │ int64 │ varchar │ varchar │
├─────────────┼──────────────────┼─────────────────┼───────────────┼──────────────────────┼──────────────────────────┤
│ Agency │ VARCHAR │ 0.00 │ 745 │ 10ADS │ ZSU │
│ Apogee │ INTEGER │ 1.28 │ 2929 │ 0 │ 225000 │
│ Ascent_Pad │ VARCHAR │ 99.01 │ 66 │ A │ Yellow Sea │
│ Ascent_Site │ VARCHAR │ 98.99 │ 37 │ A51 │ WTR │
│ Category │ VARCHAR │ 0.04 │ 955 │ - /0:30 │ usaAeron │
│ Cite │ VARCHAR │ 86.87 │ 1448 │ (MexicoImpact) │ zenite.nu │
│ Dest │ VARCHAR │ 91.68 │ 319 │ (ASC BOA) │ Zhytomyr │
│ FailCode │ VARCHAR │ 96.36 │ 88 │ AD │ WX │
│ Flight │ VARCHAR │ 0.00 │ 22029 │ "Scores of missiles" │ vs MKO camps │
│ Flight_ID │ VARCHAR │ 0.00 │ 25955 │ #117 R-112 │ xxx/B1103.1 F625 │
│ FlightCode │ VARCHAR │ 0.00 │ 2417 │ (100th) │ silo │
│ Group │ VARCHAR │ 64.08 │ 2348 │ 321SMW/ │ ZARM/Guerlebeck │
│ Launch_Date │ VARCHAR │ 0.00 │ 66313 │ 1942 Aug 16 1115 │ 2026 Mar 31 │
│ Launch_JD │ VARCHAR │ 0.00 │ 67539 │ 2430523.95 │ 2461154.77 │
│ Launch_Pad │ VARCHAR │ 66.96 │ 1038 │ 200NM │ lC9A │
│ Launch_Site │ VARCHAR │ 0.00 │ 589 │ A51 │ ZINGST │
│ Launch_Tag │ VARCHAR │ 0.00 │ 79286 │ 1942-A01 │ 2026-S99 │
│ Launch_Year │ INTEGER │ 0.00 │ 89 │ 1942 │ 2026 │
│ LaunchCode │ VARCHAR │ 0.00 │ 52 │ ! │ YS │
│ LTCite │ VARCHAR │ 0.01 │ 3415 │ 1STRAD │ zenite.nu/orbita │
│ LV_Type │ VARCHAR │ 0.00 │ 1340 │ 1Ya2TA │ eMRBM-T1 │
│ Mission │ VARCHAR │ 82.35 │ 8502 │ 'Ofeq-1 │ ziyuan yihao 02C weixing │
│ Notes │ VARCHAR │ 98.37 │ 860 │ (ex-U2) │ www.wsmr.army.mil │
│ OrbPay │ FLOAT │ 0.00 │ 2774 │ 0.0 │ 132.025 │
│ Platform │ VARCHAR │ 91.80 │ 359 │ 824YONG │ ZULUV? │
│ Range │ INTEGER │ 97.65 │ 463 │ 0 │ 14560 │
│ Variant │ VARCHAR │ 95.04 │ 113 │ (2) │ d │
└─────────────┴──────────────────┴─────────────────┴───────────────┴──────────────────────┴──────────────────────────┘
Below is the record for the Starlink Group 6-88 mission.
$ echo "FROM 'launch.parquet'
WHERE Mission = 'Starlink Group 6-88'" \
| ~/duckdb -json \
| jq -S .
[
{
"Agency": "SPX",
"Apogee": null,
"Ascent_Pad": null,
"Ascent_Site": null,
"Category": "Sat LEO D",
"Cite": null,
"Dest": null,
"FailCode": null,
"Flight": "Starlink-345",
"FlightCode": "V2MO 6-88",
"Flight_ID": "585/B1101.1 F585",
"Group": "C",
"LTCite": "nasaspaceflight.com",
"LV_Type": "Falcon 9",
"LaunchCode": "OS",
"Launch_Date": "2026 Jan 4 0648:10",
"Launch_JD": "2461044.78 ",
"Launch_Pad": "LC40",
"Launch_Site": "CC",
"Launch_Tag": "2026-002 ",
"Launch_Year": 2026,
"Mission": "Starlink Group 6-88",
"Notes": null,
"OrbPay": 16.100000381469727,
"Platform": null,
"Range": null,
"Variant": "FT5"
}
]
Below are the category counts for launches with a 2026 tag prefix.
$ ~/duckdb
SELECT Category,
COUNT(*)
FROM 'launch.parquet'
WHERE Launch_Tag LIKE '2026%'
GROUP BY 1
ORDER BY 2 DESC;
┌────────────────────┬──────────────┐
│ Category │ count_star() │
│ varchar │ int64 │
├────────────────────┼──────────────┤
│ Weapon │ 215 │
│ Sat LEO D │ 50 │
│ Sat LEO O │ 12 │
│ Test │ 10 │
│ Sat SSO O │ 5 │
│ Auroral │ 4 │
│ Sat SSO L │ 4 │
│ Sat SSO D │ 4 │
│ Sat LEO OO │ 4 │
│ Sat SSO OO │ 3 │
│ Sat GTO D │ 3 │
│ Tech │ 2 │
│ Interceptor Weapon │ 2 │
│ Hypersonic │ 2 │
│ Sat ISS D │ 2 │
│ Target? │ 2 │
│ Sat DSO E │ 1 │
│ Sat GEO SO │ 1 │
│ Sat SSO SO │ 1 │
│ Op Test │ 1 │
│ Sat GEO O │ 1 │
│ Sat ISS O │ 1 │
│ Aeron │ 1 │
│ Sat SSO SD │ 1 │
│ Sat GTO O │ 1 │
│ Sat LEO P │ 1 │
│ Tourist Flight │ 1 │
│ Sat MOL SO │ 1 │
└────────────────────┴──────────────┘
Below are the Starlink launches for this year.
SELECT Agency,
LV_Type,
Launch_Tag,
Mission
FROM 'launch.parquet'
WHERE Launch_Tag LIKE '2026%'
AND Flight LIKE 'Starlink%'
ORDER BY Launch_Tag DESC;
┌─────────┬──────────┬────────────┬──────────────────────┐
│ Agency │ LV_Type │ Launch_Tag │ Mission │
│ varchar │ varchar │ varchar │ varchar │
├─────────┼──────────┼────────────┼──────────────────────┤
│ SPX │ Falcon 9 │ 2026-089 │ Starlink Group 17-14 │
│ SPX │ Falcon 9 │ 2026-086 │ Starlink Group 17-22 │
│ SPX │ Falcon 9 │ 2026-082 │ Starlink Group 17-27 │
│ SPX │ Falcon 9 │ 2026-081 │ Starlink Group 10-24 │
│ SPX │ Falcon 9 │ 2026-077 │ Starlink Group 17-21 │
│ SPX │ Falcon 9 │ 2026-073 │ Starlink Group 17-35 │
│ SPX │ Falcon 9 │ 2026-070 │ Starlink Group 10-58 │
│ SPX │ Falcon 9 │ 2026-068 │ Starlink Group 10-44 │
│ SPX │ Falcon 9 │ 2026-063 │ Starlink Group 17-17 │
│ SPX │ Falcon 9 │ 2026-059 │ Starlink Group 10-62 │
│ SPX │ Falcon 9 │ 2026-057 │ Starlink Group 17-15 │
│ SPX │ Falcon 9 │ 2026-055 │ Starlink Group 10-33 │
│ SPX │ Falcon 9 │ 2026-054 │ Starlink Group 10-46 │
│ SPX │ Falcon 9 │ 2026-053 │ Starlink Group 17-24 │
│ SPX │ Falcon 9 │ 2026-050 │ Starlink Group 10-48 │
│ SPX │ Falcon 9 │ 2026-049 │ Starlink Group 17-31 │
│ SPX │ Falcon 9 │ 2026-044 │ Starlink Group 17-18 │
│ SPX │ Falcon 9 │ 2026-042 │ Starlink Group 10-40 │
│ SPX │ Falcon 9 │ 2026-041 │ Starlink Group 10-41 │
│ SPX │ Falcon 9 │ 2026-040 │ Starlink Group 17-23 │
│ SPX │ Falcon 9 │ 2026-039 │ Starlink Group 6-108 │
│ SPX │ Falcon 9 │ 2026-038 │ Starlink Group 17-26 │
│ SPX │ Falcon 9 │ 2026-037 │ Starlink Group 6-110 │
│ SPX │ Falcon 9 │ 2026-036 │ Starlink Group 6-104 │
│ SPX │ Falcon 9 │ 2026-035 │ Starlink Group 17-25 │
│ SPX │ Falcon 9 │ 2026-034 │ Starlink Group 10-36 │
│ SPX │ Falcon 9 │ 2026-033 │ Starlink Group 6-103 │
│ SPX │ Falcon 9 │ 2026-032 │ Starlink Group 17-13 │
│ SPX │ Falcon 9 │ 2026-026 │ Starlink Group 17-34 │
│ SPX │ Falcon 9 │ 2026-025 │ Starlink Group 17-33 │
│ SPX │ Falcon 9 │ 2026-022 │ Starlink Group 17-32 │
│ SPX │ Falcon 9 │ 2026-020 │ Starlink Group 6-101 │
│ SPX │ Falcon 9 │ 2026-018 │ Starlink Group 17-19 │
│ SPX │ Falcon 9 │ 2026-016 │ Starlink Group 17-20 │
│ SPX │ Falcon 9 │ 2026-014 │ Starlink Group 17-30 │
│ SPX │ Falcon 9 │ 2026-012 │ Starlink Group 6-100 │
│ SPX │ Falcon 9 │ 2026-008 │ Starlink Group 6-98 │
│ SPX │ Falcon 9 │ 2026-005 │ Starlink Group 6-97 │
│ SPX │ Falcon 9 │ 2026-003 │ Starlink Group 6-96 │
│ SPX │ Falcon 9 │ 2026-002 │ Starlink Group 6-88 │
└─────────┴──────────┴────────────┴──────────────────────┘
These are the top 20 agency-category launch pairs for this decade.
SELECT Agency,
Category,
COUNT(*)
FROM 'launch.parquet'
WHERE Launch_Tag LIKE '202%'
GROUP BY 1, 2
ORDER BY 3 DESC
LIMIT 20;
┌─────────┬─────────────┬──────────────┐
│ Agency │ Category │ count_star() │
│ varchar │ varchar │ int64 │
├─────────┼─────────────┼──────────────┤
│ SPX │ Sat LEO D │ 338 │
│ IR │ Weapon │ 284 │
│ KPA │ Test │ 83 │
│ SPX │ Sat LEO D 4 │ 63 │
│ CALT │ Sat GTO O │ 61 │
│ SAST │ Sat LEO O │ 38 │
│ SAST │ Sat SSO L │ 37 │
│ SPX │ Sat SSO D │ 37 │
│ SPX │ Sat ISS D │ 36 │
│ FKA │ Sat ISS O │ 34 │
│ RLABN │ Sat LEO OO │ 29 │
│ SAST │ Sat SSO O │ 26 │
│ RLABN │ Sat SSO OO │ 25 │
│ ANSAL │ Weapon │ 24 │
│ MDA │ Target │ 24 │
│ SAST │ Sat SSO D │ 24 │
│ USNSSP │ OT │ 21 │
│ PLA2AC │ Test │ 20 │
│ CALT │ Sat LEO O │ 20 │
│ JRG │ OT │ 20 │
└─────────┴─────────────┴──────────────┘
Satellites
Below, I'll download and prepare the objects dataset, which contains satellites, for DuckDB.
$ wget https://planet4589.org/space/gcat/tsv/cat/satcat.tsv
$ head -n1 satcat.tsv | tail -c+2 > working.csv
$ tail -n+3 satcat.tsv >> working.csv
I'll clean up the values, casting them to stronger types where I can and producing a parquet file with their contents.
$ ~/duckdb
COPY(
SELECT AltNames: IF(TRIM(AltNames)='-', NULL, TRIM(AltNames)),
Apogee: IF(TRIM(Apogee) = '-' OR TRIM(Apogee) = 'Inf', NULL, Apogee::INT),
Bus: IF(TRIM(Bus)='-', NULL, TRIM(Bus)),
DDate: IF(TRIM(DDate)='-', NULL, TRIM(DDate)),
Dest: IF(TRIM(Dest)='-', NULL, TRIM(Dest)),
Diameter: IF(TRIM(Diameter)='-', NULL, Diameter::FLOAT),
DryMass: IF(TRIM(DryMass)='-', NULL, TRIM(DryMass)),
Inc: IF(TRIM(Inc)='-', NULL, Inc::FLOAT),
JCAT: TRIM(JCAT),
Launch_Tag: IF(TRIM(Launch_Tag)='-', NULL, TRIM(Launch_Tag)),
LDate: IF(TRIM(LDate)='-', NULL, TRIM(LDate)),
Length: IF(TRIM(Length)='-', NULL, Length::FLOAT),
Manufacturer: IF(TRIM(Manufacturer)='-', NULL, TRIM(Manufacturer)),
Mass: IF(TRIM(Mass)='-', NULL, Mass::FLOAT),
Motor: IF(TRIM(Motor)='-', NULL, TRIM(Motor)),
Name: TRIM(Name),
ODate: IF(TRIM(ODate)='-', NULL, TRIM(ODate)),
OpOrbit: IF(TRIM(OpOrbit)='-', NULL, TRIM(OpOrbit)),
OQUAL: IF(TRIM(OQUAL)='-', NULL, TRIM(OQUAL)),
Owner: IF(TRIM(Owner)='-', NULL, TRIM(Owner)),
Parent: IF(TRIM(Parent)='-', NULL, TRIM(Parent)),
Perigee: IF(TRIM(Perigee)='-', NULL, TRIM(Perigee)),
Piece: IF(TRIM(Piece)='-', NULL, TRIM(Piece)),
PLName: IF(TRIM(PLName)='-', NULL, TRIM(PLName)),
"Primary": IF(TRIM("Primary")='-', NULL, TRIM("Primary")),
Satcat: IF(TRIM(Satcat)='-', NULL, TRIM(Satcat)),
SDate: IF(TRIM(SDate)='-', NULL, TRIM(SDate)),
SDate_Year: IF(TRIM(SDate)='-', NULL, SDate[:4]::INT),
Shape: IF(TRIM(Shape)='-', NULL, TRIM(Shape)),
Span: IF(TRIM(Span)='-', NULL, Span::FLOAT),
State: IF(TRIM(State)='-', NULL, TRIM(State)),
Status: IF(TRIM(Status)='-', NULL, TRIM(Status)),
TotMass: IF(TRIM(TotMass)='-', NULL, TotMass::FLOAT),
Type: IF(TRIM(Type)='-', NULL, TRIM(Type)),
Coarse_type:
CASE WHEN Type[1] = 'P' THEN 'Payload (for orbital attempt)'
WHEN Type[1] = 'C' THEN 'Component'
WHEN Type[1] = 'R' THEN 'Launch vehicle stage'
WHEN Type[1] = 'D' THEN 'Fragmentation debris'
WHEN Type[1] = 'S' THEN 'Suborbital payload (e.g. sounding rocket payload or missile reentry vehicle)'
WHEN Type[1] = 'X' THEN 'Catalog entry that has been deleted (used in auxcat etc.)'
WHEN Type[1] = 'Z' THEN 'Spurious catalog entry (was in SATCAT, perhaps in TLEs, but there was no real object)'
END,
Attach_flag:
CASE WHEN Type[3] = 'A' THEN 'Permanently attached component or payload'
WHEN Type[3] = 'F' THEN 'Stuck attached by mistake'
WHEN Type[3] = 'S' THEN 'Expected to separate in future'
WHEN Type[3] = 'T' THEN 'Never flew free but transferred'
WHEN Type[3] = 'I' THEN 'Internal'
END,
Subtype_flag:
CASE WHEN Type[4] = 'A' THEN 'Payload adapter, support structures, interfaces'
WHEN Type[4] = 'B' THEN 'Battery explosion debris'
WHEN Type[4] = 'C' THEN 'Passive calibration satellites, test objects or chaff'
WHEN Type[4] = 'D' THEN 'Dummy satellite'
WHEN Type[4] = 'E' THEN 'Spacesuit on tethered spacewalk'
WHEN Type[4] = 'F' THEN 'Fairings and other covers'
WHEN Type[4] = 'G' THEN 'General, miscellaneous debris'
WHEN Type[4] = 'H' THEN 'Human spaceflight related'
WHEN Type[4] = 'I' THEN 'Impact (accidental collision)'
WHEN Type[4] = 'J' THEN 'Anomalous debris (insulation, soft material, ablated material)'
WHEN Type[4] = 'K' THEN 'Possible solid motor slag'
WHEN Type[4] = 'L' THEN 'Separated from vehicle after landing (rovers, etc)'
WHEN Type[4] = 'M' THEN 'Jettisoned motor or tank'
WHEN Type[4] = 'N' THEN 'Nuclear reactor core or coolant blob'
WHEN Type[4] = 'O' THEN 'Unknown debris released at orbit insertion'
WHEN Type[4] = 'P' THEN 'Propulsion related, residual-propellant breakup'
WHEN Type[4] = 'Q' THEN 'Aerodynamic breakup at low perigee'
WHEN Type[4] = 'R' THEN 'Reentry vehicle'
WHEN Type[4] = 'S' THEN 'Subsatellite or subpayload'
WHEN Type[4] = 'T' THEN 'Ejected section of payload'
WHEN Type[4] = 'U' THEN 'Untethered EVA'
WHEN Type[4] = 'V' THEN 'Ejection mechanism (deploy canister, clamp band)'
WHEN Type[4] = 'W' THEN 'Weapons test, ASAT debris'
WHEN Type[4] = 'X' THEN 'Debris of unknown nature'
WHEN Type[4] = 'Y' THEN 'Despin (yo-yo) device'
WHEN Type[4] = 'Z' THEN 'Breakup debris from on-board destruct device'
END,
Orbit_flag:
CASE WHEN Type[5] = 'D' THEN 'Deep Space or escape'
WHEN Type[5] = 'E' THEN 'Destroyed in pad explosion'
WHEN Type[5] = 'F' THEN 'Failed to reach orbit'
WHEN Type[5] = 'L' THEN 'Active on planet surface during this phase'
WHEN Type[5] = 'M' THEN 'Missing from SATCAT by mistake (EXPRESS, IXV)'
WHEN Type[5] = 'O' THEN 'Orbital-Energy but Non-Orbit'
WHEN Type[5] = 'P' THEN 'Partial orbit - reached legit orbit but deorbited after less than 1 rev'
WHEN Type[5] = 'R' THEN 'Reentry orbit: objects that were attached and separated in post-deorbit-burn suborbital trajectory'
WHEN Type[5] = 'S' THEN 'Near-Orbit (marginally suborbital)'
WHEN Type[5] = 'T' THEN 'Transient orbit - separated just (perhaps seconds) before deorbit'
WHEN Type[5] = 'V' THEN 'Escape energy but not deep space'
WHEN Type[5] = 'X' THEN 'Extra catalog entry for extraterrestrially launched object'
WHEN Type[5] = 'Z' THEN 'Launch from extraterrestrial object, recataloged with new D series number'
END,
Status_or_failure_flag:
CASE WHEN Type[8] = '*' THEN 'This object was the one that failed during launch.'
WHEN Type[8] = 'A' THEN 'Satellite ascending: orbit raising to op orbit (or lowering from high drift orbit)'
WHEN Type[8] = 'D' THEN 'Satellite in plane drift orbit'
WHEN Type[8] = 'F' THEN 'Satellite failed early in mission, before reaching operational orbit (”screened”)'
WHEN Type[8] = 'G' THEN 'Satellite retired to a graveyard orbit of some kind'
WHEN Type[8] = 'L' THEN 'Satellite removed far from operational constellation by lowering or raising orbit'
WHEN Type[8] = 'M' THEN 'Satellite failed in operational orbit and is undergoing uncontrolled decay - non manueverable'
WHEN Type[8] = 'O' THEN 'Satellite is active in operational orbit'
WHEN Type[8] = 'R' THEN 'Satellite active orbit lowering to reentry'
WHEN Type[8] = 'S' THEN 'Satellite was used for special tests outside of main constellation'
WHEN Type[8] = 'T' THEN 'Satellite removed slighly (and possibly temporarily) from operational constellation by lowering or raising orbit a little'
WHEN Type[8] = 'U' THEN 'Satellite apparently malfunctioning, held in intermediate orbit for debugging?'
END,
ID_flag:
CASE WHEN Type[9] = '?' THEN 'The association of this satellite with this catalog number is a guess because JSpOC has not yet assigned a name. ID may change in future.'
WHEN Type[9] = '+' THEN 'Starlink filing with FCC reports this satellite is out of service but could still maneuver, at the time of filing. Starlink filing with FCC reports this satellite is out of service and can not maneuver.'
WHEN Type[9] = 'm' THEN 'Multiple objects. This entry is a placeholder for a known debris event where no debris has yet been cataloged by DoD.'
WHEN Type[9] = 'C' THEN 'US government orbital data for this object are secret.'
WHEN Type[9] = 'c' THEN 'Older US government orbital data for this object were secret, but current data is public.'
WHEN Type[9] = 'U' THEN 'This object is a cargo item on ISS (or other station) and has been assigned to a likely cargo launch but the actual launch for this object is not known.'
WHEN Type[9] = 'D' THEN 'This object is a cargo item on ISS (or other station). The return date is uncertain and I’ve made a guess at which ship it went down on.'
WHEN Type[9] = 'X' THEN 'We don’t know which launch this object is from, so the launch date is unknown (and other parameters may also be unknown).'
WHEN Type[9] = 's' THEN 'Disagreement between TLE and SupTLE data (true for S45139)'
END
FROM READ_CSV('working.csv', sample_size=-1)
) TO 'satcat.parquet' (
FORMAT 'PARQUET',
CODEC 'ZSTD',
COMPRESSION_LEVEL 22,
ROW_GROUP_SIZE 15000);
There are 68,834 records in this dataset.
SELECT COUNT(*)
FROM 'satcat.parquet'; -- 68834
Below is a breakdown of unique values and NULL coverage across each column.
SELECT column_name,
column_type[:30],
null_percentage,
approx_unique,
min[:30],
max[:30]
FROM (SUMMARIZE
FROM 'satcat.parquet')
ORDER BY LOWER(column_name);
┌────────────────────────┬──────────────────┬─────────────────┬───────────────┬────────────────────────────────┬────────────────────────────────┐
│ column_name │ column_type[:30] │ null_percentage │ approx_unique │ min[:30] │ max[:30] │
│ varchar │ varchar │ decimal(9,2) │ int64 │ varchar │ varchar │
├────────────────────────┼──────────────────┼─────────────────┼───────────────┼────────────────────────────────┼────────────────────────────────┤
│ AltNames │ VARCHAR │ 93.49 │ 6148 │ 02 │ r7411F633 No. 670-51-102:RA │
│ Apogee │ INTEGER │ 0.16 │ 7251 │ -4788696 │ 3971603 │
│ Attach_flag │ VARCHAR │ 99.46 │ 4 │ Expected to separate in future │ Stuck attached by mistake │
│ Bus │ VARCHAR │ 46.24 │ 2057 │ 'Ofeq │ sPCM │
│ Coarse_type │ VARCHAR │ 0.09 │ 5 │ Component │ Spurious catalog entry (was in │
│ DDate │ VARCHAR │ 47.80 │ 16604 │ 1957 Dec 1 1000? │ 2205 Jun 6 │
│ Dest │ VARCHAR │ 94.84 │ 3075 │ 0.5W 43.8S │ Zimbabwe │
│ Diameter │ FLOAT │ 0.09 │ 120 │ 0.0 │ 41.0 │
│ DryMass │ VARCHAR │ 0.09 │ 1661 │ 0 │ 9996 │
│ ID_flag │ VARCHAR │ 97.22 │ 6 │ Disagreement between TLE and S │ US government orbital data for │
│ Inc │ FLOAT │ 0.09 │ 6895 │ 0.0 │ 150.97 │
│ JCAT │ VARCHAR │ 0.00 │ 81214 │ S00001 │ S68834 │
│ Launch_Tag │ VARCHAR │ 0.09 │ 6171 │ 1957 ALP │ 2026-091 │
│ LDate │ VARCHAR │ 0.09 │ 4848 │ 1957 Nov 3 │ 2026 Mar 30 │
│ Length │ FLOAT │ 0.09 │ 221 │ 0.0 │ 41.0 │
│ Manufacturer │ VARCHAR │ 34.40 │ 1718 │ AACH │ ZZQY │
│ Mass │ FLOAT │ 0.09 │ 1971 │ 0.0 │ 104420.0 │
│ Motor │ VARCHAR │ 95.87 │ 341 │ 0 │ YF-85 x 4 │
│ Name │ VARCHAR │ 0.00 │ 29692 │ 'Ofeq-1 │ uHETsat │
│ ODate │ VARCHAR │ 0.16 │ 15034 │ 1957 Nov 3 │ 2026 Mar 31 │
│ OpOrbit │ VARCHAR │ 0.11 │ 27 │ CLO │ VHEO │
│ OQUAL │ VARCHAR │ 99.47 │ 18 │ - s │ UN │
│ Orbit_flag │ VARCHAR │ 99.22 │ 6 │ Deep Space or escape │ Partial orbit - reached legit │
│ Owner │ VARCHAR │ 0.09 │ 1761 │ 21AT │ ZZWYZ? │
│ Parent │ VARCHAR │ 0.11 │ 21722 │ A00002 │ S68824 │
│ Perigee │ VARCHAR │ 0.09 │ 3903 │ -110 │ 9999 │
│ Piece │ VARCHAR │ 0.09 │ 79387 │ 1957 ALP 1 │ 2026-091D │
│ PLName │ VARCHAR │ 44.05 │ 28956 │ 'Ofeq-1 │ was [UNID128] │
│ Primary │ VARCHAR │ 0.09 │ 1 │ Earth │ Earth │
│ Satcat │ VARCHAR │ 0.09 │ 69528 │ 00001 │ NNA │
│ SDate │ VARCHAR │ 0.09 │ 17672 │ 1957 Nov 3 0235 │ 2026 Mar 30 2216:37 │
│ SDate_Year │ INTEGER │ 0.09 │ 69 │ 1957 │ 2026 │
│ Shape │ VARCHAR │ 46.51 │ 543 │ 2 Box + Tether │ Unk │
│ Span │ FLOAT │ 0.09 │ 329 │ 0.0 │ 19695.0 │
│ State │ VARCHAR │ 0.09 │ 123 │ AM │ ZW │
│ Status │ VARCHAR │ 0.09 │ 19 │ AO │ TFR │
│ Status_or_failure_flag │ VARCHAR │ 79.67 │ 11 │ Satellite active orbit lowerin │ This object was the one that f │
│ Subtype_flag │ VARCHAR │ 45.86 │ 26 │ Aerodynamic breakup at low per │ Weapons test, ASAT debris │
│ TotMass │ FLOAT │ 0.09 │ 2064 │ 0.0 │ 117440.0 │
│ Type │ VARCHAR │ 0.09 │ 404 │ C X │ Z X c │
└────────────────────────┴──────────────────┴─────────────────┴───────────────┴────────────────────────────────┴────────────────────────────────┘
Below are the most represented states in this dataset.
SELECT State,
COUNT(*)
FROM 'satcat.parquet'
WHERE SDate_Year >= 2022
GROUP BY 1
ORDER BY 2 DESC
LIMIT 20;
┌─────────┬──────────────┐
│ State │ count_star() │
│ varchar │ int64 │
├─────────┼──────────────┤
│ US │ 11731 │
│ CN │ 3040 │
│ RU │ 513 │
│ UK │ 313 │
│ J │ 148 │
│ NZ │ 126 │
│ I │ 84 │
│ F │ 81 │
│ IN │ 63 │
│ KR │ 55 │
│ E │ 55 │
│ L │ 49 │
│ CA │ 47 │
│ D │ 47 │
│ TR │ 45 │
│ AU │ 35 │
│ FI │ 33 │
│ UY │ 32 │
│ IR │ 29 │
│ IL │ 29 │
└─────────┴──────────────┘
This is the record for Starlink 36609.
$ echo "FROM 'satcat.parquet'
WHERE Name = 'Starlink 36609'
LIMIT 1" \
| ~/duckdb -json \
| jq -S .
[
{
"AltNames": null,
"Apogee": 260,
"Attach_flag": null,
"Bus": "Starlink V2MO",
"Coarse_type": "Payload (for orbital attempt)",
"DDate": null,
"Dest": null,
"Diameter": 4.099999904632568,
"DryMass": "530",
"ID_flag": null,
"Inc": 97.27999877929688,
"JCAT": "S67582",
"LDate": "2026 Jan 25",
"Launch_Tag": "2026-016",
"Length": 0.30000001192092896,
"Manufacturer": "SPXS",
"Mass": 575.0,
"Motor": null,
"Name": "Starlink 36609",
"ODate": "2026 Jan 26",
"OQUAL": null,
"OpOrbit": "LLEO/S",
"Orbit_flag": null,
"Owner": "SPXS",
"PLName": "Starlink Group 17-20-22",
"Parent": "A11741",
"Perigee": "247",
"Piece": "2026-016X",
"Primary": "Earth",
"SDate": "2026 Jan 25 1832:44",
"SDate_Year": 2026,
"Satcat": "67582",
"Shape": "Box + pan",
"Span": 29.0,
"State": "US",
"Status": "O",
"Status_or_failure_flag": "Satellite is active in operational orbit",
"Subtype_flag": null,
"TotMass": 575.0,
"Type": "P O"
}
]
Below are the counts for each "start year" for Starlink satellites.
$ ~/duckdb
SELECT SDate_Year,
COUNT(*)
FROM 'satcat.parquet'
WHERE Name LIKE 'Starlink%'
GROUP BY 1
ORDER BY 1;
┌────────────┬──────────────┐
│ SDate_Year │ count_star() │
│ int32 │ int64 │
├────────────┼──────────────┤
│ 2019 │ 128 │
│ 2020 │ 890 │
│ 2021 │ 1063 │
│ 2022 │ 1826 │
│ 2023 │ 2065 │
│ 2024 │ 1964 │
│ 2025 │ 3169 │
│ 2026 │ 1076 │
└────────────┴──────────────┘
Below is a breakdown of unique values and NULL coverage across each column for Starlink satellites.
SELECT column_name,
column_type[:30],
null_percentage,
approx_unique,
min[:30],
max[:30]
FROM (SUMMARIZE
FROM 'satcat.parquet'
WHERE Name LIKE 'Starlink%')
ORDER BY LOWER(column_name);
┌────────────────────────┬──────────────────┬─────────────────┬───────────────┬────────────────────────────────┬────────────────────────────────┐
│ column_name │ column_type[:30] │ null_percentage │ approx_unique │ min[:30] │ max[:30] │
│ varchar │ varchar │ decimal(9,2) │ int64 │ varchar │ varchar │
├────────────────────────┼──────────────────┼─────────────────┼───────────────┼────────────────────────────────┼────────────────────────────────┤
│ AltNames │ VARCHAR │ 99.98 │ 3 │ 6 │ VisorSat │
│ Apogee │ INTEGER │ 0.00 │ 170 │ 232 │ 588 │
│ Attach_flag │ VARCHAR │ 100.00 │ 0 │ NULL │ NULL │
│ Bus │ VARCHAR │ 2.92 │ 4 │ Starlink │ Starlink V2MO │
│ Coarse_type │ VARCHAR │ 0.00 │ 2 │ Component │ Payload (for orbital attempt) │
│ DDate │ VARCHAR │ 84.60 │ 1582 │ 2020 Apr 2 1100? │ 2026 Mar 31 0937 │
│ Dest │ VARCHAR │ 92.28 │ 1024 │ 0.5W 43.8S │ 9W 13S │
│ Diameter │ FLOAT │ 0.00 │ 6 │ 0.02 │ 4.1 │
│ DryMass │ VARCHAR │ 0.00 │ 16 │ 100 │ 910 │
│ ID_flag │ VARCHAR │ 97.95 │ 2 │ Starlink filing with FCC repor │ The association of this satell │
│ Inc │ FLOAT │ 0.00 │ 37 │ 42.98 │ 97.68 │
│ JCAT │ VARCHAR │ 0.00 │ 10039 │ S44235 │ S68825 │
│ Launch_Tag │ VARCHAR │ 0.00 │ 423 │ 2019-029 │ 2026-089 │
│ LDate │ VARCHAR │ 0.00 │ 368 │ 2019 May 24 │ 2026 Mar 30 │
│ Length │ FLOAT │ 0.00 │ 4 │ 0.2 │ 7.0 │
│ Manufacturer │ VARCHAR │ 0.00 │ 1 │ SPXS │ SPXS │
│ Mass │ FLOAT │ 0.00 │ 15 │ 5.0 │ 960.0 │
│ Motor │ VARCHAR │ 100.00 │ 0 │ NULL │ NULL │
│ Name │ VARCHAR │ 0.00 │ 10070 │ Starlink 1007 │ Starlink rideshare adapter │
│ ODate │ VARCHAR │ 0.00 │ 557 │ 2019 May 26 │ 2026 Mar 31 │
│ OpOrbit │ VARCHAR │ 0.00 │ 2 │ LLEO/I │ LLEO/S │
│ OQUAL │ VARCHAR │ 99.95 │ 5 │ - s │ -x │
│ Orbit_flag │ VARCHAR │ 100.00 │ 0 │ NULL │ NULL │
│ Owner │ VARCHAR │ 0.00 │ 1 │ SPXS │ SPXS │
│ Parent │ VARCHAR │ 0.00 │ 368 │ A09431 │ S67673 │
│ Perigee │ VARCHAR │ 0.00 │ 189 │ 136 │ 566 │
│ Piece │ VARCHAR │ 0.00 │ 12093 │ 2019-029A │ 2026-089Z │
│ PLName │ VARCHAR │ 2.87 │ 12746 │ Starlink Group 10-1-01 │ adapter? │
│ Primary │ VARCHAR │ 0.00 │ 1 │ Earth │ Earth │
│ Satcat │ VARCHAR │ 0.00 │ 9708 │ 44235 │ NNA │
│ SDate │ VARCHAR │ 0.00 │ 471 │ 2019 May 24 0332 │ 2026 Mar 30 2216:37 │
│ SDate_Year │ INTEGER │ 0.00 │ 9 │ 2019 │ 2026 │
│ Shape │ VARCHAR │ 0.00 │ 4 │ Box + pan │ Plate │
│ Span │ FLOAT │ 0.00 │ 5 │ 2.0 │ 29.0 │
│ State │ VARCHAR │ 0.00 │ 1 │ US │ US │
│ Status │ VARCHAR │ 0.00 │ 2 │ O │ R │
│ Status_or_failure_flag │ VARCHAR │ 2.92 │ 9 │ Satellite active orbit lowerin │ Satellite removed slighly (and │
│ Subtype_flag │ VARCHAR │ 97.08 │ 1 │ Payload adapter, support struc │ Payload adapter, support struc │
│ TotMass │ FLOAT │ 0.00 │ 15 │ 5.0 │ 960.0 │
│ Type │ VARCHAR │ 0.00 │ 63 │ C A │ P T r │
└────────────────────────┴──────────────────┴─────────────────┴───────────────┴────────────────────────────────┴────────────────────────────────┘
Below are the status counts for Starlink satellites.
SELECT COUNT(*),
Status_or_failure_flag
FROM 'satcat.parquet'
WHERE Name LIKE 'Starlink%'
GROUP BY 2
ORDER BY 1 DESC;
┌──────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ count_star() │ Status_or_failure_flag │
│ int64 │ varchar │
├──────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ 8629 │ Satellite is active in operational orbit │
│ 1558 │ Satellite active orbit lowering to reentry │
│ 730 │ Satellite removed far from operational constellation by lowering or raising orbit │
│ 356 │ NULL │
│ 292 │ Satellite ascending: orbit raising to op orbit (or lowering from high drift orbit) │
│ 227 │ Satellite in plane drift orbit │
│ 158 │ Satellite failed in operational orbit and is undergoing uncontrolled decay - non manueverable │
│ 131 │ Satellite failed early in mission, before reaching operational orbit (”screened”) │
│ 100 │ Satellite removed slighly (and possibly temporarily) from operational constellation by lowering or raising orbit a little │
└──────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Starlink satellites have a much more documented status than usual. Of the 68K+ records, almost 55K of them don't have any status or failure flag to speak of.
SELECT COUNT(*),
Status_or_failure_flag
FROM 'satcat.parquet'
GROUP BY 2
ORDER BY 1 DESC;
┌──────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ count_star() │ Status_or_failure_flag │
│ int64 │ varchar │
├──────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ 54843 │ NULL │
│ 10168 │ Satellite is active in operational orbit │
│ 1634 │ Satellite active orbit lowering to reentry │
│ 735 │ Satellite removed far from operational constellation by lowering or raising orbit │
│ 392 │ Satellite in plane drift orbit │
│ 344 │ Satellite ascending: orbit raising to op orbit (or lowering from high drift orbit) │
│ 238 │ Satellite failed in operational orbit and is undergoing uncontrolled decay - non manueverable │
│ 165 │ Satellite retired to a graveyard orbit of some kind │
│ 144 │ Satellite failed early in mission, before reaching operational orbit (”screened”) │
│ 136 │ Satellite removed slighly (and possibly temporarily) from operational constellation by lowering or raising orbit a little │
│ 33 │ This object was the one that failed during launch. │
│ 2 │ Satellite was used for special tests outside of main constellation │
└──────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Below are the Starlink bus counts.
SELECT Bus,
COUNT(*)
FROM 'satcat.parquet'
WHERE Name LIKE 'Starlink%'
GROUP BY 1;
┌───────────────┬──────────────┐
│ Bus │ count_star() │
│ varchar │ int64 │
├───────────────┼──────────────┤
│ Starlink V2MD │ 663 │
│ NULL │ 356 │
│ Starlink V2M │ 2760 │
│ Starlink V2MO │ 3722 │
│ Starlink │ 4680 │
└───────────────┴──────────────┘
Below are the Starlink operational orbit counts.
SELECT OpOrbit,
COUNT(*)
FROM 'satcat.parquet'
WHERE Name LIKE 'Starlink%'
GROUP BY 1;
┌─────────┬──────────────┐
│ OpOrbit │ count_star() │
│ varchar │ int64 │
├─────────┼──────────────┤
│ LLEO/I │ 11110 │
│ LLEO/S │ 1071 │
└─────────┴──────────────┘
The Type column in this dataset is where a lot of interesting data is baked away. I still have work to do on decoding each of the 12-character strings. If you're interested in working on this, page 34 onward of GCAT's documentation describes the Type column in detail.