Home | Benchmarks | Categories | Atom Feed

Posted on Sun 05 May 2024 under Video

Minimalist Guide to AV1 Video Encoding

AV1 is an open source, royalty-free video codec. It was released in 2018 by the Alliance for Open Media (AOMedia). This group includes firms such as ARM, Apple, Intel, Nvidia and Samsung.

AV1 encodings can often be 4x smaller than an equivalent x264 encoding and half the size of an x265 encoding.

Much of the 8K footage I've come across needs about 1 GB of storage for every minute of video. This is due to x265 encoding being used as it is one of the few codecs that enjoys widespread hardware support at this resolution. But x265 also demands royalty fees from content producers.

To download an 8K video in real-time would require a solid 135 Mbps Internet connection. 8K might be a novelty for watching movies in your living room but it's more of a minimum requirement for Virtual Reality (VR).

Also, asking someone to save 60 GB on their phone for every hour of content they want to consume before a flight is asking a lot, even of the larger flagship smartphones.

AV1 would cut these requirements down by at least 2-3x and would be a huge step forward.

There are four major open source software implementations of AV1. I'll briefly cover three of them (aom, dav1d, rav1e) before going more into depth with SVT-AV1.

My Workstation

I'm using a 6 GHz Intel Core i9-14900K CPU. It has 8 performance cores and 16 efficiency cores with a total of 32 threads and 32 MB of L2 cache. It has a liquid cooler attached and is housed in a spacious, full-sized, Cooler Master HAF 700 computer case. I've come across videos on YouTube where people have managed to overclock the i9-14900KF to 9.1 GHz.

The system has 96 GB of DDR5 RAM clocked at 6,000 MT/s and a 5th-generation, Crucial T700 4 TB NVMe M.2 SSD which can read at speeds up to 12,400 MB/s. There is a heatsink on the SSD to help keep its temperature down. This is my system's C drive.

There is also a 2 TB SATA-connected SSD that will be used to store the videos and virtual machine used in this post. This is my system's D drive.

The system is powered by a 1,200-watt, fully modular, Corsair Power Supply and is sat on an ASRock Z790 Pro RS Motherboard.

I'm running Ubuntu 22 LTS via VirtualBox on Windows 11 Pro. VirtualBox has been allocated 100 GB of SATA-connected SSD capacity, 16 GB of RAM and 8 CPU cores. This will help isolate the vast and fast-changing libraries FFMPEG and AV1 encoding rely on.

In case you're wondering why I don't run a Linux-based desktop as my primary work environment, I'm still using an Nvidia GTX 1080 GPU which has better driver support on Windows and I use ArcGIS Pro from time to time which only supports Windows natively.

Installing Prerequisites

I'll be using Python, a few build tools and a few re-compiled non-AV1 video codecs in this post.

$ sudo apt update
$ sudo apt install \
    build-essential \
    cmake \
    libmp3lame-dev \
    libnuma-dev \
    libssl-dev \
    libunistring-dev \
    libx264-dev \
    libx265-dev \
    libgnutls28-dev \
    libfreetype6-dev \
    nasm \
    pkg-config \
    python3-venv \
    unzip

Rust was used to build rav1e so I'll install it's build environment.

$ curl --proto '=https' \
       --tlsv1.2 \
       -sSf \
       https://sh.rustup.rs \
    | sh
$ source ~/.cargo/env

If you already have Rust installed, this command will make sure it's the latest available.

$ rustup update
stable-x86_64-unknown-linux-gnu unchanged - rustc 1.78.0 (9b00956e5 2024-04-29)

I'll be using JSON Convert (jc) to convert the output of git into JSON. This will make it much easier to compile statistics on the git repositories discussed in this post.

$ wget https://github.com/kellyjonbrazil/jc/releases/download/v1.25.2/jc_1.25.2-1_amd64.deb
$ sudo dpkg -i jc_1.25.2-1_amd64.deb

I'll set up a Python Virtual Environment and install a few packages.

$ virtualenv ~/.av1
$ source ~/.av1/bin/activate
$ python3 -m pip install \
    pygments \
    shpyx \
    tabulate \
    typer

I'll be using FFMPEG version 7 in this post. I'll fetch the code here and compile it later on in the post.

$ git clone https://github.com/FFmpeg/FFmpeg ~/ffmpeg
$ cd ~/ffmpeg
$ git checkout n7.0

I'll use Netflix's VMAF to determine the quality of various video encodings produced in this post.

$ git clone https://github.com/Netflix/vmaf ~/vmaf
$ wget -O ~/vmaf_bin 'https://github.com/Netflix/vmaf/releases/download/v2.3.1/vmaf'
$ chmod +x ~/vmaf_bin

I'll also use DuckDB, along with its JSON extension in this post.

$ cd ~
$ wget -c https://github.com/duckdb/duckdb/releases/download/v0.10.2/duckdb_cli-linux-amd64.zip
$ unzip -j duckdb_cli-linux-amd64.zip
$ chmod +x ~/duckdb
$ ~/duckdb
INSTALL json;

I'll set up DuckDB to load every installed extension each time it launches.

$ vi ~/.duckdbrc
.timer on
.width 180
LOAD json;

I've written a BASH function which will produce a pivot table of the most active contributors in any given git repository.

$ function top_ten_committers () {
      git log \
        --format=fuller \
        --stat \
          | jc --git-log-s \
          | ~/duckdb -c "
                CREATE OR REPLACE TABLE commits AS
                    SELECT * EXCLUDE(commit_by_date),
                           strptime(commit_by_date,
                                    '%a %b %-d %H:%M:%S %Y %z')
                                AS commit_by_date,
                           strftime(strptime(commit_by_date,
                                    '%a %b %-d %H:%M:%S %Y %z')::DATE,
                                    '%Y') AS year
                    FROM read_json_auto('/dev/stdin');

                CREATE OR REPLACE TEMPORARY TABLE top_committers AS
                    SELECT commit_by,
                           COUNT(*) AS num_commits
                    FROM commits
                    GROUP BY 1
                    ORDER BY 2 DESC
                    LIMIT 10;

                CREATE OR REPLACE TEMPORARY TABLE top_commits AS
                    SELECT commits.*, top_committers.num_commits
                    FROM commits
                    JOIN top_committers
                        ON commits.commit_by = top_committers.commit_by
                    WHERE commits.commit_by IN (
                        SELECT commit_by
                        FROM top_committers);

                WITH pivot_alias AS (
                    PIVOT    top_commits
                    ON       year
                    USING    COUNT(*)
                    GROUP BY commit_by

                )
                SELECT   pivot_alias.*,
                         top_committers.num_commits AS total_commits
                FROM     pivot_alias
                JOIN top_committers
                    ON pivot_alias.commit_by = top_committers.commit_by
                ORDER BY top_committers.num_commits DESC;"
  }

Example Video Files

I've recorded two videos of my living room with my Samsung S22 Ultra for this post. They're both about 15 seconds long. One is in 4K at 60 FPS and is 139 MB in H.264 / x264 encoding. The other is in 8K at 24 FPS and is 155 MB in H.265 / HEVC encoding.

The following shows their video encoding details in JSON format.

$ video_codec () {
     ffprobe -v error \
             -hide_banner \
             -print_format json \
             -show_streams $1 \
          | jq .streams \
          | jq -S 'first(.[] | if .codec_type == "video" then . else empty end)' \
          | jq 'del(.disposition)'
  }
$ video_codec s22.4k.60fps.mp4
{
  "avg_frame_rate": "2169000/36137",
  "bit_rate": "71819495",
  "bits_per_raw_sample": "8",
  "chroma_location": "left",
  "closed_captions": 0,
  "codec_long_name": "H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10",
  "codec_name": "h264",
  "codec_tag": "0x31637661",
  "codec_tag_string": "avc1",
  "codec_type": "video",
  "coded_height": 2160,
  "coded_width": 3840,
  "color_primaries": "bt709",
  "color_range": "tv",
  "color_space": "bt709",
  "color_transfer": "bt709",
  "duration": "16.060889",
  "duration_ts": 1445480,
  "has_b_frames": 0,
  "height": 2160,
  "index": 0,
  "is_avc": "true",
  "level": 52,
  "nal_length_size": "4",
  "nb_frames": "964",
  "pix_fmt": "yuv420p",
  "profile": "High",
  "r_frame_rate": "60/1",
  "refs": 1,
  "start_pts": 0,
  "start_time": "0.000000",
  "tags": {
    "creation_time": "2024-05-04T14:07:24.000000Z",
    "handler_name": "VideoHandle",
    "language": "eng",
    "vendor_id": "[0][0][0][0]"
  },
  "time_base": "1/90000",
  "width": 3840
}
$ video_codec s22.8k.24fps.mp4
{
  "avg_frame_rate": "35640000/1484399",
  "bit_rate": "79968271",
  "chroma_location": "left",
  "closed_captions": 0,
  "codec_long_name": "H.265 / HEVC (High Efficiency Video Coding)",
  "codec_name": "hevc",
  "codec_tag": "0x31637668",
  "codec_tag_string": "hvc1",
  "codec_type": "video",
  "coded_height": 4320,
  "coded_width": 7680,
  "color_primaries": "bt709",
  "color_range": "tv",
  "color_space": "bt709",
  "color_transfer": "bt709",
  "duration": "16.493322",
  "duration_ts": 1484399,
  "has_b_frames": 0,
  "height": 4320,
  "index": 0,
  "level": 183,
  "nb_frames": "396",
  "pix_fmt": "yuv420p",
  "profile": "Main",
  "r_frame_rate": "24/1",
  "refs": 1,
  "start_pts": 2943,
  "start_time": "0.032700",
  "tags": {
    "creation_time": "2024-05-04T14:05:47.000000Z",
    "handler_name": "VideoHandle",
    "language": "eng",
    "vendor_id": "[0][0][0][0]"
  },
  "time_base": "1/90000",
  "width": 7680
}

AOMedia's Reference Implementation

The first codec is AOMedia's Reference Implementation aom. It's made up of almost 500K lines of C++.

The codebase started back in 2010 and has seen fewer and fewer commits as the codebase has matured. Below are the top ten committers / identities broken down by year.

$ git clone https://github.com/ihtsae/aom/ ~/aom
$ cd ~/aom
$ top_ten_committers
┌────────────────────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────────────┐
│     commit_by      │ 2010  │ 2011  │ 2012  │ 2013  │ 2014  │ 2015  │ 2016  │ 2017  │ 2018  │ 2019  │ 2020  │ 2021  │ 2022  │ total_commits │
│      varchar       │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │     int64     │
├────────────────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────────────┤
│ Gerrit Code Review │     0 │    85 │   394 │  1827 │  2304 │  1122 │  1331 │     0 │     0 │     0 │     0 │     0 │     0 │          7063 │
│ Yaowu Xu           │    27 │    62 │   125 │   185 │   152 │   156 │   657 │   228 │   315 │   120 │    59 │    17 │     0 │          2103 │
│ Jingning Han       │     0 │     0 │     6 │   211 │   254 │   223 │   413 │   353 │   140 │   134 │   133 │   136 │     6 │          2009 │
│ Yunqing Wang       │    26 │    67 │    43 │    74 │    84 │    47 │    26 │    58 │   402 │   333 │   388 │   408 │     6 │          1962 │
│ Deb Mukherjee      │     0 │     0 │    64 │    87 │   123 │    39 │   185 │   441 │   370 │   144 │    43 │     4 │     0 │          1500 │
│ James Zern         │     4 │    17 │    28 │   156 │   304 │   289 │   242 │   155 │    72 │    32 │    22 │    79 │     1 │          1401 │
│ Dmitry Kovalev     │     0 │     0 │     0 │   580 │   568 │     0 │     0 │     0 │     0 │     0 │     0 │     0 │     0 │          1148 │
│ John Koleszar      │   288 │   490 │   179 │   173 │     1 │     0 │     0 │     0 │     0 │     0 │     0 │     0 │     0 │          1131 │
│ Angie Chiang       │     0 │     0 │     0 │     0 │     0 │    43 │   149 │   269 │   194 │     3 │    20 │    99 │     3 │           780 │
│ Hui Su             │     0 │     0 │     1 │     0 │     3 │    49 │    98 │   133 │   241 │   179 │    45 │     2 │     0 │           751 │
├────────────────────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────────────┤
│ 10 rows                                                                                                                         15 columns │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Given that this is a reference implementation and it likely to be slower than other codecs in this post, I won't go into too much detail.

VideoLAN's dav1d

The VideoLAN project, famous for their VLC media player, has built their own AV1 codec called 'dav1d'. This project only supports decoding, not encoding so I'll keep my analysis brief.

It's made up of 230K lines of assembler for the wide array of architectures it supports. This code sits alongside ~38K lines of C. The x86 implementation uses over 290 unique assembler instructions.

The project appears to still be as very active with most of the top ten contributors to the code base still working on it in 2024.

$ git clone https://code.videolan.org/videolan/dav1d ~/dav1d
$ cd ~/dav1d
$ top_ten_committers
┌───────────────────────────────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────────────┐
│           commit_by           │ 2018  │ 2019  │ 2020  │ 2021  │ 2022  │ 2023  │ 2024  │ total_commits │
│            varchar            │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │     int64     │
├───────────────────────────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────────────┤
│ Henrik Gramner                │    58 │   114 │    70 │   114 │    66 │    66 │    42 │           530 │
│ Martin Storsjö                │    14 │    83 │   135 │    76 │     7 │    29 │    12 │           356 │
│ Ronald S. Bultje              │   181 │    47 │    29 │    44 │    31 │     6 │     6 │           344 │
│ Jean-Baptiste Kempf           │    74 │    93 │    52 │    55 │     9 │    11 │     9 │           303 │
│ Janne Grunau                  │   155 │    29 │    17 │     5 │     0 │     0 │     0 │           206 │
│ James Almer                   │    55 │    38 │     4 │     4 │    25 │    22 │     1 │           149 │
│ Victorien Le Couviour--Tuffet │     0 │    28 │    19 │    30 │    17 │    13 │     0 │           107 │
│ Nathan E. Egge                │     6 │     0 │     0 │     2 │     0 │     0 │    92 │           100 │
│ Matthias Dressel              │     0 │     1 │     9 │    34 │    23 │    14 │    12 │            93 │
│ Luc Trudeau                   │    25 │     7 │    16 │     0 │     0 │     0 │     0 │            48 │
├───────────────────────────────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────────────┤
│ 10 rows                                                                                     9 columns │
└───────────────────────────────────────────────────────────────────────────────────────────────────────┘

The project has plans for GPU support in future releases. Given the number of mobile devices in the world that don't have AV1 hardware acceleration, this support couldn't come too soon.

The Rust-based rav1e

The third AV1 codec I've looked at is rav1e. It's written in Rust and Assembler and has David Michael Barr, a long-term Samsung staff member, as one of its top contributors.

The Rust line count comes in at just under 60K while the Assembler line count is over 200K.

The top nine contributors have all produced more than 100 commits but only four of these committers have worked on this project this year. It might be a sign of the project's maturity but given other projects are still pretty busy I'm curious as to what's next for this project.

$ git clone https://github.com/xiph/rav1e ~/rav1e
$ cd ~/rav1e
$ top_ten_committers
┌────────────────────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────────────┐
│     commit_by      │ 2017  │ 2018  │ 2019  │ 2020  │ 2021  │ 2022  │ 2023  │ 2024  │ total_commits │
│      varchar       │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │     int64     │
├────────────────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────┼───────────────┤
│ Luca Barbato       │     1 │   100 │   434 │   192 │   174 │    37 │    33 │     5 │           976 │
│ David Michael Barr │     0 │    26 │   240 │   134 │   160 │   169 │   191 │     6 │           926 │
│ Thomas Daede       │    23 │   331 │   248 │    18 │     5 │     7 │     1 │     0 │           633 │
│ GitHub             │     9 │   112 │   122 │    54 │    20 │    30 │    11 │     5 │           363 │
│ Josh Holmer        │     0 │     0 │    44 │   106 │    66 │    60 │    63 │    16 │           355 │
│ Raphaël Zumer      │     0 │    30 │    96 │    33 │     1 │     0 │     0 │     0 │           160 │
│ Kyle Siefring      │     0 │     1 │    46 │    58 │     2 │    10 │     0 │     0 │           117 │
│ Monty Montgomery   │     0 │    37 │    48 │    22 │     0 │     0 │     0 │     0 │           107 │
│ Yushin Cho         │     0 │    42 │    36 │    25 │     0 │     0 │     0 │     0 │           103 │
│ vibhoothi          │     0 │     0 │    91 │     0 │     0 │     0 │     0 │     0 │            91 │
├────────────────────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────────────┤
│ 10 rows                                                                                 10 columns │
└────────────────────────────────────────────────────────────────────────────────────────────────────┘

I was able to compile a stand-alone binary of the last main branch but I wasn't able to put together sensible steps to include rav1e support when compiling FFMPEG within the timebox I've allocated for this post.

The project boasts that it's the fastest AV1 encoder. If this topic is of interest to a decent portion of my audience, I'll take a second look at getting rav1e working with FFMPEG 7 and run some comparisons against SVT-AV1.

AOMedia's SVT-AV1

The Scalable Video Technology for AV1 codec (SVT-AV1) was first started by Intel and Netflix and later adopted by AOMedia.

The number of significant committers to this project has been shrinking since it started but they're still pushing releases that have a signifiant impact on encoding times and resulting quality. The main branch below is a month ahead of the v2.0.0 release from March 13th.

$ git clone https://gitlab.com/AOMediaCodec/SVT-AV1 ~/SVT-AV1
$ cd ~/SVT-AV1
$ top_ten_committers
┌────────────────────┬───────┬───────┬───────┬───────┬───────┬───────┬───────────────┐
│     commit_by      │ 2019  │ 2020  │ 2021  │ 2022  │ 2023  │ 2024  │ total_commits │
│      varchar       │ int64 │ int64 │ int64 │ int64 │ int64 │ int64 │     int64     │
├────────────────────┼───────┼───────┼───────┼───────┼───────┼───────┼───────────────┤
│ Hassene Tmar       │   286 │   233 │    46 │   244 │   109 │    33 │           951 │
│ GitHub             │   228 │   324 │     9 │     0 │     0 │     0 │           561 │
│ Christopher Degawa │    49 │     0 │   182 │   145 │    55 │     1 │           432 │
│ Joel Sole          │   196 │     0 │     0 │     0 │     0 │     0 │           196 │
│ Cidana-Developers  │    20 │     0 │   109 │     7 │     5 │     0 │           141 │
│ Chris Degawa       │   126 │     0 │     0 │     0 │     0 │     0 │           126 │
│ Worth              │     0 │     0 │     0 │   108 │     9 │     2 │           119 │
│ htmar              │   101 │     1 │     0 │     0 │     0 │     0 │           102 │
│ hguermaz           │    92 │     0 │     0 │     6 │     1 │     0 │            99 │
│ hassount           │    24 │     0 │    65 │    10 │     0 │     0 │            99 │
├────────────────────┴───────┴───────┴───────┴───────┴───────┴───────┴───────────────┤
│ 10 rows                                                                  8 columns │
└────────────────────────────────────────────────────────────────────────────────────┘

The codebase is made up of 417K lines of C++ and 26K lines of Assembler.

FFMPEG supports both encoding and decoding with SVT-AV1. Building the codec itself is straightforward and FFMPEG should find it without issue when compiling.

$ mkdir -p ~/svt_build
$ cd ~/svt_build
$ cmake ~/SVT-AV1/
$ make -j$(nproc)
$ sudo make install
$ sudo ldconfig

Compiling FFMPEG with AV1 Support

I'll first make sure any previous FFMPEG builds are removed. They'd block the compiler from writing new binaries if they're present so it's important to have a clean build environment first.

$ cd ~/ffmpeg
$ make clean

The following will compile FFMPEG with x264, x265 and AV1 support via SVT-AV1.

$ ./configure \
      --prefix="$HOME/ffmpeg" \
      --pkg-config-flags="--static" \
      --extra-cflags="-I$HOME/ffmpeg/include" \
      --extra-ldflags="-L$HOME/ffmpeg/lib" \
      --extra-libs="-lpthread -lm" \
      --ld="g++" \
      --bindir="$HOME/ffmpeg/bin" \
      --enable-gpl \
      --enable-gnutls \
      --enable-libfreetype \
      --enable-libmp3lame \
      --enable-libx264 \
      --enable-libx265 \
      --enable-libsvtav1 \
      --enable-nonfree

$ make -j$(nproc)

Below I'll check that the FFMPEG binary compiled supports SVT-AV1.

$ ./ffmpeg -hide_banner \
           -codecs \
    | grep -i av1
DEV.L. av1                  Alliance for Open Media AV1 (encoders: libsvtav1)

FFMPEG Codec Support

As a side note, the FFMPEG codec list is very valuable in understanding what any given instance of FFMPEG supports. With that said, I find its format unhelpful when trying to put together a birds-eye-view.

I'll export the codec support list, parse it, convert it into JSON and analyse it in DuckDB.

$ ./ffmpeg -hide_banner \
           -codecs \
        > codecs.txt
$ python3
import json


lines = [line
         for line in open('codecs.txt').read().splitlines()
         if (line.startswith(' D') or line.startswith(' .'))
         and '=' not in line]

codecs = []

for line in lines:
    support_codec_name  = line.split('   ')[0].strip()
    description         = line.split('   ')[-1].strip()
    support, codec_name = support_codec_name.split(' ')
    codecs.append({
        'name':                      codec_name,
        'description':               description,
        'decoding_supported':        support[0] == 'D',
        'encoding_supported':        support[1] == 'E',
        'is_video_codec':            support[2] == 'V',
        'is_audio_codec':            support[2] == 'A',
        'is_subtitle_codec':         support[2] == 'S',
        'is_data_codec':             support[2] == 'D',
        'is_attachment_codec':       support[2] == 'T',
        'is_intra_frame_only_codec': support[3] == 'I',
        'lossy_compression':         support[4] == 'L',
        'lossless_compression':      support[5] == 'S'})

with open('codecs.json', 'w') as f:
    for codec in codecs:
        f.write(json.dumps(codec, sort_keys=True) + '\n')
$ ~/duckdb
CREATE OR REPLACE TABLE codecs AS
    SELECT *
    FROM READ_JSON('codecs.json');

A large chunk of codecs in this installation only support decoding but the number support encoding (93) is still a respectable number.

SELECT   decoding_supported,
         encoding_supported,
         COUNT(*) num_codecs
FROM     codecs
WHERE    is_video_codec
GROUP BY 1, 2
ORDER BY 3 DESC;
┌────────────────────┬────────────────────┬────────────┐
│ decoding_supported │ encoding_supported │ num_codecs │
│      boolean       │      boolean       │   int64    │
├────────────────────┼────────────────────┼────────────┤
│ true               │ false              │        175 │
│ true               │ true               │         90 │
│ false              │ false              │          6 │
│ false              │ true               │          3 │
└────────────────────┴────────────────────┴────────────┘

Video codecs make up the largest group with audio codecs not too far behind.

SELECT   is_video_codec,
         is_audio_codec,
         is_subtitle_codec,
         is_data_codec,
         is_attachment_codec,
         COUNT(*) num_codecs
FROM     codecs
GROUP BY 1, 2, 3, 4, 5
ORDER BY 6 DESC;
┌────────────────┬────────────────┬───────────────────┬───────────────┬─────────────────────┬────────────┐
│ is_video_codec │ is_audio_codec │ is_subtitle_codec │ is_data_codec │ is_attachment_codec │ num_codecs │
│    boolean     │    boolean     │      boolean      │    boolean    │       boolean       │   int64    │
├────────────────┼────────────────┼───────────────────┼───────────────┼─────────────────────┼────────────┤
│ true           │ false          │ false             │ false         │ false               │        274 │
│ false          │ true           │ false             │ false         │ false               │        208 │
│ false          │ false          │ true              │ false         │ false               │         26 │
│ false          │ false          │ false             │ true          │ false               │         10 │
└────────────────┴────────────────┴───────────────────┴───────────────┴─────────────────────┴────────────┘

The majority of video codecs use lossy compression. Not surprising given the media-rich nature of the content FFMPEG works with and the historical hardware constraints surrounding video's distribution and storage.

SELECT   lossless_compression,
         lossy_compression,
         COUNT(*)
FROM     codecs
WHERE    is_video_codec
GROUP BY 1, 2
ORDER BY 3 DESC;
┌──────────────────────┬───────────────────┬──────────────┐
│ lossless_compression │ lossy_compression │ count_star() │
│       boolean        │      boolean      │    int64     │
├──────────────────────┼───────────────────┼──────────────┤
│ false                │ true              │          169 │
│ true                 │ false             │           87 │
│ true                 │ true              │           12 │
│ false                │ false             │            6 │
└──────────────────────┴───────────────────┴──────────────┘

54 audio codecs in this build support lossless encoding.

SELECT   lossless_compression,
         lossy_compression,
         COUNT(*)
FROM     codecs
WHERE    is_audio_codec
GROUP BY 1, 2
ORDER BY 3 DESC;
┌──────────────────────┬───────────────────┬──────────────┐
│ lossless_compression │ lossy_compression │ count_star() │
│       boolean        │      boolean      │    int64     │
├──────────────────────┼───────────────────┼──────────────┤
│ false                │ true              │          150 │
│ true                 │ false             │           50 │
│ false                │ false             │            4 │
│ true                 │ true              │            4 │
└──────────────────────┴───────────────────┴──────────────┘

SVT-AV1 Presets

There are three major settings when encoding video with SVT-AV1.

The first setting is film-grain handling. It's very difficult to achieve high efficiency when there is a lot of grain. SVT-AV1 can detect and remove film grain, encode the underlying imagery and then re-apply synthetic film grain afterwards. The test videos I'll be working with don't contain any film grain so I'll exclude this functionality from my tests below.

The second setting is the Constant Rate Factor (CRF). At 0, the codec attempts to produce pristine, lossless encoding. 23 is the default value and at 63 the codec will produce the lowest-quality, lossy output. When encoding for live streaming or video that will mostly be consumed on a mobile device, I find 40 or even 50 to produce acceptable results with few artefacts.

The third setting is the preset. These are numbered 0 to 13. Each preset contains a list of settings and functionality that will be used during the encoding process. The lower the present number, the higher the resulting encoding quality at the expense of speed. Presets 4 - 7 are considered a middle ground between quality and encoding speed. If you're working with 8K video, only preset 8 or higher is supported in the current version of SVT-AV1.

Presets 0 - 3 appear to contain identical settings according to SVT-AV1's documentation.

Preset 4 will disable the following decision features in order to increase encoding speed:

  • Filter intra
  • Global motion compensation
  • Wedge prediction
  • Difference-weighted prediction
  • Distance-weighted prediction

Presets 5 - 7 appear to be identical and will lower the max reference frame count from 7 to 5 and disable the SG Restoration Filter.

Presets 8 - 9 appear to be identical and will reduce the block partitioning sb size from 128 to 64 and the min block size from 4 to 8.

Preset 10 will disable Non-square block partitions and disable Wiener Filter Restoration.

Preset 11 will disable the following decision features:

  • Paeth
  • Intra block copy

Preset 11 will disable the following decision features:

  • Motion Field Motion Vector
  • Overlapped Block Motion Compensation

Preset 13 isn't intended for regular use.

Below is a chart of the differences between each of the presets in SVT-AV1. Many features of SVT-AV1 are supported across all presets so I've excluded those from this chart.

SVT-AV1 Presets

Comparing SVT Settings

I've put together a Python script that can take a section of a video and produce multiple encodings of it using a variety of settings. The resulting file sizes, encoding rates and VMAF values will then be presented. The files are retained so they can be manually examined as well.

$ vi ~/svt.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

# pylint: disable=C0116 C0209 R0916 R0912 R0915 R0914 C0201 R1732 W1514 C3001

"""Encodes a small portion of a video in multiple SVT-AV1 settings and
reports the resulting filesizes, encoding rates and VMAF values."""


from   datetime            import datetime
import json
from   itertools           import chain, product
from   os                  import makedirs, unlink
from   os.path             import abspath, getsize, splitext
from   shlex               import quote
# OSError: [Errno 18] Invalid cross-device link
from   shutil              import move as shutil_move
from   tempfile            import NamedTemporaryFile

from   pygments            import highlight
from   pygments.lexers     import BashLexer
from   pygments.formatters import TerminalFormatter
from   shpyx               import run as execute
from   tabulate            import tabulate
import typer


app = typer.Typer(rich_markup_mode='rich')
FFMPEG_DEFAULT  = 'ffmpeg'
FFPROBE_DEFAULT = 'ffprobe'
TEMP_FOLDER     = '/tmp'
alt_bin         = {'rich_help_panel': 'Alternative Binaries'}


remove_ext      = lambda filename: splitext(filename)[0]


def product_from_dict(params):
    keys, values = zip(*params.items())
    return [dict(zip(keys, bundle)) for bundle in product(*values)]


def print_bash(cmd):
    print(highlight(cmd.strip(), BashLexer(), TerminalFormatter()))


def get_size(filename: str, ffprobe: str, verbose: bool=False):
    cmd = '%(ffprobe)s ' \
          '-v error ' \
          '-hide_banner ' \
          '-print_format json ' \
          '-show_streams ' \
          '%(filename)s' % {'filename': quote(filename),
                            'ffprobe': ffprobe}

    if verbose:
        print_bash(cmd)

    try:
        data = json.loads(execute(cmd).stdout)\
                        ['streams']\
                        [0]
    except Exception as exc:
        print(exc)
        return 0, 0, False

    width, height, is_vertical = \
        data.get('width', 0), \
        data.get('height', 0), \
        data.get('height', 0) > data.get('width', 0)

    return width, height, is_vertical


def test_av1_settings(filename:str,
                      ffmpeg:str,
                      ffprobe:str,
                      point:str = '00:00:55.01',
                      seconds:int = 25,
                      vamf_threads:int = 8,
                      use_vmaf:bool = False,
                      v_codec:str = 'libsvtav1',
                      settings:dict = {},
                      verbose:bool = False):
    '''
    Compress a segment of video across a range of settings
    '''
    try:
        makedirs(TEMP_FOLDER)
    except FileExistsError:
        pass

    width, height, is_vertical = get_size(filename, ffprobe, verbose)

    cmd_yuv = '%(ffmpeg)s -y -v error -hide_banner ' \
              '-i %(video)s ' \
              '-ss %(point)s ' \
              '-t 00:00:%(seconds)02d ' \
              '-c:v rawvideo ' \
              '-pix_fmt yuv420p10le ' \
              '%(out)s'

    cmd_yuv_no_clip = '%(ffmpeg)s -y -v error -hide_banner ' \
                      '-i %(video)s ' \
                      '-c:v rawvideo ' \
                      '-pix_fmt yuv420p10le ' \
                      '%(out)s'

    cmd_vmaf = '/home/mark/vmaf_bin -r %(source)s '\
               '-d %(comparison_video)s ' \
               '-w %(width)d ' \
               '-h %(height)d ' \
               '-b 10 ' \
               '-p 420 ' \
               '–threads %(vamf_threads)d ' \
               '-m path=/home/mark/vmaf/model/vmaf_v0.6.1.json ' \
               '--json ' \
               '-o %(json_out)s'

    # Create source YUV
    if use_vmaf:
        source_yuv = NamedTemporaryFile(suffix='.source.yuv',
                                        delete=False,
                                        dir=TEMP_FOLDER)
        cmd_ = cmd_yuv % {
                'ffmpeg':  ffmpeg,
                'video':   quote(filename),
                'point':   point,
                'seconds': seconds,
                'out':     quote(source_yuv.name)}

        if verbose:
            print_bash(cmd_)

        try:
            execute(cmd_)
        except Exception as exc:
            print(cmd_)
            print(exc)
            unlink(source_yuv.name)
            return None

    # Generate a video
    cmd = '%(ffmpeg)s -y -v error -hide_banner ' \
          '-i %(video)s ' \
          '-ss %(point)s ' \
          '-t 00:00:%(seconds)02d ' \
          '-c:a copy ' \
          '-c:v %(v_codec)s ' \
          '%(settings)s ' \
          '%(out)s'

    metrics = []

    for settings_ in product_from_dict(settings):
        settings_token = '.'.join(['%s_%s' % (k, v)
                                   for k, v in settings_.items()])

        temp_ = NamedTemporaryFile(
                    suffix='.%s.mp4' % settings_token,
                    delete=False,
                    dir=TEMP_FOLDER)

        settings_params = ' '.join(['-%s %s' % (k, v)
                                    for k, v in settings_.items()])

        cmd_ = cmd % {
            'ffmpeg':   ffmpeg,
            'video':    quote(filename),
            'settings': settings_params,
            'point':    point,
            'seconds':  seconds,
            'out':      quote(temp_.name),
            'v_codec':  v_codec}

        if verbose:
            print_bash(cmd_)

        start_time = datetime.utcnow()

        try:
            execute(cmd_)
        except Exception as exc:
            print(cmd_)
            print(exc)

            if use_vmaf:
                unlink(source_yuv.name)

            unlink(temp_.name)
            return None

        finished = (datetime.utcnow() - start_time).total_seconds()
        realtime = float(seconds) / finished if finished else 0

        # Generate YUV
        vmaf = 0

        if use_vmaf:
            target_yuv = NamedTemporaryFile(suffix='.target.yuv',
                                            delete=False,
                                            dir=TEMP_FOLDER)
            cmd_ = cmd_yuv_no_clip % {
                'ffmpeg':  ffmpeg,
                'video':   quote(temp_.name),
                'out':     quote(target_yuv.name)}

            if verbose:
                print_bash(cmd_)

            try:
                execute(cmd_)
            except Exception as exc:
                print(cmd_)
                print(exc)
                unlink(temp_.name)
                unlink(source_yuv.name)
                unlink(target_yuv.name)
                return None

            # Get VMAF
            vmaf_json = NamedTemporaryFile(suffix='.vmaf.json',
                                           delete=False,
                                           dir=TEMP_FOLDER)

            cmd_ = cmd_vmaf % {
                'source':           quote(source_yuv.name),
                'comparison_video': quote(target_yuv.name),
                'width':            width,
                'height':           height,
                'vamf_threads':     vamf_threads,
                'json_out':         quote(vmaf_json.name)}

            if verbose:
                print_bash(cmd_)

            try:
                execute(cmd_)
            except Exception as exc:
                print(cmd_)
                print(exc)
                unlink(temp_.name)
                unlink(source_yuv.name)
                unlink(target_yuv.name)
                unlink(vmaf_json.name)
                return None

            vmafs = [frame['metrics']['vmaf']
                     for frame in json.loads(open(vmaf_json.name).read())
                                    ['frames']]
            vmaf = 0 if not len(vmafs) else sum(vmafs) / len(vmafs)

        # Add to stats
        if use_vmaf:
            metrics.append({**settings_,
                            **{'vmaf': vmaf,
                               'realtime': '%.3f' % realtime,
                               'filesize': getsize(temp_.name)}})
        else:
            metrics.append({**settings_,
                            **{'realtime': '%.3f' % realtime,
                              'filesize': getsize(temp_.name)}})

        shutil_move(temp_.name,
                    remove_ext(abspath(filename)) +
                        '.%s.mp4' % settings_token)

        if use_vmaf:
            unlink(target_yuv.name)
            unlink(vmaf_json.name)

    # Remove source YUV and print stats
    if use_vmaf:
        unlink(source_yuv.name)

    render_table(metrics)


@app.command()
def test_svt(filename:str,
             ffmpeg:  str = typer.Option(FFMPEG_DEFAULT,  **alt_bin),
             ffprobe: str = typer.Option(FFPROBE_DEFAULT, **alt_bin),
             point:str = '00:00:55.01',
             seconds:int = 25,
             vamf_threads:int = 8,
             use_vmaf:bool = False,
             verbose:bool = False,
             # 8k+ resolution support is limited to M8 and faster presets
             presets:str = '8,12',
             crfs:str = '40,50,60'):
    presets = [int(x.strip()) for x in presets.split(',') if len(x.strip())]
    crfs    = [int(x.strip()) for x in crfs.split(',')    if len(x.strip())]

    test_av1_settings(filename=filename,
                      ffmpeg=ffmpeg,
                      ffprobe=ffprobe,
                      point=point,
                      seconds=seconds,
                      vamf_threads=vamf_threads,
                      use_vmaf=use_vmaf,
                      v_codec='libsvtav1',
                      settings= {'crf':     crfs,
                                 'preset': presets},
                      verbose=verbose)


def render_table(out:dict):
    # Headers should appear in alphabetical order
    keys = sorted(set(chain(*[x.keys()
                              for x in out])))

    # Fill in any blanks
    out2 = [[None if k not in x.keys() else x[k]
             for k in keys]
             for x in out]

    print(tabulate(out2,
                   headers=keys,
                   tablefmt='orgtbl',
                   floatfmt='.3f',
                   intfmt=','))


if __name__ == "__main__":
    app()

Below I'll produce 6 videos covering presets 8 and 12 and CRF rates 40, 50 and 60. I'll run this on the 4K example video.

$ python3 ~/svt.py \
    test-svt \
    --ffmpeg=$HOME/ffmpeg/ffmpeg \
    --ffprobe=$HOME/ffmpeg/ffprobe \
    --verbose \
    --use-vmaf \
    --point=00:00:03.01 \
    --seconds=2 \
    --presets=8,12 \
    --crfs=40,50,60 \
    s22.4k.60fps.mp4
|   crf |   filesize |   preset |   realtime |   vmaf |
|-------+------------+----------+------------+--------|
|    40 |  2,317,739 |        8 |      0.338 | 93.195 |
|    40 |  2,363,697 |       12 |      0.572 | 90.583 |
|    50 |    988,051 |        8 |      0.371 | 88.234 |
|    50 |    961,043 |       12 |      0.594 | 84.167 |
|    60 |    439,489 |        8 |      0.376 | 82.098 |
|    60 |    404,771 |       12 |      0.640 | 75.484 |

The preset values didn't have much of an impact of the resulting file sizes but did make an impact on both the encoding rates and resulting VMAF values.

The CRF setting has a dramatic impact on the resulting filesize. It's interesting that at CRF 60, there is hardly a difference in resulting filesize but the VMAF value fell off a cliff. Even if the encoding speed just about doubled, its dramatic impact on quality questions the value of encoding time above all else.

Irrespective of the results, no setting was able to encode any faster than 0.594x of real-time on my system. This is a testament to the heavy CPU overhead AV1 demands over codecs like x264. When I run FFMPEG outside of a virtual machine, it is able to utilise every CPU core on my system and I often find that if I'm sourcing media from mechanical disks, the sequential throughput rate of that disk (usually ~125 MB/s) ends up becoming the bottleneck.

I'll run the same test on the 8K video.

$ python3 ~/svt.py \
    test-svt \
    --ffmpeg=$HOME/ffmpeg/ffmpeg \
    --ffprobe=$HOME/ffmpeg/ffprobe \
    --verbose \
    --use-vmaf \
    --point=00:00:03.01 \
    --seconds=2 \
    --presets=8,12 \
    --crfs=40,50,60 \
    s22.8k.24fps.mp4
|   crf |   filesize |   preset |   realtime |   vmaf |
|-------+------------+----------+------------+--------|
|    40 |  3,297,547 |        8 |      0.181 | 96.584 |
|    40 |  3,783,168 |       12 |      0.271 | 96.030 |
|    50 |  1,394,154 |        8 |      0.187 | 87.954 |
|    50 |  1,651,516 |       12 |      0.285 | 86.836 |
|    60 |    623,431 |        8 |      0.203 | 74.976 |
|    60 |    740,496 |       12 |      0.316 | 72.693 |

The spread in VMAF values is even more dramatic in this example. Even worse, preset 12 produces larger files with lower VMAF scores than its preset 8 counterpart.

Thank you for taking the time to read this post. I offer both consulting and hands-on development services to clients in North America and Europe. If you'd like to discuss how my offerings can help your business please contact me via LinkedIn.

Copyright © 2014 - 2024 Mark Litwintschik. This site's template is based off a template by Giulio Fidente.