Versatile Video Coding

Versatile Video Coding (VVC), also known as H.266, is a video encoding standard. It's a successor to H.265 and H.264, the latter of which is still extremely popular among apps and streaming sites delivering video to their end users. H.265 is able to produce videos half the size of H.264 files but VVC takes this further by producing files 25-50% smaller than their H.265 counterparts.

H.264 is able to produce files as large as 4K (4096 x 2160-pixels) and H.265 expanded this to 8K. VVC maxes out at 16K and even has support for 360-degree videos.

There were seven major aspects of H.265 encoding that were expanded and enhanced for VVC. More coding units were added to its partitioning, the inter-prediction system had its algorithm count doubled, more transforms, much more sophisticated entropy coding and double the number of loop filters.

The major drawback VVC has to H.265 is that videos can take ~6-7x longer to encode and demand 1.5x more compute to decode. TVs and Smartphones are powerful enough that the decoding overhead shouldn't be an issue when hardware acceleration support is delivered in the future. The longer encoding times shouldn't be a concern as a video that could be viewed millions or billions of times only needs to be encoded once. For short-form videos like the kind you find on TikTok, a 10-year-old computer can encode 5-10 seconds a video per minute on VVC's medium preset which is still within acceptable bounds for anyone posting to social media.

In this post, I'll walk through encoding VVC video using FFmpeg and MP4 containers. I'll also walk through the resulting MP4 files with some Python-based tooling.

Installing Prerequisites

The following was run on Ubuntu for Windows which is based on Ubuntu 20.04 LTS. The system is powered by an Intel Core i5 4670K running at 3.40 GHz and has 32 GB of RAM.

I'll install some build tools, Python and some file manipulation utilities used throughout this post. The version of libx264-dev that ships with Ubuntu 20 LTS is from 2019 but it's new enough that I'm not going to bother compiling the latest version.

$ sudo apt update
$ sudo apt install \
    build-essential \
    jq \
    libx264-dev \
    python3-pip \
    python3-virtualenv \
    pigz

I'll set up a Python Virtual Environment which will be used to render charts.

$ virtualenv ~/.ffmpeg/bin/activate
$ source ~/.ffmpeg/bin/activate
$ pip install \
    matplotlib \
    pandas

The following is an MP4 analysis library written in Python.

$ git clone https://github.com/essential61/mp4analyser/ ~/mp4analyser

I'll be using a machine learning-based tool called Video Multi-method Assessment Fusion (VMAF) from Netflix which is used to predict subjective video quality. The GitHub repo contains models and source code but I'll also download a pre-compiled binary from their repository in order to avoid having to compile the tool myself.

$ git clone https://github.com/Netflix/vmaf ~/vmaf
$ wget -O ~/vmaf_bin 'https://github.com/Netflix/vmaf/releases/download/v2.3.1/vmaf'

I'll be using CMake v3.20 to build the VVC libraries. Ubuntu for Windows and Ubuntu 20 LTS ship with CMake v3.16. I need v3.20 for other projects so I've used it in this post as well.

$ cd ~/
$ wget -c https://github.com/Kitware/CMake/releases/download/v3.20.0/cmake-3.20.0.tar.gz
$ tar -xzf cmake-3.20.0.tar.gz
$ cd cmake-3.20.0
$ ./bootstrap --parallel=$(nproc)
$ make -j$(nproc)
$ sudo make install

I'll build and install the VVC encoding library Fraunhofer has published on GitHub. The repo is 4 years old with version 1.9.1 released last month. It contains 282K lines of C++. Adam Wieckowski, the Head of Video Coding Systems Group at Fraunhofer Heinrich Hertz Institute has the leading number of commits to this repository.

$ git clone https://github.com/fraunhoferhhi/vvenc/ ~/vvenc
$ cd ~/vvenc
$ cmake -S . \
        -B build/release-shared \
        -DCMAKE_INSTALL_PREFIX=/usr/local \
        -DCMAKE_BUILD_TYPE=Release \
        -DBUILD_SHARED_LIBS=1
$ cmake --build build/release-shared -j
$ sudo cmake --build build/release-shared --target install

This is the counterpart decoding library for VVC. It's made up of 226K lines of C++ with Adam producing half of the commits.

$ git clone https://github.com/fraunhoferhhi/vvdec/ ~/vvdec
$ cd ~/vvdec
$ cmake -S . \
        -B build/release-shared \
        -DCMAKE_INSTALL_PREFIX=/usr/local \
        -DCMAKE_BUILD_TYPE=Release \
        -DBUILD_SHARED_LIBS=1
$ cmake --build build/release-shared -j
$ sudo cmake --build build/release-shared --target install

I'll clone FFmpeg's repository and make sure it's set to a specific commit before applying a 12K-line VVC support patch. Once that's in place, I'll compile FFmpeg with H.264 and VVC support. This codec list is very minimalist and it's likely that you'll need to expand it if your source media comes from a diverse set of encoders.

$ git clone https://git.ffmpeg.org/ffmpeg.git ~/ffmpeg
$ cd ~/ffmpeg
$ git checkout 9413bdc381

$ wget -O Add-support-for-H266-VVC.patch \
    https://patchwork.ffmpeg.org/series/8577/mbox/
$ git apply --check Add-support-for-H266-VVC.patch
$ git apply Add-support-for-H266-VVC.patch

$ ./configure \
        --enable-pthreads \
        --enable-pic \
        --enable-shared \
        --enable-rpath \
        --arch=amd64 \
        --enable-demuxer=dash \
        --enable-libxml2 \
        --enable-libvvdec \
        --enable-libvvenc \
        --enable-libx264 \
        --enable-gpl
$ make -j$(nproc)
$ sudo make install

The following will confirm that VVC support is present in FFmpeg.

$ ffmpeg -hide_banner -codecs | grep -i vvc

DEV.L. vvc                  H.266 / VVC (Versatile Video Coding) (decoders: libvvdec ) (encoders: libvvenc )

If you load up the full help and scroll down a bit, you'll see the encoder options available for VVC.

$ ffmpeg -h full | less -S

libvvenc-vvc encoder AVOptions:
  -preset            <int>        E..V....... set encoding preset(0: faster - 4: slower (from 0 to 4) (default medium)
     faster          0            E..V....... 0
     fast            1            E..V....... 1
     medium          2            E..V....... 2
     slow            3            E..V....... 3
     slower          4            E..V....... 4
  -qp                <int>        E..V....... set quantization (from 0 to 63) (default 32)
  -period            <int>        E..V....... set (intra) refresh period in seconds (from 1 to INT_MAX) (default 1)
  -subjopt           <boolean>    E..V....... set subjective (perceptually motivated) optimization (default true)
  -vvenc-params      <dictionary> E..V....... set the vvenc configuration using a :-separated list of key=value parameters
  -levelidc          <int>        E..V....... vvc level_idc (from 0 to 105) (default 0)
     0               0            E..V....... auto
     1               16           E..V....... 1
     2               32           E..V....... 2
     2.1             35           E..V....... 2.1
     3               48           E..V....... 3
     3.1             51           E..V....... 3.1
     4               64           E..V....... 4
     4.1             67           E..V....... 4.1
     5               80           E..V....... 5
     5.1             83           E..V....... 5.1
     5.2             86           E..V....... 5.2
     6               96           E..V....... 6
     6.1             99           E..V....... 6.1
     6.2             102          E..V....... 6.2
     6.3             105          E..V....... 6.3
  -tier              <int>        E..V....... set vvc tier (from 0 to 1) (default main)
     main            0            E..V....... main
     high            1            E..V....... high

Encoding VVC with FFmpeg

The following will encode a video using VVC for the video track and retain the audio encoding from the source material into an MP4 container using a two-pass method. The first pass will produce a statistics file that will be used to aid the second pass.

The source MP4 file I'll be using is a 913 KB, 10-second, vertical video of someone being served dinner. The video track is 540x960-pixels, contains 304 frames and was encoded with libx264 at a bitrate of 696 kb/s. The audio track is AAC-encoded, 44,100 Hz and in stereo.

$ ffmpeg -i ~/source.mp4 \
         -map 0:v \
         -c:v libvvenc \
         -preset medium \
         -vvenc-params passes=2:pass=1:rcstatsfile=stats.json \
         -f null \
         /dev/null
$ ffmpeg -i ~/source.mp4 \
         -map 0:v \
         -c:v libvvenc \
         -preset medium \
         -vvenc-params passes=2:pass=2:rcstatsfile=stats.json \
         -map 0:a \
         -c:a copy \
         ~/vvc.mp4

It's very early days for finding golden encoding settings for various applications and deliveries but so far I've found that doubling the target bitrate didn't double the encoding time or the final file size when using the medium preset. These were the encoding times and file sizes from six different target bitrates.

kb/s rate |                encoding time |     bytes
----------|------------------------------|----------
      100 |           94s                |   181,214
      200 |          143s (1.52x longer) |   307,144 (1.7x larger with double the bitrate)
      400 |          198s (1.38x longer) |   555,543 (1.8x larger with double the bitrate)
      800 |          260s (1.31x longer) | 1,034,430 (1.86x larger with double the bitrate)
     1200 |          359s (1.38x longer) | 1,505,684 (1.45x larger with double the bitrate)
     1500 |          398s (1.1x longer)  | 1,830,187 (1.2x larger with double the bitrate)

Analysing the VVC Encoding Stats

I'll use Pandas and Matplotlib to render the telemetry from the VVC encoding stats file produced above.

$ python3

import json

import pandas as pd
import matplotlib.pyplot as plt


def render_chart(json_stats:str, output:str):
    keys = ('gopNum',
            'numBits',
            'poc',
            'psnrY',
            'qp',
            'visActY')

    min_maxs = {}

    for line in open(json_stats):
        rec = json.loads(line)

        for key in keys:
            if 'lambda' not in rec.keys():
                continue

            if key not in min_maxs.keys():
                min_maxs[key] = [rec[key], rec[key]]
            else:
                min_maxs[key][0] = min(min_maxs[key][0], rec[key])
                min_maxs[key][1] = max(min_maxs[key][1], rec[key])

    numerics = {}

    for line in open(json_stats):
        rec = json.loads(line)

        if 'lambda' not in rec.keys():
            continue

        for key in keys:
            if key not in numerics.keys():
                numerics[key] = []

            numerics[key].append(rec[key] / min_maxs[key][1]
                                 if min_maxs[key][1]
                                 else rec[key])

    df = pd.DataFrame(numerics)

    plot = df.plot.line(figsize=(15, 10))
    fig = plot.get_figure()
    fig.savefig(output, format='png', dpi=300)


render_chart(json_stats='stats.json',
             output='vvc_encoding_stats.png')

VVC supports group of pictures (GOP) resolution switching. In short, you can see there is a large payload produced at the start of any new GOP.

MP4 File Analysis

An MP4 file is a tree-like container. It's not too dissimilar to an HTML file in that it contains tags that classify all the different content within. These tags are referred to as atoms or boxes and they're represented with four ASCII characters. Below are a few example tags that were defined in ISO/IEC 14496-12:2004 and 15444-12:2004.

moov: metadata container.
mdat: media data container.
trak: individual track or stream container.
mdia: track media information container.
minf: media information container.
stbl: sample table box, container for the time/space map.
stsd: sample descriptions (codec types, etc..)

There are also user data atoms like albm for album titles and QuickTime-originating atoms from Apple like cprt for copyright information.

I'll use a Python package called mp4analyser to explore an MP4 file containing a VVC-encoded video track and an AAC-encoded audio track.

$ cd ~/mp4analyser/src
$ python3

import re

import pandas as pd

from mp4analyser.iso import Mp4File


def get_node(x, selector):
    selector = selector.split('.') \
               if type(selector) is str \
               else selector
    selector = list(reversed(selector))
    atom_type, index_num = selector.pop(), 0
    selector = list(reversed(selector))

    if '[' in atom_type:
        atom_type, index_num = atom_type.strip(']').split('[')
        index_num = int(index_num)

    match_count = 0

    for child in x.children:
        if child.type == atom_type:
            match_count = match_count + 1

            if match_count > index_num or index_num is None:
                if selector:
                    return get_node(child, selector)
                else:
                    return child


def chars_in_hex(b):
    return re.sub('\__+',
                  '_',
                  ''.join([chr(y)
                           if int(y) > 32 and int(y) < 128
                           else '_'
                           for y in b])).strip('_')


movie = Mp4File('/home/mark/vvc.mp4')

I've built a CSS selector-like system where I can give a path of atoms for the library to crawl through. Below I'll fetch the first track type.

print(get_node(movie, 'moov.trak[0].mdia.hdlr').box_info)

{'handler_type': 'vide', 'name': 'VideoHandler'}

This will fetch the second track type.

print(get_node(movie, 'moov.trak[1].mdia.hdlr').box_info)

{'handler_type': 'soun', 'name': 'SoundHandler'}

Below I'll confirm the first track has been encoded using VVC.

vvc1_node = get_node(movie, 'moov.trak[0].mdia.minf.stbl.stsd.vvc1')
print(vvc1_node.type)

vvc1

I'll print out the byte stream from this metadata node. Any non-ASCII characters will be replaced by underscores and any double underscores will be replaced by a single underscore. Below you can see the libvvenc signature at the beginning of the resulting string.

print(chars_in_hex(vvc1_node.get_bytes()))

vvc1_H_H_Lavc60.7.101_libvvenc_>vvcC_e_0_y_0_j_stF_i_;_X_A_B_^_V_jK$E_I_d_Q_"_B_j_I_I_RB_"_DB_b!_u_H_!_1_(_XB_D_d_5_0A_@_@T_@_4_d_@@Y_4_@_,_!_D_4_$_d_r_Aa_B_$_._l_n6J_@@_fiel_colrnclx_pasp_btrt_n_n

I'll plot out how many bytes each track uses throughout the video. As you can see, the audio track largely needs the same amount of data throughout the video but the VVC video uses GOPs so there is a burst of data produced every time there is a new GOP.

def get_chunk_sizes(sample_list:list, track_id:int=1):
    return [(y['chunk_ID'],
             sum([y['chunk_samples'][z]['size']
                  for z in range(0, y['samples_per_chunk'])]))
            for y in sample_list
            if y['track_ID'] == track_id]


sample_list = get_node(movie, 'mdat').sample_list

df = pd.DataFrame({
       'track_1': [x for _, x in get_chunk_sizes(sample_list, 1)],
       'track_2': [x for _, x in get_chunk_sizes(sample_list, 2)]},
   index=[x for x, _ in get_chunk_sizes(sample_list, 1)])

plot = df.plot.line(figsize=(15, 10))
fig = plot.get_figure()
fig.savefig('track_bitrates.png', format='png', dpi=300)

The above library also contains a Python-based GUI tool that works well on Windows. If you have Python installed on Windows (not just via WSL) and launch cmd.exe you can run the following and a GUI-based application will launch. You can open MP4 and MKV files and traverse their atoms.

cd mp4analyser\src
python3 mp4analyser.py

Metadata Overhead

You shouldn't expect a file to get substantially smaller when re-compressed by the same encoder. If that were the case, the encoding scheme would leave a lot of easily identifiable repeating content on the table. Below is an example where the resulting compressed file actually gets a few bytes bigger after each re-compression step.

$ cat test.mp4 | pigz -9 | wc -c                     # 1,824,221 bytes
$ cat test.mp4 | pigz -9 | pigz -9 | wc -c           # 1,824,788 bytes
$ cat test.mp4 | pigz -9 | pigz -9 | pigz -9 | wc -c # 1,825,366 bytes

There aren't many uncompressed strings in the MP4 file first produced in this post.

$ strings -n10 ~/vvc.mp4

isomiso2mp41
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ^
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ^
*rcM%ErMF
/M7|43,7BC
4yq&M/N$RM
VideoHandler
Lavc60.7.101 libvvenc
SoundHandler
Lavf60.4.100
vid:v10044g50000ckhr6pjc77u508a0mg70

But if I target a lower bitrate the MP4 container metadata ends up making up a good chunk of the file and either isn't compressed at all or is using an extremely light encoding. Below I'll GZIP-compress the resulting MP4 file and reduce its size by ~15%.

$ cat ~/vvc.mp4 | wc -c # 87,513 bytes
$ cat ~/vvc.mp4 | pigz -9 | wc -c # 74,110 bytes

I'll break the MP4 file up into 5,000-byte files and see how well each of those 18 files compresses individually.

$ cat ~/vvc.mp4 | split --numeric-suffixes \
                        --bytes=5000 \
                        - \
                        part_

$ for FILENAME in part_*; do
    echo $FILENAME, \
         `wc -c $FILENAME | cut -d' ' -f1`, \
         `cat $FILENAME | pigz -9 | wc -c | cut -d' ' -f2`
  done

The columns listed below are the filename, original size in bytes and number of bytes after being GZIP'ed.

part_00, 5000, 4433
part_01, 5000, 4789
part_02, 5000, 4916
part_03, 5000, 4937
part_04, 5000, 4832
part_05, 5000, 4916
part_06, 5000, 4851
part_07, 5000, 4916
part_08, 5000, 4924
part_09, 5000, 4770
part_10, 5000, 4940
part_11, 5000, 4897
part_12, 5000, 4796
part_13, 5000, 4964
part_14, 5000, 4834
part_15, 5000, 3098
part_16, 5000, 1675
part_17, 2513, 1424

The parts containing the VVC and AAC encoded media aren't deflated by much but towards the end, the MP4 metadata achieves decent compression ratios. This wouldn't matter on a large movie but so much content consumed online these days are TikTok-style short videos. These videos could conceivably get delivered in VVC in the coming years and with an MP4 container, these files could potentially shave ~15% off their payload simply by being GZIP-compressed before delivery.

I'll use hexdump to examine the file where the metadata section starts. Repeating lines are replaced with a single asterisk by hexdump so there are 314 lines in total to be seen. The first 100 or so don't have any obvious patterns and are probably pretty efficiently stored but as soon as the moov atom starts there is a lot of repeating data that GZIP is able to capitalise on.

$ hexdump -C part_15 | wc -l # 314
$ hexdump -C part_15 | less -S

00000820  f3 da 53 4b e8 44 a2 5e  d6 fb 5f 95 62 3e d6 eb  |..SK.D.^.._.b>..|
00000830  f6 dd 4d ca b6 e1 9d 05  5d 30 32 b1 09 a3 b3 03  |..M.....]02.....|
00000840  b2 12 b3 91 a1 a1 f9 39  18 ca b4 79 f7 7f 67 35  |.......9...y..g5|
00000850  60 a5 2d 44 64 02 47 9d  da 08 25 0a b9 3c 8c 7b  |`.-Dd.G...%..<.{|
00000860  6b f6 b5 63 c6 cb ee 25  7a 42 9e 1b 1c b7 31 e8  |k..c...%zB....1.|
00000870  b5 65 72 12 47 8e 77 0e  9b 79 15 e0 a2 f0 a8 08  |.er.G.w..y......|
00000880  7f 8f dc 3c 11 79 a6 c3  91 4c 95 a3 9f c9 b7 f3  |...<.y...L......|
00000890  c9 f2 dd ba a4 12 f8 e8  88 06 f0 1d ea 40 02 28  |.............@.(|
000008a0  73 6b 79 6d f9 fc 0e 25  41 00 0e 01 1a 34 ad 30  |skym...%A....4.0|
000008b0  f7 10 bc 00 25 e2 5b 47  3b 68 e3 4c 2d 65 63 56  |....%.[G;h.L-ecV|
000008c0  60 8c 67 e0 01 e3 e2 88  84 27 92 91 19 68 10 fa  |`.g......'...h..|
000008d0  37 c5 76 f7 f9 ba 4c 89  95 62 9a d2 11 22 87 39  |7.v...L..b...".9|
000008e0  61 d9 fc 89 10 40 80 d9  da eb f9 38 a6 45 99 5b  |a....@.....8.E.[|
000008f0  a7 c2 cc d5 e4 d0 db 46  0b 24 b8 20 68 a3 ad 96  |.......F.$. h...|
00000900  5e 68 a3 b9 ce 0b 24 a4  50 8e 89 92 3b 21 72 cd  |^h....$.P...;!r.|
00000910  75 6d 73 52 4a 55 45 49  f7 ba 9a 43 91 b1 64 1e  |umsRJUEI...C..d.|
00000920  18 60 c5 64 e5 08 99 15  91 6a 47 c5 e8 06 1c 8a  |.`.d.....jG.....|
00000930  80 80 19 30 e5 9c d4 c8  9c 2a a3 25 2a 5b 14 72  |...0.....*.%*[.r|
00000940  2a 11 5c 4c c7 73 1a d8  80 0a 66 00 37 80 e8 31  |*.\L.s....f.7..1|
00000950  54 dc ea bb bf a1 96 e2  ee 23 81 18 00 70 00 00  |T........#...p..|
00000960  27 83 6d 6f 6f 76 00 00  00 6c 6d 76 68 64 00 00  |'.moov...lmvhd..|
00000970  00 00 00 00 00 00 00 00  00 00 00 00 03 e8 00 00  |................|
00000980  27 a4 00 01 00 00 01 00  00 00 00 00 00 00 00 00  |'...............|
00000990  00 00 00 01 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000009b0  00 00 40 00 00 00 00 00  00 00 00 00 00 00 00 00  |..@.............|
000009c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000009d0  00 03 00 00 1d 4e 74 72  61 6b 00 00 00 5c 74 6b  |.....Ntrak...\tk|
000009e0  68 64 00 00 00 03 00 00  00 00 00 00 00 00 00 00  |hd..............|
000009f0  00 01 00 00 00 00 00 00  27 a0 00 00 00 00 00 00  |........'.......|
00000a00  00 00 00 00 00 00 00 00  00 00 00 01 00 00 00 00  |................|
*
00000a20  00 00 00 00 00 00 00 00  00 00 40 00 00 00 02 1c  |..........@.....|
00000a30  00 00 03 c0 00 00 00 00  00 24 65 64 74 73 00 00  |.........$edts..|
00000a40  00 1c 65 6c 73 74 00 00  00 00 00 00 00 01 00 00  |..elst..........|
00000a50  27 a0 00 00 07 d0 00 01  00 00 00 00 1c c6 6d 64  |'.............md|
00000a60  69 61 00 00 00 20 6d 64  68 64 00 00 00 00 00 00  |ia... mdhd......|
00000a70  00 00 00 00 00 00 00 00  2e d4 00 01 db 00 55 c4  |..............U.|

The metadata section for this file is 10,115 bytes so it ends up being a decent chunk of the overall MP4 file.

len(get_node(movie, 'moov').get_bytes()) # 10,115 bytes

The good thing about GZIP is that it's supported by HTTP and pretty much every CDN supports GZIP compression. I know some servers turn off GZIP compression for binary files like MP4s but it might be worth keeping in mind this might not be optimal when the world begins to adopt MP4s with VVC-encoded video in the future.

Netflix's Video Quality Model

Netflix has a tool called Video Multi-Method Assessment Fusion (VMAF) that can predict quality loss due to compression on a frame-by-frame basis. Others have used this tool to show VVC slightly ahead of H.265 and AV1 in terms of 4K encoding quality.

Below I'll produce raw video feeds of both the source material I used at the beginning of this post as well as a VVC-encoded test.

$ ffmpeg -i ~/source.mp4 \
         -c:v rawvideo \
         -pix_fmt yuv420p10le \
         ~/source.yuv
$ ffmpeg -i ~/test.mp4 \
         -c:v rawvideo \
         -pix_fmt yuv420p10le \
         ~/test.yuv

VMAF needs to know the height, width, pixel format and bit rate of the raw files. There are multiple models to choose from so it's good to note which you're testing against.

$ ~/vmaf_bin -r ~/source.yuv \
             -d ~/test_MjU5NjQ4MzhfMzdjOWQ1NDc_99.yuv \
             -w 540 \
             -h 960 \
             -b 10 \
             -p 420 \
             –threads 4 \
             -m path=/home/mark/vmaf/model/vmaf_v0.6.1.json \
             --json \
             -o res.json

Below is an example report record from VMAF.

$ jq -S .frames[0] res.json

{
  "frameNum": 0,
  "metrics": {
    "integer_adm2": 0.950637,
    "integer_adm_scale0": 0.921274,
    "integer_adm_scale1": 0.924992,
    "integer_adm_scale2": 0.946868,
    "integer_adm_scale3": 0.974165,
    "integer_motion": 0,
    "integer_motion2": 0,
    "integer_vif_scale0": 0.471398,
    "integer_vif_scale1": 0.782225,
    "integer_vif_scale2": 0.862763,
    "integer_vif_scale3": 0.910276,
    "vmaf": 74.663332
  }
}

I'll plot the above report onto a line chart.

$ python3

import json

import pandas as pd
import matplotlib.pyplot as plt


def render_chart(json_stats:str, output:str):
    keys = [
        "integer_adm2",
        "integer_adm_scale0",
        "integer_adm_scale1",
        "integer_adm_scale2",
        "integer_adm_scale3",
        "integer_motion",
        "integer_motion2",
        "integer_vif_scale0",
        "integer_vif_scale1",
        "integer_vif_scale2",
        "integer_vif_scale3",
        "vmaf"]

    min_maxs = {}

    frames = json.loads(open(json_stats).read())['frames']

    for frame in frames:
        rec = frame['metrics']

        for key in keys:
            if key not in rec.keys():
                continue

            if key not in min_maxs.keys():
                min_maxs[key] = [rec[key], rec[key]]
            else:
                min_maxs[key][0] = min(min_maxs[key][0], rec[key])
                min_maxs[key][1] = max(min_maxs[key][1], rec[key])

    numerics = {}

    for frame in frames:
        rec = frame['metrics']

        for key in keys:
            if key not in rec.keys():
                continue

            if key not in numerics.keys():
                numerics[key] = []

            numerics[key].append(rec[key] / min_maxs[key][1]
                                 if min_maxs[key][1]
                                 else rec[key])

    df = pd.DataFrame(numerics)

    plot = df.plot.line(figsize=(15, 10))
    fig = plot.get_figure()
    fig.savefig(output, format='png', dpi=300)


render_chart('res.json', 'vmaf.png')

Watching VVC Videos

TV and Smartphone support for VVC is pretty much non-existent at the moment but surprisingly, VLC and other free desktop media players aren't shipping with support in their stable binary releases either. VVCEasy produces VVC-enabled builds for VVC ecosystem tools like the vvenc library, FFmpeg, VLC and MPV.

I was able to launch these using cmd.exe. Below I'll VVC encode a raw video using the fast preset.

cd C:\Users\mark\Downloads\WindowsVVC\x64
vvencapp.exe ^
    --size 540x960 ^
    --framerate 30 ^
    --preset fast ^
    -i source.yuv ^
    --output=vvc.bit

vvencapp: Fraunhofer VVC Encoder ver. 1.8.0 [Windows][VS 1916][64 bit][SIMD=AVX2]
vvenc [info]: Input File                             : c:\users\mark\downloads\windowsvvc\x64\source.yuv
vvenc [info]: Bitstream File                         : vvc.bit
vvenc [info]: Real Format                            : 540x960  yuv420p  30 Hz  SDR  304 frames
vvenc [info]: Frames                                 : encode 304 frames
vvenc [info]: Internal Format                        : 544x960 30Hz SDR
vvenc [info]: Rate Control                           : QP 32
vvenc [info]: Percept QPA                            : Enabled
vvenc [info]: Intra period (Keyframe)                : 32
vvenc [info]: Decoding refresh type                  : CRA

vvenc [info]: stats:   9.9% frame=  30/304 fps=  4.9 avg_fps=  4.9 bitrate=  888.37 kbps avg_bitrate=  888.37 kbps elapsed= 00h:00m:07s left= 00h:00m:56s
...

I'll then convert the VVC-encoded video back into a raw file and use FFmpeg's ffplay.exe to watch it on my desktop. Given it's a raw video file, I'll need to tell ffplay.exe the height and width of the video.

vvdecapp.exe -b vvc.bit --y4m -o from_vvc.y4m
ffplay.exe -x 540 -y 960 from_vvc.y4m

Thank you for taking the time to read this post. I offer both consulting and hands-on development services to clients in North America and Europe. If you'd like to discuss how my offerings can help your business please contact me via LinkedIn.