概要 製品一覧 サポート

Improvements of Mobile Real-time EVC Decoder and Player

image

 

  By Olga Krovyakova - May, 26 2021

Abstract

This article describes the improvements over a Stage 1 of the Real-time Mobile Video Player Project from October 2020, that implemented an early version of the player, capable of  EVC bitstreams real-time playback. The Player is based on ETM reference software version 6.1 and optimized for ARM architecture. The Player is developed and tested on Huawei P40 Pro smartphone and demonstrates a real-time playback at 24 FPS on 1080p test files with a subset of EVC Main profile tools.

1.Results

Summarizing the first stage of the solution had been presented that allows a real-time playback of the encoded sequences with the following restrictions: 

  • Quantization parameter value: 32; 
  • Coding tools set: Affine, DMVR, HTDF, ADMVP, ADCC, AMVR, ATS, IQT, CM_INIT, ADDB, DBF, HMVP, MMVD, POCS, RPL (Stage1 set). 

Presented EVC Player demonstrates 25.07 FPS for the “ParkScene” and 26.02 FPS for the “Kimono1” sequences.

The results of the optimizations, made during the Stage 2 show that BTT tool can be safely added to the tools set and the QP can be reduced to 27 and this will increase the image quality with the conditions:

  • Quantization parameter value: 27;
  • Coding tools set: Affine, DMVR, HTDF, ADMVP, ADCC, AMVR, ATS, IQT, CM_INIT, ADDB, DBF, HMVP, MMVD, POCS, RPL, BTT (Further referred as Stage2 set).

 

Table 1 - Average maximal playback speed achieved with the selected toolset on the Test Mobile Device for QP=27 

Test sequence

Decoding speed of the

Stage 1 decoder (fps)

Decoding speed of the

Stage 2 decoder (fps)

Performance gain over the

Stage 1 decoder (%)

ParkScene

21.36

32.99

53,2

Kimono1

22.85

35.02

54,4

As a result of the optimizations, the decoder shows 53-54% of the performance gain in the EVC player application, designed for Huawei P40 Pro smartphone. 

 

Table 2. Selected toolset for the real-time player improvement

Tool short name

Tool full name

ETM6.1 CTC RA

default configuration

Selected toolset, stage 1

Selected toolset, stage  2

ADMVP

Advanced Motion Vectors Prediction

1

1

1

AFFINE

Affine prediction

1

1

1

HTDF

Hadamard Transform Domain Filter

1

1

1

DMVR

Decoder side Motion Vectors Derivation

1

1

1

ADCC

Advanced Coefficients Coding

1

1

1

ADDB

Advanced Deblocking

1

1

1

ALF

Adaptive Loop Filter

1

0

0

AMVR

Adaptive Motion Vectors resolution

1

1

1

ATS

Adaptive Transforms Selection

1

1

1

BTT

Binary and Ternary Trees

1

0

1

CM_INIT

Context Modeling Initialization

1

1

1

DBF

Deblocking Filter

1

1

1

EIPD

Enhanced Intra Prediction Directions

1

0

0

HMVP

History Motion vectors Prediction

1

1

1

iQT

Advanced Quantization and Transforms

1

1

1

MMVD

Merge with Motion Vectors Difference

1

1

1

POCS

Advanced Picture Order Count

1

1

1

RPL

Reference Picture List

1

1

1

SUCO

Split Unit Coding Order

1

0

0

IBC

Intra Block Copy

0

0

0

2. Supported EVC tools

The BTT tool is considered having the greatest potential tool so we perform tests for the BTT tool additionally enabled to the Stage1 set (Table 2). 

The following Figures 1-3 show EVC encoding, decoding performance and PSNR values for the files ParkScene and Kimono1 encoded with QP values 27-32 with and without BTT64 tool. For convenience, figures contain combined values measured for streams without BTT64 and with BTT64 accordingly.

image

Figure 1. ETM 6.1 encoding performance on PC with i7-9700 CPU operated by Windows 10 x64 with and without BTT64 tool enabled

image

Figure 2. Stage 1 decoder performance on HUAWEI P40 Pro operated by EMUI 10.1 with and without BTT64 tool enabled

image

Figure 3. PSNR values of files encoded by ETM 6.1 encoder with and without BTT64 tool enabled.

 

Figure 3 shows that enabling BTT64 does not reduce the decoder performance significantly, and also improves the output file’s quality. So we can safely enable it as an extra EVC tool.

To stick to the decoding concept proposed in the Stage 1, it is proposed to enable 64x64 CU size for the BTT (further referred as BTT64 for the convenience). 

In order to achieve real-time playback speed on the target device it is important to know EVC tools profiling information while decoding. Such information was collected by the Android Profiler on the test decoded with Stage 1 decoder. Figure 4 and Figure 5 demonstrate obtained profiling data on the target device for the ParkScene and Kimono1 bitstreams encoded with the Stage 2 toolset and the QP value equal to 27.

image

Figure 4. Decoder profiling for the ParkScene bitstream encoded with the Stage 2 toolset (QP=27)

image

Figure 5. Decoder profiling for the Kimono1 bitstream encoded with the Stage 2 toolset (QP=27)

The most time consuming functions are related to the DMVR, Motion Compensation (MC), Deblocking and Reconstruction. These tools consume about 77% of the decoding time and are the best candidates for the SIMD optimization. 

3. Implementation details and playback

In order to achieve real-time playback on the selected toolset the following main modifications were performed on top of the ETM 6.1 reference SW:

  1. ARM SIMD implementation of the most critical functions in Deblocking and Reconstruction parts
  2. Waterfront-like parallel processing (WPP) for deblocking and decoding processes, thread pull and line based startup

As a preparation to the multithreaded functionality optimizations there were some additional refactoring and optimizations in functions related to deblocking. After finalizing multithreaded implementation, the decoder’s FPS gain over the Stage 1 decoder is 40% for the Kimono1 and 42% for the ParkScene sequences, and the Player is capable of demonstrating 24 fps playback on the device. 

In order to check playback speed objectively and subjectively the Player was deployed and tested on Huawei P40 Pro smartphone.

Figure 6 demonstrates a picture of the Player during Kimono bitstream playback on the device.

image

Figure 6. Picture of the Player working on Huawei P40 Pro

4. Profiling of the optimized decoder

image

Figure 7. Optimized decoder profiling for the ParkScene bitstream encoded with the Stage 2 toolset (QP=27)

image

Figure 8. Optimized decoder profiling for the Kimono1 bitstream encoded with the Stage 2 toolset (QP=27)

5. CPU Utilization

The decoder is optimized for Huawei P40 Pro mobile phone in terms of CPU cores usage, more specifically the SW works only with 4 most powerful Kirin 990 CPU cores (hi-end and mid-end). Figure 9 and Figure 10 display CPU Utilization percentage for each of the 4 powerful cores during the decoding process.

image

(a) 

 image

(b)

Figure 9. Hi-end and mid-end CPU cores utilization. Kimono playback, (a) Stage 1, (b) Stage 2

image(a)

image

(b)

Figure 10. Hi-end and mid-end CPU cores utilization. ParkScene playback, (a) Stage 1, (b) Stage 2

 

Figures 9-10 show that hi- and mid-end cores are used more effectively by the Stage 2 optimized decoder.

6. Power Consumption

In order to estimate power consumption, an infinite playback loop of test bitstreams was launched in the Player at 100% charged Test Mobile Device and the process was working until the device switched off due to energy insufficiency. As a result, the Player was working for 4h 25m with the Kimono1 file and 4h 07m with the ParkScene file with 24 fps speed. Figure 11 and 12 summarize obtained results.

image

(a)

image

(b)

Figure 11. Power consumption and playback speed during infinite playback of Kimono, (a) Stage 1, (b) Stage 2

image

(a)

image

(b)

Figure 12. Power consumption and playback speed during infinite playback of ParkScene, (a) Stage 1, (b) Stage 2

 

Figures 11-12 show that the battery consumption in Stage 2 has slightly increased in comparison with the Stage 1.

7. Conclusion

Solveig Multimedia continued to make optimizations related to the WPP algorithm introduced in the Stage 1 as well as the SIMD optimizations of the functions.

As a result of the optimizations, the decoder shows 53-54% of the performance gain in the EVC player application, designed for Huawei P40 Pro smartphone. 

The optimized decoder shows 33-35 FPS for the proposed toolset, where the BTT tool is additionally included so all objectives of the Stage 2 are over-fulfilled. And it means that the QP value can be also decreased to 23. For QP=23 the decoder shows 31.52 FPS for the Kimono1 and 28.37 FPS for the Parkscene files which is enough to provide a real-time 24 FPS playback for both files.

 


Related topics:

Early Implementation of Mobile Real-time EVC Player MPEG Submission October 2020


 

 About the author

imageOlga Krovyakova is the Technical Support Manager in Solveig Multimedia since 2010.

She is the author of many text and video guidelines of company's products: Video Splitter, HyperCam, WMP Trimmer Plugin, AVI Trimmer+ and TriMP4.

She works with programs every day and therefore knows very well how they work. Сontact Olga via support@solveigmm.com if you have any questions. She will gladly assist you!