Network Working Group                                          T. Davies
Internet-Draft                                                     Cisco
Intended status: Standards Track                          March 16, 2016
Expires: September 17, 2016


               Quantisation matrices for Thor video coding
                      draft-davies-netvc-qmtx-00

Abstract

   This draft describes a family of default quantisation matrices
   that may be used to improve perceptual quality when encoding with
   Thor. Similar quantisation matrix designs may be used in most
   block-based video and image codecs.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on September 17, 2016.

Copyright Notice

   Copyright (c) 2016 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Davies                   Expires September 17, 2016              [Page 1]

Internet-Draft                   QMTX                          March 2016


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2

   2.  Definitions . . . . . . . . . . . . . . . . . . . . . . . . .   2
     2.1.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   2

   3.  Quantisation matrix design  . . . . . . . . . . . . . . . . .   3
     3.1.  The function of quantisation matrices . . . . . . . . . .   3
     3.2.  Quantisation matrices in AVC and HEVC . . . . . . . . . .   4
     3.3.  Quantisation matrices in Thor . . . . . . . . . . . . . .   4
     3.4.  Implementation . . . . . .  . . . . . . . . . . . . . . .   5

   4.  Compression performance . . . . . . . . . . . . . . . . . . .   6

   5. Informative References . . . . . . . . . . . . . . . . . . . .   7

   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .   7


1.  Introduction

   This document describes a family of default quantisation matrices
   that may be used to improve perceptual quality when encoding with
   Thor. The quantisation matrices are designed to be near-flat at high
   quantisation levels and more strongly profiled at low quantisation
   levels, to avoid ringing artefacts and better shape quantisation
   error across a whole sequence with varying quantisation levels.

2.  Definitions

2.1.  Terminology

   This document uses the following terms.

      QP: quantisation parameter

      QM: quantisation matrix

      CSF: contrast sensitivity function

      BDR: Bjontegaard Delta-Rate


Davies                   Expires September 17, 2016              [Page 2]

Internet-Draft                   QMTX                          March 2016


3.  Quantisation matrix design

3.1.  The function of quantisation matrices

   Quantisation matrices work by shaping the residual error after
   quantisation in the spatial frequency domain, usually the DCT domain.
   This is done by varying the quantisation factor applied across
   spatial frequencies in the transform block. Typically a high
   quantisation factor is applied at high spatial frequencies and a
   low one at low spatial frequencies.

   The aim is roughly to match a Contrast Sensitivity Function for the
   human visual system. This provides a curve of sensitivity to detail
   (and therefore coding errors) with spatial frequency. Given known
   resolutions and assumed viewing distances, a weighting function
   can be simply defined for all the coefficients in a transform block.

   This simple approach is complicated, however, by a number of factors.
   The first is the CSF is in reality not a simple function
   of spatial frequency, but depends on factors such as brightness
   which are imperfectly corrected for by television gammas. There is
   little that can be done about that in the quantisation matrices
   themselves, but adjusting QP itself may help.

   The second factor is that CSFs are determined experimentally based
   on models of Just Noticeable Difference (JND) and do not reflect so
   well the impact of distortions well above this level. Adjustments
   at high levels of quantisation are needed to reflect this.

   Finally, applying quantisation matrices to video is affected by the
   fact that most frames are predicted and the QM is applied to the
   residual after prediction. This means that the quantisation error
   for a block consists of the quantisation error in the reference
   block, plus any additional error introduced in the current block.
   These errors will add if they are uncorrelated, but they may well
   be correlated at high QP.

   Despite these difficulties, QMs are widely used and known to work
   well, and are available in video coding standards such as
   H264/AVC and H265/HEVC [AVC,HEVC].


Davies                   Expires September 17, 2016              [Page 3]

Internet-Draft                   QMTX                          March 2016


3.2  Quantisation matrix design in AVC and HEVC

   Quantisation matrices are available in a number of different codecs.
   The design in AVC and HEVC is to provide default matrices together
   with the ability to signal bespoke matrices [AVC,HEVC]. These
   matrices must cover all the different transform block sizes,
   components (Y, Cb, Cr) and intra and inter frame or block types,
   with fall-backs defined if bespoke matrices are not provided.
   Default inter block matrices are flatter than
   intra matrices, no doubt because of the noise-addition effect
   described in section 3.1: if they had the same profile as for intra
   then the overall profile of the combined prediction + residual could
   be over-shaped.

3.3  Design of quantisation matrices in Thor

   Thor provides a set of matrices for each component of 420-sampled
   video, for each block size and each quantisation parameter. The
   principles behind the design are as follows:

   1) QP dependence. Matrices become flatter as quantisation levels
   increase

   2) Energy preservation for intra. The inverse quantisation matrices
   for intra blocks are normalised to approximately preserve energy
   of the residual

   3) DC preservation for inter. The inverse quantisation matrices for
   inter blocks are normalised to preserve the DC level

   4) Matrices are also flatter for inter blocks than for intra blocks.

   5) Quantisation matrix strength is globally adjustable

   The QP dependence takes account of a number of factors. Firstly it
   reflects that inter blocks typically have higher QPs than the blocks
   used to predict them. This means that flattening the matrices at
   higher QP naturally prevents over-shaping the quantisation error.

   Secondly, the high-QP flattening process also reflects the fact
   that errors at this level are very visible even at high spatial
   frequencies. Strong error-shaping at these QP levels leads to very
   visible additional ringiness.

   SSIM-based metrics [SSIM,MSSSIM,FASTSSIM] indicate that preserving
   image variances and therefore residual energies is perceptually
   important. This is feasible for intra where residuals are substantial
   but in the case of inter it is also important to preserve DC levels
   since getting these wrong can produce very visible artefacts.


Davies                   Expires September 17, 2016              [Page 4]

Internet-Draft                   QMTX                          March 2016


   Intra frames tend to have lower QP than inter frames, and this means
   that QP dependence absorbs most of the requirement for inter
   matrices to be flatter than intra matrices. However inter matrices
   are still a little flatter, to take account of the different
   characteristics of intra and inter blocks within the same frame.

   In determining the quantisation matrix, there are 12 possible sets
   available giving a new set of matrix for each change of approximately
   4 in quantisation value. Thor also supports a global adjustment
   or strength parameter, which offsets the LUT mapping quantisation
   parameter to quantisation matrix set. This is a value from -32
   to 31. A value of -32 will reduce the qp used by 32, increasing
   the strength of quantisation matrix dramatically. Likewise a value
   of 31 will eliminate quantisation matrices for all but the smallest
   QPs.

   The effect of the ability to signal strength, and the provision
   of a range of QP-dependent matrices are intended to remove the need
   to signal bespoke matrices at all.

3.4  Implementation

  Quantisation matrices are applied as multiplicative factors in
  forward or inverse quantisation processes. In Thor the basic
  unweighted dequantisation process for a coefficient c with
  quantisation parameter  q is based on two values: scale[q], which
  depends only on q%6, and shift[q] which depends only on q/6,
  the block size and the signal dynamic range.  scale[q] takes care of
  quantisation step sizes which fall between powers of 2 and shift[q]
  takes care of the basic power of 2 part of the quantisation step.

  The formula for unweighted dequantisation is then:

  c -> (c*scale[q] + (1<<(shift[q]-1))) >> shift[q]                  (1)

  for positive shift[q], otherwise

  c -> (c*scale[q])<<(-shift[q])                                     (2)

  To apply a matrix M  to a coefficient c[i,j] at position
  (i,j) within a block, the formulae (1), (2) change to:

  c[i,j]->(c[i,j]*M[i,j]*scale[q]+(1<<(shift[q]+5)))>>(shift[q]+6)   (3)

  if shift[q]+6 > 0, otherwise

  c[i,j]->(c[i,j]*M[i,j]*scale[q])<<(-shift[q]-6)                    (4)

  otherwise.


Davies                   Expires September 17, 2016              [Page 5]

Internet-Draft                   QMTX                          March 2016


   Exactly complementary formulae can be derived for the forward
   quantisation process.

4.  Compression performance

   Although largely a visual tool, the effectiveness of QMs can be
   inferred by changes to PSNRHVS [PSNRHVS] and FASTSSIM metrics.
   FASTSSIM tends to over-estimate gains a little, as it has a bias
   towards low-pass filtering. Overall BDR results for the Low-Delay B (LDB)
   and High-Delay B GOP 16 configuration (HDB16) are as follows
   (QPs 22, 27, 32, 37):

   Config |    PSNR   |  PSNRHVS  | FASTSSIM  |
   --------------------------------------------
   LDB    |   +1.1%   |   -3.3%   |   -9.0%   |
   --------------------------------------------
   HDB    |   +2.2%   |   -2.6%   |  -11.6%   |
   --------------------------------------------

   These were computed on the same test sequences as in IRFVC.

   FASTSSIM and PSNRHVS gains are typically larger, and PSNR losses
   smaller, for higher resolution material.


Davies                   Expires September 17, 2016              [Page 6]

Internet-Draft                   QMTX                          March 2016


5.  Informative References

[AVC]        ITU-T Recommendation H.264, "Advanced video coding for
             generic audiovisual services", March 2010.

[HEVC]       ITU-T Recommendation H.265, "High efficiency video
             coding", April 2013.

[FASTSSIM]   Chen, M. and A. Bovik, "Fast structural similarity index
             algorithm", 2010, <http://live.ece.utexas.edu/publications
             /2011/chen_rtip_2011.pdf>.

[MSSSIM]     Wang, Z., Simoncelli, E., and A. Bovik, "Multi-Scale
             Structural Similarity for Image Quality Assessment", n.d.,
             <http://www.cns.nyu.edu/~zwang/files/papers/msssim.pdf>.

[PSNRHVS]    Egiazarian, K., Astola, J., Ponomarenko, N., Lukin, V.,
             Battisti, F., and M. Carli, "A New Full-Reference Quality
             Metrics Based on HVS", 2002.

[SSIM]       Wang, Z., Bovik, A., Sheikh, H., and E. Simoncelli, "Image
             Quality Assessment: From Error Visibility to Structural
             Similarity", 2004,
             <http://www.cns.nyu.edu/pub/eero/wang03-reprint.pdf>.

[IRFVC]      Davies, T. "Interpolated reference frames for video
             coding", IETF draft
             https://www.ietf.org/id/draft-davies-netvc-irfvc-00.txt


Author's Address:

   Thomas Davies
   Cisco
   Feltham
   UK

   Email: thdavies@cisco.com


Davies                   Expires September 17, 2016              [Page 7]

Internet-Draft                   QMTX                          March 2016