Payload Working Group J. Lennox Internet-Draft D. Hong Intended status: Standards Track Vidyo Expires: January 9, 2017 J. Uberti S. Holmer M. Flodman Google July 8, 2016 The Layer Refresh Request (LRR) RTCP Feedback Message draft-ietf-avtext-lrr-03 Abstract This memo describes the RTCP Payload-Specific Feedback Message "Layer Refresh Request" (LRR), which can be used to request a state refresh of one or more substreams of a layered media stream. It also defines its use with several RTP payloads for scalable media formats. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on January 9, 2017. Copyright Notice Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must Lennox, et al. Expires January 9, 2017 [Page 1] Internet-Draft Layer Refresh Request RTCP Feedback July 2016 include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Conventions, Definitions and Acronyms . . . . . . . . . . . . 2 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 3. Layer Refresh Request . . . . . . . . . . . . . . . . . . . . 5 3.1. Message Format . . . . . . . . . . . . . . . . . . . . . 5 3.2. Semantics . . . . . . . . . . . . . . . . . . . . . . . . 7 4. Usage with specific codecs . . . . . . . . . . . . . . . . . 7 4.1. H264 SVC . . . . . . . . . . . . . . . . . . . . . . . . 7 4.2. VP8 . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.3. H265 . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5. Usage with different scalability transmission mechanisms . . 10 6. Security Considerations . . . . . . . . . . . . . . . . . . . 11 7. SDP Definitions . . . . . . . . . . . . . . . . . . . . . . . 11 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 9.1. Normative References . . . . . . . . . . . . . . . . . . 12 9.2. Informative References . . . . . . . . . . . . . . . . . 13 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 1. Introduction This memo describes an RTCP [RFC3550] Payload-Specific Feedback Message [RFC4585] "Layer Refresh Request" (LRR). It is designed to allow a receiver of a layered media stream to request that one or more of its substreams be refreshed, such that it can then be decoded by an endpoint which previously was not receiving those layers, without requiring that the entire stream be refreshed (as it would be if the receiver sent a Full Intra Request (FIR); [RFC5104] see also [I-D.ietf-avtext-avpf-ccm-layered]). The feedback message is applicable both to temporally and spatially scaled streams, and to both single-stream and multi-stream scalability modes. 2. Conventions, Definitions and Acronyms The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Lennox, et al. Expires January 9, 2017 [Page 2] Internet-Draft Layer Refresh Request RTCP Feedback July 2016 2.1. Terminology A "Layer Refresh Point" is a point in a scalable stream after which a decoder, which previously had been able to decode only some (possibly none) of the available layers of stream, is able to decode a greater number of the layers. For spatial (or quality) layers, layer refresh typically requires that a spatial layer be encoded in a way that references only lower- layer subpictures of the current picture, not any earlier pictures of that spatial layer. Additionally, the encoder must promise that no earlier pictures of that spatial layer will be used as reference in the future. In a layer refresh, however, other layers than the ones requested for refresh may still maintain dependency on earlier content of the stream. This is the difference between a layer refresh and a Full Intra Request [RFC5104]. This minimizes the coding overhead of refresh to only those parts of the stream that actually need to be refreshed at any given time. An illustration of spatial layer refresh of an enhancement layer is shown below. ... <-- S1 <-- S1 S1 <-- S1 <-- ... | | | | \/ \/ \/ \/ ... <-- S0 <-- S0 <-- S0 <-- S0 <-- ... 1 2 3 4 In this illustration, frame 3 is a layer refresh point for spatial layer S1; a decoder which had previously only been decoding spatial layer S0 would be able to decode layer S1 starting at frame 3. Figure 1 Lennox, et al. Expires January 9, 2017 [Page 3] Internet-Draft Layer Refresh Request RTCP Feedback July 2016 An illustration of spatial layer refresh of a base layer is shown below. ... <-- S1 <-- S1 <-- S1 <-- S1 <-- ... | | | | \/ \/ \/ \/ ... <-- S0 <-- S0 S0 <-- S0 <-- ... 1 2 3 4 In this illustration, frame 3 is a layer refresh point for spatial layer S0; a decoder which had previously not been decoding the stream at all could decode layer S0 starting at frame 3. Figure 2 For temporal layers, layer refresh requires that the layer be "temporally nested", i.e. use as reference only earlier frames of a lower temporal layer, not any earlier frames of this temporal layer, and also promise that no future frames of this temporal layer will reference frames of this temporal layer before the refresh point. In many cases, the temporal structure of the stream will mean that all frames are temporally nested, in which case decoders will have no need to send LRR messages for the stream. An illustration of temporal layer refresh is shown below. ... <----- T1 <------ T1 T1 <------ ... / / / |_ |_ |_ ... <-- T0 <------ T0 <------ T0 <------ T0 <--- ... 1 2 3 4 5 6 7 In this illustration, frame 6 is a layer refresh point for temporal layer T1; a decoder which had previously only been decoding temporal layer T0 would be able to decode layer T1 starting at frame 6. Figure 3 Lennox, et al. Expires January 9, 2017 [Page 4] Internet-Draft Layer Refresh Request RTCP Feedback July 2016 An illustration of an inherently temporally nested stream is shown below. T1 T1 T1 / / / |_ |_ |_ ... <-- T0 <------ T0 <------ T0 <------ T0 <--- ... 1 2 3 4 5 6 7 In this illustration, the stream is temporally nested in its ordinary structure; a decoder receiving layer T0 can begin decoding layer T1 at any point. Figure 4 3. Layer Refresh Request A layer refresh frame can be requested by sending a Layer Refresh Request (LRR), which is an RTCP payload-specific feedback message [RFC4585] asking the encoder to encode a frame which makes it possible to upgrade to a higher layer. The LRR contains one or two tuples, indicating the layer the decoder wants to upgrade to, and (optionally) the currently highest layer the decoder can decode. The specific format of the tuples, and the mechanism by which a receiver recognizes a refresh frame, is codec-dependent. Usage for several codecs is discussed in Section 4. LRR follows the model of the Full Intra Request (FIR) [RFC5104](Section 3.5.1) for its retransmission, reliability, and use in multipoint conferences. The LRR message is identified by RTCP packet type value PT=PSFB and FMT=TBD. The FCI field MUST contain one or more LRR entries. Each entry applies to a different media sender, identified by its SSRC. 3.1. Message Format The Feedback Control Information (FCI) for the Layer Refresh Request consists of one or more FCI entries, the content of which is depicted in Figure 5. The length of the LRR feedback message MUST be set to 2+3*N, where N is the number of FCI entries. Lennox, et al. Expires January 9, 2017 [Page 5] Internet-Draft Layer Refresh Request RTCP Feedback July 2016 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Seq nr. |C| Payload Type| Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RES | TTID| TLID | RES | CTID| CLID (opt) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5 SSRC (32 bits) The SSRC value of the media sender that is requested to send a layer refresh point. Seq nr. (8 bits) Command sequence number. The sequence number space is unique for each pairing of the SSRC of command source and the SSRC of the command target. The sequence number SHALL be increased by 1 modulo 256 for each new command. A repetition SHALL NOT increase the sequence number. The initial value is arbitrary. C (1 bit) A flag bit indicating whether the "Current Layer Index" field is present in the FCI. If this bit is false, the sender of the LRR message is requesting refresh of all layers up to and including the target layer. Payload Type (7 bits) The RTP payload type for which the LRR is being requested. This gives the context in which the target layer index is to be interpreted. Reserved (RES) (16 bits / 5 bits / 5 bits) All bits SHALL be set to 0 by the sender and SHALL be ignored on reception. Target Temporal Layer ID (TTID) (3 bits) The temporal ID of the target layer for which the receiver wishes a refresh point. Target Layer ID (TLID) (8 bits) The layer ID of the target layer for which the receiver wishes a refresh point. Its format is dependent on the payload type field. Current Temporal Layer ID (CTID) (3 bits) If C is 1, the ID of the current temporal layer being decoded by the receiver. This message is not requesting refresh of layers at or below this layer. If C is 0, this field SHALL be set to 0 by the sender and SHALL be ignored on reception. Lennox, et al. Expires January 9, 2017 [Page 6] Internet-Draft Layer Refresh Request RTCP Feedback July 2016 Current Layer ID (CLID) (8 bits) If C is 1, the layer ID of the current layer being decoded by the receiver. This message is not requesting refresh of layers at or below this layer. If C is 0, this field SHALL be set to 0 by the sender and SHALL be ignored on reception. Note: the syntax of the TTID, TLID, CTID, and TLID fields are designed to match the TID and LID fields in [I-D.ietf-avtext-framemarking]. 3.2. Semantics Within the common packet header for feedback messages (as defined in section 6.1 of [RFC4585]), the "SSRC of packet sender" field indicates the source of the request, and the "SSRC of media source" is not used and SHALL be set to 0. The SSRCs of the media senders to which the LRR command applies are in the corresponding FCI entries. A LRR message MAY contain requests to multiple media senders, using one FCI entry per target media sender. Upon reception of LRR, the encoder MUST send a decoder refresh point (see section Section 2.1) as soon as possible. The sender MUST consider congestion control as outlined in section 5 of [RFC5104], which MAY restrict its ability to send a layer refresh point quickly. 4. Usage with specific codecs In order for LRR to be used with a scalable codec, the format of the target layer and current target layer fields needs to be specified for that codec's RTP packetization. New RTP packetization specifications for scalable codecs SHOULD define how this is done. (The VP9 payload [I-D.ietf-payload-vp9], for instance, has done so.) If the payload also specifies how it is used with the Frame Marking RTP Header Extension [I-D.ietf-avtext-framemarking], the syntax MUST be defined in the same manner as the TID and LID fields in that header. 4.1. H264 SVC H.264 SVC [RFC6190] defines temporal, dependency (spatial), and quality scalability modes. Lennox, et al. Expires January 9, 2017 [Page 7] Internet-Draft Layer Refresh Request RTCP Feedback July 2016 +---------------+---------------+ |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RES | TID |R| DID | QID | +---------------+---------------+ Figure 6 Figure 6 shows the format of the layer index field for H.264 SVC streams. The "R" and "RES" fields MUST be set to 0 on transmission and ignored on reception. See [RFC6190] Section 1.1.3 for details on the DID, QID, and TID fields. A dependency or quality layer refresh of a given layer in H.264 SVC can be identified by the "I" bit (idr_flag) in the extended NAL unit header, present in NAL unit types 14 (prefix NAL unit) and 20 (coded scalable slice). Layer refresh of the base layer can also be identified by its NAL unit type of its coded slices, which is "5" rather than "1". A dependency or quality layer refresh is complete once this bit has been seen on all the appropriate layers (in decoding order) above the current layer index (if any, or beginning from the base layer if not) through the target layer index. Note that as the "I" bit in a PACSI header is set if the corresponding bit is set in any of the aggregated NAL units it describes; thus, it is not sufficient to identify layer refresh when NAL units of multiple dependency or quality layers are aggregated. In H.264 SVC, temporal layer refresh information can be determined from various Supplemental Encoding Information (SEI) messages in the bitstream. Whether an H.264 SVC stream is scalably nested can be determined from the Scalability Information SEI message's temporal_id_nesting flag. If this flag is set in a stream's currently applicable Scalability Information SEI, receivers SHOULD NOT send temporal LRR messages for that stream, as every frame is implicitly a temporal layer refresh point. (The Scalability Information SEI message may also be available in the signaling negotiation of H.264 SVC, as the sprop- scalability-info parameter.) If a stream's temporal_id_nesting flag is not set, the Temporal Level Switching Point SEI message identifies temporal layer switching points. A temporal layer refresh is satisfied when this SEI message is present in a frame with the target layer index, if the message's delta_frame_num refers to a frame with the requested current layer index. (Alternately, temporal layer refresh can also be satisfied by a complete state refresh, such as an IDR.) Senders which support Lennox, et al. Expires January 9, 2017 [Page 8] Internet-Draft Layer Refresh Request RTCP Feedback July 2016 receiving LRR for non-temporally-nested streams MUST insert Temporal Level Switching Point SEI messages as appropriate. 4.2. VP8 The VP8 RTP payload format [RFC7741] defines temporal scalability modes. It does not support spatial scalability. +---------------+---------------+ |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RES | TID | RES | +---------------+---------------+ Figure 7 Figure 7 shows the format of the layer index field for VP8 streams. The "RES" fields MUST be set to 0 on transmission and be ignored on reception. See [RFC7741] Section 4.2 for details on the TID field. A VP8 layer refresh point can be identified by the presence of the "Y" bit in the VP8 payload header. When this bit is set, this and all subsequent frames depend only on the current base temporal layer. On receipt of an LRR for a VP8 stream, A sender which supports LRR MUST encode the stream so it can set the Y bit in a packet whose temporal layer is at or below the target layer index. Note that in VP8, not every layer switch point can be identified by the Y bit, since the Y bit implies layer switch of all layers, not just the layer in which it is sent. Thus the use of LRR with VP8 can result in some inefficiency in transmision. However, this is not expected to be a major issue for temporal structures in normal use. 4.3. H265 The initial version of the H.265 payload format [RFC7798] defines temporal scalability, with protocol elements reserved for spatial or other scalability modes (which are expected to be defined in a future version of the specification). +---------------+---------------+ |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RES | TID |RES| LayerId | +---------------+---------------+ Figure 8 Lennox, et al. Expires January 9, 2017 [Page 9] Internet-Draft Layer Refresh Request RTCP Feedback July 2016 Figure 8 shows the format of the layer index field for H.265 streams. The "RES" fields MUST be set to 0 on transmission and ignored on reception. See [RFC7798] Section 1.1.4 for details on the LayerId and TID fields. H.265 streams signal whether they are temporally nested, using the vps_temporal_id_nesting_flag in the Video Parameter Set (VPS), and the sps_temporal_id_nesting_flag in the Sequence Parameter Set (SPS). If this flag is set in a stream's currently applicable VPS or SPS, receivers SHOULD NOT send temporal LRR messages for that stream, as every frame is implicitly a temporal layer refresh point. If a stream's sps_temporal_id_nesting_flag is not set, the NAL unit types 2 to 5 inclusively identify temporal layer switching points. A layer refresh to any higher target temporal layer is satisfied when a NAL unit type of 4 or 5 with TID equal to 1 more than current TID is seen. Alternatively, layer refresh to a target temporal layer can be incrementally satisfied with NAL unit type of 2 or 3. In this case, given current TID = TO and target TID = TN, layer refresh to TN is satisfied when NAL unit type of 2 or 3 is seen for TID = T1, then TID = T2, all the way up to TID = TN. During this incremental process, layer refresh to TN can be completely satisfied as soon as a NAL unit type of 2 or 3 is seen. Of course, temporal layer refresh can also be satisfied whenever any Intra Random Access Point (IRAP) NAL unit type (with values 16-23, inclusively) is seen. An IRAP picture is similar to an IDR picture in H.264 (NAL unit type of 5 in H.264) where decoding of the picture can start without any older pictures. In the (future) H.265 payloads that support spatial scalability, a spatial layer refresh of a specific layer can be identified by NAL units with the requested layer ID and NAL unit types between 16 and 21 inclusive. A dependency or quality layer refresh is complete once NAL units of this type have been seen on all the appropriate layers (in decoding order) above the current layer index (if any, or beginning from the base layer if not) through the target layer index. 5. Usage with different scalability transmission mechanisms Several different mechanisms are defined for how scalable streams can be transmitted in RTP. The RTP Taxonomy [RFC7656] Section 3.7 defines three mechanisms: Single RTP Stream on a Single Media Transport (SRST), Multiple RTP Streams on a Single Media Transport (MRST), and Multiple RTP Streams on Multiple Media Transports (MRMT). The LRR message is applicable to all these mechanisms. For MRST and MRMT mechanisms, the "media source" field of the LRR FCI is set to Lennox, et al. Expires January 9, 2017 [Page 10] Internet-Draft Layer Refresh Request RTCP Feedback July 2016 the SSRC of the RTP stream containing the layer indicated by the Current Layer Index (if "C" is 1), or the stream containing the base encoded stream (if "C" is 0). For MRMT, it is sent on the RTP session on which this stream is sent. On receipt, the sender MUST refresh all the layers requested in the stream, simultaneously in decode order. 6. Security Considerations All the security considerations of FIR feedback packets [RFC5104] apply to LRR feedback packets as well. Additionally, media senders receiving LRR feedback packets MUST validate that the payload types and layer indices they are receiving are valid for the stream they are currently sending, and discard the requests if not. 7. SDP Definitions Section 7 of [RFC5104] defines SDP procedures for indicating and negotiating support for codec control messages (CCM) in SDP. This document extends this with a new codec control command, "lrr", which indicates support of the Layer Refresh Request (LRR). Figure 9 gives a formal Augmented Backus-Naur Form (ABNF) [RFC5234] showing this grammar extension, extending the grammar defined in [RFC5104]. rtcp-fb-ccm-param =/ SP "lrr" ; Layer Refresh Request Figure 9: Syntax of the "lrr" ccm The Offer-Answer considerations defined in [RFC5104] Section 7.2 apply. 8. IANA Considerations This document defines a new entry to the "Codec Control Messages" subregistry of the "Session Description Protocol (SDP) Parameters" registry, according to the following data: Value name: lrr Long name: Layer Refresh Request Command Usable with: ccm Reference: RFC XXXX Lennox, et al. Expires January 9, 2017 [Page 11] Internet-Draft Layer Refresh Request RTCP Feedback July 2016 This document also defines a new entry to the "FMT Values for PSFB Payload Types" subregistry of the "Real-Time Transport Protocol (RTP) Parameters" registry, according to the following data: Name: LRR Long Name: Layer Refresh Request Command Value: TBD Reference: RFC XXXX 9. References 9.1. Normative References [I-D.ietf-avtext-framemarking] Berger, E., Nandakumar, S., and M. Zanaty, "Frame Marking RTP Header Extension", draft-ietf-avtext-framemarking-01 (work in progress), March 2016. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/ RFC2119, March 1997, . [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, July 2003, . [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, "Extended RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, DOI 10.17487/RFC4585, July 2006, . [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, "Codec Control Messages in the RTP Audio-Visual Profile with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104, February 2008, . [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, DOI 10.17487/ RFC5234, January 2008, . Lennox, et al. Expires January 9, 2017 [Page 12] Internet-Draft Layer Refresh Request RTCP Feedback July 2016 [RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, "RTP Payload Format for Scalable Video Coding", RFC 6190, DOI 10.17487/RFC6190, May 2011, . [RFC7741] Westin, P., Lundin, H., Glover, M., Uberti, J., and F. Galligan, "RTP Payload Format for VP8 Video", RFC 7741, DOI 10.17487/RFC7741, March 2016, . [RFC7798] Wang, Y., Sanchez, Y., Schierl, T., Wenger, S., and M. Hannuksela, "RTP Payload Format for High Efficiency Video Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798, March 2016, . 9.2. Informative References [I-D.ietf-avtext-avpf-ccm-layered] Wenger, S., Lennox, J., Burman, B., and M. Westerlund, "Using Codec Control Messages in the RTP Audio-Visual Profile with Feedback with Layered Codecs", draft-ietf- avtext-avpf-ccm-layered-01 (work in progress), May 2016. [I-D.ietf-payload-vp9] Uberti, J., Holmer, S., Flodman, M., Lennox, J., and D. Hong, "RTP Payload Format for VP9 Video", draft-ietf- payload-vp9-02 (work in progress), March 2016. [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms for Real-Time Transport Protocol (RTP) Sources", RFC 7656, DOI 10.17487/RFC7656, November 2015, . Authors' Addresses Jonathan Lennox Vidyo, Inc. 433 Hackensack Avenue Seventh Floor Hackensack, NJ 07601 US Email: jonathan@vidyo.com Lennox, et al. Expires January 9, 2017 [Page 13] Internet-Draft Layer Refresh Request RTCP Feedback July 2016 Danny Hong Vidyo, Inc. 433 Hackensack Avenue Seventh Floor Hackensack, NJ 07601 US Email: danny@vidyo.com Justin Uberti Google, Inc. 747 6th Street South Kirkland, WA 98033 USA Email: justin@uberti.name Stefan Holmer Google, Inc. Kungsbron 2 Stockholm 111 22 Sweden Email: holmer@google.com Magnus Flodman Google, Inc. Kungsbron 2 Stockholm 111 22 Sweden Email: mflodman@google.com Lennox, et al. Expires January 9, 2017 [Page 14]