INTERNET-DRAFT Q. Wei Intended Status: Standard Track Y. Jiang Expires: January 6, 2017 R. Huang Huawei July 5, 2016 RTP Payload Format for HTTP Adaptive Streaming draft-wei-payload-has-over-rtp-00 Abstract This document introduces a new RTP payload format for encapsulating the HTTP Adaptive Streaming data into RTP, so that current RTP schemes can be leveraged into OTT video delivery services. For example, operators can easily deliver OTT live content through multicast to eliminating the impact of live content consumption peaks. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Copyright and License Notice Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Expires January 6, 2017 [Page 1] INTERNET DRAFT July 5, 2016 Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Existing Technologies . . . . . . . . . . . . . . . . . . . . 3 3.1 HTTP Adaptive Streaming . . . . . . . . . . . . . . . . . . 3 3.2 Multicast Adaptive Bit Rate (Multicast-ABR) . . . . . . . . 4 4. Overview of HTTP Adaptive Streaming over RTP . . . . . . . . . 5 5. HTTP Adaptive Streaming Payload . . . . . . . . . . . . . . . . 6 5.1. RTP Header Definitions . . . . . . . . . . . . . . . . . . 7 5.2. Payload Definitions . . . . . . . . . . . . . . . . . . . . 8 5.3. Packetization Consideration . . . . . . . . . . . . . . . . 9 6 Payload Format Parameters . . . . . . . . . . . . . . . . . . . 10 6.1 Media Type Definition . . . . . . . . . . . . . . . . . . . 10 6.2 SDP Signaling . . . . . . . . . . . . . . . . . . . . . . . 11 7. Congestion Control . . . . . . . . . . . . . . . . . . . . . . 12 8 Security Considerations . . . . . . . . . . . . . . . . . . . . 12 9 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 12 10 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 12 11 References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 11.1 Normative References . . . . . . . . . . . . . . . . . . . 12 11.2 Informative References . . . . . . . . . . . . . . . . . . 12 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13 Expires January 6, 2017 [Page 2] INTERNET DRAFT July 5, 2016 1 Introduction Video consumption has exploded over the last few years as more and more consumers are watching live Over-the-Top (OTT) content on smartphones, tablets, PCs and other IP connected devices. Since OTT video services rely on HTTP adaptive streaming (HAS) technology, e.g., DASH and HTTP Live Streaming (HLS), to deliver content, so every time a user requests a piece of content, a stream is sent throughout the entire network. If a significant number of users are requesting content, the operator's bandwidth is drained. It is usually difficult for operators to predict the popularity of live video content, especially for some major sporting events. Even when a content delivery network (CDN) is involved, the edge network may become congested, and CDN scalability could be a problem. All of this leads to a poor Quality of Experience (QoE) for users. The most effective solution is to use multicast technology, even for OTT live content delivery. Through multicast technology, operators can stream live content only once in their network, regardless of the number of viewers watching, which eliminates the impact of live content consumption peaks. This document introduces a new RTP payload format for encapsulating the HAS data into RTP, so that current RTP schemes can be leveraged into OTT video delivery services. For example, operators can easily deliver OTT live content through multicast to eliminating the impact of live content consumption peaks. 2 Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. This document uses the following terms: 3. Existing Technologies 3.1 HTTP Adaptive Streaming HTTP adaptive streaming has become a popular approach in video commercial deployments. The multimedia content is captured, divided into small segments, and stored on an HTTP server. The consuming user first obtains the manifest file, e.g., the Media Presentation Description (MPD), which describes a manifest of the available segments information, corresponding bitrates, their URL addresses, and other characteristics. Based on this, the consuming user selects Expires January 6, 2017 [Page 3] INTERNET DRAFT July 5, 2016 the appropriate encoded alternative and starts streaming of the content by fetching the segments using HTTP GET requests in unicast. MPEG developed the specification, known as MPEG Dynamic Adaptive Streaming over HTTP (DASH), to standardize the MPD and the segment formats. Other private mechanisms like Apple's HTTP Live Streaming (HLS) [HLS] are also popular. HAS is a typical client pull model. All the manifest files, HAS segments, and etc., are pulled from the HTTP server one after another by the clients issuing HTTP requests. HTTP adaptive streaming is very efficient for the usage of Video on Demand (VOD). However, when delivering live content simultaneously to millions of users, this becomes quite a problem. The peak bandwidth in video consumption is simply too much for an operator to handle since each viewer counts as a separate unicast session. As live OTT multi-screen video consumption shows no signs of slowing down, a traditional unicast delivery method is becoming too expensive in terms of bandwidth and investments that must be made to maintain the network. Partnering with a CDN provider only helps optimize the traffic on the backbone for known content. Additional infrastructure investment is still required at the edge of the network to absorb the load, but is too costly of an unertaking and would only be a temporary solution, as there would always be a need for more servers when live OTT consumption increases. 3.2 Multicast Adaptive Bit Rate (Multicast-ABR) Operators are seeking ways to improve the quality of services available, while also creating more balanced and effective delivery of data to enhance the operators' cost-efficiency, and reduce wastage across increasingly constrained bandwidths. Multicast-ABR, specified in [CableLabs], is one of the innovations. Multicast-ABR leverages HTTP streaming into multicast by keeping the different alternatives in separate multicast groups, so that smart network nodes or clients are able to select an appropriate rate by joining the correct multicast and delivering these segments to clients. And multicast-ABR uses NACK-Oriented Reliable Multicast (NORM) [RFC5740] to deliver HTTP adaptive streaming data in multicast. Multicast-ABR is a low cost and easy to deploy solution that allows operators to see multicast gains on all in-home devices leveraging their TV Everywhere infrastructure. However, using NORM to convey HTTP adaptive streaming data has 3 shortcomings: Firstly, NORM has no fast channel change (FCC) mechanisms, like [RFC6285], so that Expires January 6, 2017 [Page 4] INTERNET DRAFT July 5, 2016 changing different video resolutions may take some time and cause video frame freeze. Secondly, some telcom operators only have IPTV multicast platform, which may not support NORM protocol. Thirdly, NORM is not aware of the media timing in a way that RTP is as RTP is nature to handle multimedia. Based on this, using RTP to deliver HTTP adaptive streaming data could be an alternative. 4. Overview of HTTP Adaptive Streaming over RTP Figure 1 shows the architecture for HTTP adaptive streaming over RTP, which is similar to the ones defined in Multicast-ABR. xxxxxxxxxxx x x x OTT CDN x x Server x x x xxxxxxxxxxx ----- Unicast | | | ***** Multicast | xxxxxxxxxxxxxx x x x Multicast x x Server x x x xxxxxxxxxxxxxx * * * **************************************** * * * * xxxxxxxxxxxxxx xxxxxxxxxxxxxx x x x x x Multicast x ................ x Multicast x x Client x x Client x x x x x xxxxxxxxxxxxxx xxxxxxxxxxxxxx | | | | ------ ---------- ------ ------ | | | | xxxxxxxxxxx xxxxxxxxxxx xxxxxxxxxxx xxxxxxxxxxx x x x x x x x x x OTT x x OTT x x OTT x x OTT x Expires January 6, 2017 [Page 5] INTERNET DRAFT July 5, 2016 x Client x ....... x Client x x Client x .... x Client x x x x x x x x x xxxxxxxxxxx xxxxxxxxxxx xxxxxxxxxxx xxxxxxxxxxx Figure 1: Architecture of HTTP Adaptive Streaming over Mutlicast In this figure, the multicast server near the head-end taking a standard HAS stream (e.g., a DASH stream) as input and converting it into a multicast channel, so that the live streams can be conveyed in multicast down to its receivers, i.e., multicast clients. Different resolutions or bitrates are kept in different multicast groups. Multicast server: It is responsible for converting the HAS streams into multicast data, and providing the multicast service to its receivers. Multicast client: It is responsible for terminating the multicast. It can be a network device, deployed inside the ISP managed network, e.g., the home gateways, broadband network gateways (BNG), or optical line terminals (OLT), which converts the multicast streams back into unicast, so that all the compatible devices in the home network could receive the stream without modifying the end-use applications. Or, it can also be a user device which supports multicast. As indicated in the figure, HAS over RTP will be used between the multicast server and the multicast clients. Unlike HTTP, RTP is based on a push model where the server actively and continuously pushes the data to the client without the round trips between the client and server for requests. In essence, the manifest files are unnecessary and the receivers can handle the HAS data without the manifest files when leveraging HAS over RTP. However, Since the multicast client still has the requirement to work as converting multicast to traditional HAS mechanism, the manifest files should be considered in transmission to reduce the workload of the multicast client and traditional HAS client. There are two ways to send the manifest files from the multicast server to the multicast receivers. One way is to send them out of band, e.g., through HTTP. It is reliable, but requires additional bandwidth and round-trip time. The other way is to send them in the RTP payload, which requires to handle losses and partial receives. 5. HTTP Adaptive Streaming Payload Expires January 6, 2017 [Page 6] INTERNET DRAFT July 5, 2016 This section specifies the format of the RTP payload of HTTP adaptive streaming data. The structure of the payload is illustrated as Figure 2. This payload format uses the fields of the header in a manner consistent with that specification. +----------+------------+ |RTP Header|HAS Payload | +----------+------------+ Figure 2: Packet Structure with RTP Header 5.1. RTP Header Definitions The format of RTP header is specified in [RFC3550] and is shown as Figure 3 for convenience. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | [ contributing source (CSRC) identifiers ] | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3: RTP Header Defined in [RFC3550] The RTP header information to be set according to this RTP payload format is set as followings: Market bit (M): 1 bits The market bit set "1" SHALL indicate the last RTP packet of the media segment, carried in the current RTP stream. This is in line with the normal use of the M bit in video formats to allow an efficient playout buffer handling. Payload Type (PT): 7 bits The assignment of an RTP payload type for this new packet format is outside of the scope of this document and will not be specified here. The assignment of a payload type has to be performed either through the profile used or in a dynamic way. Expires January 6, 2017 [Page 7] INTERNET DRAFT July 5, 2016 Sequence Number (SN): 16 bits Set and used in accordance with [RFC3550]. Timestamp: 16 bits The RTP timestamp is set to the sampling timestamp of the content. The clock rate is specified dynamically through non-RTP means. If no clock rate is signaled, 90 kHz MUST be used. When a media segment or a information list is encapsulated into several RTP packets, each of them shares the same timestamp. 5.2. Payload Definitions The format of the HAS payload is illustrated in Figure 4. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |TP |F| RSV | Length | [URL Length] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | [ URL ...] +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | HAS data | | | | . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | :...OPTIONAL RTP padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 4: Format for HTTP Adaptive Streaming Payload TYPE (TP): 2 bits This field indicates the content type of this payload: TYPE=0: Manifest - Information about the media, in particular the list of segments making up the media. For example, MPD, Playlist. TYPE=1: Initial information - Information required to initialize the video decoder. For example, DASH initialization segment. This information is optional. TYPE=2: Media segment - A part or the whole of the HAS media segment. Expires January 6, 2017 [Page 8] INTERNET DRAFT July 5, 2016 TYPE=3: Reserved When HSPT=0, TP=0 means the payload carries a MPD file; TP=1 means the payload carries an initialization Segment; and TP=2 indicates a media segment content. When HSPT=1, TP=0 means the payload carries Playlist file; TP=2 indicates a media segment content; and TP=1 will never be used. Fragmentation (F): 1 bit If the fragmentation is set, it indicates the received packet is a fragment and can not be decoded correctly until all the fragments belong to the same content are received. The fragments belong to the same content are ordered by their sequence numbers, and share the timestamp. If the fragmentation is set, URL length field and URL field will be omitted. RSV : 5 bits These bits are reserved for future use. They MUST be set to zero by senders and ignored by receivers. Length: 16 bits The size of the RTP payload in bytes, excluding the RTP header but including the payload header. URL length: 8 bits The size of the URL field in bytes, including the URL length. This field only appears when the fragmentation is set. URL: bits defined by URL length. This field indicates the URL of the content. Examples would be "/PLTV/88888888/224/3221225484/3221225484.mpd", or "Stream_1_1944000". It facilitates associating the content with HTTP request so that receivers can easily turn it into HAS scheme when receiving it. It is used to relate the RTP packet with corresponding HAS segment specified in the manifest. 5.3. Packetization Consideration Senders of this payload SHOULD transmit the HAS data, e.g., MPD or segments, by encapsulating the whole HTTP packet so as to reduce Expires January 6, 2017 [Page 9] INTERNET DRAFT July 5, 2016 receivers' processing if they wish to convert them into conventional HTTP streaming scheme when forwarding these data to final end users and facilitate the support of different user devices. This payload format introduces a method to send the manifest or initial information through the media tunnel together with the HAS segments. This is optional. The manifest or initial information can also be sent by out of band methods like HTTP or SDP. Especially when the receiver joins the multicast, it's better to obtain the manifest or initial information by out of band ways in advance. In cases where manifest information or initial information is missing or partially missing when sent in the media tunnel, it is also suggested to retransmit them by reliable ways. A HAS segment may be much larger than the Maximum Transmission Unit (MTU), which will result in the segmentation of the HAS segment. Basically, it is required to split application data into RTP packets so that each packet is usable, no matter what is lost. So it is suggested to do so if the multicast server has the ability and authorized to access the HAS content. However, when OTT content is encrypted to the multicast server, the frame boundary that can be decoded independently is hardly figured out. Accordingly, one fragment of the HAS segment lost will lead to the whole segment undecodable, and the receivers joining the multicast randomly cannot view the content immediately. In this case, mechanisms like FEC [RFC5109], or retransmission [RFC4585] MUST be used to alleviate packet losses. And FCC [RFC6285] SHOULD be used to ensure users who joins the multicast randomly can view the content immediately. Another possible way to do smart fragmenting is to extend the manifest files, e.g. MPD, to allow the OTT content providers indicate the fragmentation points where independently decodable application data can be extracted. Thus, the multicast server bridging HAS into RTP would fetch the extended manifest file, then use these hints to determine how to fragment each segment into RTP packets. Since DASH is the standard in MPEG, this method requires the work in MPEG to specify the extended fragmentation points. 6 Payload Format Parameters This section specifies the media type and the parameters identifying this RTP payload format. 6.1 Media Type Definition The media subtype for HAS is allocated from the IETF tree. The receiver MUST ignore any unrecognized parameter. Expires January 6, 2017 [Page 10] INTERNET DRAFT July 5, 2016 Type name: HAS Subtype name: N/A Required parameter: has-type: This parameter indicates the HTTP adaptive streaming protocol. The value of has-type MUST be in the range of 0 to 7, inclusive. The detailed value can be seen as following. HSPT=0: DASH HSPT=1: Http Live Streaming (HLS) HSPT=2-6: Reserved HSPT=7: Profile-specific HTTP adaptive streaming Optional parameters: TBD Encoding considerations: This type is only defined for transfer via RTP [RFC3550]. Security considerations: See section 7 of RFCXXXX. Published specification: N/A Additional information: None File extensions: none Macintosh file type code: none Object identifier or OID: none Person & email address to contact for further information: Rachel Huang (rachel.huang@huawei.com) Intended usage: COMMON Author: See Authors' Addresses section of RFCxxxx. Change controller: IETF Audio/Video Transport Payloads working group delegated from the IESG. 6.2 SDP Signaling TBD. Negotiation of the new RTP payload is required. Further details will be provided in the next versions. Expires January 6, 2017 [Page 11] INTERNET DRAFT July 5, 2016 7. Congestion Control Current DASH clients do congestion control individually. When using multicast to transport HAS data, it is expected that multicast receivers have the ability to dynamically join the corresponding multicast group based on different network conditions. Multicast receivers share the same stream in one multicast group, but HAS receivers compete each other with different streams, which means the congestion control mechanisms used in HAS don't work in HAS over RTP. A new congestion control mechanism is needed to coordinate the receivers with the same problem, so that they can share the stream to obtain the best quality they can have. 8 Security Considerations TBD. 9 IANA Considerations TBD. 10 Acknowledgments The authors would like to thank the following individuals who help to review this document and provide very valuable comments: Colin Perkins, Roni Even. 11 References 11.1 Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. [CableLabs] "IP Multicast Adaptive Bit Rate Architecture Technical Report" http://www.cablelabs.com/wp-content/uploads/specdocs/OC-TR- IP-MULTI-ARCH-V01-1411121.pdf 11.2 Informative References [RFC5740] Adamson, B., Bormann, C., Handley, M., and J. Macker "NACK-Oriented Reliable Multicast (NORM) Transport Protocol", RFC 5740, November 2009. Expires January 6, 2017 [Page 12] INTERNET DRAFT July 5, 2016 [RFC6285] Steeg, B., Begen, A., Caenegem, T., and Z. Vax "Unicast- Based Rapid Acquisition of Multicast RTP Sessions", RFC 6285, June 2011. [HLS] Pantos, R. and W. May, "HTTP Live Streaming", https://tools.ietf.org/html/draft-pantos-http-live-streaming-19, April 2016. Authors' Addresses Qikun Wei Huawei Email: weiqikun@huawei.com Yuping Jiang Huawei Email: jiangyuping@huawei.com Rachel Huang Huawei Email: rachel.huang@huawei.com Expires January 6, 2017 [Page 13]