A Scalable and Format-Compliant Encryption Scheme for H.264 ...

www1.i2r.a.star.edu.sg

A Scalable and Format-Compliant Encryption Scheme for H.264 ...

A Scalable and Format-Compliant Encryption Schemefor H.264/SVC BitstreamsZhuo Wei a,∗ , Yongdong Wu b , Xuhua Ding a , Robert H. Deng aa School of Information Systems, Singapore Management University, Singapore, 178902b Institute for Infocomm Research, 1 Fusionopolis Way Singapore 138632AbstractSVC (Scalable Video Coding) is designed to adapt to heterogeneous networksand various terminal devices. This paper presents an encryption scheme forSVC bitstreams which retains the valuable scalability properties of SVC. Tothis end, we explore PACSI (Payload Content Scalability Information) andRTP (Real-time Transport Protocol) payload format such that encryptedbitstreams are SVC format-compliant. Specifically, the proposed scheme processesthe base layer and enhancement layers in different ways. For the baselayer, the scheme encrypts VCL (video coding layer) NALU (Network AbstractLayer Unit) into either SEI (Supplement Enhancement Information)NALU or PACSI NALU. For an enhancement layer, however, the schemereplaces a coded slice in scalable extension NALU with an encryption ofPACSI NALU. Thus, the proposed encryption scheme preserves SVC scalabilityand format-compliance. i.e., it produces encrypted bitstreams whichhave the original SVC structure, do not include emulation markers and illegalcodewords for any standard decoder. The analysis and experiments indicatethat protected SVC bitstream is secure against chosen plaintext attack, andcost-effective.Keywords: Scalable Video Coding; Scalability; Format-compliance;Encryption∗ corresponding authorPreprint submitted to Elsevier June 22, 2012


1. IntroductionDigital video is pervasive today; it finds applications in such diversifiedareas as remote education, telemedicine, surveillance, IP TV and video-ondemand.In order to cater for heterogenous networks and various terminaldevices, many video coding schemes (e.g., [1, 2]) have been designed. However,few of them are widely deployed due to low compression efficiencyand/or non-scalability. Fortunately, after at least 20 years of research andexperiments, the scalable extension of H.264/AVC (Advance Video coding)[3, 4] was finally adopted as an international standard for video coding in2007. Generally speaking, SVC (Scalable Video Coding) achieves significantimprovement in coding efficiency and scalability, making it especially attractivein today’s ubiquitous networking environments.When a video stream is disseminated over an open network, an adversaryis able to eavesdrop the content. A naïve way to thwart the adversary isto treat the video stream as non-structural data, encrypt the bitstream asa whole, and distribute the ciphertext bitstream. However, this naïve approachis not suitable for secure SVC delivery over heterogeneous networksbecause the protected bitstream which loses the scalable feature can causehigh computational overhead for adaptation. In addition, several SVC encryptionschemes [5, 6, 7, 8] were proposed for MPEG-4 FGS (Fine grainscalability) and wavelet-based SVC.Encryption schemes for SVC should satisfy the following properties:• Security: No information of the original video content can be deduciblefrom ciphertext.• Scalability: The encrypted streams preserve end-to-end scalability fordelivery. That is to say, the header of NAL (Network Abstract Layer)should be compliant to SVC specification such that anyone can downgradethe video by simply discarding some of the protected bitstream.• Format-compliance: The encrypted streams are compliant to the SVCspecification and compatible to the standard SVC decoder. To this end,the encrypted bitstream shall have two properties. (1) The encrypteddata shall not include the SVC markers, i.e., emulation-free; (2) Theencrypted data do not have illegal codeword. Otherwise, a standardplayer may crash when it plays the encrypted bitstream directly. Asthe crashing of decoder may leak memory leakage and eventually lead2


to abnormal system behavior. Even worse, memory leakage leads toserious kernel stability issues, or other attacks such as denial of serviceattack.• Computational complexity: The encryption and decryption operationsshould incur limited computational resource and time.• Compression overhead: The encryption should have no or little effecton compression efficiency.In this paper, based on the RTP payload format for SVC video [9], theencryption scheme presented in this paper transforms a standard SVC bitstreaminto an encrypted SVC bitstream which has the same format as theoriginal one. To this end, we encrypt VCL (video coding layer) NALU (NetworkAbstract Layer Unit) of the base layer into either SEI (SupplementEnhancement Information) NALU or PACSI (Payload Content ScalabilityInformation) NALU. For each enhancement layer, we replace a VCL NALUwhose type is 20 with an encryption of PACSI NALU. Our analysis and experimentalresults indicate that the encrypted SVC bitstream is secure andformat-compliant, and at the same time, the computational complexity andcompression overhead are small.The remaining of this paper is organized as follows. Related works areexplained in Section 2, and the SVC bitstream structure is introduced inSection 3. Section 4 elaborates our encryption scheme. Section 5 addressesthe performance of the proposed scheme. Section 6 presents experimentalresults. Finally, conclusions are drawn in Section 7.2. Related WorkThere are several video encryption delivery methods [10] based on tradeoffbetween security and performance in terms of scalability, format-compliance,computation complexity, and compression overhead. The classification ofSVC encryption algorithm is based on the relation between encryption andcompression. We consider two classes of algorithms as follows.2.1. Compression-integrated encryptionCompression-integrated encryption scheme simultaneously performs encryptionand compression (encryption is part of compression), which preservesscalability and format-compliant properties, and causes little computationalcomplexity. For example, in the schemes [11, 12, 13, 14, 15, 16]3


which encrypt the sign of Intra mode, residual, and motion vectors, the encryptedbitstream has the same compression rate as the original one, butit exposes semantic content of video by setting all the signs be positive ornegative [17]. On the contrary, the method in [18] increases the security levelby encrypting both signs and DC coefficients, but decreases the compressionrate by 15%.2.2. Bitstream-oriented encryptionThe existing bitstream-oriented encryption schemes produce secure bitstreamswith small compression overhead by compressing the video bitstreamdirectly. However, some of them are not (fully) format-compliant bitstreamssuch that a standard decoder which can not parse an encrypted SVC bitstreammay crash or freeze. For example, Thomas et al. [19] proposedan SVC encryption which is able to preserve bit-rate transcoding property,and the encryption techniques in [20, 21] merely preserve the NALU headerfor the sake of end-to-end scalable video adaptation. On the other hand,some other bitstream-oriented encryption schemes lost the scalability functionalityalthough they are format-compliant. For instance, the schemes in[22, 23] are format-compliant because they selectively encrypt a bitstream,adjust the encrypted bitstream, and map the NALU types of an encryptedSVC bitstream to unspecified NAL unit types (NUT), i.e., types 24 -27. As athird-party decoder ignores the unspecified NUTs, it can parse the encryptedbitstream smoothly. In addition, since the schemes conceal SVC headers [9],MANE (media aware network element) has to understand the mapping tableand parse each NALU to replace NUTs for adaptation. Generally speaking,bitstream-oriented encryption has higher computation complexity than thecompression-integrated encryption scheme.3. Overview of SVCAccording to the specification of the scalable extension of H.264/AVCstandard [3, 24], an SVC bitestream consists of one base layer and one or moreenhancement layers. The base layer includes the fundamental information ofthe video while enhancement layers supply more data for video resolution,frame rate and picture quality.3.1. Structure of SVC BitstreamAs shown in Figure 1, an SVC bitstream is divided into NALUs. EachNALU has a header which includes a forbidden zero bit (F), a 3-bit field4


Table 1: NALU TYPESNALU type Description SVC class0 unspecified Non-VCL1 Coded slice of a non-IDR picture VCL2-4 Coded slice data partition A, B, C VCL5 Coded slice of an IDR picture VCL6 Supplemental enhancement information Non-VCL7 Sequence parameter set Non-VCL8 Picture parameter set Non-VCL9 Access unit delimiter Non-VCL10 End of sequence Non-VCL11 End of stream Non-VCL12 Filler data Non-VCL13 Sequence parameter set extension Non-VCL14 Prefix NALU Non-VCL15 Subset sequence parameter set Non-VCL16-18 Reserved Non-VCL19 Coded slice of an auxiliary coded picture Non-VCL20 Coded slice in scalable extension VCL21-23 Reserved Non-VCL24-27 STAP-A, STAP-B, STAP-16, STAP-24 Aggregation Non-VCL28, 29 FU-A, FU-B Fragmentation unit Non-VCL30 PACSI: Payload content Scalability Information Non-VCL31 Subtype = 0 Empty; Subtype = 1 NI-MTAP Non-VCL5


AuxiliaryPayloadNALU i+1VisualPayloadVisualPayloadSVC.........bitstreamSOS NALU 1 NALU i NALU j NALUn EOSNon-VCL NALUVCL NALUVCL NALUNALU headerNALU headerNALU headerSVC headerNRI type R I PRID N DID QID TID D O UFRRFigure 1: The structure of SVC bitstream. The dash box means that some VCL NALUshall follow a prefix Non-VCL NALU.signalling importance of the NALU (NRI) and an NUT. As shown in Table 1,the NALUs can be classified into Non-VCL NALU and VCL NALU. A Non-VCL NALU has an auxiliary payload field for decoding (e.g., SVC headerin prefix NALU) or facilitating certain system operations (e.g., supplementenhancement information (SEI)), while a VCL NALU has a payload field forthe compressed visual data. The VCL NALU has two kinds of structures.One has an SVC header itself, while another one has to form a pair with aprefix Non-VCL NALU whose auxiliary payload is an SVC header. The SVCheader includes PRID (Priority id), DID (Dependency id), QID (Quality id),TID (Temporal id) indicating three scalability dimensions resepctively, whichdecides the layer of NALU. By ordering NALUs based on PRID, DID, QIDand TID in different ways, the SVC bitstream has different representationbut the same visual content.In this paper, we are interested in the Non-VCL NALU with types 6,14, 30 and VCL NALU with types 1, 5, 20. Those who are interested inother types, please refer to [3, 24]. Denote NALU (n)i as the ith NALU inthe bitstream, and its type be n. We may omit the subscript if the NALUposition is not important.3.2. Structure of selected NALUsThe VCL NALUs of an SVC bitstream are divided into one base layerand several enhancement layers. For the base layer, a non-VCL NALU (14)i isused to carry the SVC header information for each following VCL NALU (1)i+16


Figure 2: The structure of base layer, enhancement layer and PACSI NALU.or NALU (5)i+1 as shown in Figure 2.a. That is, a pair (NALU(14) i , NALU (1)i+1 )or(NALU (14)i , NALU (5)i+1 ) is used together so as to represent the visual contentof the base layer. For the enhancement layer, a coded slice NALU (20) shownin Figure 2.b is used to represent the visual content.PACSI NALU (30) [9] shown in Figure 2.c consists of an NALU header,SVC NALU header, several flags, and SEI NALUs. A PACSI NALU mayinclude one or more SEI NALUs. NALU (30) may be carried in a single NALUpacket or an aggregation packet. If a PACSI NALU (30)i is carried in a singlepacket, its SVC header is the same as that of the non-PACSI NALU i+1 .Butif a PACSI NALU (30)i is carried with other NALUs in an aggregation packet,7


it must be the first NALU in the aggregation packet, and its SVC NALUheader is related to the remaining NALUs in the aggregation packet. Notethat NALU (30) must not be fragmented into several packets. Meanwhile, theflags can decide if the field “TL0PICIDX”, “IDRPICID”, and “DONC” arepresent.3.3. ScalabilityAn SVC bitstream consists of a low-quality video sub-bitstream as wellas one or more supplement sub-bitstreams. Due to the flexible arrangementof NALUs, SVC provides three kinds of scalabilities, as depicted in Figure 3.These scalability properties enable an SVC-aware router to reduce the bitratedirectly without decoding the bitstream so as to meet the requirementsof network bandwidth and/or end user devices’ capabilities.spatialqualityD1Q2Q1D0Q008 4 26 1357T0 T1 T2 T3temporalFigure 3: An example architecture of SVC three dimensions. There are 4 temporal layers,2 spatial layers, and 3 quality layers.3.3.1. Temporal scalabilityA temporal enhancement layer consists of hierarchical P or B-pictures,and logically organizes the bitstream into a hierarchy of images. The tem-8


poral base layer is used as references for motion-compensated prediction ofpictures for all temporal layers, hence it should be coded with highest fidelity.But the enhancement layers use a larger quantization parameter because theirquality influences fewer pictures. Thus, when an SVC-aware router discardssome picture (e.g., B-pictures) data directly without decoding the bitstream,it produces a lower bit-rate SVC bitstream.3.3.2. Spatial scalabilitySVC spatial scalability means that its base layer represents a video of lowspatial resolution, and the enhancement layers increase the spatial resolutionof the video. As there are inter-layer prediction mechanisms for the sakeof coding efficiency, a lower layer must be present if a higher layer exists,but the lower layer does not need the higher layers at the decoder side.Therefore, when the highest layer is discarded, the rest of layers are still ableto be decoded. This discarding process can be repeated until only one layerremains. In other words, the spatial resolution of an SVC bitstream can bedecreased directly without decoding the bitstream.3.3.3. Quality scalabilityQuality scalability means that the quality base layer is coded at a lowvisual quality, and the quality enhancement layers increase the visual qualityof the decoded sequence. Therefore, when the highest quality layer is discarded,the rest of quality layers are still able to be decoded. This discardingprocess can be repeated until only one quality layer remains.3.4. Base Layer and Enhancement LayerAn SVC bitstream contains an H.264/AVC compatible base layer and oneor more enhancement layers. With reference to its lower layer such as baselayer, an enhancement layer is able to increase the resolution, quality andframe rate of a video. Thus, a reference layer is required to decode a higherlayer. However, this does not mean that a higher layer without its referencelayer is meaningless. Indeed, if the frame of base layer is set to blank andthe SVC enhancement layers are decoded, the spatial (e.g., Figure 4.a toFigure 4.e), quality (e.g., Figure 4.f to Figure 4.j) and temporal (e.g., Figure4.k to Figure 4.o) enhancement layers expose video content if they are notencrypted.9


Figure 4: Visual content of enhancement layers.4. The Scalable and Format-compliant Encryption SchemeA format-compliant encrypted SVC bitstream must follow the SVC structuredepicted in Subsection 3.1 and have the scalability merits in Subsection3.3. To this end, the proposed encryption scheme does not encrypt the NALUheaders, but real visual sample data only. Since the base layer conveys themajor visual information, and enhancement layers represent the visual contentto some extent, both of them need to be encrypted. That is to say, thestructure of the protected bitstream is public, but all the visual content areencrypted.4.1. Emulation-free protection of visual dataWith the method in [21], we generate an emulation-free SVC bitstreamassuming that both encryptor and decryptor share a secret key and a streamcipher. Based on the SVC requirements [24], an emulation-free SVC bitstreamrequires, for every VCL NALU, that (1) any forbidden code in Table2 should not occur; and (2) the last byte should not be 0x00.10


Table 2: Mapping between Forbidden code and its replacementforbidden replacement0x0000000x0000010x0000020x0000030x000003000x000003010x000003020x000003034.1.1. EncryptionDenote an NALU’s payload (i.e., visual data) as m 1 ,m 2 , ···,m n ,andtheencryptor generates a key stream k 1 ,k 2 , ···,k n with the stream key generator,all in bytes.• For any but the last byte m i , the ciphertext isc i = m i ⊕ k i ,i≠ n.If any 3-byte string c i c i+1 c i+2 forms a forbidden code, replace it withits mapped replacement.• The encrypted last byte isc n =((m n + k n ) mod 255) + 1.All the {c i } or their replacement (if any) constitute the encryption of thevisual data. This encrypted visual data has no emulation markers. Notethat the number of bytes in the resulting encrypted data may be more thann due to our emulation prevention measure.4.1.2. DecryptionThe decryption is used to recover the original visual data of a VCL NALU,i.e., recovers the encrypted visual data c i ,i=1, 2, ···,n ′ back into the plaintext.• For any 4-byte string b = c i c i+1 c i+2 c i+3 ,ifb is one of the replacementcode in the 2nd column in Table 2, then replace b with the corresponding3-byte code in the 1st column in Table 2 and denote it as c ′ ic ′ i+1c ′ i+2.11


NRI 14 R I PRID N DIDF=0NRI 14 R I PRID N DIDF=0• The decryptor generates a key stream k 1 ,k 2 , ···,k n with the streamkey generator.• For any but the last byte c ′ i, the plaintextm i = c ′ i ⊕ k i ,i≠ n.• The last byte is replaced withm n =((c ′ n + k n) mod 255) + 1.All the {m i } constitute the original visual data.4.2. Encrypted NALUIn order to ensure format-compliant and security, the proposed schemecreates an encrypted NALU by adjusting the NALU header information,and encrypting the NALU visual data with the method in Subsection 4.1according to the NALU type.4.2.1. “Root” VCL NALU at base layerTID=0 D O UQID=0RRF NRI 1/5Slice headerVisual dataa. Original NALU group: prefix NALU and VCL NALTID=0 D O UQID=0RRF NRI 6Payload type = 5PayloadSizeUUID(16 bytes)Slice headerEncrypted visual datab. New SEI NALU which replace original VCL NALUFigure 5: Encryption of “Root” VCL NALU.12


PACSIunit NALPrefixunit NALNRI 14 R I PRID N DIDF=0NRI 30 R I PRID N DIDF=0A “root” VCL NALU belongs to the base layer, and the TID of its prefixNALU (14) is zero. As it is used as the reference layer of all the other enhancementlayers, a “root” VCL NALU must be encrypted to provide the basicprotection. To encrypt the “root” NALU, we create a new SEI NALU (6)whose payload type is 5 (user data unregistered message) and replace “root”NALU with the new NALU (6) . The encryption of the visual sample dataof the original VCL NALU acts as the payload of SEI NALU (6) . Figure 5illustrates the structure of the new NALU.4.2.2. Temporal enhancement layer’s NALU at base layerTID!=0 D O UQID=0RRF NRI 1/5Slice headerVisual dataa. Original NALUs group: prefix NALU and VLC NALU.TID!=0 D O UQID=0RR0 0 0 0 0 0 0 0 NAL unit sizeF NRI6Payload type = 5PayloadSizeUUID(16 bytes)Slice headerEncrypted visual dataNRI 14 R I PRID N DID QID TID D O UFRRb. New PACSI NALU which replaces original VCL NALU.Figure 6: Encryption of temporal enhancement layer at base layer.For the temporal enhancement layer NALU (1) or NALU (5) of the baselayer following a prefix NALU (14) with TID ≠ 0, a new NALU is constructedas follows.• A new PACSI NALU (30) including:13


PACSIunit NALprefixunit NAL– an NALU header which is the same as that of the original prefixNALU (14) .– an SVC header which is the same as that of the original prefixNALU (14) .– an NALU type which is 6, and its payload type is 5 (i.e., user dataunregistered message);– a payload which is the encryption of the visual data of the originalNALU;Additionally, the original prefix NALU (14) is appended to the new PACSINALU (30) . Figure 6 illustrates the structure of the new NALUs which areused to replace the original NALUs.4.2.3. NALU of quality and spatial enhancement layerNRI 20 R I PRID N DID QID TID D O UFRRSlice headerVisual dataa. Original VCL NALU which is in enhancement layer.NRI 30 R I PRID N DID QID TID D O UFRR0 0 0 0 0 0 0 0 NAL unit size F NRI 6Payload type = 5PayloadSizeUUID(16 bytes)Slice headerEncrypted visual dataNRI 14 R I PRID N DID QID TID D O UFRRb. New PACSI NALU which replaces original VCL NALU and new prefix NALU.Figure 7: Encryption of quality or spatial enhancement layer.For the NALU (20) of spatial (or quality) enhancement layer with DID≠ 0or QID≠ 0, two NALUs are created as shown in Figure 7.14


• A new PACSI NALU (30) including:– an NALU header which is the same as that of the original NALU (20) .– an SVC header which is the same as that of the original NALU (20) .– an NALU type which is 6, and its payload type is 5 (i.e., user dataunregistered message);– a payload which is the encryption of the visual data of the originalNALU;• A prefix NALU (14) consists of the SVC header of original NALU (20) ;Table 3: Summary of VCL NALU encryptionTID DID QID Orignal New Layer SubsectionNUTNUT0 0 0 1or5 ∗ 6 base 4.2.10 1 0 20 30 # enhancement 4.2.30 0 1 20 30 # enhancement 4.2.30 1 1 20 30 # enhancement 4.2.3> 0 0 0 1or5 ∗ 30 + base 4.2.2> 0 1 0 20 30 # enhancement 4.2.3> 0 0 1 20 30 # enhancement 4.2.3> 0 1 1 20 30 # enhancement 4.2.3- - - 30 30 enhancement 4.2.4∗ Following a prefix NALU (14) .+ Swapping the sequence with the prefix NALU (14) .# Padding with the prefix NALU (14) .4.2.4. Existing PACSI NALUAccording to the SVC specification, a non-protected SVC bitstream mayhave PACSI NALU (30) . For any of those PACSI NALUs, a new PACSI15


Figure 8: Encrypted PACSI NALU constructed from the PACSI NALU of a non-protectedbitstream.NALU (30) is constructed as given in Figure 8, where the original NALU (30)is the payload of the new NALU (30) .Table 3 summarizes the encryption method based on the NALU parameters.The last column shows the subsections which describe the encryptionmethods for the VCL NALU.4.3. The Flow of EncryptionOur scheme encrypts SVC bitstreams based on NALUs. With referenceto Figure 9, the encryption process proceeds as follows.(1) Parse the bitstream into NALUs, then process the NALUs one by one.16


Figure 9: The flowchart of bitstream encryption.• For prefix NALU which is followed by a coded slice picture NALU(non-IDR/IDR), if TID=0, construct an SEI NALU as described inSubsection 4.2.1, otherwise, construct a new PACSI NALU (30) asdescribed in Subsection 4.2.2;• For any NALU (20) , construct a PACSI NALU as described in Subsection4.2.3;• For any PACSI NALU (30) , construct a new PACSI NALU as describedin Subsection 4.2.4.(2) Replace the original NALU of SVC bitstream with the new NALU.In the above encryption process, the visual data are encrypted and the VCLNALUs at different scalability layers are replaced by the constructed NALUs.Thus, the encrypted SVC bitstream is format-compliant.17


Figure 10: The flowchart of bitstream decryption.4.4. The Flow of DecryptionSimilar to the encryption architectures, Figure 10 illustrates the decryptionprocess which is the reverse of the encryption flow elaborated in Subsection4.3.(1) Parse the bitstream into NALUs, then decrypt each NALU as follows.• For any SEI NALU (6) whose payload type is 5 and which followsa prefix NALU (14) , construct a VCL NALU (1) or NALU (5) withDID = QID = TID =0.• For any PACSI NALU (30) whose DID=0 and QID=0, copy theprefix NALU (14) , and recover NALU (1) or NALU (5) based on theencrypted NALU (30) .18


• For any PACSI NALU (30) whose DID or QID is not zero, recoveran NALU (20) based on the NALU header and the SEI NALUs ofPACSI NALU.• For any PACSI NALU, if its SEI payload includes the header ofPACSI NALU (20) , recover the existing PACSI NALU.(2) Decrypt the visual sample data in the NALU (1) , NALU (5) or NALU (20)with the method in Subsection 4.1.2, and replace the NALU of theprotected SVC bitstream with the new NALU.Following the above decryption process, an authorized user can recover theSVC bitstream with his/her decryption key and then decode the decryptedbitstream.5. Discussion5.1. SecurityUsing a secret key which is transmitted under secure socket layer (SSL,http://www.openssl.org) protocol, a stream cipher such as RC4 [25] withInitialization Vector (IV ) is used for encryption, IV should be unique foreach message such that the cipertexts do not repeat themselves even if thecorresponding plaintexts are the same. This ensures the encryption schemeis secure against chosen plaintext attack (CPA).In our encryption scheme, IV is generated asIV = F(H n ,H s )where F is a one-way function such as SHA1 [26], H n represents the SVCNALU header such as DID, QID, and TID at different scalability layers, andH S denotes the slice header for each VCL NALU at the same scalability layer.Because the proposed scheme only encrypts the visual sample data, and theheader information is in clear text, IV can be deduced from the protectedbitstream at the decoder/decryptor side.5.2. Packet fragmentationAs mentioned in Subsection 3.1, a PACSI NALU must be encapsulatedin a RTP packet. However, when an SVC bitstream is delivered over heterogeneousnetworks, the RTP packet may be fragmented due to network MTU(Maximum Transmission Unit) limit. Therefore, with regard to Figure 11,19


Figure 11: Fragment packets of PACSI NALU. (a) A packet of the original PACSI NALU,(b) Fragment packets of the original PACSI NALU.20


an SVC-aware router is required to fragment a PACSI NALU into severalnew PACSI NALUs as follows.• Preserve the prefix NALU (14) ;• Construct some new PACSI NALUs (30)with the original PACSI NALU (30) ;whose headers are the same• Fragment the payload of original PACSI NALU, and distribute thesefragment payload to new PACSI NALUs;• Encapsulate each new NALU into a separate packet.Since each new PACSI NALU has the same header as the original PACSINALU and prefix NALU, it satisfies the standard requirement of PACSINALU.5.3. ScalabilityAs the proposed method uses a new PACSI NALU to replace VCL NALUwith types 1, 5 or 20, and the PACSI NALU possesses the same SVC NALUheader which indicates the scalability information, MANEs are able to decidewhether to forward, process, or discard the NALU. Thus, the proposedscheme is transparently adaptable in an end-to-end video delivery scenario.5.4. Format-complianceSince an encrypted SVC bitstream consists of standard NALUs (e.g., SEI,PACSI and prefix NALU), the SVC bitstream satisfies semantic and syntacticrequirement of the scalable extension of H.264/AVC standard. Thus thepresent dedicated decoder can process the protected bitestream correctly,while a third-party decoder processes the encrypted NALU whose payloadtype indicates user data unregistered message as a string rather than codewordssuch that it will not crash or freeze. Therefore, the present encryptionscheme is format-compliant.6. Experimental ResultIn the experiments, the SVC bitstreams are produced by a generatorimplemented with JSVM 9.19 [28] and stored in the streaming server Live555[27], while the open SVC decoder [29] is used as the client. Both the serverand the client run on PC (2.53GHz Intel dual-core processor).21


Table 4: The SVC sequences in the experimentsSpatial scalabilityQuality scalabilityCoarse grain scalability Medium grain scalabilityQCIF/CIF CIF/4CIF CIF/CIF CIF/CIFforeman soccer bus mobile football footballAs listed in Table 4, the test video include football (249 frames), soccer(299 frames), bus (149 frames), mobile (299 frames), and foreman (299frames), where B/E means base layer B and enhancement layer E. EachGOP 1 includes 16 frames and there are 30 frames between two I-frames. Inaddition, QP (Quality Parameter) of the base layer is 32, and QP of spatialand quality (CGS/MGS) enhancement layers are 30 and 24 respectively.a. Decrypted picture by an unauthorized user b. Decrypted pictures by an authorized userFigure 12: Decrypted pictures by unauthorized and authorized user, respectively.6.1. Encrypted bitstreamWith the present encryption scheme, an encrypted SVC bitstream is constructedfrom an original SVC bitstream by following processing: based onscalability information, if DID = QID = TID = 0, substituting NALU (1)(NALU (5) ) with NALU (6) ; otherwise, substituting NALU (1) (NALU (20) ) withNALU (30) . The encrypted bitstreams are stored at a server. When a user1 A picture of the temporal base layer and all temporal refinement pictures between thebase layer picture and the previous base picture build a group of pictures.22


percentage0.0350.03busmgsfootball mgsfootball cgsmobilemgsforeman spatialsoccer spatial0.0250.020.0150.010.0050NALUs (DID = QID = TID = 0)1 2 3 4 5 6 7 8 9 10(a) Overhead percentage of NALUs of“root” layer.percentage0.4bus mgsmobile mgs0.35football mgsforeman spatial0.3football cgssoccer spatial0.250.20.150.10.050DTQ:DTQ:DTQ:DTQ:DTQ:DTQ:DTQ:DTQ:DTQ:DTQ:DTQ:DTQ:DTQ:DTQ:DTQ:DTQ:DTQ:DTQ:DTQ:DTQ:NALUs (DID = QID = 0 and TID = 1)010 010 010 010 020 020 020 020 020 030 030 030 030 030 040 040 040 040 040010(b) Overhead percentage of temporal enhancement layer ofbase layer (DID = QID =0and T ID ≠0).Figure 13: Overhead percentage of NALUs of base layer.starts a session with the server based on his/her network bandwidth or terminalcapability, an encrypted SVC bitstream is transmitted to the user. Ifthe user’s decoder does not have the right key, it normally deals with SEINALU (6) or PACSI NALU (30) which actually contain visual data. Therefore,no visual information can be viewed as shown in Figure 12.a. However, for theauthorized user with the correct key, the SEI NALU (6) or PACSI NALU (30)will be parsed and decrypted to VCL NALUs, then the reconstructed VCLNALUs can be decoded and rendered as shown in Figure 12.b.23


0Q:0Q:0Q:0Q:0Q:0Q:01Q:01Q:0.06 foreman spatial football cgs soccer spatial01Q:02Q:02Q:02Q:: 021 QTQ:022 D TQ:023 D TQ:031 D TQ:032 D TQ:033 D TQ:031 D TQ:032 D TQ:033 D TQ:031 D TQ:032 D TQ:033 D TQ:031 D TQ:032 D TQ:033 D TQ:041 D TQ:042 D TQ:043 D TQ:041 D TQ:042 D TQ:043 D TQ:041 D TQ:042 D TQ:043 D TQ:041 D TQ:042 D TQ:043 D TQ:041 D TQ:042 D TQ:043 D TQ:041 D TQ:042 D TQ:043 D TQ:041 D TQ:042 D TQ:043 D TQ:041 D TQ:042 D TQ:043D6.2. Communication costAs described in Section 4, the scheme replaces the original VCL NALUof the base layer or enhancement layer with a new SEI or PACSI NALU.Hence, the overhead of each VCL NALU isoverhead = H + P + U + Mwhere H is the size of the header of SEI or PACSI, P is the payload sizefield of SEI (variable), U = 16 is the fixed size of UUID, and M is factorgenerated by the encryption primitive to avoid marker emulation as shownin Section 4.1.percentage0.050.040.030.020.010NALUs (DID = 1)(a) Overhead percentage of spatial and CGS enhancementlayer (DID ≠0and QID =0).percentage0.120.1bus mgs mobile mgs football mgs0.080.060.040.0202 3 1 2 3 1 2 3 1 2 31DTDTDTDTDTDTDTDTDTDTDTDTDTNALUs (DID = 0 and QID != 0)(b) Overhead percentage of MGS enhancement layer (DID =0and QID ≠0).Figure 14: Overhead percentage of NALUs of spatial or quality enhancement layers.24


Figure 13 and Figure 14 illustrate our experimental results on averagecompression overhead of base layer and enhancement layers. The x-axis unitis NALUs which are indicated by “DTQ” value. For example, “DTQ:010”means DID =0,TID =1,andQID =0.Table 5: Overhead of the proposed algorithmfootball football soccer bus mobile foremancgs mgs spatial mgs mgs spatialOriginal 1390808 1511335 3378146 687275 1414621 689425(bytes)Encryption 1410150 1523372 3384273 694198 1428522 695552(bytes)Overhead 19342 35110 29285 19465 39231 18749(bytes)VCL NALU 598 1196 598 596 1196 598(number)Overhead per 37.34 33.89 48.97 32.66 32.80 31.35NAL (bytes)Overhead 1.39 2.32 0.867 2.83 2.77 2.72(%)Figure 13(a) shows the overhead percentage of “root” layer NALUs, whichis between 0.85% to 1.38% of SVC bitstreams. Figure 13(b) shows the overheadpercentage of temporal layer of base layer inside one GOP, in which bothDID and QID are 0. Experimental results indicate that NALUs of differenttemporal layers have similar overhead in byte (e.g., 30 bytes). Due to interprediction theory, higher temporal layer’s NALU length may be shorter thanlower temporal layer’s. There are two reasons as follows. First, higher temporalframe generally uses bi-direction prediction and utilizes nearer framesas reference which are similar content with current frame. Therefore, thecurrent encoded frame has less redundant signal (i.e., its NALU length isshorter). Especially, scene of SVC sequence is static or smooth one. Second,higher temporal frame normally takes use of larger quantization parameter(QP) than lower temporal layer’s. Thus, with similar overhead in byte, the25


NALU’s overhead percentage rises as TID increases. However, it dose notmean the entire SVC bitstreams have large overhead percentage. In fact,the proposed scheme only causes average 2.14% overhead of SVC bitstreams,which is due to the following reasons:• Compared with all NALUs, there are few shorter NALUs belonging totemporal layers of base layer.• Compared with the size of the entire SVC bitstream, the overheadcaused from these shorter NALUs are small.Moreover, Figure 14 shows the overhead percentage of spatial/CGS andMGS enhancement layers inside one GOP. Both Figure 14(a) and Figure14(b) illustrate that the NALU’s overhead percentage is higher for biggerTID (bigger TID corresponds to higher temporal layer), which it is the samereason with Figure 13(b). In addition, experimental results and analysisindicate each MGS causes similar overhead in byte. Based on MGS theory,each MGS layer corresponds to DCT (discrete cosine transform) coefficientsof different frequency band (i.e., higher MGS layer contains higherfrequency DCT coefficients while lower MGS contains lower frequency DCTcoefficients). Most non-zero DCT coefficients of image concentrate in lowfrequency bands, that is to say, the NALU length of higher MGS layer isshorter than lower MGS layer’s. Therefore, with similar overhead in byte, anNALU of higher MGS layer (e.g., QID = 3) has larger overhead percentagethan a lower one (e.g., QID = 1) at the same TID, which is illustrated inFigure 14(b).Table 6: Computation cost in msDecoding Our scheme Naïve Overhead (%)timeenc dec enc dec decfootball cgs 7,062 117.2 115.82 35.57 34.34 1.64football mgs 11,670 128.7 131.60 41.23 39.86 1.13soccer spatial 38,477 268.9 271.01 71.37 68.68 0.70bus mgs 6,577 60.3 61.99 21.85 21.17 0.94mobile mgs 13,229 122.7 125.75 38.83 37.81 0.95foreman spatial 9,365 61.1 62.06 21.86 21.26 0.6626


Table 5 states that the overhead introduced by the proposed scheme isbetween 0.8% to 2.8% of the SVC bitstream, or 2.14% on average. In comparison,the overhead percentage is 1.42% in [21], and 2.14% in [22]. Therefore,compression overhead of the proposed scheme is similar to the previousworks’, but it can preserve both format-compliance and scalability.6.3. Computation costComputation overhead of our proposed scheme contains three parts: T parsing ,T constructing ,andT encrypting . T parsing is time of parsing slice header; T constructingis time of constructing SEI or PACSI NALU header; T encrypting is time of encryptingvisual data by stream cipher.Table 6 illustrates computation cost results of six SVC sequences, whichincludes decoding time of original SVC sequences and encryption/decryptiontime of our scheme and naïve scheme. Naïve scheme means that SVC bitstreamis directly encrypted by stream cipher or block cipher without preservingSVC’s format-complaint and scalability. The last column is the decryptionoverhead of our scheme. With reference to Table 6, our proposedencryption scheme is three times slower than the naïve decryption method.Fortunately, this computation cost is usually acceptable as the last columnsindicates the decryption time is smaller than 1.64% of the decoding time.6.4. Comparison with other SVC encryption schemesTable 7 summaries performance comparison between our proposed schemeand previous SVC encryption schemes.For security, [11, 12, 13, 14, 15, 16, 18] are low/weak security level becausethe encrypted video can expose sensitive information which is explained inSection 2.1; [19, 20, 21, 22, 23] provide high security level due to the use ofcryptographic technique; our proposed scheme also exhibits high security asshown in our evaluation section (Section 5.3).Note that scalability-transparency and format-compliance cannot be metsimultaneously in [11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22, 23] (analysis in Section2), but our proposed scheme can preserve both scalability-transparencyand format-compliance.For computation complexity, [19, 20, 21, 22, 23] and our proposed schemerequire XOR-ing every bit, this causes a medium computation complexity;[11, 12, 13, 14, 15, 16, 18] need to perform XOR operation on each signof non-zero coefficients, Intra mode and motion vector, thus it has a lowcomputation complexity.27


Table 7: Comparison between the proposed scheme and other SVC encryption schemes[11, 12, 13] [18] [19, 20, 21] [22, 23] Present[14, 15, 16]Security low medium high high highScalability yes yes yes no yes-transparencyFormat yes yes no yes yes-complianceComputation low low medium medium mediumcomplexityCompression no 15 1.30 ∼ 3.4 0.92 ∼ 2.0 0.8 ∼ 2.8overhead(%)For compression overhead, [11, 12, 13, 14, 15, 16] have no compressionoverhead except of [18]; [19, 20, 21, 22, 23] can cause compression overheadif stream and block cipher have an initial vector (128bit) for each NALUand handle synchronization markers of H.264. Our proposed scheme causesimilar compression overhead while this trades-off with additional propertiese.g., adaptation-transparency, format-compliance and low computationcomplexity.7. ConclusionsThis paper presents a novel encryption scheme for SVC bitstreams. Thescheme creates new NALUs to replace the original VCL NALUs. Becauseeach new NALU has an SVC NAL header which indicates the scalability informationin an SVC bitstream, the encrypted bitstream preserves the SVCscalability. At the same time, since the payload of each new NALU is identifiedas user data unregistered message, the encrypted bitstream satisfiesthe SVC specification. The experimental results indicate that the proposedscheme incurs little overhead and has low processing cost.AcknowledgmentThis work was supported in part by A*STAR SERC Grant No. 102 1010027 in Singapore.28


References[1] D. L. Goeckel, Adaptive coding for time-varying channels using outdatedfading estimates (Jun. 1999).[2] V. K. Goyal, Multiple description coding: Compression meets the network(Sep. 2001).[3] H. Schwarz, D. Marpe, T. Wiegand, Overview of the scalable video codingextension of the h.264/avc standard, IEEE Transactions on Circuitsand System for Video Technology 17 (9) (2007) 1103–1120.[4] M. Wien, H. Schwarz, T. Oelbaum, Performance analysis of svc, IEEETransactions on Circuits and System for Video Technology 17 (9) (2007)1194–1203.[5] J. G. Apostolopoulos, S. J. Wee, Secure scalable streaming enablingtranscoding without decryption, in: ICIP (1), 2001, pp. 437–440.[6] V. Gergely, G. Fehér, Enhancing progressive encryption for scalablevideo streams, in: EUNICE, 2009, pp. 51–58.[7] S. G. Lian, Secure service convergence based on scalable media coding,Telecommunication Systems 45 (1) (2010) 21–35.[8] B. B. Zhu, C. Yuan, Y. Wang, S. Li, Scalable protection for mpeg-4 finegranularity scalability, IEEE Transactions on Multimedia 7 (2) (2005)222–233.[9] S.Wenger,Y.K.Wang,T.Schierl,A. Eleftheriadis, Rtp payload formatfor scalable video coding, RFC 6190, 2011.[10] Thomas Stütz and Andreas Uhl, “A survey of h.264 avc/svc encryption,”IEEE Trans. Circuits Syst. Video Techn., vol. 22, no. 3, pp. 325–339,2012.[11] Y. G. Won, T. M. Bae, and Y. M. Ro, “Scalable protection and accesscontrol in full scalable video coding,” International Workshop on DigitalWatermarking, pp. 407–421, 2006.29


[12] S. W. Park, S. U. Shin, Efficient selective encryption scheme for theh.264/scalable video coding(svc), Fourth International Conference onNetworked Computing and Advanced Information Management (2008)371–376.[13] S.-W. Park, S.-U. Shin, Combined scheme of encryption and watermarkingin h.264/scalable video coding (svc), in: New Directions in IntelligentInteractive Multimedia, 2008, pp. 351–361.[14] S.-W. Park, S.-U. Shin, An efficient encryption and key managementscheme for layered access control of h.264/scalable video coding, IEICETransactions 92-D (5) (2009) 851–858.[15] Y. Kim, S. H. Jin, T. M. Bae, Y. M. Ro, A selective video encryptionfor the region of interest in scalable video coding, IEEE Region 10Conference (2007) 1–4.[16] C. H. Li, X. X. Zhou, Y. Z. Zhong, Nal level encryption for scalablevideo coding, PCM (2008) 496–505.[17] C. Wu and C. Kuo. Fast Encryption Methods for Audiovisual Data Confidentiality.SPIE International Symposia on Information Technologies,pages 284-295, 2000.[18] G. B. Algin, E. T. Tunali, Scalable video encryption of h.264 svc codec,Journal of Visual Communication and Image Representation 22 (4)(2011) 353–364.[19] N. Thomas, D. Bull, and D. Redmill, “A novel h.264 svc encryptionscheme for secure bit-rate transcoding,” Picture Coding Symposium.,pp. 1–4, May. 2009.[20] E. Magli, M. Grangetto, G. Olmo, Transparent encryption techniquesfor h.264/avc and h.264/svc compressed video, Signal Processing 91 (5)(2011) 1103–1114.[21] H. K. Arachchi, X. Perramon, S. Dogan, A. M. Kondoz, Adaptationawareencryption of scalable h.264/avc video for content security, SignalProcessing: Image Communication 24 (6) (2009) 468–483.30


[22] H. Hellwagner, R. Kuschnig, T. Stütz, A. Uhl, Efficient in-network adaptationof encrypted h.264/svc content, Signal Processing: Image Communication24 (9) (2009) 740–758.[23] T. Stütz, A. Uhl, Format-compliant encryption of h.264/avc and svc, in:ISM, 2008, pp. 446–451.[24] ITU-T, I. J. 1, Advanced video coding for generic audiovisual services,ITU-T and ISO/IEC JTC 1 Recommendation H.264 and ISO/IEC 14496-10 (MPEG-4) AVC, 2005.[25] R.L. Rivest, “The RC4 Encryption Algorithm” RSA Data Security,Inc., March 12, 1992.[26] D. Eastlake, P. Jones, US Secure Hash Algorithm 1 (SHA1), in: IETFRequest for Comments 3174, September 2001.[27] Live Networks Inc. LIVE555 Streaming Media.http://www.live555.com/liveMedia/.[28] J. Reichel, H. Schwarz, M. Wien, Joint Scalable Video Model JSVM-19, doc, Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG,2011.[29] M. Blestel and M. Raulet, “Open SVC decoder: a flexible SVC library,”ACM MM., pp. 1463–1466, OCT. 2010.31

More magazines by this user
Similar magazines