1% -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*- 2%!TEX root = Vorbis_I_spec.tex 3\section{Embedding Vorbis into an Ogg stream} \label{vorbis:over:ogg} 4 5\subsection{Overview} 6 7This document describes using Ogg logical and physical transport 8streams to encapsulate Vorbis compressed audio packet data into file 9form. 10 11The \xref{vorbis:spec:intro} provides an overview of the construction 12of Vorbis audio packets. 13 14The \href{oggstream.html}{Ogg 15bitstream overview} and \href{framing.html}{Ogg logical 16bitstream and framing spec} provide detailed descriptions of Ogg 17transport streams. This specification document assumes a working 18knowledge of the concepts covered in these named backround 19documents. Please read them first. 20 21\subsubsection{Restrictions} 22 23The Ogg/Vorbis I specification currently dictates that Ogg/Vorbis 24streams use Ogg transport streams in degenerate, unmultiplexed 25form only. That is: 26 27\begin{itemize} 28 \item 29 A meta-headerless Ogg file encapsulates the Vorbis I packets 30 31 \item 32 The Ogg stream may be chained, i.e., contain multiple, contigous logical streams (links). 33 34 \item 35 The Ogg stream must be unmultiplexed (only one stream, a Vorbis audio stream, per link) 36 37\end{itemize} 38 39 40This is not to say that it is not currently possible to multiplex 41Vorbis with other media types into a multi-stream Ogg file. At the 42time this document was written, Ogg was becoming a popular container 43for low-bitrate movies consisting of DivX video and Vorbis audio. 44However, a 'Vorbis I audio file' is taken to imply Vorbis audio 45existing alone within a degenerate Ogg stream. A compliant 'Vorbis 46audio player' is not required to implement Ogg support beyond the 47specific support of Vorbis within a degenrate Ogg stream (naturally, 48application authors are encouraged to support full multiplexed Ogg 49handling). 50 51 52 53 54\subsubsection{MIME type} 55 56The MIME type of Ogg files depend on the context. Specifically, complex 57multimedia and applications should use \literal{application/ogg}, 58while visual media should use \literal{video/ogg}, and audio 59\literal{audio/ogg}. Vorbis data encapsulated in Ogg may appear 60in any of those types. RTP encapsulated Vorbis should use 61\literal{audio/vorbis} + \literal{audio/vorbis-config}. 62 63 64\subsection{Encapsulation} 65 66Ogg encapsulation of a Vorbis packet stream is straightforward. 67 68\begin{itemize} 69 70\item 71 The first Vorbis packet (the identification header), which 72 uniquely identifies a stream as Vorbis audio, is placed alone in the 73 first page of the logical Ogg stream. This results in a first Ogg 74 page of exactly 58 bytes at the very beginning of the logical stream. 75 76 77\item 78 This first page is marked 'beginning of stream' in the page flags. 79 80 81\item 82 The second and third vorbis packets (comment and setup 83 headers) may span one or more pages beginning on the second page of 84 the logical stream. However many pages they span, the third header 85 packet finishes the page on which it ends. The next (first audio) packet 86 must begin on a fresh page. 87 88 89\item 90 The granule position of these first pages containing only headers is zero. 91 92 93\item 94 The first audio packet of the logical stream begins a fresh Ogg page. 95 96 97\item 98 Packets are placed into ogg pages in order until the end of stream. 99 100 101\item 102 The last page is marked 'end of stream' in the page flags. 103 104 105\item 106 Vorbis packets may span page boundaries. 107 108 109\item 110 The granule position of pages containing Vorbis audio is in units 111 of PCM audio samples (per channel; a stereo stream's granule position 112 does not increment at twice the speed of a mono stream). 113 114 115\item 116 The granule position of a page represents the end PCM sample 117 position of the last packet \emph{completed} on that 118 page. The 'last PCM sample' is the last complete sample returned by 119 decode, not an internal sample awaiting lapping with a 120 subsequent block. A page that is entirely spanned by a single 121 packet (that completes on a subsequent page) has no granule 122 position, and the granule position is set to '-1'. 123 124 125 Note that the last decoded (fully lapped) PCM sample from a packet 126 is not necessarily the middle sample from that block. If, eg, the 127 current Vorbis packet encodes a "long block" and the next Vorbis 128 packet encodes a "short block", the last decodable sample from the 129 current packet be at position (3*long\_block\_length/4) - 130 (short\_block\_length/4). 131 132 133\item 134 The granule (PCM) position of the first page need not indicate 135 that the stream started at position zero. Although the granule 136 position belongs to the last completed packet on the page and a 137 valid granule position must be positive, by 138 inference it may indicate that the PCM position of the beginning 139 of audio is positive or negative. 140 141 142 \begin{itemize} 143 \item 144 A positive starting value simply indicates that this stream begins at 145 some positive time offset, potentially within a larger 146 program. This is a common case when connecting to the middle 147 of broadcast stream. 148 149 \item 150 A negative value indicates that 151 output samples preceeding time zero should be discarded during 152 decoding; this technique is used to allow sample-granularity 153 editing of the stream start time of already-encoded Vorbis 154 streams. The number of samples to be discarded must not exceed 155 the overlap-add span of the first two audio packets. 156 157 \end{itemize} 158 159 160 In both of these cases in which the initial audio PCM starting 161 offset is nonzero, the second finished audio packet must flush the 162 page on which it appears and the third packet begin a fresh page. 163 This allows the decoder to always be able to perform PCM position 164 adjustments before needing to return any PCM data from synthesis, 165 resulting in correct positioning information without any aditional 166 seeking logic. 167 168 169 \begin{note} 170 Failure to do so should, at worst, cause a 171 decoder implementation to return incorrect positioning information 172 for seeking operations at the very beginning of the stream. 173 \end{note} 174 175 176\item 177 A granule position on the final page in a stream that indicates 178 less audio data than the final packet would normally return is used to 179 end the stream on other than even frame boundaries. The difference 180 between the actual available data returned and the declared amount 181 indicates how many trailing samples to discard from the decoding 182 process. 183 184\end{itemize} 185