1% -*- mode: latex; TeX-master: "Vorbis_I_spec"; -*-
2%!TEX root = Vorbis_I_spec.tex
3\section{Embedding Vorbis into an Ogg stream} \label{vorbis:over:ogg}
4
5\subsection{Overview}
6
7This document describes using Ogg logical and physical transport
8streams to encapsulate Vorbis compressed audio packet data into file
9form.
10
11The \xref{vorbis:spec:intro} provides an overview of the construction
12of Vorbis audio packets.
13
14The \href{oggstream.html}{Ogg
15bitstream overview} and \href{framing.html}{Ogg logical
16bitstream and framing spec} provide detailed descriptions of Ogg
17transport streams. This specification document assumes a working
18knowledge of the concepts covered in these named backround
19documents.  Please read them first.
20
21\subsubsection{Restrictions}
22
23The Ogg/Vorbis I specification currently dictates that Ogg/Vorbis
24streams use Ogg transport streams in degenerate, unmultiplexed
25form only. That is:
26
27\begin{itemize}
28 \item
29  A meta-headerless Ogg file encapsulates the Vorbis I packets
30
31 \item
32  The Ogg stream may be chained, i.e., contain multiple, contigous logical streams (links).
33
34 \item
35  The Ogg stream must be unmultiplexed (only one stream, a Vorbis audio stream, per link)
36
37\end{itemize}
38
39
40This is not to say that it is not currently possible to multiplex
41Vorbis with other media types into a multi-stream Ogg file.  At the
42time this document was written, Ogg was becoming a popular container
43for low-bitrate movies consisting of DivX video and Vorbis audio.
44However, a 'Vorbis I audio file' is taken to imply Vorbis audio
45existing alone within a degenerate Ogg stream.  A compliant 'Vorbis
46audio player' is not required to implement Ogg support beyond the
47specific support of Vorbis within a degenrate Ogg stream (naturally,
48application authors are encouraged to support full multiplexed Ogg
49handling).
50
51
52
53
54\subsubsection{MIME type}
55
56The MIME type of Ogg files depend on the context.  Specifically, complex
57multimedia and applications should use \literal{application/ogg},
58while visual media should use \literal{video/ogg}, and audio
59\literal{audio/ogg}.  Vorbis data encapsulated in Ogg may appear
60in any of those types.  RTP encapsulated Vorbis should use
61\literal{audio/vorbis} + \literal{audio/vorbis-config}.
62
63
64\subsection{Encapsulation}
65
66Ogg encapsulation of a Vorbis packet stream is straightforward.
67
68\begin{itemize}
69
70\item
71  The first Vorbis packet (the identification header), which
72  uniquely identifies a stream as Vorbis audio, is placed alone in the
73  first page of the logical Ogg stream.  This results in a first Ogg
74  page of exactly 58 bytes at the very beginning of the logical stream.
75
76
77\item
78  This first page is marked 'beginning of stream' in the page flags.
79
80
81\item
82  The second and third vorbis packets (comment and setup
83  headers) may span one or more pages beginning on the second page of
84  the logical stream.  However many pages they span, the third header
85  packet finishes the page on which it ends.  The next (first audio) packet
86  must begin on a fresh page.
87
88
89\item
90  The granule position of these first pages containing only headers is zero.
91
92
93\item
94  The first audio packet of the logical stream begins a fresh Ogg page.
95
96
97\item
98  Packets are placed into ogg pages in order until the end of stream.
99
100
101\item
102  The last page is marked 'end of stream' in the page flags.
103
104
105\item
106  Vorbis packets may span page boundaries.
107
108
109\item
110  The granule position of pages containing Vorbis audio is in units
111  of PCM audio samples (per channel; a stereo stream's granule position
112  does not increment at twice the speed of a mono stream).
113
114
115\item
116  The granule position of a page represents the end PCM sample
117  position of the last packet \emph{completed} on that
118  page.  The 'last PCM sample' is the last complete sample returned by
119  decode, not an internal sample awaiting lapping with a
120  subsequent block.  A page that is entirely spanned by a single
121  packet (that completes on a subsequent page) has no granule
122  position, and the granule position is set to '-1'.
123
124
125  Note that the last decoded (fully lapped) PCM sample from a packet
126  is not necessarily the middle sample from that block. If, eg, the
127  current Vorbis packet encodes a "long block" and the next Vorbis
128  packet encodes a "short block", the last decodable sample from the
129  current packet be at position (3*long\_block\_length/4) -
130  (short\_block\_length/4).
131
132
133\item
134    The granule (PCM) position of the first page need not indicate
135    that the stream started at position zero.  Although the granule
136    position belongs to the last completed packet on the page and a
137    valid granule position must be positive, by
138    inference it may indicate that the PCM position of the beginning
139    of audio is positive or negative.
140
141
142  \begin{itemize}
143    \item
144        A positive starting value simply indicates that this stream begins at
145        some positive time offset, potentially within a larger
146        program. This is a common case when connecting to the middle
147        of broadcast stream.
148
149    \item
150        A negative value indicates that
151        output samples preceeding time zero should be discarded during
152        decoding; this technique is used to allow sample-granularity
153        editing of the stream start time of already-encoded Vorbis
154        streams.  The number of samples to be discarded must not exceed
155        the overlap-add span of the first two audio packets.
156
157  \end{itemize}
158
159
160    In both of these cases in which the initial audio PCM starting
161    offset is nonzero, the second finished audio packet must flush the
162    page on which it appears and the third packet begin a fresh page.
163    This allows the decoder to always be able to perform PCM position
164    adjustments before needing to return any PCM data from synthesis,
165    resulting in correct positioning information without any aditional
166    seeking logic.
167
168
169  \begin{note}
170    Failure to do so should, at worst, cause a
171    decoder implementation to return incorrect positioning information
172    for seeking operations at the very beginning of the stream.
173  \end{note}
174
175
176\item
177  A granule position on the final page in a stream that indicates
178  less audio data than the final packet would normally return is used to
179  end the stream on other than even frame boundaries.  The difference
180  between the actual available data returned and the declared amount
181  indicates how many trailing samples to discard from the decoding
182  process.
183
184\end{itemize}
185