1\documentclass{spec} 2\usepackage[pdftex]{graphicx} 3\newcommand{\syntax}[1]{ 4 5 \subsubsection*{Syntax} 6 7 \begin{tabbing} 8 9 \hspace{2cm}\=\\[-16pt] 10 11 #1 12 13 \end{tabbing} 14 15} 16\newcommand{\secspec}[1]{Section:\>\texttt{#1}} 17\newcommand{\secspecs}[2]{Sections:\>\texttt{#1}, \texttt{#2}} 18\newcommand{\HRule}{\rule{\linewidth}{0.5mm}} 19 20 21\begin{document} 22\title{AS5 Subtitle Format Draft} 23\author{Rodrigo Braz Monteiro, Niels Martin Hansen, David Lamparter, Karl Blomster} 24 25\begin{titlepage} 26\begin{center} 27 28\vspace*{3cm} 29 30\HRule \\[0.5cm] 31\textsc{\huge AS5 Subtitle Format}\\ 32\HRule \\[1.1cm] 33{\large By Rodrigo Braz Monteiro, Niels Martin Hansen, David Lamparter and Karl Blomster}\\[0.3cm] 34This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License.\\ 35\vfill 36 37\begin{minipage}{0.4\textwidth} 38\begin{flushleft} \large 39\includegraphics[width=0.7\textwidth]{./aegisub} 40\end{flushleft} 41\end{minipage} 42\begin{minipage}{0.4\textwidth} 43\begin{flushright} \large 44\includegraphics[width=0.6\textwidth]{./asa} 45\end{flushright} 46\end{minipage}\\[1.5cm] 47 48{\large \today} 49 50\end{center} 51\end{titlepage} 52 53\setlength{\parskip}{0pt} 54\tableofcontents 55\newpage 56\setlength{\parskip}{8pt} 57 58 59\section{Abstract} 60This document specifies the \emph{AS5 Subtitle Format}, developed jointly by the 61Aegisub\cite{Aegisub} and asa\cite{asa} teams in order to replace the old 62\emph{Sub Station Alpha}\cite{SSA} subtitle format and its extensions: 63 64\begin{itemize} 65\item Advanced Sub Station Alpha (ASS) implemented by Gabest in VSFilter\cite{VSFilter} 66\item Advanced Sub Station Alpha 2 (ASS2), also implemented by Gabest in VSFilter 67\item Advanced Sub Station Alpha 3 (ASS3) implemented by equinox in asa. 68\end{itemize} 69 70The goal is to create a flexible, easy to understand and powerful subtitle format 71that can be used in hardsubs or multiplexed into Matroska Video\cite{mkv} files as 72softsubs. The syntax is heavily influenced by the older SSA and ASS formats, which in 73turn vaguely resemble the TeX typesetting language; but AS5 also has many differences 74compared to these older formats and you should not expect it to behave exactly like them. 75 76AS5 has no official meaning. The ``A'' can stand for Aegisub, asa, ASS or Advanced, 77the ``S'' for Subtitles, and the 5 is a reference to the fact that it's a major 78rework of the SSA4 format (from which ASS, ASS2 and ASS3 derive). The full 79name of the format is ``AS5 Subtitle Format''. 80 81 82\newpage 83\section{AS5 Files} 84\subsection{File Format} 85All AS5 files are \emph{REQUIRED} to comply with the three requirements below: 86 87\begin{itemize} 88\item Be encoded with one of \emph{UTF-8}\cite{UTF-8}, \emph{UTF-16 Big Endian} 89\cite{UTF-16} or \emph{UTF-16 Little Endian} Unicode Transformation Formats. UTF-8 is 90preferred. 91\item Not to have any character below Unicode code point U+20, except for U+09, U+0A, U+0D. 92That is, it must be a plain-text file. 93\item All lines must end with Windows line endings, that is, U+0D followed by U+0A. 94\end{itemize} 95 96These requirements are important so the AS5 format can be edited in most plain-text editors 97across most operating systems and languages without problems. The character set of a 98subtitle file can be autodetermined by its Byte-Order Mark or by the value of the first 99two bytes. See below. 100 101When used as a standalone file, the extension should be \textsc{.as5}. When multiplexed 102into a Matroska container, the Codec ID used is \textsc{S\_TEXT/AS5}. 103 104\todo{Get clearance from the Matroska team to use that Codec ID.} 105 106 107\subsection{File Structure} 108The file is divided in \emph{sections}, which are uniquely identified by a string inside 109square brackets, in a line of its own. From that point on, every next line is considered 110to be part of the last found section until another section is found. There is no end-of-section 111termination mark; they always end at the start of the next one or at the end of the file. 112There \must\ only be one and only one of each section; if the parser finds two lines containing 113the same section header, it \must\ reject the file as invalid. \emph{Section names are case sensitive.} 114 115Each section is divided in lines, each line representing one command or definition. Empty 116lines (that is, lines only containing a line ending) \must\ be ignored by the parser. 117It is recommended that programs generating AS5 files insert a blank line at the end of each 118section to increase readability. There \must\ always be a blank line at the end of the file 119(as every line is required to end in a line break). 120 121Each line in a section takes the general form of \textit{Type: data1,data2,...,dataN}. An 122unknown \textit{Type} \must\ be ignored by a parser. Subtitle editing programs \should\ keep 123such ignored lines in the file after re-saving it. Note that the space after the colon is \emph{mandatory}. 124 125There are two sections which are required, \emph{[AS5]} and \emph{[Events]}, the former being 126the equivalent of \emph{[Script Info]} in previous formats. If either of those sections is 127missing, the file is invalid and \must\ be rejected by the parser. Any other section 128can be ommitted from the file, and need not be implemented by all parsers. 129 130Finally, there is a special type of undefined group, \emph{[Private:PROGNAME]}, which 131\must\ be \emph{ENTIRELY} preserved by other programs when re-saving it. This is used to 132store program-specific data. For example, Aegisub would create a group called 133\emph{[Private:Aegisub]} to store its data inside. This type of group is identified 134by the fact that it starts with \emph{``[Private:''}. 135 136Additionally, private data may also be stored in any other section by using commented-out 137lines: any line where the first character is a semicolon (\textit{;} - U+3B) is considered a 138"comment line" and \must\ be ignored by the parser; they also \must\ be preserved by an editing program 139when resaving. It is suggested that an editing program \should\ check whether commented lines are 140actually valid AS5 lines, and if they are, display them to the user in some way as "disabled" lines. 141Note that commented out lines \mustnot\ influence subtitle rendering in any way. 142 143The sections \may\ be written in any order, with the exception of the \emph{[AS5]} section which 144\must\ always be the first section. 145 146In general, malformed lines in AS5 (such as unrecognized lines, lines with missing fields, fields 147with invalid data for its type (for example, malformed timestamps) or unrecognized section headers) 148are not considered fatal syntax errors. If nothing else is explicitly specified, the renderer \must\ 149ignore such lines completely, and the parser \should\ emit a warning describing the syntax error. The 150spirit of this rule to be forgiving; something that doesn't make the entire file unuseable or dangerously 151ambigous should not be a fatal syntax error. It is usually better to render the valid parts of the file 152correctly and tell the user about the problematic lines by the way of warning messages. Under certain 153circumstances it may be desirable to suppress warning messages; a well-behaved parser \should\ include 154an option to do so, but in general it is probably more useful to let the user know about the problem 155instead of just silently failing to render the line. 156 157\subsubsection{[AS5]} 158This \must\ be the first section in every AS5 file. If the very first line of the file is not 159[AS5], the file \must\ be rejected by the parser as invalid. Note, however, that the first 160line is allowed to contain a Byte-Order Mark (BOM), which is the character U+FEFF encoded in 161the encoding used for the rest of the script\cite{Unicode BOM}. The first four bytes will therefore be: 162 163\begin{itemize} 164\item 0xEF 0xBB 0xBF 0x5B - UTF-8 (with BOM) 165\item 0x5B 0x41 0x53 0x53 - UTF-8 (without BOM) 166\item 0xFF 0xFE 0x5B 0x00 - UTF-16 LE (with BOM) 167\item 0x5B 0x00 0x41 0x00 - UTF-16 LE (without BOM) 168\item 0xFE 0xFF 0x00 0x5B - UTF-16 BE (with BOM) 169\item 0x00 0x5B 0x00 0x41 - UTF-16 BE (without BOM) 170\end{itemize} 171 172It is possible, therefore, to determine the encoding of the file by checking its first two bytes. 173 174This section is used to declare several script properties that affect its parsing and rendering. 175All properties are stored in the format \textit{Name: data}, with one property per line. 176 177This section \must\ always declare the following properties (a file that is missing one of them is not valid): 178 179\begin{itemize} 180\item ScriptType: Should always be set to \textit{AS5}, for this particular version of the specification. 181An unrecognized ScriptType value is considered a fatal syntax error, and \must\ cause the parser to 182reject the entire file as invalid. 183\item Resolution: Should contain the script resolution in \textit{WxH} format. For example, for a 640x480 184script, this should say \textit{``Resolution: 640x480''}. Note that this does not need to correspond to the 185video resolution, however, subtitles \must\ be rendered on such a coordinate space. That is, in a 186640x480 script, \textbackslash{pos(320,240)} always represents the center of the script, no matter the 187resolution of the video it's being drawn on. Also, in a 100x100 script, a radius 50 circle centered on 188the center will always take half of the height and half of the width of the video, even if that means 189being distorted if drawn on a video with a non-1:1 aspect ratio (for example, a 640x480 video). 190An unrecognized or malformed Resolution value is considered a fatal syntax error, and \must\ cause the parser 191to reject the entire file as invalid. 192\end{itemize} 193 194The following items \may\ also be used; they are not required, but are recommended. They all have default values: 195 196\begin{itemize} 197\item Generator: The name of the program that generated this script, e.g. \textit{``Generator: Aegisub''}. 198Default value is empty. This should be ignored by the renderer, but might be useful for inter-editing-program 199interaction. 200\item Wrapping: The line wrapping style. This can be ``Manual'', in which case only \textbackslash{n} can 201break lines or ``Automatic'', in which the renderer chooses how to break them. If this is not set, or if the 202value set is not recognized, the renderer \must\ default to ``Automatic''. 203Even if it is set to Automatic, \textbackslash{n} will still insert a forced line break. 204On the other hand, if set to manual, the line can NEVER be broken at anywhere other than forced line breaks, 205even if it means that the line will become unreadable because it goes outside the display area. 206This property is not case sensitive. 207\item Extensions: A comma-separated list of all extensions being used in this file. At the moment, there are 208no extensions available. Renderers should read this to enable any extensions that they might support. 209Editing programs \must\ keep this field intact, unless the user chooses otherwise. Scripts WILL break 210if the list of extensions is suddenly lost. 211\item Credits: Credits for the people who worked on this subtitle file. Purely for informational purposes and 212\should\ be ignored by the renderer. Subtitling programs \should\ be able to display these credits to the user. 213\item Title: The title of this script. Purely for informational purposes and \should\ be ignored by the renderer. 214Subtitling programs \should\ be able to display this title to the user. 215\end{itemize} 216 217Unlike in the previous incarnations of the format, storing private properties here is strongly discouraged, 218which means that this section \shouldnot\ contain any properties not listed here. It \may\, just like any other 219section, contain commented-out lines prefixed with a semicolon (;) which of course may contain anything, but it 220is strongly recommended that any application-specific or otherwise private data \should\ be stored in the 221\textit{[Private:PROGNAME]} section instead, as mentioned above, or if it is line-specific data, in the User field. 222 223 224\subsubsection{[Events]} 225 226The most important section, [Events], lists all the actual subtitle lines in the file. The syntax has 227been radically simplified from previous incarnations of the format, and now consist of only five fields. 228Each line is represented as: 229 230\begin{verbatim} 231Line: start,end,style,user,content 232\end{verbatim} 233 234Where: 235 236\begin{itemize} 237\item Start: The start time of the line. See below for the timestamp format. A line is only displayed if 238the timestamp of the current frame is \emph{greater than or equal} to the start time. That is, start 239time is \emph{inclusive}. 240\item End: The end time of the line. It follows the same format as the start time. The line is only 241displayed if the timestamp of the current frame is \emph{lesser than} the end time. That is, end time is 242\emph{exclusive}. In particular, it means that a line whose start time is equal to its end time will 243never be displayed. If the end time is earlier than the start time, the renderer \should\ issue a warning, 244but this is not considered a fatal syntax error and it \should\ render the remaining lines regardless of the issue. 245If the end time is earlier than the start time, it should for rendering purposes be considered to be equal to 246the start time, and editing programs \may\ automatically reset the end time to be equal to the start time. 247\item Style: The name of the default style used for this line. See the [Style] section below. If left blank, 248the script's global default style \must\ be used. If there is no default style defined, or if an unknown 249style name is specified, the renderer \must\ fallback to its own defaults (see below), and \should\ issue a warning. 250\item User: This field is used by the program to store program-specific data in each line. Renderers 251\should\ ignore this (but \may\ use it for application-specific extension features). This field \should\ 252be left empty if it's not used. Note that whatever data is stored here \mustnot\ contain any commas! 253 254It is suggested that text in the User field is encoded with the following scheme: The characters 2550x00 to 0x1F (control codes), 0x23 (number sign), 0x2C (comma), 0x3A (colon) and 0x7C (pipe) 256are replaced with a number sign (0x23) followed by the hexadecimal code for the character, for example 257a comma is replaced with ``\#2C''. This scheme allows the field to contain several sub-fields separated 258with pipe characters, optionally using a ``Name:Value'' format. 259\item Content: The actual text of the line. This contains actual text and override tags. See the section 260on override tags for more information. 261\end{itemize} 262 263The timestamp format is h...h:mm:ss[.s...], that is, it begins with an integer of arbitrary length 264(up to a maximum of 4 digits) representing the number of hours, followed by a one-digit or two-digit integer 265representing minutes, and a floating point number representing seconds. Leading zeroes \may\ be ommitted. 266Localization is irrelevant: a period (``.'') is always used to separate the decimal point. This way, 2670:21:42.5 and 0000:21:42.5000 are equivalent, and both represent 0 hours, 21 minutes, 42 seconds and 500 miliseconds. 268 269Spaces between each field \must\ be ignored by the parser. Any spaces at the beginning of the 270content line \should\ be stripped by any editing program. A hard space (see the overrides section) or empty 271override block should be used if space at the start of a line is truly desirable. That is, the two 272following lines are syntactically identical: 273 274\begin{verbatim} 275Line: 0:2:31.57 , 0:02:34.22 , , , Hello world of {\b1}AS5{\b0}! 276Line: 0:02:31.570,00:02:34.22,,,Hello world of {\b1}AS5{\b0}! 277\end{verbatim} 278 279 280\subsubsection{[Styles]} 281 282This is equivalent to the \emph{[V4 Styles]} (and subsequent variations) from the Sub Station Alpha format. 283Like \emph{[Events]}, it has been greatly simplified when compared to the previous formats, and now 284each entry contains only three fields. They are declared as: 285 286\begin{verbatim} 287Style: name,parent,overrides 288\end{verbatim} 289 290Where: 291 292\begin{itemize} 293\item Name: The name of this style. Style names are not case-sensitive, but \must\ be unique. A 294script with conflicting style names \must\ be rejected by the parser. If the style name is ``Default'', it 295will be used for all lines that omit the style name. If there is no ``Default'' line, the renderer 296default is used. 297\item Parent: The style from which the current style derives from. See below for more information. 298Leaving this field blank means that the style derives from the renderer's default style. 299\item Overrides: A list of override tags to define this style. See below. 300\end{itemize} 301 302Styles work in a very different way from the way they did on previous formats (with the notable exception 303of ASS3, which actually implements this very same style based on this format, as ``StyleEx''). 304Instead of setting multiple parameters across many commas, you simply specify override tags. When a line 305uses a style, it's as if the overrides of the style were inserted right before the start of the line 306contents, with one exception: certain tags without parameters revert to the style default. For example, 307\textbackslash 1c will revert the primary colour to the one specified in style. Such use of tags is invalid 308in the style definition, and \must\ be ignored if found in them; the parser \may\ choose to emit a warning. 309 310Also, a style can inherit from another style, and define new overrides which are then appended to those 311of the parent style. The parent style \must\ have been declared \emph{BEFORE} the style trying to use 312it as a parent. If the parent doesn't exist or wasn't declared yet, the parser must refuse to parse the 313script. This is important because otherwise you could get a ``inheritance loop'', where styles derive from 314each other in a cycle. 315 316For example, see the following \emph{[Styles]} group: 317 318\begin{verbatim} 319[Styles] 320Style: Default,,\fn(Arial)\fs20 321Style: Speech,,\fn(Respublica)\fs24\bord2\shad2\4a#80\2c#000000 322Style: Actor1,Speech,\1c#B9C5E3 323Style: Actor2,Speech,\1c#FFB3CF 324Style: UglinessItself,Default,\fn(Comic Sans MS) 325\end{verbatim} 326 327In the above fragment, the first style defines the Default style that will be used on all lines that 328don't set any style and the second style defines a base speech style that will be used for all actors 329(note that it doesn't inherit from Default, even though Default overrode the renderer's default, that 330one is still used for style definitions.) 331 332The third and fourth styles are based on the second, and simply assign different colours to it. They 333will both have all properties of Speech, and only differ in primary colour. Finally, the last example 334shows how to derive from the overriden default. In this case, font size would be 20 points, regardless 335of renderer's default. 336 337The two Actor styles could have been defined without a parent style as follows: 338 339\begin{verbatim} 340[Styles] 341Style: Actor1,,\fn(Respublica)\fs24\bord2\shad2\4a#80\2c#000000\1c#B9C5E3 342Style: Actor2,,\fn(Respublica)\fs24\bord2\shad2\4a#80\2c#000000\1c#FFB3CF 343\end{verbatim} 344 345Since all that deriving a style from another does is append the new tags to the end of the previous, 346this way of declaring styles is identical to the one above, but is more verbose. 347 348\todo{This is bad, we need to fix it with specified defaults to get consistent rendering} 349If no Default style is defined, the renderer \must\ choose its own defaults to render the text with. 350The defaults \must\ also be used any for any properties not specified in a given style (in other words, 351styles with no parent inherit from the renderer defaults). To ensure consistent rendering while still 352avoiding having to explicitly define every single property, some of these defaults are mandatory and 353specified below; some others have recommended values, also specified below, but a well-featured renderer 354\may\ allow the user to change these defaults at will. 355 356The following default overrides are mandatory and \must\ be set as following: 357\begin{itemize} 358\item \textbackslash i(0) 359\item \textbackslash b(0) 360\item \textbackslash u(0) 361\item \textbackslash s(0) 362\item \textbackslash fe(Unicode) 363\item \textbackslash bordstyle(0) 364\item \textbackslash fscx(100) 365\item \textbackslash fscy(100) 366\item \textbackslash fsp() - undefined (font default) 367\item \textbackslash fsvp() - undefined (font default) 368\item \textbackslash 1a(\#00) 369\item \textbackslash 2a(\#00) 370\item \textbackslash 3a(\#00) 371\item \textbackslash 4a(\#80) 372\item \textbackslash left(12) 373\item \textbackslash right(12) 374\item \textbackslash top(12) 375\item \textbackslash bottom(12) 376\item \textbackslash ax(50) 377\item \textbackslash ay(100) 378\item \textbackslash nx(50) 379\item \textbackslash ny(100) 380\item \textbackslash rel(0) 381\item \textbackslash vertical(0) 382\item \textbackslash q(1) 383\item \textbackslash pos() - undefined (defined by alignment, margins and script resolution) 384\item \textbackslash org() - undefined (defined by alignment, margins and script resolution) 385\item \textbackslash bls(0) 386\item \textbackslash frx(0) 387\item \textbackslash fry(0) 388\item \textbackslash frz(0) 389\item \textbackslash fax(0) 390\item \textbackslash fay(0) 391\item \textbackslash fad(0,0) 392\item \textbackslash distort() - undefined (none) 393\item \textbackslash baseline() - undefined (none) 394\item \textbackslash blpos(0) 395\item \textbackslash vc() - undefined (none) 396\item \textbackslash blend(normal) 397\item \textbackslash clip() - undefined (none) 398\item \textbackslash iclip() - undefined (none) 399\item \textbackslash \$blur(0) 400\end{itemize} 401 402\subsubsection{[Resources]} 403 404The new \emph{[Resources]} section can be used to store information on external file resources, 405such as images and fonts. The general syntax is: 406 407\begin{verbatim} 408Resource: type,name,path 409\end{verbatim} 410 411Where: 412 413\begin{itemize} 414\item Type: Must be either ``font'' or ``image''. Any other types \must\ be ignored by the parser. 415\item Name: An unique name identifying this resource. For fonts, it must correspond to the font 416name, e.g., ``Verdana''. For images, it's the name that the file will be reffered as in the rest 417of the script. If there is already a resource with this same name, the parser \must\ abort the 418parsing. 419\item Path: The location of the file relative to the subtitles. This \must\ be a relative path 420for external .as5 files, or a container-specific string for AS5 multiplexed into a container. 421The relative path \must\ use forward slashes and be case-sensitive, in order to avoid UNIX 422compatibility issues. 423\end{itemize} 424 425 426\newpage 427\section{Style Overrides} 428 429\subsection{General Information on Override Tags} 430As with previous formats, AS5 uses override tags to set the style for lines. Also, it uses those 431same tags to set style definitions themselves (see above). Although many tags were imported from 432\emph{Advanced Sub Station Alpha}, do not assume that they behave exactly the same. Some had their 433behavior changed or properly defined. Also, AS5 defines many new tags in addition to the old ones. 434 435All tags must be inserted between a pair of curly brackets (\emph{\{\}}), except on style definitions. 436A pair can contain any number of override tags inside it. They should be listed one after the other, 437with no spaces or any other kind of separator between them. Tags then affect all text that follows 438it, unless re-overriden or reset by the \emph{\textbackslash r} tag. For example: 439 440\begin{verbatim} 441{\fn(Verdana)\fs26\c#FFA040}Welcome to {\b1}AS5{\b0}! 442\end{verbatim} 443 444In the above example, the first override block affects the entire text, but only ``AS5'' is bolded. 445 446Some tags begin with a \$ in their names. This means that there are actually five variations 447of this specific tag, the tag with \$ replaced with a number from \emph{1} to \emph{4} (inclusive) 448or without it altogether - in that case, the tag is assumed to mean the \emph{1} variation. Those 449numbers represent the four different colours available on any given line (see below). If no number 450is specified, the tag will affect all 4 colours. The 4 colurs are: 451 452\begin{itemize} 453\item 1 - Primary colour, used for the main face of the text. 454\item 2 - Secondary colour, used on karaoke. See the karaoke tags for more information. 455\item 3 - Border colour. This is the colour of the border that outlines the text. See the \textbackslash 456bord tag for more information. 457\item 4 - Shadow colour. This is the colour of the shadow dropped by the text. See the \textbackslash 458shad tag for more information. 459\end{itemize} 460 461So, for example, you would use \textbackslash 1c to set the primary colour, or \textbackslash 3c to set 462the colour of the border. \textbackslash \$c, however, does not exist in itself. 463 464When a tag requires a floating point parameter, the decimal part \must\ be specified using a period (.); 465never a comma. When a tag requires a colour parameter, it is given in HTML hexadecimal code, which is 466\# followed by a 6-digit hexadecimal string, where the first two digits represent the red component, 467the next two the green component, and the last two the blue component (\#RRGGBB). Sub Station Alpha 468style (Visual Basic hexadecimal) is not supported. 469 470In the tag specification in this document, optional parameters are denoted by being enclosed by square 471brackets (``[]''), and may be ommitted. For example, \emph{\textbackslash baseline(curve1[,curve2])} 472means that the second parameter is entirely optional. It's also possible that the entire parameter set 473is enclosed in square brackets, e.g. \emph{\textbackslash vc[(c1,c2,c3,c4)]}. 474 475The parameters of a tag \must\ be enclosed within parantheses, with exception for tags with only one numerical 476parameter, for which the parantheses \may\ be omitted. 477 478All tags \must\ start with a backslash (\textbackslash ). If an override block (a pair of curly brackets) 479or any tag starts with anything else than a backslash, it is considered a syntax error and the parser \must\ 480ignore the block or tag and \should\ emit a warning (see the section "Invalid or Malformed Tags and Syntax Errors" 481below). Thus it is not possible, as it was in earlier formats, to hide inline comments inside normal override blocks. 482There is, however a special kind of comment block that can be used for this. Any curly opening brace that is 483immediately followed by an exclamation mark (!) starts a comment block (ending with a matching closing curly brace), 484the contents of which \must\ be ignored by the parser and the renderer. 485For example: 486 487\begin{verbatim} 488{\fn(Verdana)\fs26\c#FFA040}Welcome to {\b1}AS5{\b0}!{!It's a nifty format, isn't it?} 489\end{verbatim} 490 491 492\subsection{Invalid or Malformed Tags and Syntax Errors} 493Any override tag (excluding the special character escape) that meets any of the following conditions: 494\begin{itemize} 495\item - is not specified in this document (that is, tags not present in the standard or just simply 496misspelled variants of existing tags) 497\item - does not start with a backslash 498\item - is found outside an override block (that is, not within curly braces) 499\item - is missing parantheses where they should be present, or is missing a matching opening/closing paranthesis 500\item - has arguments not matching those expected by the parser 501\end{itemize} 502is considered \emph{invalid} or \emph{malformed}. Invalid or malformed tags are syntax errors, and the renderer 503\must\ ignore them. The parser \should\ also emit warnings about these errors, although it should be noted that 504under certain circumstances it may be desirable to suppress warnings. The parser \should\ include an option to do so. 505 506Any curly brace (start/end of an override block) which is missing its matching pair is also a syntax error; the 507resulting line \must\ be drawn as if it was just plain text without the override block. Naturally, the parser 508\should\ warn about this. 509 510\todo{Finish this} 511 512 513\subsection{Vector Path Format} 514\todo{Write me} 515 516 517\todo{Write detailed descriptions for all the override tags} 518 519\subsection{Special Character Escapes} 520The following tags are not considered override tags, but rather escape codes for special characters. They 521\mustnot\ be inside an override block, but only in the middle of the text (i.e. not between \{ and \}). 522 523 524\subsubsection{\textbackslash n} 525\textbf{Usage:} 526\begin{verbatim} 527Line 1\nLine 2 528\end{verbatim} 529 530\textbf{Description:} 531Inserts a forced line break. 532 533\todo{Should the presence of a forced line break in a line disable automatic line breaking for that line?} 534 535\subsubsection{\textbackslash h} 536\textbf{Usage:} 537\begin{verbatim} 538Word1\hWord2 539\end{verbatim} 540 541\textbf{Description:} 542Inserts a ``hard'' space. This is equivalent to Unicode character U+00A0 No-Break Space, but script authors 543are recommended to use \textbackslash h over U+00A0 since U+00A0 can visually easily be mistaken for a regular 544space character. 545 546\subsubsection{\textbackslash \{, \textbackslash \}} 547\textbf{Usage:} 548\begin{verbatim} 549Text \textbackslash \{inside curly braces\textbackslash \} 550\end{verbatim} 551 552\textbf{Description:} 553Insert respectively literal \{ and \} into the rendered output. 554 555\subsubsection{\textbackslash \textbackslash} 556\textbf{Usage:} 557\begin{verbatim} 558A \textbackslash \textbackslash\ (backslash) 559\end{verbatim} 560 561\textbf{Description:} 562Insert a literal \textbackslash\ into the rendered output. 563 564 565\subsection{Basic Typography Tags} 566 567\subsubsection{\textbackslash i} 568\textbf{Usage:} 569\begin{verbatim} 570\i(1) 571\i(0) 572\end{verbatim} 573 574\textbf{Description:} 575Enable (parameter 1) or disable (parameter 0) italics font style. If the selected font face does not 576have a native italics variation, a simulated italics style \must\ be used. If the selected font face 577does not have a non-italics variation, the italics vatiation \must\ be used even when \textbackslash i(0) 578is specified. 579 580\subsubsection{\textbackslash b} 581\textbf{Usage:} 582\begin{verbatim} 583\b(1) 584\b(0) 585\end{verbatim} 586 587\textbf{Description:} 588Enable (parameter 1) or disable (parameter 0) boldface font style. If the selected font face does not 589have a native boldface variation, a simulated boldface \must\ be used. If the selected font face 590does not have a non-boldface variation, the boldface variation \must\ be used even when \textbackslash b(1) 591is specified. 592 593AS5 does not support specifying a specific font weight with \textbackslash b and any other parameter 594than 0 or 1 (zero or one) is an error. To specify a specific weight version of a font that has more 595than two weight variations, the textual name of the weight variation must be specified with the 596\textbackslash fn override. 597 598\subsubsection{\textbackslash u} 599\textbf{Usage:} 600\begin{verbatim} 601\u(1) 602\u(0) 603\end{verbatim} 604 605\textbf{Description:} 606Add an underline decoration to the text (parameter 1) or not (parameter 0.) The underline is a straight 607line parallel to the text baseline, placed slightly below the baseline. 608 609\subsubsection{\textbackslash s} 610\textbf{Usage:} 611\begin{verbatim} 612\s(1) 613\s(0) 614\end{verbatim} 615 616\textbf{Description:} 617Add a strikeout decoration to the text (parameter 1) or not (parameter 0.) The strikeout is a straight 618line parallel to the text baseline, which strikes through the letters. 619 620\subsubsection{\textbackslash fn} 621\textbf{Usage:} 622\begin{verbatim} 623\fn(fontname1,fontname2,...,fontnameN) 624\end{verbatim} 625 626\textbf{Description:} 627List of preferred fonts in descending order of preference 628 629\todo{What about fonts that have commas or parentheses in their names?} 630 631\subsubsection{\textbackslash fe} 632\textbf{Usage:} 633\begin{verbatim} 634\fe(fontencoding) 635\end{verbatim} 636 637\textbf{Description:} 638Set font encoding in some ISO code 639 640\todo{What does this affect? Apart from possibly selecting national variations of some characters 641and possibly fixing things in Windows.} 642 643\subsubsection{\textbackslash fs} 644\textbf{Usage:} 645\begin{verbatim} 646\fs(size) 647\end{verbatim} 648 649\textbf{Description:} 650Set font height in pixels. The font nominal character width is also set by \textbackslash fs to the default 651of the font face. 652 653The parameter can also be interpreted as a typographic point value, when 654the script resolution is assumed to be 72 dpi and the size of a typographic point is defined as 655$1/72$ inch. 656 657\todo{Can this be defined more clearly?} 658 659A negative font size must be considered an error and \must\ be ignored. 660 661\subsubsection{\textbackslash bord} 662\textbf{Usage:} 663\begin{verbatim} 664\bord(bordersize) 665\end{verbatim} 666 667\textbf{Description:} 668Set the width of the text outline. The outline width \mustnot\ be negative. 669 670The text outline can be defined by a morphological dilation operation using the rasterised text 671and a circular element with the radius specified by the \textbackslash bord tag. The outline is the 672original rasterised text subtracted from the result of the dilation operation. Ie.: 673\[O = (T \oplus E_{bord}) - T\] 674Where $O$ is the image of the outline, $T$ is the image of the text,$E_{bord}$ is the image of the 675circular element with radius $bord$ and $\oplus$ is the morphological dilation operation. 676 677The border can also be calculated from the vector outlines of the text. 678 679\todo{Define border by vector operations?} 680 681\todo{Is the outline calculated before or after applying other transformations? Ie. does X/Y axis 682rotations affect it?} 683 684\subsubsection{\textbackslash shad} 685\textbf{Usage:} 686\begin{verbatim} 687\shad(shadowsize) 688\end{verbatim} 689 690\textbf{Description:} 691Set shadow depth in script resolution pixels. The shadow depth \mustnot\ be negative. 692 693\todo{Or define what a negative shadow depth should mean instead?} 694 695The shadow can be defined as a shadow image offset from the text and outline images. The shadow image 696\must\ be rendered visually ``further away'' than the text and outline images, ie. ``behind'' them. 697 698The shadow image is the sum of the text and outline images, rendered entirely in the fourth color. 699 700The shadow image offset from the text and outline images is $shadowsize$ script resolution pixels in 701both X and Y direction. 702 703After offsetting the shadow image, the text and outline images are subtracted from it at its new position. 704 705\subsubsection{\textbackslash bordstyle} 706\textbf{Usage:} 707\begin{verbatim} 708\bordstyle(0) 709\bordstyle(1) 710\end{verbatim} 711 712\textbf{Description:} 713Set border style; 0 means normal, 1 means solid bounding box. 714 715When border style is 1 the outline image defined by the \textbackslash bord override \mustnot\ be used 716and instead an opaque box of the border color must be drawn behind the text. 717 718\todo{Define that box further} 719 720\subsection{Font Scaling Tags} 721 722\subsubsection{\textbackslash fsc, \textbackslash fscx, \textbackslash fscy} 723\textbf{Usage:} 724\begin{verbatim} 725\fsc(scale) 726\fscx(xscale) 727\fscy(yscale) 728\end{verbatim} 729 730\textbf{Description:} 731Set font X/Y scaling in percent. 732 733\todo{Implementation for this should probably go in a section that deals with transformation pipeline.} 734 735\subsubsection{\textbackslash fsp} 736\textbf{Usage:} 737\begin{verbatim} 738\fsp(fontspacing) 739\end{verbatim} 740 741\textbf{Description:} 742Set additional spacing between characters in pixels. When the spacing is non-zero, an additional 743number of script pixels equal to the parameter given to \textbackslash fsp are skipped after rendering 744each glyph in the text. When the spacing is non-zero, any ligatures defined by the font face 745\mustnot\ be used. 746 747\todo{Does non-zero spacing have further implications? How about complex scripts?} 748\todo{What about negative spacing?} 749 750\subsubsection{\textbackslash fsvp} 751\textbf{Usage:} 752\begin{verbatim} 753\fsp(verticalspacing) 754\end{verbatim} 755 756\textbf{Description:} 757Set font spacing between vertical baselines in pixels. This is an additional number of script pixels 758to skip after each rendered line of text. 759 760\todo{Any further implications on text rendering? What about negative values?} 761 762\subsection{Colouring Tags} 763 764\subsubsection{\textbackslash \$c} 765\textbf{Usage:} 766\begin{verbatim} 767\$c(colour) 768\end{verbatim} 769 770\textbf{Description:} 771Set font colouring in hexadecimal RGB. 772 773\subsubsection{\textbackslash \$a} 774\textbf{Usage:} 775\begin{verbatim} 776\$a(alpha) 777\end{verbatim} 778 779\textbf{Description:} 780Set font alpha channel (transparency) in hexadecimal RGB. 781 782\subsection{Positioning and Rotation Tags} 783 784\subsubsection{\textbackslash left, \textbackslash right, \textbackslash top, \textbackslash bottom} 785\textbf{Usage:} 786\begin{verbatim} 787\left(distance) 788\right(distance) 789\top(distance) 790\bottom(distance) 791\end{verbatim} 792 793\textbf{Description:} 794Margins are the distance between the subtitle text and the edge of the frame. They are used for 795improved aesthetics, readability, and to avoid issues with overscan. Unless manually overriden 796by another tag (such as \textbackslash pos), the text should always be contained inside the box 797defined by the script area minus the four borders, as long as automatic line breaking mode is 798set (see the section on [AS5]). 799 800All distance values are specified in script coordinates. The default value for all borders is 12. 801Margin tags can only be present once per line, and will affect all of it, not just the following 802block. Margin tags cannot be animated. 803 804\textbf{Implementation:} 805The default positioning of the pivot point of the subtitles box is also determined by the margins. 806On left-align, the \emph{x} of pivot is set to the left margin; on right-align, to $w - r$, 807and on middle-align, to $\frac{w + r - l}{2}$, where \emph{w} is the script width, \emph{r} is 808the value of the right margin and \emph{l} is the value of the left margin, that is, it is put 809halfway between the edges defined by the margins. The rules are analogous to the \emph{y} coordinate. 810 811See the alignment tags for more information regarding screen alignment. 812 813\subsubsection{\textbackslash an, \textbackslash ax, \textbackslash ay, \textbackslash nx, \textbackslash ny} 814\textbf{Usage:} 815\begin{verbatim} 816\an(numpadalignment) 817\ax(xalignment) 818\ay(yalignment) 819\nx(xinneralignment) 820\ny(yinneralignment) 821\end{verbatim} 822 823\textbf{Description:} 824Set alignment in various ways 825 826\todo{How about an alignment mode where the position set controls the text baseline position instead 827of an edge of the text bounding box?} 828 829\subsubsection{\textbackslash rel} 830\textbf{Usage:} 831\begin{verbatim} 832\rel(0) 833\rel(1) 834\end{verbatim} 835 836\textbf{Description:} 837Script resolution relative to video area (0) or not (1) 838 839\todo{Is this really a good tag name?} 840 841\subsubsection{\textbackslash vertical} 842\textbf{Usage:} 843\begin{verbatim} 844\vertical(0) 845\vertical(1) 846\end{verbatim} 847 848\textbf{Description:} 849Makes text vertical. This in particular affects the use of some glyph variations in CJK scripts. 850 851\todo{Does vertical imply that the baseline is vertical, ie. 852\verb/{\vertical1\fscx0}this is vertical text/ is indeed shown top-down?} 853 854\subsubsection{\textbackslash q} 855\textbf{Usage:} 856\begin{verbatim} 857\q(0) 858\q(1) 859\end{verbatim} 860 861\textbf{Description:} 862Set wrap style to manual (0) or automatic (1) 863 864\subsubsection{\textbackslash pos} 865\textbf{Usage:} 866\begin{verbatim} 867\pos(x,y) 868\end{verbatim} 869 870\textbf{Description:} 871Set line position to x,y in script coordinates. 872 873Can be animated with \textbackslash t. 874 875\subsubsection{\textbackslash org} 876\textbf{Usage:} 877\begin{verbatim} 878\org(x,y) 879\end{verbatim} 880 881\textbf{Description:} 882Set origin to x,y in script coordinates. 883 884Can be animated with \textbackslash t. 885 886\subsubsection{\textbackslash bls} 887\textbf{Usage:} 888\begin{verbatim} 889\bls[#] 890\end{verbatim} 891 892\textbf{Description:} 893This sets the baseline shift, that is, the vertical spacing between each character and the baseline 894in which it is supposed to be sitting on. The default value is 0, and the parameter is given in 895script coordinates. 896 897This tag can be animated with \textbackslash t, and can be reverted to style default by ommitting 898its parameter. 899 900\subsubsection{\textbackslash frx, \textbackslash fry, \textbackslash frz} 901\textbf{Usage:} 902\begin{verbatim} 903\frx(xrotation) 904\fry(yrotation) 905\frz(zrotation) 906\end{verbatim} 907 908\textbf{Description:} 909Set font rotation around x/y/z axis in degrees. 910 911\todo{Define the axes} 912 913\subsubsection{\textbackslash fax, \textbackslash fay} 914\textbf{Usage:} 915\begin{verbatim} 916\fax(xshearing) 917\fay(yshearing) 918\end{verbatim} 919 920\textbf{Description:} 921Set shearing in x and y axis. 0 means no shearing takes place. Negative values allowed. 922The parameters are multipliers in a shearing matrix. 923 924\subsection{Animation Tags} 925 926\subsubsection{\textbackslash fad} 927\textbf{Usage:} 928\begin{verbatim} 929\fad(t1,t2) 930\end{verbatim} 931 932\textbf{Description:} 933Fading text 934 935\subsubsection{\textbackslash t} 936\textbf{Usage:} 937\begin{verbatim} 938\t([t1,t2,]tags) 939\end{verbatim} 940 941\textbf{Description:} 942Animate tags between t1 and t2 943 944\subsection{Shape Transformation Tags} 945These are tags characterized by the fact that they distort the shape of the text itself. They 946were designed to enhance the flexibility of the format while dealing with unusually-shaped 947imagery. 948 949\subsubsection{\textbackslash distort} 950 951\textbf{Usage:} 952\begin{verbatim} 953\distort(x1,y1,x2,y2,x3,y3) 954\end{verbatim} 955 956\textbf{Description:} 957The distort tag allows you to apply an arbitrary distortion to the block that follows it. 958It takes three coordinate pairs that, along with the origin (at the current baseline position) 959specify a quadrilateral. 960 961$P_0$ is the origin, $P_1 = (x1,y1)$ is the corner at the end of the baseline for the affected text, 962$P_2 = (x2,y2)$ is the point above that, and $P_3 = (x3,y3)$ is the point above $P_0$. That is, they 963are listed clockwise from origin ($P_0$). 964 965The following picture illustrates how this tag works:\\ 966\begin{center} 967\includegraphics[width=0.7\textwidth]{./distort} 968\end{center} 969 970If the parameter list is ommitted, the distort reverts to the style's default (none by default). 971This tag can be animated with \textbackslash t. 972 973\textbf{Implementation:} 974This tag cannot be reduced to an affine transformation, so it cannot be expressed in Matrix form. 975In order to transform a given (x,y) coordinate pair to it: 976 977\begin{enumerate} 978\item Normalize the (x,y) coordinates to a (u,v) system, so that $P_0$ = (0,0) and $P_2$ = (1,1). 979This can be done by dividing x by the block's baseline length (bl) and y by the block height (h). 980The affine 3D transformation matrix for this operation is:\\ 981\begin{center} 982$\displaystyle \begin{bmatrix} 983\frac{1}{bl} & 0 & 0 & -\frac{P_{0x}}{bl} \\ 9840 & \frac{1}{h} & 0 & -\frac{P_{0y}}{h} \\ 9850 & 0 & 1 & 0 \\ 9860 & 0 & 0 & 1 987\end{bmatrix}$ 988\end{center} 989%\vspace{10pt} 990That is, $\displaystyle u = \frac{P_x - P_{0x}}{bl}; v = \frac{P_y - P_{0y}}{h}$. 991\item Apply the following formula: $P = P_0 + (P_1-P_0) u + (P_3-P_0) v + (P_0+P_2-P_1-P_3) u v$.\\ 992This can be interpreted as simple vector operations, that is, apply that once using the x coordinates 993and another using the y coordinates. Since the four points are constant, the coeficients can be 994precalculated, resulting in a very fast transformation.\\ 995\end{enumerate} 996 997\subsubsection{\textbackslash baseline} 998 999\textbf{Usage:} 1000\begin{verbatim} 1001\baseline(path1[,path2]) 1002\end{verbatim} 1003 1004\textbf{Description:} 1005Similarly to \textbackslash distort, this tag distorts the text, however, it does so by curving the 1006baseline into a vector path, so you can write curved text. Alternatively, you can specify a second 1007path to work as the ``ceiling'' of the text. The format of both path parameters is the standard 1008vector path format (see above). 1009 1010\textbf{Implementation:} 1011Implementation of this tag can be summarized by the conversion of a generic $P_n = (x,y)$ point into 1012$P'_n = (x',y')$. Let $c1(t)$ and $c2(t)$ be the parametric equations of the two paths specified. 1013The conversion can then be done in the following manner: 1014 1015\begin{enumerate} 1016\item Find the parameter \emph{t} along the baseline path that corresponds to the x position of 1017the point being converted. This can be done with a function that calculates the length from the 1018beginning of the path until an arbitrary point $P_t = c1(t)$ along it. 1019\item Calculate the base point along path1: $P_0 = c1(t)$ 1020\item Calculate \emph{u} so that $u = \frac{y-y_0}{h}$, where $y_0$ is the y coordinate of the original 1021baseline and \emph{h} is the height of the block box. 1022\end{enumerate} 1023 1024Now, for the single curve version: 1025 1026\begin{enumerate} 1027\item Find the tangent vector of path1 at point $c1(t)$ and find the \emph{V} unit vector that is 1028perpendicular to the curve at that point, by rotating the tangent vector by -90 degrees along the Z axis. 1029This should give you a vector pointing ``up'', towards where the letters go. This can be summarized as:\\ 1030$\displaystyle V = ( \lim_{h \to 0} (c1_y(t)-c1_y(t+h)) , \lim _{h\to0} (c1_x(t)-c1_x(t+h)))\\ 1031V = \frac{V}{\left \| V \right \|}\\$ 1032\todo Is that correct? 1033\item Multiply \emph{u} by the vector to find the offset from $P_0$, that is, $P'_n = P_0 + u V$. 1034\end{enumerate} 1035 1036And for the two-curve version: 1037 1038\begin{enumerate} 1039\item Calculate the ceiling point along path2: $P_1 = c2(t)$ 1040\item Get \emph{P} with the parametric equation of the line defined by $(P_0,P_1)$: $P = (1-u) P_0 + u P_1$. 1041\end{enumerate} 1042 1043\subsubsection{\textbackslash blpos} 1044\textbf{Usage:} 1045\begin{verbatim} 1046\blpos# 1047\end{verbatim} 1048 1049\textbf{Description:} 1050This sets the position of the text relative to the baseline start. This tag can be animated. 1051\todo{Write proper specs for this.} 1052 1053\subsection{Rastering Tags} 1054These tags affect how the subtitles are rasterized, that is, they affect things such as 1055colour, blurring, etc. 1056 1057\subsubsection{\textbackslash \$vc} 1058\textbf{Usage:} 1059\begin{verbatim} 1060\$vc(colour1,colour2,colour3,colour4) 1061\end{verbatim} 1062 1063\textbf{Description:} 1064Sets the primary colour to blend with each of the four vertices of draw polygon. 1065The primary use for this is to make smooth gradients easily, which are often required 1066for proper blending with the background. Note that you can also set alpha using this tag. 1067 1068\subsubsection{\textbackslash \$blend} 1069\textbf{Usage:} 1070\begin{verbatim} 1071\$blend(mode) 1072\end{verbatim} 1073 1074\textbf{Description:} 1075Sets the blending mode for the colour specified. Acceptable values are "normal", "add" and "multiply". 1076 1077\subsubsection{\textbackslash clip} 1078\textbf{Usage:} 1079\begin{verbatim} 1080\clip(x1,y1,x2,x2) 1081\end{verbatim} 1082 1083\textbf{Description:} 1084Clips so only text inside the rectangle formed by x1,y1,x2,y2 will be drawn 1085 1086\subsubsection{\textbackslash iclip} 1087\textbf{Usage:} 1088\begin{verbatim} 1089\iclip(x1,y1,x2,x2) 1090\end{verbatim} 1091 1092\textbf{Description:} 1093The inverse of \textbackslash clip, i.e. clips so only text outside the rectangle formed 1094by x1,y1,x2,y2 will be drawn. 1095 1096\subsubsection{\textbackslash \$blur} 1097\textbf{Usage:} 1098\begin{verbatim} 1099\$blur(???) 1100\end{verbatim} 1101 1102\textbf{Description:} 1103Blurs stuff. Animatable. 1104 1105\todo{Gaussian kernel or a number of applications of box blur?} 1106 1107\subsection{Advanced Typography Tags} 1108These are more advanced tags, which might prove to be fairly complex to implement. They include 1109things such as ruby text support (also known as furigana, when used with Japanese Kanji.) 1110 1111\todo{Write me} 1112 1113 1114\newpage 1115\section{Renderer Behaviour Specification} 1116\todo{Write this section} 1117 1118 1119\newpage 1120\section{Container Multiplexing Specification} 1121 1122\subsection{Matroska} 1123Storage of AS5 files in Matroska files is similar to how similar formats are stored.\cite{mkv ssa} 1124The Codec ID used is \textsc{S\_TEXT/AS5} 1125 1126First, the entire file is converted to UTF-8 (if it isn't already UTF-8). Then, all sections other 1127than \emph{[Events]} and \emph{[Resources]} are stored on the \emph{CodecPrivate} element. For the 1128\emph{[Resources]} section, each line is parsed and files are converted to Matroska file attachments. 1129\todo{Specify this more clearly.} 1130 1131Finally, each line in the \emph{[Events]} section is read and stored each in a block. The \emph{start} 1132and \emph{end} fields are parsed (see the specifications on the section describing [Events]) and set 1133as the \emph{TimeStamp} and \emph{BlockDuration} elements. The line itself is then stored in the 1134following format: 1135 1136\begin{verbatim} 1137Line: readOrder,style,userData,contents 1138\end{verbatim} 1139 1140Where \emph{readOrder} is the number that the line had on the file. This is necessary so the file 1141can be demultiplexed back in its original order, since lines will be stored in chronological order 1142while inside the Matroska file. The remaining fields should just be copied from the original line. 1143 1144 1145\newpage 1146\addcontentsline{toc}{section}{References} 1147\begin{thebibliography}{1} 1148 1149\bibitem{Aegisub} Rodrigo Braz Monteiro, Niels Martin Hansen, David Lamparter et al., Aegisub. Application, 2005-2007.\\ 1150\url{http://www.aegisub.net/} 1151 1152\bibitem{asa} David Lamparter, asa. Application, 2004-2007.\\ 1153\url{http://asa.diac24.net/} 1154 1155\bibitem{SSA} Kotus, Sub Station Alpha. Website, 1997-2003.\\ 1156\url{http://web.archive.org/web/*/http://www.eswat.demon.co.uk/substation.html} 1157 1158\bibitem{ASS} \#Anime-Fansubs, Advanced Sub Station Alpha.\\ 1159\url{http://www.anime-fansubs.org}\\ 1160\url{http://moodub.free.fr/video/ass-specs.doc} 1161 1162\bibitem{VSFilter} Gabest, VSFilter. Application, 2003-2007.\\ 1163\url{http://sourceforge.net/projects/guliverkli/} 1164 1165\bibitem{ASS3} David Lamparter, Advanced Sub Station Alpha 3. Website, 2007.\\ 1166\url{http://asa.diac24.net/ass3.pdf} 1167 1168\bibitem{mkv} The Matroska project. Website.\\ 1169\url{http://www.matroska.org/} 1170 1171\bibitem{UTF-8} The Internet Society, RFC 3629, ``UTF-8, a transformation format of ISO 10646''. Website, 2003.\\ 1172\url{http://tools.ietf.org/html/rfc3629} 1173 1174\bibitem{UTF-16} The Internet Society, RFC 2781, ``UTF-16, an encoding of ISO 10646''. Website, 2000.\\ 1175\url{http://tools.ietf.org/html/rfc2781} 1176 1177\bibitem{Unicode BOM} Unicode, Inc, The Unicode Standard, Chapter 13. PDF, 1991-2000.\\ 1178\url{http://www.unicode.org/unicode/uni2book/ch13.pdf} 1179 1180\bibitem{mkv ssa} The Matroska project, specification for SSA/ASS subtitle formats. Website.\\ 1181\url{http://www.matroska.org/technical/specs/subtitles/ssa.html} 1182 1183\end{thebibliography} 1184 1185\end{document}