1NAME
2 Audio::Scan - Fast C metadata and tag reader for all common audio file
3 formats
4
5SYNOPSIS
6 use Audio::Scan;
7
8 my $data = Audio::Scan->scan('/path/to/file.mp3');
9
10 # Just file info
11 my $info = Audio::Scan->scan_info('/path/to/file.mp3');
12
13 # Just tags
14 my $tags = Audio::Scan->scan_tags('/path/to/file.mp3');
15
16 # Scan without reading (possibly large) artwork into memory.
17 # Instead of binary artwork data, the size of the artwork will be returned instead.
18 {
19 local $ENV{AUDIO_SCAN_NO_ARTWORK} = 1;
20 my $data = Audio::Scan->scan('/path/to/file.mp3');
21 }
22
23 # Scan a filehandle
24 open my $fh, '<', 'my.mp3';
25 my $data = Audio::Scan->scan_fh( mp3 => $fh );
26 close $fh;
27
28 # Scan and compute an audio MD5 checksum
29 my $data = Audio::Scan->scan( '/path/to/file.mp3', { md5_size => 100 * 1024 } );
30 my $md5 = $data->{info}->{audio_md5};
31
32DESCRIPTION
33 Audio::Scan is a C-based scanner for audio file metadata and tag
34 information. It currently supports MP3, MP4, Ogg Vorbis, FLAC, ASF, WAV,
35 AIFF, Musepack, Monkey's Audio, and WavPack.
36
37 See below for specific details about each file format.
38
39METHODS
40 scan( $path, [ \%OPTIONS ] )
41 Scans $path for both metadata and tag information. The type of scan
42 performed is determined by the file's extension. Supported extensions
43 are:
44
45 MP3: mp3, mp2
46 MP4: mp4, m4a, m4b, m4p, m4v, m4r, k3g, skm, 3gp, 3g2, mov
47 AAC (ADTS): aac
48 Ogg: ogg, oga
49 FLAC: flc, flac, fla
50 ASF: wma, wmv, asf
51 Musepack: mpc, mpp, mp+
52 Monkey's Audio: ape, apl
53 WAV: wav
54 AIFF: aiff, aif
55 WavPack: wv
56
57 This method returns a hashref containing two other hashrefs: info and
58 tags. The contents of the info and tag hashes vary depending on file
59 format, see below for details.
60
61 An optional hashref may be provided. Currently this supports one item:
62
63 md5_size => $audio_bytes_to_checksum
64
65 An MD5 will be computed of the first N audio bytes. Any tags in the file
66 are automatically skipped, so this is a useful way of determining if a
67 file's audio content is the same even if tags may have been changed. The
68 hex MD5 value is returned in the $info->{audio_md5} key. This option
69 will reduce performance, so choose a small enough size that works for
70 you, you should probably avoid using more than 64K for example.
71
72 scan_info( $path, [ \%OPTIONS ] )
73 If you only need file metadata and don't care about tags, you can use
74 this method.
75
76 scan_tags( $path, [ \%OPTIONS ] )
77 If you only need the tags and don't care about the metadata, use this
78 method.
79
80 scan_fh( $type => $fh, [ \%OPTIONS ] )
81 Scans a filehandle. $type is the type of file to scan as, i.e. "mp3" or
82 "ogg". Note that FLAC does not support reading from a filehandle.
83
84 find_frame( $path, $timestamp_in_ms )
85 Returns the byte offset to the first audio frame starting from the given
86 timestamp (in milliseconds).
87
88 MP3, Ogg, FLAC, ASF, MP4
89 The byte offset to the data packet containing this timestamp will be
90 returned. For file formats that don't provide timestamp information
91 such as MP3, the best estimate for the location of the timestamp
92 will be returned. This will be more accurate if the file has a Xing
93 header or is CBR for example.
94
95 WAV, AIFF, Musepack, Monkey's Audio, WavPack
96 Not yet supported by find_frame.
97
98 find_frame_return_info( $mp4_path, $timestamp_in_ms )
99 The header of an MP4 file contains various metadata that refers to the
100 structure of the audio data, making seeking more difficult to perform.
101 This method will return the usual $info hash with 2 additional keys:
102
103 seek_offset - The seek offset in bytes
104 seek_header - A rewritten MP4 header that can be prepended to the audio data
105 found at seek_offset to construct a valid bitstream. Specifically,
106 the following boxes are rewritten: stts, stsc, stsz, stco
107
108 For example, to seek 30 seconds into a file and write out a new MP4 file
109 seeked to this point:
110
111 my $info = Audio::Scan->find_frame_return_info( $file, 30000 );
112
113 open my $f, '<', $file;
114 sysseek $f, $info->{seek_offset}, 1;
115
116 open my $fh, '>', 'seeked.m4a';
117 print $fh $info->{seek_header};
118
119 while ( sysread( $f, my $buf, 65536 ) ) {
120 print $fh $buf;
121 }
122
123 close $f;
124 close $fh;
125
126 find_frame_fh( $type => $fh, $offset )
127 Same as "find_frame", but with a filehandle.
128
129 find_frame_fh_return_info( $type => $fh, $offset )
130 Same as "find_frame_return_info", but with a filehandle.
131
132 has_flac()
133 Deprecated. Always returns 1 now that FLAC is always enabled.
134
135 is_supported( $path )
136 Returns 1 if the given path can be scanned by Audio::Scan, or 0 if not.
137
138 get_types()
139 Returns an array of strings of the file types supported by Audio::Scan.
140
141 extensions_for( $type )
142 Returns an array of strings of the file extensions that are considered
143 to be the file type *$type*.
144
145 type_for( $extension )
146 Returns file type for a given extension. Returns *undef* for unsupported
147 extensions.
148
149MP3
150 INFO
151 The following metadata about a file may be returned:
152
153 id3_version (i.e. "ID3v2.4.0")
154 song_length_ms (duration in milliseconds)
155 layer (i.e. 3)
156 stereo
157 samples_per_frame
158 padding
159 audio_size (size of all audio frames)
160 audio_offset (byte offset to first audio frame)
161 bitrate (in bps, determined using Xing/LAME/VBRI if possible, or average in the worst case)
162 samplerate (in kHz)
163 vbr (1 if file is VBR)
164
165 If a Xing header is found:
166 xing_frames
167 xing_bytes
168 xing_quality
169
170 If a VBRI header is found:
171 vbri_delay
172 vbri_frames
173 vbri_bytes
174 vbri_quality
175
176 If a LAME header is found:
177 lame_encoder_version
178 lame_tag_revision
179 lame_vbr_method
180 lame_lowpass
181 lame_replay_gain_radio
182 lame_replay_gain_audiophile
183 lame_encoder_delay
184 lame_encoder_padding
185 lame_noise_shaping
186 lame_stereo_mode
187 lame_unwise_settings
188 lame_source_freq
189 lame_surround
190 lame_preset
191
192 TAGS
193 Raw tags are returned as found. This means older tags such as ID3v1 and
194 ID3v2.2/v2.3 are converted to ID3v2.4 tag names. Multiple instances of a
195 tag in a file will be returned as arrays. Complex tags such as APIC and
196 COMM are returned as arrays. All tag fields are converted to upper-case.
197 All text is converted to UTF-8.
198
199 Sample tag data:
200
201 tags => {
202 ALBUMARTISTSORT => "Solar Fields",
203 APIC => [ "image/jpeg", 3, "", <binary data snipped> ],
204 CATALOGNUMBER => "INRE 017",
205 COMM => ["eng", "", "Amazon.com Song ID: 202981429"],
206 "MUSICBRAINZ ALBUM ARTIST ID" => "a2af1f31-c9eb-4fff-990c-c4f547a11b75",
207 "MUSICBRAINZ ALBUM ID" => "282143c9-6191-474d-a31a-1117b8c88cc0",
208 "MUSICBRAINZ ALBUM RELEASE COUNTRY" => "FR",
209 "MUSICBRAINZ ALBUM STATUS" => "official",
210 "MUSICBRAINZ ALBUM TYPE" => "album",
211 "MUSICBRAINZ ARTIST ID" => "a2af1f31-c9eb-4fff-990c-c4f547a11b75",
212 "REPLAYGAIN_ALBUM_GAIN" => "-2.96 dB",
213 "REPLAYGAIN_ALBUM_PEAK" => "1.045736",
214 "REPLAYGAIN_TRACK_GAIN" => "+3.60 dB",
215 "REPLAYGAIN_TRACK_PEAK" => "0.892606",
216 TALB => "Leaving Home",
217 TCOM => "Magnus Birgersson",
218 TCON => "Ambient",
219 TCOP => "2005 ULTIMAE RECORDS",
220 TDRC => "2004-10",
221 TIT2 => "Home",
222 TPE1 => "Solar Fields",
223 TPE2 => "Solar Fields",
224 TPOS => "1/1",
225 TPUB => "Ultimae Records",
226 TRCK => "1/11",
227 TSOP => "Solar Fields",
228 UFID => [
229 "http://musicbrainz.org",
230 "1084278a-2254-4613-a03c-9fed7a8937ca",
231 ],
232 },
233
234MP4
235 INFO
236 The following metadata about a file may be returned:
237
238 audio_offset (byte offset to start of mdat)
239 audio_size
240 compatible_brands
241 file_size
242 leading_mdat (if file has mdat before moov)
243 major_brand
244 minor_version
245 song_length_ms
246 timescale
247 tracks (array of tracks in the file)
248 Each track may contain:
249
250 audio_type
251 avg_bitrate
252 bits_per_sample
253 channels
254 duration
255 encoding
256 handler_name
257 handler_type
258 id
259 max_bitrate
260 samplerate
261
262 TAGS
263 Tags are returned in a hash with all keys converted to upper-case. Keys
264 starting with 0xA9 (copyright symbol) will have this character stripped
265 out. Sample tag data:
266
267 tags => {
268 AART => "Album Artist",
269 ALB => "Album",
270 ART => "Artist",
271 CMT => "Comments",
272 COVR => <binary data snipped>,
273 CPIL => 1,
274 DAY => 2009,
275 DESC => "Video Description",
276 DISK => "1/2",
277 "ENCODING PARAMS" => "vers\0\0\0\1acbf\0\0\0\2brat\0\1w\0cdcv\0\1\6\5",
278 GNRE => "Jazz",
279 GRP => "Grouping",
280 ITUNNORM => " 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000",
281 ITUNSMPB => " 00000000 00000840 000001E4 00000000000001DC 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000",
282 LYR => "Lyrics",
283 NAM => "Name",
284 PGAP => 1,
285 SOAA => "Sort Album Artist",
286 SOAL => "Sort Album",
287 SOAR => "Sort Artist",
288 SOCO => "Sort Composer",
289 SONM => "Sort Name",
290 SOSN => "Sort Show",
291 TMPO => 120,
292 TOO => "iTunes 8.1.1, QuickTime 7.6",
293 TRKN => "1/10",
294 TVEN => "Episode ID",
295 TVES => 12,
296 TVSH => "Show",
297 TVSN => 12,
298 WRT => "Composer",
299 },
300
301AAC (ADTS)
302 INFO
303 The following metadata about a file is returned:
304
305 audio_offset
306 audio_size
307 bitrate (in bps)
308 channels
309 file_size
310 profile (Main, LC, or SSR)
311 samplerate (in kHz)
312 song_length_ms (duration in milliseconds)
313
314OGG VORBIS
315 INFO
316 The following metadata about a file is returned:
317
318 version
319 channels
320 stereo
321 samplerate (in kHz)
322 bitrate_average (in bps)
323 bitrate_upper
324 bitrate_nominal
325 bitrate_lower
326 blocksize_0
327 blocksize_1
328 audio_offset (byte offset to audio)
329 audio_size
330 song_length_ms (duration in milliseconds)
331
332 TAGS
333 Raw Vorbis comments are returned. All comment keys are capitalized.
334
335FLAC
336 INFO
337 The following metadata about a file is returned:
338
339 channels
340 samplerate (in kHz)
341 bitrate (in bps)
342 file_size
343 audio_offset (byte offset to first audio frame)
344 audio_size
345 song_length_ms (duration in milliseconds)
346 bits_per_sample
347 frames
348 minimum_blocksize
349 maximum_blocksize
350 minimum_framesize
351 maximum_framesize
352 md5
353 total_samples
354
355 TAGS
356 Raw FLAC comments are returned. All comment keys are capitalized. Some
357 data returned is special:
358
359 APPLICATION
360
361 Each application block is returned in the APPLICATION tag keyed by application ID.
362
363 CUESHEET_BLOCK
364
365 The CUESHEET_BLOCK tag is an array containing each line of the cue sheet.
366
367 ALLPICTURES
368
369 Embedded pictures are returned in an ALLPICTURES array. Each picture has the following metadata:
370
371 mime_type
372 description
373 width
374 height
375 depth
376 color_index
377 image_data
378 picture_type
379
380ASF (Windows Media Audio/Video)
381 INFO
382 The following metadata about a file may be returned. Reading the ASF
383 spec is encouraged if you want to find out more about any of these
384 values.
385
386 audio_offset (byte offset to first data packet)
387 audio_size
388 broadcast (boolean, whether the file is a live broadcast or not)
389 codec_list (array of information about codecs used in the file)
390 creation_date (UNIX timestamp when file was created)
391 data_packets
392 drm_key
393 drm_license_url
394 drm_protection_type
395 drm_data
396 file_id (unique file ID)
397 file_size
398 index_blocks
399 index_entry_interval (in milliseconds)
400 index_offsets (byte offsets for each second of audio, per stream. Useful for seeking)
401 index_specifiers (indicates which stream a given index_offset points to)
402 language_list (array of languages referenced by the file's metadata)
403 lossless (boolean)
404 max_bitrate
405 max_packet_size
406 min_packet_size
407 mutex_list (mutually exclusive stream information)
408 play_duration_ms
409 preroll
410 script_commands
411 script_types
412 seekable (boolean, whether the file is seekable or not)
413 send_duration_ms
414 song_length_ms (the actual length of the audio, in milliseconds)
415
416 STREAMS
417
418 The streams array contains metadata related to an individul stream
419 within the file. The following metadata may be returned:
420
421 DeviceConformanceTemplate
422 IsVBR
423 alt_bitrate
424 alt_buffer_fullness
425 alt_buffer_size
426 avg_bitrate (most accurate bitrate for this stream)
427 avg_bytes_per_sec (audio only)
428 bitrate
429 bits_per_sample (audio only)
430 block_alignment (audio only)
431 bpp (video only)
432 buffer_fullness
433 buffer_size
434 channels (audio only)
435 codec_id (audio only)
436 compression_id (video only)
437 encode_options
438 encrypted (boolean)
439 error_correction_type
440 flag_seekable (boolean)
441 height (video only)
442 index_type
443 language_index (offset into language_list array)
444 max_object_size
445 samplerate (in kHz) (audio only)
446 samples_per_block
447 stream_number
448 stream_type
449 super_block_align
450 time_offset
451 width (video only)
452
453 TAGS
454 Raw tags are returned. Tags that occur more than once are returned as
455 arrays. In contrast to the other formats, tag keys are NOT capitalized.
456 There is one special key:
457
458 WM/Picture
459
460 Pictures are returned as a hash with the following keys:
461
462 image_type (numeric type, same as ID3v2 APIC)
463 mime_type
464 description
465 image
466
467WAV
468 INFO
469 The following metadata about a file may be returned.
470
471 audio_offset
472 audio_size
473 bitrate (in bps)
474 bits_per_sample
475 block_align
476 channels
477 file_size
478 format (WAV format code, 1 == PCM)
479 id3_version (if an ID3v2 tag is found)
480 samplerate (in kHz)
481 song_length_ms
482
483 TAGS
484 WAV files can contain several different types of tags. "Native" WAV tags
485 found in a LIST block may include these and others:
486
487 IARL - Archival Location
488 IART - Artist
489 ICMS - Commissioned
490 ICMT - Comment
491 ICOP - Copyright
492 ICRD - Creation Date
493 ICRP - Cropped
494 IENG - Engineer
495 IGNR - Genre
496 IKEY - Keywords
497 IMED - Medium
498 INAM - Name (Title)
499 IPRD - Product (Album)
500 ISBJ - Subject
501 ISFT - Software
502 ISRC - Source
503 ISRF - Source Form
504 TORG - Label
505 LOCA - Location
506 TVER - Version
507 TURL - URL
508 TLEN - Length
509 ITCH - Technician
510 TRCK - Track
511 ITRK - Track
512
513 ID3v2 tags can also be embedded within WAV files. These are returned
514 exactly as for MP3 files.
515
516AIFF
517 INFO
518 The following metadata about a file may be returned.
519
520 audio_offset
521 audio_size
522 bitrate (in bps)
523 bits_per_sample
524 block_align
525 channels
526 compression_name (if AIFC)
527 compression_type (if AIFC)
528 file_size
529 id3_version (if an ID3v2 tag is found)
530 samplerate (in kHz)
531 song_length_ms
532
533 TAGS
534 ID3v2 tags can be embedded within AIFF files. These are returned exactly
535 as for MP3 files.
536
537MONKEY'S AUDIO (APE)
538 INFO
539 The following metadata about a file may be returned.
540
541 audio_offset
542 audio_size
543 bitrate (in bps)
544 channels
545 compression
546 file_size
547 samplerate (in kHz)
548 song_length_ms
549 version
550
551 TAGS
552 APEv2 tags are returned as a hash of key/value pairs.
553
554MUSEPACK
555 INFO
556 The following metadata about a file may be returned.
557
558 audio_offset
559 audio_size
560 bitrate (in bps)
561 channels
562 encoder
563 file_size
564 profile
565 samplerate (in kHz)
566 song_length_ms
567
568 TAGS
569 Musepack uses APEv2 tags. They are returned as a hash of key/value
570 pairs.
571
572WAVPACK
573
574 The following metadata about a file may be returned.
575
576 audio_offset
577 audio_size
578 bitrate (in bps)
579 bits_per_sample
580 channels
581 encoder_version
582 file_size
583 hybrid (1 if file is lossy) (v4 only)
584 lossless (1 if file is lossless) (v4 only)
585 samplerate
586 song_length_ms
587 total_samples
588
589 TAGS
590 WavPack uses APEv2 tags. They are returned as a hash of key/value pairs.
591
592
593THANKS
594 Some code from the Rockbox project was very helpful in implementing ASF
595 and MP4 seeking.
596
597 Some of the file format parsing code was derived from the mt-daapd
598 project, and adapted by Netgear. It has been heavily rewritten to fix
599 bugs and add more features.
600
601 The source to the original Netgear C scanner for SqueezeCenter is
602 located at
603 <http://svn.slimdevices.com/repos/slim/7.3/trunk/platforms/readynas/cont
604 rib/scanner>
605
606 The audio MD5 feature uses an MD5 implementation by L. Peter Deutsch,
607 <ghost@aladdin.com>.
608
609SEE ALSO
610 ASF Spec
611 <http://www.microsoft.com/windows/windowsmedia/forpros/format/asfspec.as
612 px>
613
614 MP4 Info:
615 <http://standards.iso.org/ittf/PubliclyAvailableStandards/c051533_ISO_IE
616 C_14496-12_2008.zip>
617 <http://www.geocities.com/xhelmboyx/quicktime/formats/mp4-layout.txt>
618
619AUTHORS
620 Andy Grundman, <andy@hybridized.org>
621
622 Dan Sully, <daniel@cpan.org>
623
624COPYRIGHT AND LICENSE
625 Copyright (C) 2010 Logitech, Inc.
626
627 This program is free software; you can redistribute it and/or modify it
628 under the terms of the GNU General Public License as published by the
629 Free Software Foundation; either version 2 of the License, or (at your
630 option) any later version.
631
632