1NAME 2 Audio::Scan - Fast C metadata and tag reader for all common audio file 3 formats 4 5SYNOPSIS 6 use Audio::Scan; 7 8 my $data = Audio::Scan->scan('/path/to/file.mp3'); 9 10 # Just file info 11 my $info = Audio::Scan->scan_info('/path/to/file.mp3'); 12 13 # Just tags 14 my $tags = Audio::Scan->scan_tags('/path/to/file.mp3'); 15 16 # Scan without reading (possibly large) artwork into memory. 17 # Instead of binary artwork data, the size of the artwork will be returned instead. 18 { 19 local $ENV{AUDIO_SCAN_NO_ARTWORK} = 1; 20 my $data = Audio::Scan->scan('/path/to/file.mp3'); 21 } 22 23 # Scan a filehandle 24 open my $fh, '<', 'my.mp3'; 25 my $data = Audio::Scan->scan_fh( mp3 => $fh ); 26 close $fh; 27 28 # Scan and compute an audio MD5 checksum 29 my $data = Audio::Scan->scan( '/path/to/file.mp3', { md5_size => 100 * 1024 } ); 30 my $md5 = $data->{info}->{audio_md5}; 31 32DESCRIPTION 33 Audio::Scan is a C-based scanner for audio file metadata and tag 34 information. It currently supports MP3, MP4, Ogg Vorbis, FLAC, ASF, WAV, 35 AIFF, Musepack, Monkey's Audio, and WavPack. 36 37 See below for specific details about each file format. 38 39METHODS 40 scan( $path, [ \%OPTIONS ] ) 41 Scans $path for both metadata and tag information. The type of scan 42 performed is determined by the file's extension. Supported extensions 43 are: 44 45 MP3: mp3, mp2 46 MP4: mp4, m4a, m4b, m4p, m4v, m4r, k3g, skm, 3gp, 3g2, mov 47 AAC (ADTS): aac 48 Ogg: ogg, oga 49 FLAC: flc, flac, fla 50 ASF: wma, wmv, asf 51 Musepack: mpc, mpp, mp+ 52 Monkey's Audio: ape, apl 53 WAV: wav 54 AIFF: aiff, aif 55 WavPack: wv 56 57 This method returns a hashref containing two other hashrefs: info and 58 tags. The contents of the info and tag hashes vary depending on file 59 format, see below for details. 60 61 An optional hashref may be provided. Currently this supports one item: 62 63 md5_size => $audio_bytes_to_checksum 64 65 An MD5 will be computed of the first N audio bytes. Any tags in the file 66 are automatically skipped, so this is a useful way of determining if a 67 file's audio content is the same even if tags may have been changed. The 68 hex MD5 value is returned in the $info->{audio_md5} key. This option 69 will reduce performance, so choose a small enough size that works for 70 you, you should probably avoid using more than 64K for example. 71 72 scan_info( $path, [ \%OPTIONS ] ) 73 If you only need file metadata and don't care about tags, you can use 74 this method. 75 76 scan_tags( $path, [ \%OPTIONS ] ) 77 If you only need the tags and don't care about the metadata, use this 78 method. 79 80 scan_fh( $type => $fh, [ \%OPTIONS ] ) 81 Scans a filehandle. $type is the type of file to scan as, i.e. "mp3" or 82 "ogg". Note that FLAC does not support reading from a filehandle. 83 84 find_frame( $path, $timestamp_in_ms ) 85 Returns the byte offset to the first audio frame starting from the given 86 timestamp (in milliseconds). 87 88 MP3, Ogg, FLAC, ASF, MP4 89 The byte offset to the data packet containing this timestamp will be 90 returned. For file formats that don't provide timestamp information 91 such as MP3, the best estimate for the location of the timestamp 92 will be returned. This will be more accurate if the file has a Xing 93 header or is CBR for example. 94 95 WAV, AIFF, Musepack, Monkey's Audio, WavPack 96 Not yet supported by find_frame. 97 98 find_frame_return_info( $mp4_path, $timestamp_in_ms ) 99 The header of an MP4 file contains various metadata that refers to the 100 structure of the audio data, making seeking more difficult to perform. 101 This method will return the usual $info hash with 2 additional keys: 102 103 seek_offset - The seek offset in bytes 104 seek_header - A rewritten MP4 header that can be prepended to the audio data 105 found at seek_offset to construct a valid bitstream. Specifically, 106 the following boxes are rewritten: stts, stsc, stsz, stco 107 108 For example, to seek 30 seconds into a file and write out a new MP4 file 109 seeked to this point: 110 111 my $info = Audio::Scan->find_frame_return_info( $file, 30000 ); 112 113 open my $f, '<', $file; 114 sysseek $f, $info->{seek_offset}, 1; 115 116 open my $fh, '>', 'seeked.m4a'; 117 print $fh $info->{seek_header}; 118 119 while ( sysread( $f, my $buf, 65536 ) ) { 120 print $fh $buf; 121 } 122 123 close $f; 124 close $fh; 125 126 find_frame_fh( $type => $fh, $offset ) 127 Same as "find_frame", but with a filehandle. 128 129 find_frame_fh_return_info( $type => $fh, $offset ) 130 Same as "find_frame_return_info", but with a filehandle. 131 132 has_flac() 133 Deprecated. Always returns 1 now that FLAC is always enabled. 134 135 is_supported( $path ) 136 Returns 1 if the given path can be scanned by Audio::Scan, or 0 if not. 137 138 get_types() 139 Returns an array of strings of the file types supported by Audio::Scan. 140 141 extensions_for( $type ) 142 Returns an array of strings of the file extensions that are considered 143 to be the file type *$type*. 144 145 type_for( $extension ) 146 Returns file type for a given extension. Returns *undef* for unsupported 147 extensions. 148 149MP3 150 INFO 151 The following metadata about a file may be returned: 152 153 id3_version (i.e. "ID3v2.4.0") 154 song_length_ms (duration in milliseconds) 155 layer (i.e. 3) 156 stereo 157 samples_per_frame 158 padding 159 audio_size (size of all audio frames) 160 audio_offset (byte offset to first audio frame) 161 bitrate (in bps, determined using Xing/LAME/VBRI if possible, or average in the worst case) 162 samplerate (in kHz) 163 vbr (1 if file is VBR) 164 165 If a Xing header is found: 166 xing_frames 167 xing_bytes 168 xing_quality 169 170 If a VBRI header is found: 171 vbri_delay 172 vbri_frames 173 vbri_bytes 174 vbri_quality 175 176 If a LAME header is found: 177 lame_encoder_version 178 lame_tag_revision 179 lame_vbr_method 180 lame_lowpass 181 lame_replay_gain_radio 182 lame_replay_gain_audiophile 183 lame_encoder_delay 184 lame_encoder_padding 185 lame_noise_shaping 186 lame_stereo_mode 187 lame_unwise_settings 188 lame_source_freq 189 lame_surround 190 lame_preset 191 192 TAGS 193 Raw tags are returned as found. This means older tags such as ID3v1 and 194 ID3v2.2/v2.3 are converted to ID3v2.4 tag names. Multiple instances of a 195 tag in a file will be returned as arrays. Complex tags such as APIC and 196 COMM are returned as arrays. All tag fields are converted to upper-case. 197 All text is converted to UTF-8. 198 199 Sample tag data: 200 201 tags => { 202 ALBUMARTISTSORT => "Solar Fields", 203 APIC => [ "image/jpeg", 3, "", <binary data snipped> ], 204 CATALOGNUMBER => "INRE 017", 205 COMM => ["eng", "", "Amazon.com Song ID: 202981429"], 206 "MUSICBRAINZ ALBUM ARTIST ID" => "a2af1f31-c9eb-4fff-990c-c4f547a11b75", 207 "MUSICBRAINZ ALBUM ID" => "282143c9-6191-474d-a31a-1117b8c88cc0", 208 "MUSICBRAINZ ALBUM RELEASE COUNTRY" => "FR", 209 "MUSICBRAINZ ALBUM STATUS" => "official", 210 "MUSICBRAINZ ALBUM TYPE" => "album", 211 "MUSICBRAINZ ARTIST ID" => "a2af1f31-c9eb-4fff-990c-c4f547a11b75", 212 "REPLAYGAIN_ALBUM_GAIN" => "-2.96 dB", 213 "REPLAYGAIN_ALBUM_PEAK" => "1.045736", 214 "REPLAYGAIN_TRACK_GAIN" => "+3.60 dB", 215 "REPLAYGAIN_TRACK_PEAK" => "0.892606", 216 TALB => "Leaving Home", 217 TCOM => "Magnus Birgersson", 218 TCON => "Ambient", 219 TCOP => "2005 ULTIMAE RECORDS", 220 TDRC => "2004-10", 221 TIT2 => "Home", 222 TPE1 => "Solar Fields", 223 TPE2 => "Solar Fields", 224 TPOS => "1/1", 225 TPUB => "Ultimae Records", 226 TRCK => "1/11", 227 TSOP => "Solar Fields", 228 UFID => [ 229 "http://musicbrainz.org", 230 "1084278a-2254-4613-a03c-9fed7a8937ca", 231 ], 232 }, 233 234MP4 235 INFO 236 The following metadata about a file may be returned: 237 238 audio_offset (byte offset to start of mdat) 239 audio_size 240 compatible_brands 241 file_size 242 leading_mdat (if file has mdat before moov) 243 major_brand 244 minor_version 245 song_length_ms 246 timescale 247 tracks (array of tracks in the file) 248 Each track may contain: 249 250 audio_type 251 avg_bitrate 252 bits_per_sample 253 channels 254 duration 255 encoding 256 handler_name 257 handler_type 258 id 259 max_bitrate 260 samplerate 261 262 TAGS 263 Tags are returned in a hash with all keys converted to upper-case. Keys 264 starting with 0xA9 (copyright symbol) will have this character stripped 265 out. Sample tag data: 266 267 tags => { 268 AART => "Album Artist", 269 ALB => "Album", 270 ART => "Artist", 271 CMT => "Comments", 272 COVR => <binary data snipped>, 273 CPIL => 1, 274 DAY => 2009, 275 DESC => "Video Description", 276 DISK => "1/2", 277 "ENCODING PARAMS" => "vers\0\0\0\1acbf\0\0\0\2brat\0\1w\0cdcv\0\1\6\5", 278 GNRE => "Jazz", 279 GRP => "Grouping", 280 ITUNNORM => " 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000", 281 ITUNSMPB => " 00000000 00000840 000001E4 00000000000001DC 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000", 282 LYR => "Lyrics", 283 NAM => "Name", 284 PGAP => 1, 285 SOAA => "Sort Album Artist", 286 SOAL => "Sort Album", 287 SOAR => "Sort Artist", 288 SOCO => "Sort Composer", 289 SONM => "Sort Name", 290 SOSN => "Sort Show", 291 TMPO => 120, 292 TOO => "iTunes 8.1.1, QuickTime 7.6", 293 TRKN => "1/10", 294 TVEN => "Episode ID", 295 TVES => 12, 296 TVSH => "Show", 297 TVSN => 12, 298 WRT => "Composer", 299 }, 300 301AAC (ADTS) 302 INFO 303 The following metadata about a file is returned: 304 305 audio_offset 306 audio_size 307 bitrate (in bps) 308 channels 309 file_size 310 profile (Main, LC, or SSR) 311 samplerate (in kHz) 312 song_length_ms (duration in milliseconds) 313 314OGG VORBIS 315 INFO 316 The following metadata about a file is returned: 317 318 version 319 channels 320 stereo 321 samplerate (in kHz) 322 bitrate_average (in bps) 323 bitrate_upper 324 bitrate_nominal 325 bitrate_lower 326 blocksize_0 327 blocksize_1 328 audio_offset (byte offset to audio) 329 audio_size 330 song_length_ms (duration in milliseconds) 331 332 TAGS 333 Raw Vorbis comments are returned. All comment keys are capitalized. 334 335FLAC 336 INFO 337 The following metadata about a file is returned: 338 339 channels 340 samplerate (in kHz) 341 bitrate (in bps) 342 file_size 343 audio_offset (byte offset to first audio frame) 344 audio_size 345 song_length_ms (duration in milliseconds) 346 bits_per_sample 347 frames 348 minimum_blocksize 349 maximum_blocksize 350 minimum_framesize 351 maximum_framesize 352 md5 353 total_samples 354 355 TAGS 356 Raw FLAC comments are returned. All comment keys are capitalized. Some 357 data returned is special: 358 359 APPLICATION 360 361 Each application block is returned in the APPLICATION tag keyed by application ID. 362 363 CUESHEET_BLOCK 364 365 The CUESHEET_BLOCK tag is an array containing each line of the cue sheet. 366 367 ALLPICTURES 368 369 Embedded pictures are returned in an ALLPICTURES array. Each picture has the following metadata: 370 371 mime_type 372 description 373 width 374 height 375 depth 376 color_index 377 image_data 378 picture_type 379 380ASF (Windows Media Audio/Video) 381 INFO 382 The following metadata about a file may be returned. Reading the ASF 383 spec is encouraged if you want to find out more about any of these 384 values. 385 386 audio_offset (byte offset to first data packet) 387 audio_size 388 broadcast (boolean, whether the file is a live broadcast or not) 389 codec_list (array of information about codecs used in the file) 390 creation_date (UNIX timestamp when file was created) 391 data_packets 392 drm_key 393 drm_license_url 394 drm_protection_type 395 drm_data 396 file_id (unique file ID) 397 file_size 398 index_blocks 399 index_entry_interval (in milliseconds) 400 index_offsets (byte offsets for each second of audio, per stream. Useful for seeking) 401 index_specifiers (indicates which stream a given index_offset points to) 402 language_list (array of languages referenced by the file's metadata) 403 lossless (boolean) 404 max_bitrate 405 max_packet_size 406 min_packet_size 407 mutex_list (mutually exclusive stream information) 408 play_duration_ms 409 preroll 410 script_commands 411 script_types 412 seekable (boolean, whether the file is seekable or not) 413 send_duration_ms 414 song_length_ms (the actual length of the audio, in milliseconds) 415 416 STREAMS 417 418 The streams array contains metadata related to an individul stream 419 within the file. The following metadata may be returned: 420 421 DeviceConformanceTemplate 422 IsVBR 423 alt_bitrate 424 alt_buffer_fullness 425 alt_buffer_size 426 avg_bitrate (most accurate bitrate for this stream) 427 avg_bytes_per_sec (audio only) 428 bitrate 429 bits_per_sample (audio only) 430 block_alignment (audio only) 431 bpp (video only) 432 buffer_fullness 433 buffer_size 434 channels (audio only) 435 codec_id (audio only) 436 compression_id (video only) 437 encode_options 438 encrypted (boolean) 439 error_correction_type 440 flag_seekable (boolean) 441 height (video only) 442 index_type 443 language_index (offset into language_list array) 444 max_object_size 445 samplerate (in kHz) (audio only) 446 samples_per_block 447 stream_number 448 stream_type 449 super_block_align 450 time_offset 451 width (video only) 452 453 TAGS 454 Raw tags are returned. Tags that occur more than once are returned as 455 arrays. In contrast to the other formats, tag keys are NOT capitalized. 456 There is one special key: 457 458 WM/Picture 459 460 Pictures are returned as a hash with the following keys: 461 462 image_type (numeric type, same as ID3v2 APIC) 463 mime_type 464 description 465 image 466 467WAV 468 INFO 469 The following metadata about a file may be returned. 470 471 audio_offset 472 audio_size 473 bitrate (in bps) 474 bits_per_sample 475 block_align 476 channels 477 file_size 478 format (WAV format code, 1 == PCM) 479 id3_version (if an ID3v2 tag is found) 480 samplerate (in kHz) 481 song_length_ms 482 483 TAGS 484 WAV files can contain several different types of tags. "Native" WAV tags 485 found in a LIST block may include these and others: 486 487 IARL - Archival Location 488 IART - Artist 489 ICMS - Commissioned 490 ICMT - Comment 491 ICOP - Copyright 492 ICRD - Creation Date 493 ICRP - Cropped 494 IENG - Engineer 495 IGNR - Genre 496 IKEY - Keywords 497 IMED - Medium 498 INAM - Name (Title) 499 IPRD - Product (Album) 500 ISBJ - Subject 501 ISFT - Software 502 ISRC - Source 503 ISRF - Source Form 504 TORG - Label 505 LOCA - Location 506 TVER - Version 507 TURL - URL 508 TLEN - Length 509 ITCH - Technician 510 TRCK - Track 511 ITRK - Track 512 513 ID3v2 tags can also be embedded within WAV files. These are returned 514 exactly as for MP3 files. 515 516AIFF 517 INFO 518 The following metadata about a file may be returned. 519 520 audio_offset 521 audio_size 522 bitrate (in bps) 523 bits_per_sample 524 block_align 525 channels 526 compression_name (if AIFC) 527 compression_type (if AIFC) 528 file_size 529 id3_version (if an ID3v2 tag is found) 530 samplerate (in kHz) 531 song_length_ms 532 533 TAGS 534 ID3v2 tags can be embedded within AIFF files. These are returned exactly 535 as for MP3 files. 536 537MONKEY'S AUDIO (APE) 538 INFO 539 The following metadata about a file may be returned. 540 541 audio_offset 542 audio_size 543 bitrate (in bps) 544 channels 545 compression 546 file_size 547 samplerate (in kHz) 548 song_length_ms 549 version 550 551 TAGS 552 APEv2 tags are returned as a hash of key/value pairs. 553 554MUSEPACK 555 INFO 556 The following metadata about a file may be returned. 557 558 audio_offset 559 audio_size 560 bitrate (in bps) 561 channels 562 encoder 563 file_size 564 profile 565 samplerate (in kHz) 566 song_length_ms 567 568 TAGS 569 Musepack uses APEv2 tags. They are returned as a hash of key/value 570 pairs. 571 572WAVPACK 573 574 The following metadata about a file may be returned. 575 576 audio_offset 577 audio_size 578 bitrate (in bps) 579 bits_per_sample 580 channels 581 encoder_version 582 file_size 583 hybrid (1 if file is lossy) (v4 only) 584 lossless (1 if file is lossless) (v4 only) 585 samplerate 586 song_length_ms 587 total_samples 588 589 TAGS 590 WavPack uses APEv2 tags. They are returned as a hash of key/value pairs. 591 592 593THANKS 594 Some code from the Rockbox project was very helpful in implementing ASF 595 and MP4 seeking. 596 597 Some of the file format parsing code was derived from the mt-daapd 598 project, and adapted by Netgear. It has been heavily rewritten to fix 599 bugs and add more features. 600 601 The source to the original Netgear C scanner for SqueezeCenter is 602 located at 603 <http://svn.slimdevices.com/repos/slim/7.3/trunk/platforms/readynas/cont 604 rib/scanner> 605 606 The audio MD5 feature uses an MD5 implementation by L. Peter Deutsch, 607 <ghost@aladdin.com>. 608 609SEE ALSO 610 ASF Spec 611 <http://www.microsoft.com/windows/windowsmedia/forpros/format/asfspec.as 612 px> 613 614 MP4 Info: 615 <http://standards.iso.org/ittf/PubliclyAvailableStandards/c051533_ISO_IE 616 C_14496-12_2008.zip> 617 <http://www.geocities.com/xhelmboyx/quicktime/formats/mp4-layout.txt> 618 619AUTHORS 620 Andy Grundman, <andy@hybridized.org> 621 622 Dan Sully, <daniel@cpan.org> 623 624COPYRIGHT AND LICENSE 625 Copyright (C) 2010 Logitech, Inc. 626 627 This program is free software; you can redistribute it and/or modify it 628 under the terms of the GNU General Public License as published by the 629 Free Software Foundation; either version 2 of the License, or (at your 630 option) any later version. 631 632