1.. _pxb.xtrabackup.table-statistics.analyzing: 2 3================================================================================ 4 Analyzing Table Statistics 5================================================================================ 6 7The |xtrabackup| binary can analyze InnoDB data files in read-only mode to give 8statistics about them. To do this, you should use the :option:`--stats` 9option. You can combine this with the :option:`--tables` option to limit the 10files to examine. It also uses the :option:`--use-memory` option. 11 12You can perform the analysis on a running server, with some chance of errors due 13to the data being changed during analysis. Or, you can analyze a backup copy of 14the database. Either way, to use the statistics feature, you need a clean copy 15of the database including correctly sized log files, so you need to execute with 16:option:`--prepare` twice to use this functionality on a backup. 17 18The result of running on a backup might look like the following: :: 19 20 <INDEX STATISTICS> 21 table: test/table1, index: PRIMARY, space id: 12, root page 3 22 estimated statistics in dictionary: 23 key vals: 25265338, leaf pages 497839, size pages 498304 24 real statistics: 25 level 2 pages: pages=1, data=5395 bytes, data/pages=32% 26 level 1 pages: pages=415, data=6471907 bytes, data/pages=95% 27 leaf pages: recs=25958413, pages=497839, data=7492026403 bytes, data/pages=91% 28 29This can be interpreted as follows: 30 31* The first line simply shows the table and index name and its internal 32 identifiers. If you see an index named ``GEN_CLUST_INDEX``, that is the 33 table's clustered index, automatically created because you did not explicitly 34 create a ``PRIMARY KEY``. 35* The estimated statistics in dictionary information is similar to the data 36 that's gathered through ``ANALYZE TABLE`` inside of |InnoDB| to be stored as 37 estimated cardinality statistics and passed to the query optimizer. 38* The real statistics information is the result of scanning the data pages and 39 computing exact information about the index. 40* ``The level <X> pages``: output means that the line shows information about 41 pages at that level in the index tree. The larger ``<X>`` is, the farther it 42 is from the leaf pages, which are level 0. The first line is the root page. 43* The ``leaf pages`` output shows the leaf pages, of course. This is where the 44 table's data is stored. 45* The ``external pages``: output (not shown) shows large external pages that 46 hold values too long to fit in the row itself, such as long ``BLOB`` and 47 ``TEXT`` values. 48* The ``recs`` is the real number of records (rows) in leaf pages. 49* The ``pages`` is the page count. 50* The ``data`` is the total size of the data in the pages, in bytes. 51* The ``data/pages`` is calculated as (``data`` / (``pages`` * ``PAGE_SIZE``)) * 52 100%. It will never reach 100% because of space reserved for page headers and 53 footers. 54 55A more detailed example is posted as a MySQL Performance Blog `post 56<http://www.mysqlperformanceblog.com/2009/09/14/statistics-of-innodb-tables-and-indexes-available-in-xtrabackup/>`_. 57 58Script to Format Output 59================================================================================ 60 61The following script can be used to summarize and tabulate the output of the 62statistics information: :: 63 64 tabulate-xtrabackup-stats.pl 65 66 #!/usr/bin/env perl 67 use strict; 68 use warnings FATAL => 'all'; 69 my $script_version = "0.1"; 70 71 my $PG_SIZE = 16_384; # InnoDB defaults to 16k pages, change if needed. 72 my ($cur_idx, $cur_tbl); 73 my (%idx_stats, %tbl_stats); 74 my ($max_tbl_len, $max_idx_len) = (0, 0); 75 while ( my $line = <> ) { 76 if ( my ($t, $i) = $line =~ m/table: (.*), index: (.*), space id:/ ) { 77 $t =~ s!/!.!; 78 $cur_tbl = $t; 79 $cur_idx = $i; 80 if ( length($i) > $max_idx_len ) { 81 $max_idx_len = length($i); 82 } 83 if ( length($t) > $max_tbl_len ) { 84 $max_tbl_len = length($t); 85 } 86 } 87 elsif ( my ($kv, $lp, $sp) = $line =~ m/key vals: (\d+), \D*(\d+), \D*(\d+)/ ) { 88 @{$idx_stats{$cur_tbl}->{$cur_idx}}{qw(est_kv est_lp est_sp)} = ($kv, $lp, $sp); 89 $tbl_stats{$cur_tbl}->{est_kv} += $kv; 90 $tbl_stats{$cur_tbl}->{est_lp} += $lp; 91 $tbl_stats{$cur_tbl}->{est_sp} += $sp; 92 } 93 elsif ( my ($l, $pages, $bytes) = $line =~ m/(?:level (\d+)|leaf) pages:.*pages=(\d+), data=(\d+) bytes/ ) { 94 $l ||= 0; 95 $idx_stats{$cur_tbl}->{$cur_idx}->{real_pages} += $pages; 96 $idx_stats{$cur_tbl}->{$cur_idx}->{real_bytes} += $bytes; 97 $tbl_stats{$cur_tbl}->{real_pages} += $pages; 98 $tbl_stats{$cur_tbl}->{real_bytes} += $bytes; 99 } 100 } 101 102 my $hdr_fmt = "%${max_tbl_len}s %${max_idx_len}s %9s %10s %10s\n"; 103 my @headers = qw(TABLE INDEX TOT_PAGES FREE_PAGES PCT_FULL); 104 printf $hdr_fmt, @headers; 105 106 my $row_fmt = "%${max_tbl_len}s %${max_idx_len}s %9d %10d %9.1f%%\n"; 107 foreach my $t ( sort keys %tbl_stats ) { 108 my $tbl = $tbl_stats{$t}; 109 printf $row_fmt, $t, "", $tbl->{est_sp}, $tbl->{est_sp} - $tbl->{real_pages}, 110 $tbl->{real_bytes} / ($tbl->{real_pages} * $PG_SIZE) * 100; 111 foreach my $i ( sort keys %{$idx_stats{$t}} ) { 112 my $idx = $idx_stats{$t}->{$i}; 113 printf $row_fmt, $t, $i, $idx->{est_sp}, $idx->{est_sp} - $idx->{real_pages}, 114 $idx->{real_bytes} / ($idx->{real_pages} * $PG_SIZE) * 100; 115 } 116 } 117 118.. rubric:: Sample Script Output 119 120The output of the above Perl script, when run against the sample shown in the 121previously mentioned blog post, will appear as follows: :: 122 123 TABLE INDEX TOT_PAGES FREE_PAGES PCT_FULL 124 art.link_out104 832383 38561 86.8% 125 art.link_out104 PRIMARY 498304 49 91.9% 126 art.link_out104 domain_id 49600 6230 76.9% 127 art.link_out104 domain_id_2 26495 3339 89.1% 128 art.link_out104 from_message_id 28160 142 96.3% 129 art.link_out104 from_site_id 38848 4874 79.4% 130 art.link_out104 revert_domain 153984 19276 71.4% 131 art.link_out104 site_message 36992 4651 83.4% 132 133The columns are the table and index, followed by the total number of pages in 134that index, the number of pages not actually occupied by data, and the number of 135bytes of real data as a percentage of the total size of the pages of real 136data. The first line in the above output, in which the ``INDEX`` column is 137empty, is a summary of the entire table. 138