1<!--
2doc/src/sgml/ref/create_statistics.sgml
3PostgreSQL documentation
4-->
5
6<refentry id="sql-createstatistics">
7 <indexterm zone="sql-createstatistics">
8  <primary>CREATE STATISTICS</primary>
9 </indexterm>
10
11 <refmeta>
12  <refentrytitle>CREATE STATISTICS</refentrytitle>
13  <manvolnum>7</manvolnum>
14  <refmiscinfo>SQL - Language Statements</refmiscinfo>
15 </refmeta>
16
17 <refnamediv>
18  <refname>CREATE STATISTICS</refname>
19  <refpurpose>define extended statistics</refpurpose>
20 </refnamediv>
21
22 <refsynopsisdiv>
23<synopsis>
24CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="parameter">statistics_name</replaceable>
25    [ ( <replaceable class="parameter">statistics_kind</replaceable> [, ... ] ) ]
26    ON <replaceable class="parameter">column_name</replaceable>, <replaceable class="parameter">column_name</replaceable> [, ...]
27    FROM <replaceable class="parameter">table_name</replaceable>
28</synopsis>
29
30 </refsynopsisdiv>
31
32 <refsect1 id="sql-createstatistics-description">
33  <title>Description</title>
34
35  <para>
36   <command>CREATE STATISTICS</command> will create a new extended statistics
37   object tracking data about the specified table, foreign table or
38   materialized view.  The statistics object will be created in the current
39   database and will be owned by the user issuing the command.
40  </para>
41
42  <para>
43   If a schema name is given (for example, <literal>CREATE STATISTICS
44   myschema.mystat ...</literal>) then the statistics object is created in the
45   specified schema.  Otherwise it is created in the current schema.
46   The name of the statistics object must be distinct from the name of any
47   other statistics object in the same schema.
48  </para>
49 </refsect1>
50
51 <refsect1>
52  <title>Parameters</title>
53
54  <variablelist>
55
56   <varlistentry>
57    <term><literal>IF NOT EXISTS</literal></term>
58    <listitem>
59     <para>
60      Do not throw an error if a statistics object with the same name already
61      exists.  A notice is issued in this case.  Note that only the name of
62      the statistics object is considered here, not the details of its
63      definition.
64     </para>
65    </listitem>
66   </varlistentry>
67
68   <varlistentry>
69    <term><replaceable class="parameter">statistics_name</replaceable></term>
70    <listitem>
71     <para>
72      The name (optionally schema-qualified) of the statistics object to be
73      created.
74     </para>
75    </listitem>
76   </varlistentry>
77
78   <varlistentry>
79    <term><replaceable class="parameter">statistics_kind</replaceable></term>
80    <listitem>
81     <para>
82      A statistics kind to be computed in this statistics object.
83      Currently supported kinds are
84      <literal>ndistinct</literal>, which enables n-distinct statistics,
85      <literal>dependencies</literal>, which enables functional
86      dependency statistics, and <literal>mcv</literal> which enables
87      most-common values lists.
88      If this clause is omitted, all supported statistics kinds are
89      included in the statistics object.
90      For more information, see <xref linkend="planner-stats-extended"/>
91      and <xref linkend="multivariate-statistics-examples"/>.
92     </para>
93    </listitem>
94   </varlistentry>
95
96   <varlistentry>
97    <term><replaceable class="parameter">column_name</replaceable></term>
98    <listitem>
99     <para>
100      The name of a table column to be covered by the computed statistics.
101      At least two column names must be given;  the order of the column names
102      is insignificant.
103     </para>
104    </listitem>
105   </varlistentry>
106
107   <varlistentry>
108    <term><replaceable class="parameter">table_name</replaceable></term>
109    <listitem>
110     <para>
111      The name (optionally schema-qualified) of the table containing the
112      column(s) the statistics are computed on.
113     </para>
114    </listitem>
115   </varlistentry>
116
117  </variablelist>
118 </refsect1>
119
120 <refsect1>
121  <title>Notes</title>
122
123  <para>
124   You must be the owner of a table to create a statistics object
125   reading it.  Once created, however, the ownership of the statistics
126   object is independent of the underlying table(s).
127  </para>
128 </refsect1>
129
130 <refsect1 id="sql-createstatistics-examples">
131  <title>Examples</title>
132
133  <para>
134   Create table <structname>t1</structname> with two functionally dependent columns, i.e.,
135   knowledge of a value in the first column is sufficient for determining the
136   value in the other column. Then functional dependency statistics are built
137   on those columns:
138
139<programlisting>
140CREATE TABLE t1 (
141    a   int,
142    b   int
143);
144
145INSERT INTO t1 SELECT i/100, i/500
146                 FROM generate_series(1,1000000) s(i);
147
148ANALYZE t1;
149
150-- the number of matching rows will be drastically underestimated:
151EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 0);
152
153CREATE STATISTICS s1 (dependencies) ON a, b FROM t1;
154
155ANALYZE t1;
156
157-- now the row count estimate is more accurate:
158EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 0);
159</programlisting>
160
161   Without functional-dependency statistics, the planner would assume
162   that the two <literal>WHERE</literal> conditions are independent, and would
163   multiply their selectivities together to arrive at a much-too-small
164   row count estimate.
165   With such statistics, the planner recognizes that the <literal>WHERE</literal>
166   conditions are redundant and does not underestimate the row count.
167  </para>
168
169  <para>
170   Create table <structname>t2</structname> with two perfectly correlated columns
171   (containing identical data), and a MCV list on those columns:
172
173<programlisting>
174CREATE TABLE t2 (
175    a   int,
176    b   int
177);
178
179INSERT INTO t2 SELECT mod(i,100), mod(i,100)
180                 FROM generate_series(1,1000000) s(i);
181
182CREATE STATISTICS s2 (mcv) ON a, b FROM t2;
183
184ANALYZE t2;
185
186-- valid combination (found in MCV)
187EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1);
188
189-- invalid combination (not found in MCV)
190EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 2);
191</programlisting>
192
193   The MCV list gives the planner more detailed information about the
194   specific values that commonly appear in the table, as well as an upper
195   bound on the selectivities of combinations of values that do not appear
196   in the table, allowing it to generate better estimates in both cases.
197  </para>
198
199 </refsect1>
200
201 <refsect1>
202  <title>Compatibility</title>
203
204  <para>
205   There is no <command>CREATE STATISTICS</command> command in the SQL standard.
206  </para>
207 </refsect1>
208
209 <refsect1>
210  <title>See Also</title>
211
212  <simplelist type="inline">
213   <member><xref linkend="sql-alterstatistics"/></member>
214   <member><xref linkend="sql-dropstatistics"/></member>
215  </simplelist>
216 </refsect1>
217</refentry>
218