1.. |join| replace:: :func:`~astropy.table.join`
2
3.. _table_operations:
4
5Table Operations
6****************
7
8In this section we describe high-level operations that can be used to generate
9a new table from one or more input tables. This includes:
10
11=======================
12
13.. list-table::
14   :header-rows: 1
15   :widths: 28 52 20
16
17   * - Documentation
18     - Description
19     - Function
20   * - `Grouped operations`_
21     - Group tables and columns by keys
22     - :func:`~astropy.table.Table.group_by`
23   * - `Binning`_
24     - Binning tables
25     - :func:`~astropy.table.Table.group_by`
26   * - `Stack vertically`_
27     - Concatenate input tables along rows
28     - :func:`~astropy.table.vstack`
29   * - `Stack horizontally`_
30     - Concatenate input tables along columns
31     - :func:`~astropy.table.hstack`
32   * - `Join`_
33     - Database-style join of two tables
34     - |join|
35   * - `Unique rows`_
36     - Unique table rows by keys
37     - :func:`~astropy.table.unique`
38   * - `Set difference`_
39     - Set difference of two tables
40     - :func:`~astropy.table.setdiff`
41   * - `Table diff`_
42     - Generic difference of two simple tables
43     - :func:`~astropy.utils.diff.report_diff_values`
44
45
46.. _grouped-operations:
47
48Grouped Operations
49------------------
50
51.. EXAMPLE START: Grouped Operations in Tables
52
53Sometimes in a table or table column there are natural groups within the dataset
54for which it makes sense to compute some derived values. A minimal example is a
55list of objects with photometry from various observing runs::
56
57  >>> from astropy.table import Table
58  >>> obs = Table.read("""name    obs_date    mag_b  mag_v
59  ...                     M31     2012-01-02  17.0   17.5
60  ...                     M31     2012-01-02  17.1   17.4
61  ...                     M101    2012-01-02  15.1   13.5
62  ...                     M82     2012-02-14  16.2   14.5
63  ...                     M31     2012-02-14  16.9   17.3
64  ...                     M82     2012-02-14  15.2   15.5
65  ...                     M101    2012-02-14  15.0   13.6
66  ...                     M82     2012-03-26  15.7   16.5
67  ...                     M101    2012-03-26  15.1   13.5
68  ...                     M101    2012-03-26  14.8   14.3
69  ...                     """, format='ascii')
70  >>> # Make sure magnitudes are printed with one digit after the decimal point
71  >>> obs['mag_b'].info.format = '{:.1f}'
72  >>> obs['mag_v'].info.format = '{:.1f}'
73
74.. EXAMPLE END
75
76Table Groups
77^^^^^^^^^^^^
78
79Now suppose we want the mean magnitudes for each object. We first group the data
80by the ``name`` column with the :func:`~astropy.table.Table.group_by` method.
81This returns a new table sorted by ``name`` which has a ``groups`` property
82specifying the unique values of ``name`` and the corresponding table rows::
83
84  >>> obs_by_name = obs.group_by('name')
85  >>> print(obs_by_name)  # doctest: +SKIP
86  name  obs_date  mag_b mag_v
87  ---- ---------- ----- -----
88  M101 2012-01-02  15.1  13.5  << First group (index=0, key='M101')
89  M101 2012-02-14  15.0  13.6
90  M101 2012-03-26  15.1  13.5
91  M101 2012-03-26  14.8  14.3
92   M31 2012-01-02  17.0  17.5  << Second group (index=4, key='M31')
93   M31 2012-01-02  17.1  17.4
94   M31 2012-02-14  16.9  17.3
95   M82 2012-02-14  16.2  14.5  << Third group (index=7, key='M83')
96   M82 2012-02-14  15.2  15.5
97   M82 2012-03-26  15.7  16.5
98                               << End of groups (index=10)
99  >>> print(obs_by_name.groups.keys)
100  name
101  ----
102  M101
103   M31
104   M82
105  >>> print(obs_by_name.groups.indices)
106  [ 0  4  7 10]
107
108The ``groups`` property is the portal to all grouped operations with tables and
109columns. It defines how the table is grouped via an array of the unique row key
110values and the indices of the group boundaries for those key values. The groups
111here correspond to the row slices ``0:4``, ``4:7``, and ``7:10`` in the
112``obs_by_name`` table.
113
114The initial argument (``keys``) for the :func:`~astropy.table.Table.group_by`
115function can take a number of input data types:
116
117- Single string value with a table column name (as shown above)
118- List of string values with table column names
119- Another |Table| or |Column| with same length as table
120- ``numpy`` structured array with same length as table
121- ``numpy`` homogeneous array with same length as table
122
123In all cases the corresponding row elements are considered as a :class:`tuple`
124of values which form a key value that is used to sort the original table and
125generate the required groups.
126
127As an example, to get the average magnitudes for each object on each observing
128night, we would first group the table on both ``name`` and ``obs_date`` as
129follows::
130
131  >>> print(obs.group_by(['name', 'obs_date']).groups.keys)
132  name  obs_date
133  ---- ----------
134  M101 2012-01-02
135  M101 2012-02-14
136  M101 2012-03-26
137   M31 2012-01-02
138   M31 2012-02-14
139   M82 2012-02-14
140   M82 2012-03-26
141
142
143Manipulating Groups
144^^^^^^^^^^^^^^^^^^^
145
146.. EXAMPLE START: Manipulating Groups in Tables
147
148Once you have applied grouping to a table then you can access the individual
149groups or subsets of groups. In all cases this returns a new grouped table.
150For instance, to get the subtable which corresponds to the second group
151(index=1) do::
152
153  >>> print(obs_by_name.groups[1])
154  name  obs_date  mag_b mag_v
155  ---- ---------- ----- -----
156   M31 2012-01-02  17.0  17.5
157   M31 2012-01-02  17.1  17.4
158   M31 2012-02-14  16.9  17.3
159
160To get the first and second groups together use a :class:`slice`::
161
162  >>> groups01 = obs_by_name.groups[0:2]
163  >>> print(groups01)
164  name  obs_date  mag_b mag_v
165  ---- ---------- ----- -----
166  M101 2012-01-02  15.1  13.5
167  M101 2012-02-14  15.0  13.6
168  M101 2012-03-26  15.1  13.5
169  M101 2012-03-26  14.8  14.3
170   M31 2012-01-02  17.0  17.5
171   M31 2012-01-02  17.1  17.4
172   M31 2012-02-14  16.9  17.3
173  >>> print(groups01.groups.keys)
174  name
175  ----
176  M101
177   M31
178
179You can also supply a ``numpy`` array of indices or a boolean mask to select
180particular groups, for example::
181
182  >>> mask = obs_by_name.groups.keys['name'] == 'M101'
183  >>> print(obs_by_name.groups[mask])
184  name  obs_date  mag_b mag_v
185  ---- ---------- ----- -----
186  M101 2012-01-02  15.1  13.5
187  M101 2012-02-14  15.0  13.6
188  M101 2012-03-26  15.1  13.5
189  M101 2012-03-26  14.8  14.3
190
191You can iterate over the group subtables and corresponding keys with::
192
193  >>> for key, group in zip(obs_by_name.groups.keys, obs_by_name.groups):
194  ...     print(f'****** {key["name"]} *******')
195  ...     print(group)
196  ...     print('')
197  ...
198  ****** M101 *******
199  name  obs_date  mag_b mag_v
200  ---- ---------- ----- -----
201  M101 2012-01-02  15.1  13.5
202  M101 2012-02-14  15.0  13.6
203  M101 2012-03-26  15.1  13.5
204  M101 2012-03-26  14.8  14.3
205  ****** M31 *******
206  name  obs_date  mag_b mag_v
207  ---- ---------- ----- -----
208   M31 2012-01-02  17.0  17.5
209   M31 2012-01-02  17.1  17.4
210   M31 2012-02-14  16.9  17.3
211  ****** M82 *******
212  name  obs_date  mag_b mag_v
213  ---- ---------- ----- -----
214   M82 2012-02-14  16.2  14.5
215   M82 2012-02-14  15.2  15.5
216   M82 2012-03-26  15.7  16.5
217
218.. EXAMPLE END
219
220Column Groups
221^^^^^^^^^^^^^
222
223Like |Table| objects, |Column| objects can also be grouped for subsequent
224manipulation with grouped operations. This can apply both to columns within a
225|Table| or bare |Column| objects.
226
227As for |Table|, the grouping is generated with the
228:func:`~astropy.table.Table.group_by` method. The difference here is that
229there is no option of providing one or more column names since that
230does not make sense for a |Column|.
231
232Examples
233~~~~~~~~
234
235.. EXAMPLE START: Grouping Column Objects in Tables
236
237To generate grouping in columns::
238
239  >>> from astropy.table import Column
240  >>> import numpy as np
241  >>> c = Column([1, 2, 3, 4, 5, 6], name='a')
242  >>> key_vals = np.array(['foo', 'bar', 'foo', 'foo', 'qux', 'qux'])
243  >>> cg = c.group_by(key_vals)
244
245  >>> for key, group in zip(cg.groups.keys, cg.groups):
246  ...     print(f'****** {key} *******')
247  ...     print(group)
248  ...     print('')
249  ...
250  ****** bar *******
251   a
252  ---
253    2
254  ****** foo *******
255   a
256  ---
257    1
258    3
259    4
260  ****** qux *******
261   a
262  ---
263    5
264    6
265
266.. EXAMPLE END
267
268Aggregation
269^^^^^^^^^^^
270
271Aggregation is the process of applying a specified reduction function to the
272values within each group for each non-key column. This function must accept a
273|ndarray| as the first argument and return a single scalar value. Common
274function examples are :func:`numpy.sum`, :func:`numpy.mean`, and
275:func:`numpy.std`.
276
277For the example grouped table ``obs_by_name`` from above, we compute the group
278means with the :meth:`~astropy.table.groups.TableGroups.aggregate` method::
279
280  >>> obs_mean = obs_by_name.groups.aggregate(np.mean)  # doctest: +SHOW_WARNINGS
281  AstropyUserWarning: Cannot aggregate column 'obs_date' with type '<U10'
282  >>> print(obs_mean)
283  name mag_b mag_v
284  ---- ----- -----
285  M101  15.0  13.7
286   M31  17.0  17.4
287   M82  15.7  15.5
288
289It seems the magnitude values were successfully averaged, but what about the
290:class:`~astropy.utils.exceptions.AstropyUserWarning`? Since the ``obs_date``
291column is a string-type array, the :func:`numpy.mean` function failed and
292raised an exception.  Any time this happens
293:meth:`~astropy.table.groups.TableGroups.aggregate` will issue a warning and
294then drop that column from the output result. Note that the ``name`` column is
295one of the ``keys`` used to determine the grouping so it is automatically
296ignored from aggregation.
297
298.. EXAMPLE START: Performing Aggregation on Grouped Tables
299
300From a grouped table it is possible to select one or more columns on which
301to perform the aggregation::
302
303  >>> print(obs_by_name['mag_b'].groups.aggregate(np.mean))
304  mag_b
305  -----
306   15.0
307   17.0
308   15.7
309
310The order of the columns can be specified too::
311
312  >>> print(obs_by_name['name', 'mag_v', 'mag_b'].groups.aggregate(np.mean))
313  name mag_v mag_b
314  ---- ----- -----
315  M101  13.7  15.0
316   M31  17.4  17.0
317   M82  15.5  15.7
318
319
320A single column of data can be aggregated as well::
321
322  >>> c = Column([1, 2, 3, 4, 5, 6], name='a')
323  >>> key_vals = np.array(['foo', 'bar', 'foo', 'foo', 'qux', 'qux'])
324  >>> cg = c.group_by(key_vals)
325  >>> cg_sums = cg.groups.aggregate(np.sum)
326  >>> for key, cg_sum in zip(cg.groups.keys, cg_sums):
327  ...     print(f'Sum for {key} = {cg_sum}')
328  ...
329  Sum for bar = 2
330  Sum for foo = 8
331  Sum for qux = 11
332
333.. EXAMPLE END
334
335If the specified function has a :meth:`numpy.ufunc.reduceat` method, this will
336be called instead. This can improve the performance by a factor of 10 to 100
337(or more) for large unmasked tables or columns with many relatively small
338groups.  It also allows for the use of certain ``numpy`` functions which
339normally take more than one input array but also work as reduction functions,
340like `numpy.add`.  The ``numpy`` functions which should take advantage of using
341:meth:`numpy.ufunc.reduceat` include:
342
343- `numpy.add`
344- `numpy.arctan2`
345- `numpy.bitwise_and`
346- `numpy.bitwise_or`
347- `numpy.bitwise_xor`
348- `numpy.copysign`
349- `numpy.divide`
350- `numpy.equal`
351- `numpy.floor_divide`
352- `numpy.fmax`
353- `numpy.fmin`
354- `numpy.fmod`
355- `numpy.greater_equal`
356- `numpy.greater`
357- `numpy.hypot`
358- `numpy.left_shift`
359- `numpy.less_equal`
360- `numpy.less`
361- `numpy.logaddexp2`
362- `numpy.logaddexp`
363- `numpy.logical_and`
364- `numpy.logical_or`
365- `numpy.logical_xor`
366- `numpy.maximum`
367- `numpy.minimum`
368- `numpy.mod`
369- `numpy.multiply`
370- `numpy.not_equal`
371- `numpy.power`
372- `numpy.remainder`
373- `numpy.right_shift`
374- `numpy.subtract`
375- `numpy.true_divide`
376
377In special cases, :func:`numpy.sum` and :func:`numpy.mean` are substituted with
378their respective ``reduceat`` methods.
379
380Filtering
381^^^^^^^^^
382
383Table groups can be filtered by means of the
384:meth:`~astropy.table.groups.TableGroups.filter` method. This is done by
385supplying a function which is called for each group. The function
386which is passed to this method must accept two arguments:
387
388- ``table`` : |Table| object
389- ``key_colnames`` : list of columns in ``table`` used as keys for grouping
390
391It must then return either `True` or `False`.
392
393Example
394~~~~~~~
395
396.. EXAMPLE START: Filtering Table Groups
397
398The following will select all table groups with only positive values in the non-
399key columns::
400
401  >>> def all_positive(table, key_colnames):
402  ...     colnames = [name for name in table.colnames if name not in key_colnames]
403  ...     for colname in colnames:
404  ...         if np.any(table[colname] <= 0):
405  ...             return False
406  ...     return True
407
408An example of using this function is::
409
410  >>> t = Table.read(""" a   b    c
411  ...                   -2  7.0   2
412  ...                   -2  5.0   1
413  ...                    1  3.0  -5
414  ...                    1 -2.0  -6
415  ...                    1  1.0   7
416  ...                    0  4.0   4
417  ...                    3  3.0   5
418  ...                    3 -2.0   6
419  ...                    3  1.0   7""", format='ascii')
420  >>> tg = t.group_by('a')
421  >>> t_positive = tg.groups.filter(all_positive)
422  >>> for group in t_positive.groups:
423  ...     print(group)
424  ...     print('')
425  ...
426   a   b   c
427  --- --- ---
428   -2 7.0   2
429   -2 5.0   1
430  <BLANKLINE>
431   a   b   c
432  --- --- ---
433    0 4.0   4
434
435As can be seen only the groups with ``a == -2`` and ``a == 0`` have all
436positive values in the non-key columns, so those are the ones that are selected.
437
438Likewise a grouped column can be filtered with the
439:meth:`~astropy.table.groups.ColumnGroups.filter`, method but in this case the
440filtering function takes only a single argument which is the column group. It
441still must return either `True` or `False`. For example::
442
443  def all_positive(column):
444      return np.all(column > 0)
445
446.. EXAMPLE END
447
448.. _table_binning:
449
450Binning
451-------
452
453A common tool in analysis is to bin a table based on some reference value.
454Examples:
455
456- Photometry of a binary star in several bands taken over a
457  span of time which should be binned by orbital phase.
458- Reducing the sampling density for a table by combining
459  100 rows at a time.
460- Unevenly sampled historical data which should binned to
461  four points per year.
462
463All of these examples of binning a table can be accomplished using
464`grouped operations`_. The examples in that section are focused on the
465case of discrete key values such as the name of a source. In this
466section we show a concise yet powerful way of applying grouped operations to
467accomplish binning on key values such as time, phase, or row number.
468
469The common theme in all of these cases is to convert the key value array into
470a new float- or int-valued array whose values are identical for rows in the same
471output bin.
472
473Example
474^^^^^^^
475
476.. EXAMPLE START: Binning a Table using Grouped Operations
477
478As an example, we generate a fake light curve::
479
480  >>> year = np.linspace(2000.0, 2010.0, 200)  # 200 observations over 10 years
481  >>> period = 1.811
482  >>> y0 = 2005.2
483  >>> mag = 14.0 + 1.2 * np.sin(2 * np.pi * (year - y0) / period)
484  >>> phase = ((year - y0) / period) % 1.0
485  >>> dat = Table([year, phase, mag], names=['year', 'phase', 'mag'])
486
487Now we make an array that will be used for binning the data by 0.25 year
488intervals::
489
490  >>> year_bin = np.trunc(year / 0.25)
491
492This has the property that all samples in each 0.25 year bin have the same
493value of ``year_bin``. Think of ``year_bin`` as the bin number for ``year``.
494Then do the binning by grouping and immediately aggregating with
495:func:`numpy.mean`.
496
497  >>> dat_grouped = dat.group_by(year_bin)
498  >>> dat_binned = dat_grouped.groups.aggregate(np.mean)
499
500We can plot the results with ``plt.plot(dat_binned['year'], dat_binned['mag'],
501'.')``. Alternately, we could bin into 10 phase bins::
502
503  >>> phase_bin = np.trunc(phase / 0.1)
504  >>> dat_grouped = dat.group_by(phase_bin)
505  >>> dat_binned = dat_grouped.groups.aggregate(np.mean)
506
507This time, try plotting with ``plt.plot(dat_binned['phase'],
508dat_binned['mag'])``.
509
510.. EXAMPLE END
511
512.. _stack-vertically:
513
514Stack Vertically
515----------------
516
517The |Table| class supports stacking tables vertically with the
518:func:`~astropy.table.vstack` function. This process is also commonly known as
519concatenating or appending tables in the row direction. It corresponds roughly
520to the :func:`numpy.vstack` function.
521
522Examples
523^^^^^^^^
524
525.. EXAMPLE START: Stacking (or Concatenating) Tables Vertically
526
527Suppose we have two tables of observations with several column names in
528common::
529
530  >>> from astropy.table import Table, vstack
531  >>> obs1 = Table.read("""name    obs_date    mag_b  logLx
532  ...                      M31     2012-01-02  17.0   42.5
533  ...                      M82     2012-10-29  16.2   43.5
534  ...                      M101    2012-10-31  15.1   44.5""", format='ascii')
535
536  >>> obs2 = Table.read("""name    obs_date    logLx
537  ...                      NGC3516 2011-11-11  42.1
538  ...                      M31     1999-01-05  43.1
539  ...                      M82     2012-10-30  45.0""", format='ascii')
540
541Now we can stack these two tables::
542
543  >>> print(vstack([obs1, obs2]))
544    name   obs_date  mag_b logLx
545  ------- ---------- ----- -----
546      M31 2012-01-02  17.0  42.5
547      M82 2012-10-29  16.2  43.5
548     M101 2012-10-31  15.1  44.5
549  NGC3516 2011-11-11    --  42.1
550      M31 1999-01-05    --  43.1
551      M82 2012-10-30    --  45.0
552
553Notice that the ``obs2`` table is missing the ``mag_b`` column, so in the
554stacked output table those values are marked as missing. This is the default
555behavior and corresponds to ``join_type='outer'``. There are two other allowed
556values for the ``join_type`` argument, ``'inner'`` and ``'exact'``::
557
558  >>> print(vstack([obs1, obs2], join_type='inner'))
559    name   obs_date  logLx
560  ------- ---------- -----
561      M31 2012-01-02  42.5
562      M82 2012-10-29  43.5
563     M101 2012-10-31  44.5
564  NGC3516 2011-11-11  42.1
565      M31 1999-01-05  43.1
566      M82 2012-10-30  45.0
567
568  >>> print(vstack([obs1, obs2], join_type='exact'))  # doctest: +IGNORE_EXCEPTION_DETAIL
569  Traceback (most recent call last):
570    ...
571  TableMergeError: Inconsistent columns in input arrays (use 'inner'
572  or 'outer' join_type to allow non-matching columns)
573
574In the case of ``join_type='inner'``, only the common columns (the intersection)
575are present in the output table. When ``join_type='exact'`` is specified, then
576:func:`~astropy.table.vstack` requires that all of the input tables have
577exactly the same column names.
578
579More than two tables can be stacked by supplying a longer list of tables::
580
581  >>> obs3 = Table.read("""name    obs_date    mag_b  logLx
582  ...                      M45     2012-02-03  15.0   40.5""", format='ascii')
583  >>> print(vstack([obs1, obs2, obs3]))
584    name   obs_date  mag_b logLx
585  ------- ---------- ----- -----
586      M31 2012-01-02  17.0  42.5
587      M82 2012-10-29  16.2  43.5
588     M101 2012-10-31  15.1  44.5
589  NGC3516 2011-11-11    --  42.1
590      M31 1999-01-05    --  43.1
591      M82 2012-10-30    --  45.0
592      M45 2012-02-03  15.0  40.5
593
594See also the sections on `Merging metadata`_ and `Merging column attributes`_
595for details on how these characteristics of the input tables are merged in the
596single output table. Note also that you can use a single table |Row| instead of
597a full table as one of the inputs.
598
599.. EXAMPLE END
600
601.. _stack-horizontally:
602
603Stack Horizontally
604------------------
605
606The |Table| class supports stacking tables horizontally (in the column-wise
607direction) with the :func:`~astropy.table.hstack` function. It corresponds
608roughly to the :func:`numpy.hstack` function.
609
610Examples
611^^^^^^^^
612
613.. EXAMPLE START: Stacking (or Concatenating) Tables Horizontally
614
615Suppose we have the following two tables::
616
617  >>> from astropy.table import Table, hstack
618  >>> t1 = Table.read("""a   b    c
619  ...                    1   foo  1.4
620  ...                    2   bar  2.1
621  ...                    3   baz  2.8""", format='ascii')
622  >>> t2 = Table.read("""d     e
623  ...                    ham   eggs
624  ...                    spam  toast""", format='ascii')
625
626Now we can stack these two tables horizontally::
627
628  >>> print(hstack([t1, t2]))
629   a   b   c   d     e
630  --- --- --- ---- -----
631    1 foo 1.4  ham  eggs
632    2 bar 2.1 spam toast
633    3 baz 2.8   --    --
634
635As with :func:`~astropy.table.vstack`, there is an optional ``join_type``
636argument that can take values ``'inner'``, ``'exact'``, and ``'outer'``. The
637default is ``'outer'``, which effectively takes the union of available rows and
638masks out any missing values. This is illustrated in the example above. The
639other options give the intersection of rows, where ``'exact'`` requires that
640all tables have exactly the same number of rows::
641
642  >>> print(hstack([t1, t2], join_type='inner'))
643   a   b   c   d     e
644  --- --- --- ---- -----
645    1 foo 1.4  ham  eggs
646    2 bar 2.1 spam toast
647
648  >>> print(hstack([t1, t2], join_type='exact'))  # doctest: +IGNORE_EXCEPTION_DETAIL
649  Traceback (most recent call last):
650    ...
651  TableMergeError: Inconsistent number of rows in input arrays (use 'inner' or
652  'outer' join_type to allow non-matching rows)
653
654More than two tables can be stacked by supplying a longer list of tables. The
655example below also illustrates the behavior when there is a conflict in the
656input column names (see the section on `Column renaming`_ for details)::
657
658  >>> t3 = Table.read("""a    b
659  ...                    M45  2012-02-03""", format='ascii')
660  >>> print(hstack([t1, t2, t3]))
661  a_1 b_1  c   d     e   a_3    b_3
662  --- --- --- ---- ----- --- ----------
663    1 foo 1.4  ham  eggs M45 2012-02-03
664    2 bar 2.1 spam toast  --         --
665    3 baz 2.8   --    --  --         --
666
667The metadata from the input tables is merged by the process described in the
668`Merging metadata`_ section. Note also that you can use a single table |Row|
669instead of a full table as one of the inputs.
670
671.. EXAMPLE END
672
673.. _stack-depthwise:
674
675Stack Depth-Wise
676----------------
677
678The |Table| class supports stacking columns within tables depth-wise using the
679:func:`~astropy.table.dstack` function. It corresponds roughly to running the
680:func:`numpy.dstack` function on the individual columns matched by name.
681
682Examples
683^^^^^^^^
684
685.. EXAMPLE START: Stacking (or Concatenating) Tables Depth-Wise
686
687Suppose we have tables of data for sources giving information on the enclosed
688source counts for different PSF fractions::
689
690  >>> from astropy.table import Table, dstack
691  >>> src1 = Table.read("""psf_frac  counts
692  ...                      0.10        45
693  ...                      0.50        90
694  ...                      0.90       120
695  ...                      """, format='ascii')
696
697  >>> src2 = Table.read("""psf_frac  counts
698  ...                      0.10       200
699  ...                      0.50       300
700  ...                      0.90       350
701  ...                      """, format='ascii')
702
703Now we can stack these two tables depth-wise to get a single table with the
704characteristics of both sources::
705
706  >>> srcs = dstack([src1, src2])
707  >>> print(srcs)
708  psf_frac [2] counts [2]
709  ------------ ----------
710    0.1 .. 0.1  45 .. 200
711    0.5 .. 0.5  90 .. 300
712    0.9 .. 0.9 120 .. 350
713
714In this case the counts for the first source are accessible as
715``srcs['counts'][:, 0]``, and likewise the second source counts are
716``srcs['counts'][:, 1]``.
717
718For this function the length of all input tables must be the same. This
719function can accept ``join_type`` and ``metadata_conflicts`` just like the
720:func:`~astropy.table.vstack` function. The ``join_type`` argument controls how
721to handle mismatches in the columns of the input table.
722
723See also the sections on `Merging metadata`_ and `Merging column attributes`_
724for details on how these characteristics of the input tables are merged in the
725single output table. Note also that you can use a single table |Row| instead of
726a full table as one of the inputs.
727
728.. EXAMPLE END
729
730.. _table-join:
731
732Join
733----
734
735The |Table| class supports the `database join
736<https://en.wikipedia.org/wiki/Join_(SQL)>`_ operation. This provides a flexible
737and powerful way to combine tables based on the values in one or more key
738columns.
739
740Examples
741^^^^^^^^
742
743.. EXAMPLE START: Combining Tables using the Database Join Operation
744
745Suppose we have two tables of observations, the first with B and V magnitudes
746and the second with X-ray luminosities of an overlapping (but not identical)
747sample::
748
749  >>> from astropy.table import Table, join
750  >>> optical = Table.read("""name    obs_date    mag_b  mag_v
751  ...                         M31     2012-01-02  17.0   16.0
752  ...                         M82     2012-10-29  16.2   15.2
753  ...                         M101    2012-10-31  15.1   15.5""", format='ascii')
754  >>> xray = Table.read("""   name    obs_date    logLx
755  ...                         NGC3516 2011-11-11  42.1
756  ...                         M31     1999-01-05  43.1
757  ...                         M82     2012-10-29  45.0""", format='ascii')
758
759The |join| method allows you to merge these two tables into a single table based
760on matching values in the "key columns". By default, the key columns are the set
761of columns that are common to both tables. In this case the key columns are
762``name`` and ``obs_date``. We can find all of the observations of the same
763object on the same date as follows::
764
765  >>> opt_xray = join(optical, xray)
766  >>> print(opt_xray)
767  name  obs_date  mag_b mag_v logLx
768  ---- ---------- ----- ----- -----
769   M82 2012-10-29  16.2  15.2  45.0
770
771We can perform the match by ``name`` only by providing the ``keys`` argument,
772which can be either a single column name or a list of column names::
773
774  >>> print(join(optical, xray, keys='name'))
775  name obs_date_1 mag_b mag_v obs_date_2 logLx
776  ---- ---------- ----- ----- ---------- -----
777   M31 2012-01-02  17.0  16.0 1999-01-05  43.1
778   M82 2012-10-29  16.2  15.2 2012-10-29  45.0
779
780This output table has all of the observations that have both optical and X-ray
781data for an object (M31 and M82). Notice that since the ``obs_date`` column
782occurs in both tables, it has been split into two columns, ``obs_date_1`` and
783``obs_date_2``. The values are taken from the "left" (``optical``) and "right"
784(``xray``) tables, respectively.
785
786.. EXAMPLE END
787
788Different Join Options
789^^^^^^^^^^^^^^^^^^^^^^
790
791The table joins so far are known as "inner" joins and represent the strict
792intersection of the two tables on the key columns.
793
794.. EXAMPLE START: Table Join Options
795
796If you want to make a new table which has *every* row from the left table and
797includes matching values from the right table when available, this is known as a
798left join::
799
800  >>> print(join(optical, xray, join_type='left'))
801  name  obs_date  mag_b mag_v logLx
802  ---- ---------- ----- ----- -----
803  M101 2012-10-31  15.1  15.5    --
804   M31 2012-01-02  17.0  16.0    --
805   M82 2012-10-29  16.2  15.2  45.0
806
807Two of the observations do not have X-ray data, as indicated by the ``--`` in
808the table. You might be surprised that there is no X-ray data for M31 in the
809output. Remember that the default matching key includes both ``name`` and
810``obs_date``. Specifying the key as only the ``name`` column gives::
811
812  >>> print(join(optical, xray, join_type='left', keys='name'))
813  name obs_date_1 mag_b mag_v obs_date_2 logLx
814  ---- ---------- ----- ----- ---------- -----
815  M101 2012-10-31  15.1  15.5         --    --
816   M31 2012-01-02  17.0  16.0 1999-01-05  43.1
817   M82 2012-10-29  16.2  15.2 2012-10-29  45.0
818
819Likewise you can construct a new table with every row of the right table and
820matching left values (when available) using ``join_type='right'``.
821
822To make a table with the union of rows from both tables do an "outer" join::
823
824  >>> print(join(optical, xray, join_type='outer'))
825    name   obs_date  mag_b mag_v logLx
826  ------- ---------- ----- ----- -----
827     M101 2012-10-31  15.1  15.5    --
828      M31 1999-01-05    --    --  43.1
829      M31 2012-01-02  17.0  16.0    --
830      M82 2012-10-29  16.2  15.2  45.0
831  NGC3516 2011-11-11    --    --  42.1
832
833In all the above cases the output join table will be sorted by the key
834column(s) and in general will not preserve the row order of the input tables.
835
836Finally, you can do a "Cartesian" join, which is the Cartesian product of all
837available rows. In this case there are no key columns (and supplying the
838``keys`` argument is an error)::
839
840  >>> print(join(optical, xray, join_type='cartesian'))
841  name_1 obs_date_1 mag_b mag_v  name_2 obs_date_2 logLx
842  ------ ---------- ----- ----- ------- ---------- -----
843     M31 2012-01-02  17.0  16.0 NGC3516 2011-11-11  42.1
844     M31 2012-01-02  17.0  16.0     M31 1999-01-05  43.1
845     M31 2012-01-02  17.0  16.0     M82 2012-10-29  45.0
846     M82 2012-10-29  16.2  15.2 NGC3516 2011-11-11  42.1
847     M82 2012-10-29  16.2  15.2     M31 1999-01-05  43.1
848     M82 2012-10-29  16.2  15.2     M82 2012-10-29  45.0
849    M101 2012-10-31  15.1  15.5 NGC3516 2011-11-11  42.1
850    M101 2012-10-31  15.1  15.5     M31 1999-01-05  43.1
851    M101 2012-10-31  15.1  15.5     M82 2012-10-29  45.0
852
853.. EXAMPLE END
854
855Non-Identical Key Column Names
856^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
857
858.. EXAMPLE START: Joining Tables with Unique Key Column Names
859
860To use the |join| function with non-identical key column names, use the
861``keys_left`` and ``keys_right`` arguments. In the following example one table
862has a ``'name'`` column while the other has an ``'obj_id'`` column::
863
864  >>> optical = Table.read("""name    obs_date    mag_b  mag_v
865  ...                         M31     2012-01-02  17.0   16.0
866  ...                         M82     2012-10-29  16.2   15.2
867  ...                         M101    2012-10-31  15.1   15.5""", format='ascii')
868  >>> xray_1 = Table.read("""obj_id    obs_date    logLx
869  ...                        NGC3516 2011-11-11  42.1
870  ...                        M31     1999-01-05  43.1
871  ...                        M82     2012-10-29  45.0""", format='ascii')
872
873In order to perform a match based on the names of the objects, do the
874following::
875
876  >>> print(join(optical, xray_1, keys_left='name', keys_right='obj_id'))
877  name obs_date_1 mag_b mag_v obj_id obs_date_2 logLx
878  ---- ---------- ----- ----- ------ ---------- -----
879   M31 2012-01-02  17.0  16.0    M31 1999-01-05  43.1
880   M82 2012-10-29  16.2  15.2    M82 2012-10-29  45.0
881
882The ``keys_left`` and ``keys_right`` arguments can also take a list of column
883names or even a list of column-like objects. The latter case allows specifying
884the matching key column values independent of the tables being joined.
885
886.. EXAMPLE END
887
888Identical Key Values
889^^^^^^^^^^^^^^^^^^^^
890
891.. EXAMPLE START: Joining Tables with Identical Key Values
892
893The |Table| join operation works even if there are multiple rows with identical
894key values. For example, the following tables have multiple rows for the column
895``'key'``::
896
897  >>> from astropy.table import Table, join
898  >>> left = Table([[0, 1, 1, 2], ['L1', 'L2', 'L3', 'L4']], names=('key', 'L'))
899  >>> right = Table([[1, 1, 2, 4], ['R1', 'R2', 'R3', 'R4']], names=('key', 'R'))
900  >>> print(left)
901  key  L
902  --- ---
903    0  L1
904    1  L2
905    1  L3
906    2  L4
907  >>> print(right)
908  key  R
909  --- ---
910    1  R1
911    1  R2
912    2  R3
913    4  R4
914
915Doing an outer join on these tables shows that what is really happening is a
916`Cartesian product <https://en.wikipedia.org/wiki/Cartesian_product>`_. For
917each matching key, every combination of the left and right tables is
918represented. When there is no match in either the left or right table, the
919corresponding column values are designated as missing::
920
921  >>> print(join(left, right, join_type='outer'))
922  key  L   R
923  --- --- ---
924    0  L1  --
925    1  L2  R1
926    1  L2  R2
927    1  L3  R1
928    1  L3  R2
929    2  L4  R3
930    4  --  R4
931
932An inner join is the same but only returns rows where there is a key match in
933both the left and right tables::
934
935  >>> print(join(left, right, join_type='inner'))
936  key  L   R
937  --- --- ---
938    1  L2  R1
939    1  L2  R2
940    1  L3  R1
941    1  L3  R2
942    2  L4  R3
943
944Conflicts in the input table names are handled by the process described in the
945section on `Column renaming`_. See also the sections on `Merging metadata`_ and
946`Merging column attributes`_ for details on how these characteristics of the
947input tables are merged in the single output table.
948
949.. EXAMPLE END
950
951Merging Details
952---------------
953
954When combining two or more tables there is the need to merge certain
955characteristics in the inputs and potentially resolve conflicts. This
956section describes the process.
957
958Column Renaming
959^^^^^^^^^^^^^^^
960
961In cases where the input tables have conflicting column names, there
962is a mechanism to generate unique output column names. There are two
963keyword arguments that control the renaming behavior:
964
965``table_names``
966    List of strings that provide names for the tables being joined.
967    By default this is ``['1', '2', ...]``, where the numbers correspond to
968    the input tables.
969
970``uniq_col_name``
971    String format specifier with a default value of ``'{col_name}_{table_name}'``.
972
973This is best understood by example using the ``optical`` and ``xray`` tables
974in the |join| example defined previously::
975
976  >>> print(join(optical, xray, keys='name',
977  ...            table_names=['OPTICAL', 'XRAY'],
978  ...            uniq_col_name='{table_name}_{col_name}'))
979  name OPTICAL_obs_date mag_b mag_v XRAY_obs_date logLx
980  ---- ---------------- ----- ----- ------------- -----
981   M31       2012-01-02  17.0  16.0    1999-01-05  43.1
982   M82       2012-10-29  16.2  15.2    2012-10-29  45.0
983
984.. _merging_metadata:
985
986Merging Metadata
987^^^^^^^^^^^^^^^^
988
989|Table| objects can have associated metadata:
990
991- ``Table.meta``: table-level metadata as an ordered dictionary
992- ``Column.meta``: per-column metadata as an ordered dictionary
993
994The table operations described here handle the task of merging the metadata in
995the input tables into a single output structure. Because the metadata can be
996arbitrarily complex there is no unique way to do the merge. The current
997implementation uses a recursive algorithm with four rules:
998
999- :class:`dict` elements are merged by keys.
1000- Conflicting :class:`list` or :class:`tuple` elements are concatenated.
1001- Conflicting :class:`dict` elements are merged by recursively calling the
1002  merge function.
1003- Conflicting elements that are not :class:`list`, :class:`tuple`, or
1004  :class:`dict` will follow the following rules:
1005
1006    - If both metadata values are identical, the output is set to this value.
1007    - If one of the conflicting metadata values is `None`, the other value is
1008      picked.
1009    - If both metadata values are different and neither is `None`, the one for
1010      the last table in the list is picked.
1011
1012By default, a warning is emitted in the last case (both metadata values are not
1013`None`). The warning can be silenced or made into an exception using the
1014``metadata_conflicts`` argument to :func:`~astropy.table.hstack`,
1015:func:`~astropy.table.vstack`, or
1016:func:`~astropy.table.join`. The ``metadata_conflicts`` option can be set to:
1017
1018- ``'silent'`` – no warning is emitted, the value for the last table is silently
1019  picked.
1020- ``'warn'`` – a warning is emitted, the value for the last table is picked.
1021- ``'error'`` – an exception is raised.
1022
1023The default strategies for merging metadata can be augmented or customized by
1024defining subclasses of the `~astropy.utils.metadata.MergeStrategy` base class.
1025In most cases you will also use
1026:func:`~astropy.utils.metadata.enable_merge_strategies` for enabling the custom
1027strategies. The linked documentation strings provide details.
1028
1029Merging Column Attributes
1030^^^^^^^^^^^^^^^^^^^^^^^^^
1031
1032In addition to the table and column ``meta`` attributes, the column attributes
1033``unit``, ``format``, and ``description`` are merged by going through the input
1034tables in order and taking the last value which is defined (i.e., is not
1035`None`).
1036
1037Example
1038~~~~~~~
1039
1040.. EXAMPLE START: Merging Column Attributes in a Table
1041
1042To merge column attributes ``unit``, ``format``, or ``description``::
1043
1044  >>> from astropy.table import Column, Table, vstack
1045  >>> col1 = Column([1], name='a')
1046  >>> col2 = Column([2], name='a', unit='cm')
1047  >>> col3 = Column([3], name='a', unit='m')
1048  >>> t1 = Table([col1])
1049  >>> t2 = Table([col2])
1050  >>> t3 = Table([col3])
1051  >>> out = vstack([t1, t2, t3])  # doctest: +SHOW_WARNINGS
1052  MergeConflictWarning: In merged column 'a' the 'unit' attribute does
1053  not match (cm != m).  Using m for merged output
1054  >>> out['a'].unit
1055  Unit("m")
1056
1057The rules for merging are the same as for `Merging metadata`_, and the
1058``metadata_conflicts`` option also controls the merging of column attributes.
1059
1060.. EXAMPLE END
1061
1062.. _astropy-table-join-functions:
1063
1064Joining Coordinates and Custom Join Functions
1065^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1066
1067Source catalogs that have |SkyCoord| coordinate columns can be joined using
1068cross-matching of the coordinates with a specified distance threshold. This is
1069a special case of a more general problem of "fuzzy" matching of key column
1070values, where instead of an exact match we require only an approximate match.
1071This is supported using the ``join_funcs`` argument.
1072
1073.. warning::
1074
1075   The coordinate and distance table joins discussed in this section are most
1076   applicable in the case where the relevant entries in at least one of the
1077   tables are all separated from one another by more than twice the join
1078   distance. If this is not satisfied then the join results may be unexpected.
1079
1080   This is a consequence of the algorithm which effectively finds clusters of
1081   nearby points (an "equivalence class") and assigns a unique cluster
1082   identifier to each entry in both tables. This assumes the join matching
1083   function is a transitive relation where ``join_func(A, B)`` and
1084   ``join_func(B, C)`` implies ``join_func(A, C)``. With multiple matches on
1085   both left and right sides it is possible for the cluster of points having a
1086   single cluster identifier to expand in size beyond the distance threshold.
1087
1088   Users should be especially aware of this issue if additional join keys
1089   are provided beyond the ``join_funcs``. The code does not do a "pre-join"
1090   on the other keys, so the possibility of having overlaps within the distance
1091   in both tables is higher.
1092
1093Example
1094~~~~~~~
1095
1096.. EXAMPLE START: Joining a Table on Coordinates
1097
1098To join two tables on a |SkyCoord| key column we use the ``join_funcs`` keyword
1099to supply a :class:`dict` of functions that specify how to match a particular
1100key column by name. In the example below we are joining on the ``sc`` column,
1101so we provide the following argument::
1102
1103  join_funcs={'sc': join_skycoord(0.2 * u.deg)}
1104
1105This tells |join| to match the ``sc`` key column using the join function
1106:func:`~astropy.table.join_skycoord` with a matching distance threshold of 0.2
1107deg. Under the hood this calls
1108:meth:`~astropy.coordinates.SkyCoord.search_around_sky` or
1109:meth:`~astropy.coordinates.SkyCoord.search_around_3d` to do the
1110cross-matching. The default is to use
1111:meth:`~astropy.coordinates.SkyCoord.search_around_sky` (angle) matching, but
1112:meth:`~astropy.coordinates.SkyCoord.search_around_3d` (length or
1113dimensionless) is also available. This is specified using the ``distance_func``
1114argument of :func:`~astropy.table.join_skycoord`, which can also be a function
1115that matches the input and output API of
1116:meth:`~astropy.coordinates.SkyCoord.search_around_sky`.
1117
1118Now we show the whole process:
1119
1120..  doctest-requires:: scipy
1121
1122  >>> from astropy.coordinates import SkyCoord
1123  >>> import astropy.units as u
1124  >>> from astropy.table import Table, join, join_skycoord
1125
1126..  doctest-requires:: scipy
1127
1128  >>> sc1 = SkyCoord([0, 1, 1.1, 2], [0, 0, 0, 0], unit='deg')
1129  >>> sc2 = SkyCoord([1.05, 0.5, 2.1], [0, 0, 0], unit='deg')
1130
1131..  doctest-requires:: scipy
1132
1133  >>> t1 = Table([sc1, [0, 1, 2, 3]], names=['sc', 'idx'])
1134  >>> t2 = Table([sc2, [0, 1, 2]], names=['sc', 'idx'])
1135
1136..  doctest-requires:: scipy
1137
1138  >>> t12 = join(t1, t2, keys='sc', join_funcs={'sc': join_skycoord(0.2 * u.deg)})
1139  >>> print(t12)
1140  sc_id   sc_1  idx_1   sc_2   idx_2
1141        deg,deg       deg,deg
1142  ----- ------- ----- -------- -----
1143      1 1.0,0.0     1 1.05,0.0     0
1144      1 1.1,0.0     2 1.05,0.0     0
1145      2 2.0,0.0     3  2.1,0.0     2
1146
1147The joined table has matched the sources within 0.2 deg and created a new
1148column ``sc_id`` with a unique identifier for each source.
1149
1150.. EXAMPLE END
1151
1152You might be wondering what is happening in the join function defined above,
1153especially if you are interested in defining your own such function. This could
1154be done in order to allow fuzzy word matching of tables, for example joining
1155tables of people by name where the names do not always match exactly.
1156
1157The first thing to note here is that the :func:`~astropy.table.join_skycoord`
1158function actually returns a function itself. This allows specifying a variable
1159match distance via a function enclosure. The requirement of the join function
1160is that it accepts two arguments corresponding to the two key columns, and
1161returns a tuple of ``(ids1, ids2)``. These identifiers correspond to the
1162identification of each column entry with a unique matched source.
1163
1164..  doctest-requires:: scipy
1165
1166    >>> join_func = join_skycoord(0.2 * u.deg)
1167    >>> join_func(sc1, sc2)  # Associate each coordinate with unique source ID
1168    (array([3, 1, 1, 2]), array([1, 4, 2]))
1169
1170If you would like to write your own fuzzy matching function, we suggest starting
1171from the source code for :func:`~astropy.table.join_skycoord` or
1172:func:`~astropy.table.join_distance`.
1173
1174Join on Distance
1175~~~~~~~~~~~~~~~~
1176
1177The example above focused on joining on a |SkyCoord|, but you can also join on
1178a generic distance between column values using the
1179:func:`~astropy.table.join_distance` join function. This can apply to 1D or 2D
1180(vector) columns. This will look very similar to the coordinates example, but
1181here there is a bit more flexibility. The matching is done using
1182:class:`scipy.spatial.cKDTree` and
1183:meth:`scipy.spatial.cKDTree.query_ball_tree`, and the behavior of these can be
1184controlled via the ``kdtree_args`` and ``query_args`` arguments, respectively.
1185
1186.. _unique-rows:
1187
1188Unique Rows
1189-----------
1190
1191Sometimes it makes sense to use only rows with unique key columns or even
1192fully unique rows from a table. This can be done using the above described
1193:meth:`~astropy.table.Table.group_by` method and ``groups`` attribute, or with
1194the :func:`~astropy.table.unique` convenience function. The
1195:func:`~astropy.table.unique` function returns a sorted table containing the
1196first row for each unique ``keys`` column value. If no ``keys`` is provided, it
1197returns a sorted table containing all of the fully unique rows.
1198
1199Example
1200^^^^^^^
1201
1202.. EXAMPLE START: Grouping Unique Rows in Tables
1203
1204An example of a situation where you might want to use rows with unique key
1205columns is a list of objects with photometry from various observing
1206runs. Using ``'name'`` as the only ``keys``, it returns with the first
1207occurrence of each of the three targets::
1208
1209  >>> from astropy import table
1210  >>> obs = table.Table.read("""name    obs_date    mag_b  mag_v
1211  ...                           M31     2012-01-02  17.0   17.5
1212  ...                           M82     2012-02-14  16.2   14.5
1213  ...                           M101    2012-01-02  15.1   13.5
1214  ...                           M31     2012-01-02  17.1   17.4
1215  ...                           M101    2012-01-02  15.1   13.5
1216  ...                           M82     2012-02-14  16.2   14.5
1217  ...                           M31     2012-02-14  16.9   17.3
1218  ...                           M82     2012-02-14  15.2   15.5
1219  ...                           M101    2012-02-14  15.0   13.6
1220  ...                           M82     2012-03-26  15.7   16.5
1221  ...                           M101    2012-03-26  15.1   13.5
1222  ...                           M101    2012-03-26  14.8   14.3
1223  ...                           """, format='ascii')
1224  >>> unique_by_name = table.unique(obs, keys='name')
1225  >>> print(unique_by_name)
1226  name  obs_date  mag_b mag_v
1227  ---- ---------- ----- -----
1228  M101 2012-01-02  15.1  13.5
1229   M31 2012-01-02  17.0  17.5
1230   M82 2012-02-14  16.2  14.5
1231
1232Using multiple columns as ``keys``::
1233
1234  >>> unique_by_name_date = table.unique(obs, keys=['name', 'obs_date'])
1235  >>> print(unique_by_name_date)
1236  name  obs_date  mag_b mag_v
1237  ---- ---------- ----- -----
1238  M101 2012-01-02  15.1  13.5
1239  M101 2012-02-14  15.0  13.6
1240  M101 2012-03-26  15.1  13.5
1241   M31 2012-01-02  17.0  17.5
1242   M31 2012-02-14  16.9  17.3
1243   M82 2012-02-14  16.2  14.5
1244   M82 2012-03-26  15.7  16.5
1245
1246.. EXAMPLE END
1247
1248.. _set-difference:
1249
1250Set Difference
1251--------------
1252
1253A set difference will tell you the elements that are contained in the first set
1254but not in the other. This concept can be applied to rows of a table by using
1255the :func:`~astropy.table.setdiff` function. You provide the function with two
1256input tables and it will return all rows in the first table which do not occur
1257in the second table.
1258
1259The optional ``keys`` parameter specifies the names of columns that are used to
1260match table rows. This can be a subset of the full list of columns, but both
1261the first and second tables must contain all columns specified by ``keys``.
1262If not provided, then ``keys`` defaults to all column names in the first table.
1263
1264If no different rows are found, the :func:`~astropy.table.setdiff` function
1265will return an empty table.
1266
1267Example
1268^^^^^^^
1269
1270.. EXAMPLE START: Using Set Difference in Tables
1271
1272The example below illustrates finding the set difference of two observation
1273lists using a common subset of the columns in two tables.::
1274
1275  >>> from astropy.table import Table, setdiff
1276  >>> cat_1 = Table.read("""name    obs_date    mag_b  mag_v
1277  ...                       M31     2012-01-02  17.0   16.0
1278  ...                       M82     2012-10-29  16.2   15.2
1279  ...                       M101    2012-10-31  15.1   15.5""", format='ascii')
1280  >>> cat_2 = Table.read("""   name    obs_date    logLx
1281  ...                          NGC3516 2011-11-11  42.1
1282  ...                          M31     2012-01-02  43.1
1283  ...                          M82     2012-10-29  45.0""", format='ascii')
1284  >>> sdiff = setdiff(cat_1, cat_2, keys=['name', 'obs_date'])
1285  >>> print(sdiff)
1286  name  obs_date  mag_b mag_v
1287  ---- ---------- ----- -----
1288  M101 2012-10-31  15.1  15.5
1289
1290In this example there is a column in the first table that is not
1291present in the second table, so the ``keys`` parameter must be used to specify
1292the desired column names.
1293
1294.. EXAMPLE END
1295
1296.. _table-diff:
1297
1298Table Diff
1299----------
1300
1301To compare two tables, you can use
1302:func:`~astropy.utils.diff.report_diff_values`, which would produce a report
1303identical to :ref:`FITS diff <io-fits-differs>`.
1304
1305Example
1306^^^^^^^
1307
1308.. EXAMPLE START: Using Table Diff to Compare Tables
1309
1310The example below illustrates finding the difference between two tables::
1311
1312  >>> from astropy.table import Table
1313  >>> from astropy.utils.diff import report_diff_values
1314  >>> import sys
1315  >>> cat_1 = Table.read("""name    obs_date    mag_b  mag_v
1316  ...                       M31     2012-01-02  17.0   16.0
1317  ...                       M82     2012-10-29  16.2   15.2
1318  ...                       M101    2012-10-31  15.1   15.5""", format='ascii')
1319  >>> cat_2 = Table.read("""name    obs_date    mag_b  mag_v
1320  ...                       M31     2012-01-02  17.0   16.5
1321  ...                       M82     2012-10-29  16.2   15.2
1322  ...                       M101    2012-10-30  15.1   15.5
1323  ...                       NEW     2018-05-08   nan    9.0""", format='ascii')
1324  >>> identical = report_diff_values(cat_1, cat_2, fileobj=sys.stdout)
1325       name  obs_date  mag_b mag_v
1326       ---- ---------- ----- -----
1327    a>  M31 2012-01-02  17.0  16.0
1328     ?                           ^
1329    b>  M31 2012-01-02  17.0  16.5
1330     ?                           ^
1331        M82 2012-10-29  16.2  15.2
1332    a> M101 2012-10-31  15.1  15.5
1333     ?               ^
1334    b> M101 2012-10-30  15.1  15.5
1335     ?               ^
1336    b>  NEW 2018-05-08   nan   9.0
1337  >>> identical
1338  False
1339
1340.. EXAMPLE END
1341