1.. _dsintro:
2
3{{ header }}
4
5************************
6Intro to data structures
7************************
8
9We'll start with a quick, non-comprehensive overview of the fundamental data
10structures in pandas to get you started. The fundamental behavior about data
11types, indexing, and axis labeling / alignment apply across all of the
12objects. To get started, import NumPy and load pandas into your namespace:
13
14.. ipython:: python
15
16   import numpy as np
17   import pandas as pd
18
19Here is a basic tenet to keep in mind: **data alignment is intrinsic**. The link
20between labels and data will not be broken unless done so explicitly by you.
21
22We'll give a brief intro to the data structures, then consider all of the broad
23categories of functionality and methods in separate sections.
24
25.. _basics.series:
26
27Series
28------
29
30:class:`Series` is a one-dimensional labeled array capable of holding any data
31type (integers, strings, floating point numbers, Python objects, etc.). The axis
32labels are collectively referred to as the **index**. The basic method to create a Series is to call:
33
34::
35
36    >>> s = pd.Series(data, index=index)
37
38Here, ``data`` can be many different things:
39
40* a Python dict
41* an ndarray
42* a scalar value (like 5)
43
44The passed **index** is a list of axis labels. Thus, this separates into a few
45cases depending on what **data is**:
46
47**From ndarray**
48
49If ``data`` is an ndarray, **index** must be the same length as **data**. If no
50index is passed, one will be created having values ``[0, ..., len(data) - 1]``.
51
52.. ipython:: python
53
54   s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"])
55   s
56   s.index
57
58   pd.Series(np.random.randn(5))
59
60.. note::
61
62    pandas supports non-unique index values. If an operation
63    that does not support duplicate index values is attempted, an exception
64    will be raised at that time. The reason for being lazy is nearly all performance-based
65    (there are many instances in computations, like parts of GroupBy, where the index
66    is not used).
67
68**From dict**
69
70Series can be instantiated from dicts:
71
72.. ipython:: python
73
74   d = {"b": 1, "a": 0, "c": 2}
75   pd.Series(d)
76
77.. note::
78
79   When the data is a dict, and an index is not passed, the ``Series`` index
80   will be ordered by the dict's insertion order, if you're using Python
81   version >= 3.6 and pandas version >= 0.23.
82
83   If you're using Python < 3.6 or pandas < 0.23, and an index is not passed,
84   the ``Series`` index will be the lexically ordered list of dict keys.
85
86In the example above, if you were on a Python version lower than 3.6 or a
87pandas version lower than 0.23, the ``Series`` would be ordered by the lexical
88order of the dict keys (i.e. ``['a', 'b', 'c']`` rather than ``['b', 'a', 'c']``).
89
90If an index is passed, the values in data corresponding to the labels in the
91index will be pulled out.
92
93.. ipython:: python
94
95   d = {"a": 0.0, "b": 1.0, "c": 2.0}
96   pd.Series(d)
97   pd.Series(d, index=["b", "c", "d", "a"])
98
99.. note::
100
101    NaN (not a number) is the standard missing data marker used in pandas.
102
103**From scalar value**
104
105If ``data`` is a scalar value, an index must be
106provided. The value will be repeated to match the length of **index**.
107
108.. ipython:: python
109
110   pd.Series(5.0, index=["a", "b", "c", "d", "e"])
111
112Series is ndarray-like
113~~~~~~~~~~~~~~~~~~~~~~
114
115``Series`` acts very similarly to a ``ndarray``, and is a valid argument to most NumPy functions.
116However, operations such as slicing will also slice the index.
117
118.. ipython:: python
119
120    s[0]
121    s[:3]
122    s[s > s.median()]
123    s[[4, 3, 1]]
124    np.exp(s)
125
126.. note::
127
128   We will address array-based indexing like ``s[[4, 3, 1]]``
129   in :ref:`section <indexing>`.
130
131Like a NumPy array, a pandas Series has a :attr:`~Series.dtype`.
132
133.. ipython:: python
134
135   s.dtype
136
137This is often a NumPy dtype. However, pandas and 3rd-party libraries
138extend NumPy's type system in a few places, in which case the dtype would
139be an :class:`~pandas.api.extensions.ExtensionDtype`. Some examples within
140pandas are :ref:`categorical` and :ref:`integer_na`. See :ref:`basics.dtypes`
141for more.
142
143If you need the actual array backing a ``Series``, use :attr:`Series.array`.
144
145.. ipython:: python
146
147   s.array
148
149Accessing the array can be useful when you need to do some operation without the
150index (to disable :ref:`automatic alignment <dsintro.alignment>`, for example).
151
152:attr:`Series.array` will always be an :class:`~pandas.api.extensions.ExtensionArray`.
153Briefly, an ExtensionArray is a thin wrapper around one or more *concrete* arrays like a
154:class:`numpy.ndarray`. pandas knows how to take an ``ExtensionArray`` and
155store it in a ``Series`` or a column of a ``DataFrame``.
156See :ref:`basics.dtypes` for more.
157
158While Series is ndarray-like, if you need an *actual* ndarray, then use
159:meth:`Series.to_numpy`.
160
161.. ipython:: python
162
163   s.to_numpy()
164
165Even if the Series is backed by a :class:`~pandas.api.extensions.ExtensionArray`,
166:meth:`Series.to_numpy` will return a NumPy ndarray.
167
168Series is dict-like
169~~~~~~~~~~~~~~~~~~~
170
171A Series is like a fixed-size dict in that you can get and set values by index
172label:
173
174.. ipython:: python
175
176    s["a"]
177    s["e"] = 12.0
178    s
179    "e" in s
180    "f" in s
181
182If a label is not contained, an exception is raised:
183
184.. code-block:: python
185
186    >>> s["f"]
187    KeyError: 'f'
188
189Using the ``get`` method, a missing label will return None or specified default:
190
191.. ipython:: python
192
193   s.get("f")
194
195   s.get("f", np.nan)
196
197See also the :ref:`section on attribute access<indexing.attribute_access>`.
198
199Vectorized operations and label alignment with Series
200~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
201
202When working with raw NumPy arrays, looping through value-by-value is usually
203not necessary. The same is true when working with Series in pandas.
204Series can also be passed into most NumPy methods expecting an ndarray.
205
206.. ipython:: python
207
208    s + s
209    s * 2
210    np.exp(s)
211
212A key difference between Series and ndarray is that operations between Series
213automatically align the data based on label. Thus, you can write computations
214without giving consideration to whether the Series involved have the same
215labels.
216
217.. ipython:: python
218
219    s[1:] + s[:-1]
220
221The result of an operation between unaligned Series will have the **union** of
222the indexes involved. If a label is not found in one Series or the other, the
223result will be marked as missing ``NaN``. Being able to write code without doing
224any explicit data alignment grants immense freedom and flexibility in
225interactive data analysis and research. The integrated data alignment features
226of the pandas data structures set pandas apart from the majority of related
227tools for working with labeled data.
228
229.. note::
230
231    In general, we chose to make the default result of operations between
232    differently indexed objects yield the **union** of the indexes in order to
233    avoid loss of information. Having an index label, though the data is
234    missing, is typically important information as part of a computation. You
235    of course have the option of dropping labels with missing data via the
236    **dropna** function.
237
238Name attribute
239~~~~~~~~~~~~~~
240
241.. _dsintro.name_attribute:
242
243Series can also have a ``name`` attribute:
244
245.. ipython:: python
246
247   s = pd.Series(np.random.randn(5), name="something")
248   s
249   s.name
250
251The Series ``name`` will be assigned automatically in many cases, in particular
252when taking 1D slices of DataFrame as you will see below.
253
254You can rename a Series with the :meth:`pandas.Series.rename` method.
255
256.. ipython:: python
257
258   s2 = s.rename("different")
259   s2.name
260
261Note that ``s`` and ``s2`` refer to different objects.
262
263.. _basics.dataframe:
264
265DataFrame
266---------
267
268**DataFrame** is a 2-dimensional labeled data structure with columns of
269potentially different types. You can think of it like a spreadsheet or SQL
270table, or a dict of Series objects. It is generally the most commonly used
271pandas object. Like Series, DataFrame accepts many different kinds of input:
272
273* Dict of 1D ndarrays, lists, dicts, or Series
274* 2-D numpy.ndarray
275* `Structured or record
276  <https://numpy.org/doc/stable/user/basics.rec.html>`__ ndarray
277* A ``Series``
278* Another ``DataFrame``
279
280Along with the data, you can optionally pass **index** (row labels) and
281**columns** (column labels) arguments. If you pass an index and / or columns,
282you are guaranteeing the index and / or columns of the resulting
283DataFrame. Thus, a dict of Series plus a specific index will discard all data
284not matching up to the passed index.
285
286If axis labels are not passed, they will be constructed from the input data
287based on common sense rules.
288
289.. note::
290
291   When the data is a dict, and ``columns`` is not specified, the ``DataFrame``
292   columns will be ordered by the dict's insertion order, if you are using
293   Python version >= 3.6 and pandas >= 0.23.
294
295   If you are using Python < 3.6 or pandas < 0.23, and ``columns`` is not
296   specified, the ``DataFrame`` columns will be the lexically ordered list of dict
297   keys.
298
299From dict of Series or dicts
300~~~~~~~~~~~~~~~~~~~~~~~~~~~~
301
302The resulting **index** will be the **union** of the indexes of the various
303Series. If there are any nested dicts, these will first be converted to
304Series. If no columns are passed, the columns will be the ordered list of dict
305keys.
306
307.. ipython:: python
308
309    d = {
310        "one": pd.Series([1.0, 2.0, 3.0], index=["a", "b", "c"]),
311        "two": pd.Series([1.0, 2.0, 3.0, 4.0], index=["a", "b", "c", "d"]),
312    }
313    df = pd.DataFrame(d)
314    df
315
316    pd.DataFrame(d, index=["d", "b", "a"])
317    pd.DataFrame(d, index=["d", "b", "a"], columns=["two", "three"])
318
319The row and column labels can be accessed respectively by accessing the
320**index** and **columns** attributes:
321
322.. note::
323
324   When a particular set of columns is passed along with a dict of data, the
325   passed columns override the keys in the dict.
326
327.. ipython:: python
328
329   df.index
330   df.columns
331
332From dict of ndarrays / lists
333~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
334
335The ndarrays must all be the same length. If an index is passed, it must
336clearly also be the same length as the arrays. If no index is passed, the
337result will be ``range(n)``, where ``n`` is the array length.
338
339.. ipython:: python
340
341   d = {"one": [1.0, 2.0, 3.0, 4.0], "two": [4.0, 3.0, 2.0, 1.0]}
342   pd.DataFrame(d)
343   pd.DataFrame(d, index=["a", "b", "c", "d"])
344
345From structured or record array
346~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
347
348This case is handled identically to a dict of arrays.
349
350.. ipython:: python
351
352   data = np.zeros((2,), dtype=[("A", "i4"), ("B", "f4"), ("C", "a10")])
353   data[:] = [(1, 2.0, "Hello"), (2, 3.0, "World")]
354
355   pd.DataFrame(data)
356   pd.DataFrame(data, index=["first", "second"])
357   pd.DataFrame(data, columns=["C", "A", "B"])
358
359.. note::
360
361    DataFrame is not intended to work exactly like a 2-dimensional NumPy
362    ndarray.
363
364.. _basics.dataframe.from_list_of_dicts:
365
366From a list of dicts
367~~~~~~~~~~~~~~~~~~~~
368
369.. ipython:: python
370
371   data2 = [{"a": 1, "b": 2}, {"a": 5, "b": 10, "c": 20}]
372   pd.DataFrame(data2)
373   pd.DataFrame(data2, index=["first", "second"])
374   pd.DataFrame(data2, columns=["a", "b"])
375
376.. _basics.dataframe.from_dict_of_tuples:
377
378From a dict of tuples
379~~~~~~~~~~~~~~~~~~~~~
380
381You can automatically create a MultiIndexed frame by passing a tuples
382dictionary.
383
384.. ipython:: python
385
386   pd.DataFrame(
387       {
388           ("a", "b"): {("A", "B"): 1, ("A", "C"): 2},
389           ("a", "a"): {("A", "C"): 3, ("A", "B"): 4},
390           ("a", "c"): {("A", "B"): 5, ("A", "C"): 6},
391           ("b", "a"): {("A", "C"): 7, ("A", "B"): 8},
392           ("b", "b"): {("A", "D"): 9, ("A", "B"): 10},
393       }
394   )
395
396.. _basics.dataframe.from_series:
397
398From a Series
399~~~~~~~~~~~~~
400
401The result will be a DataFrame with the same index as the input Series, and
402with one column whose name is the original name of the Series (only if no other
403column name provided).
404
405
406.. _basics.dataframe.from_list_namedtuples:
407
408From a list of namedtuples
409~~~~~~~~~~~~~~~~~~~~~~~~~~
410
411The field names of the first ``namedtuple`` in the list determine the columns
412of the ``DataFrame``. The remaining namedtuples (or tuples) are simply unpacked
413and their values are fed into the rows of the ``DataFrame``. If any of those
414tuples is shorter than the first ``namedtuple`` then the later columns in the
415corresponding row are marked as missing values. If any are longer than the
416first ``namedtuple``, a ``ValueError`` is raised.
417
418.. ipython:: python
419
420    from collections import namedtuple
421
422    Point = namedtuple("Point", "x y")
423
424    pd.DataFrame([Point(0, 0), Point(0, 3), (2, 3)])
425
426    Point3D = namedtuple("Point3D", "x y z")
427
428    pd.DataFrame([Point3D(0, 0, 0), Point3D(0, 3, 5), Point(2, 3)])
429
430
431.. _basics.dataframe.from_list_dataclasses:
432
433From a list of dataclasses
434~~~~~~~~~~~~~~~~~~~~~~~~~~
435
436.. versionadded:: 1.1.0
437
438Data Classes as introduced in `PEP557 <https://www.python.org/dev/peps/pep-0557>`__,
439can be passed into the DataFrame constructor.
440Passing a list of dataclasses is equivalent to passing a list of dictionaries.
441
442Please be aware, that all values in the list should be dataclasses, mixing
443types in the list would result in a TypeError.
444
445.. ipython:: python
446
447    from dataclasses import make_dataclass
448
449    Point = make_dataclass("Point", [("x", int), ("y", int)])
450
451    pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])
452
453**Missing data**
454
455Much more will be said on this topic in the :ref:`Missing data <missing_data>`
456section. To construct a DataFrame with missing data, we use ``np.nan`` to
457represent missing values. Alternatively, you may pass a ``numpy.MaskedArray``
458as the data argument to the DataFrame constructor, and its masked entries will
459be considered missing.
460
461Alternate constructors
462~~~~~~~~~~~~~~~~~~~~~~
463
464.. _basics.dataframe.from_dict:
465
466**DataFrame.from_dict**
467
468``DataFrame.from_dict`` takes a dict of dicts or a dict of array-like sequences
469and returns a DataFrame. It operates like the ``DataFrame`` constructor except
470for the ``orient`` parameter which is ``'columns'`` by default, but which can be
471set to ``'index'`` in order to use the dict keys as row labels.
472
473
474.. ipython:: python
475
476   pd.DataFrame.from_dict(dict([("A", [1, 2, 3]), ("B", [4, 5, 6])]))
477
478If you pass ``orient='index'``, the keys will be the row labels. In this
479case, you can also pass the desired column names:
480
481.. ipython:: python
482
483   pd.DataFrame.from_dict(
484       dict([("A", [1, 2, 3]), ("B", [4, 5, 6])]),
485       orient="index",
486       columns=["one", "two", "three"],
487   )
488
489.. _basics.dataframe.from_records:
490
491**DataFrame.from_records**
492
493``DataFrame.from_records`` takes a list of tuples or an ndarray with structured
494dtype. It works analogously to the normal ``DataFrame`` constructor, except that
495the resulting DataFrame index may be a specific field of the structured
496dtype. For example:
497
498.. ipython:: python
499
500   data
501   pd.DataFrame.from_records(data, index="C")
502
503.. _basics.dataframe.sel_add_del:
504
505Column selection, addition, deletion
506~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
507
508You can treat a DataFrame semantically like a dict of like-indexed Series
509objects. Getting, setting, and deleting columns works with the same syntax as
510the analogous dict operations:
511
512.. ipython:: python
513
514   df["one"]
515   df["three"] = df["one"] * df["two"]
516   df["flag"] = df["one"] > 2
517   df
518
519Columns can be deleted or popped like with a dict:
520
521.. ipython:: python
522
523   del df["two"]
524   three = df.pop("three")
525   df
526
527When inserting a scalar value, it will naturally be propagated to fill the
528column:
529
530.. ipython:: python
531
532   df["foo"] = "bar"
533   df
534
535When inserting a Series that does not have the same index as the DataFrame, it
536will be conformed to the DataFrame's index:
537
538.. ipython:: python
539
540   df["one_trunc"] = df["one"][:2]
541   df
542
543You can insert raw ndarrays but their length must match the length of the
544DataFrame's index.
545
546By default, columns get inserted at the end. The ``insert`` function is
547available to insert at a particular location in the columns:
548
549.. ipython:: python
550
551   df.insert(1, "bar", df["one"])
552   df
553
554.. _dsintro.chained_assignment:
555
556Assigning new columns in method chains
557~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
558
559Inspired by `dplyr's
560<https://dplyr.tidyverse.org/reference/mutate.html>`__
561``mutate`` verb, DataFrame has an :meth:`~pandas.DataFrame.assign`
562method that allows you to easily create new columns that are potentially
563derived from existing columns.
564
565.. ipython:: python
566
567   iris = pd.read_csv("data/iris.data")
568   iris.head()
569   iris.assign(sepal_ratio=iris["SepalWidth"] / iris["SepalLength"]).head()
570
571In the example above, we inserted a precomputed value. We can also pass in
572a function of one argument to be evaluated on the DataFrame being assigned to.
573
574.. ipython:: python
575
576   iris.assign(sepal_ratio=lambda x: (x["SepalWidth"] / x["SepalLength"])).head()
577
578``assign`` **always** returns a copy of the data, leaving the original
579DataFrame untouched.
580
581Passing a callable, as opposed to an actual value to be inserted, is
582useful when you don't have a reference to the DataFrame at hand. This is
583common when using ``assign`` in a chain of operations. For example,
584we can limit the DataFrame to just those observations with a Sepal Length
585greater than 5, calculate the ratio, and plot:
586
587.. ipython:: python
588
589   @savefig basics_assign.png
590   (
591       iris.query("SepalLength > 5")
592       .assign(
593           SepalRatio=lambda x: x.SepalWidth / x.SepalLength,
594           PetalRatio=lambda x: x.PetalWidth / x.PetalLength,
595       )
596       .plot(kind="scatter", x="SepalRatio", y="PetalRatio")
597   )
598
599Since a function is passed in, the function is computed on the DataFrame
600being assigned to. Importantly, this is the DataFrame that's been filtered
601to those rows with sepal length greater than 5. The filtering happens first,
602and then the ratio calculations. This is an example where we didn't
603have a reference to the *filtered* DataFrame available.
604
605The function signature for ``assign`` is simply ``**kwargs``. The keys
606are the column names for the new fields, and the values are either a value
607to be inserted (for example, a ``Series`` or NumPy array), or a function
608of one argument to be called on the ``DataFrame``. A *copy* of the original
609DataFrame is returned, with the new values inserted.
610
611Starting with Python 3.6 the order of ``**kwargs`` is preserved. This allows
612for *dependent* assignment, where an expression later in ``**kwargs`` can refer
613to a column created earlier in the same :meth:`~DataFrame.assign`.
614
615.. ipython:: python
616
617   dfa = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
618   dfa.assign(C=lambda x: x["A"] + x["B"], D=lambda x: x["A"] + x["C"])
619
620In the second expression, ``x['C']`` will refer to the newly created column,
621that's equal to ``dfa['A'] + dfa['B']``.
622
623
624Indexing / selection
625~~~~~~~~~~~~~~~~~~~~
626The basics of indexing are as follows:
627
628.. csv-table::
629    :header: "Operation", "Syntax", "Result"
630    :widths: 30, 20, 10
631
632    Select column, ``df[col]``, Series
633    Select row by label, ``df.loc[label]``, Series
634    Select row by integer location, ``df.iloc[loc]``, Series
635    Slice rows, ``df[5:10]``, DataFrame
636    Select rows by boolean vector, ``df[bool_vec]``, DataFrame
637
638Row selection, for example, returns a Series whose index is the columns of the
639DataFrame:
640
641.. ipython:: python
642
643   df.loc["b"]
644   df.iloc[2]
645
646For a more exhaustive treatment of sophisticated label-based indexing and
647slicing, see the :ref:`section on indexing <indexing>`. We will address the
648fundamentals of reindexing / conforming to new sets of labels in the
649:ref:`section on reindexing <basics.reindexing>`.
650
651.. _dsintro.alignment:
652
653Data alignment and arithmetic
654~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
655
656Data alignment between DataFrame objects automatically align on **both the
657columns and the index (row labels)**. Again, the resulting object will have the
658union of the column and row labels.
659
660.. ipython:: python
661
662    df = pd.DataFrame(np.random.randn(10, 4), columns=["A", "B", "C", "D"])
663    df2 = pd.DataFrame(np.random.randn(7, 3), columns=["A", "B", "C"])
664    df + df2
665
666When doing an operation between DataFrame and Series, the default behavior is
667to align the Series **index** on the DataFrame **columns**, thus `broadcasting
668<https://numpy.org/doc/stable/user/basics.broadcasting.html>`__
669row-wise. For example:
670
671.. ipython:: python
672
673   df - df.iloc[0]
674
675For explicit control over the matching and broadcasting behavior, see the
676section on :ref:`flexible binary operations <basics.binop>`.
677
678Operations with scalars are just as you would expect:
679
680.. ipython:: python
681
682   df * 5 + 2
683   1 / df
684   df ** 4
685
686.. _dsintro.boolean:
687
688Boolean operators work as well:
689
690.. ipython:: python
691
692   df1 = pd.DataFrame({"a": [1, 0, 1], "b": [0, 1, 1]}, dtype=bool)
693   df2 = pd.DataFrame({"a": [0, 1, 1], "b": [1, 1, 0]}, dtype=bool)
694   df1 & df2
695   df1 | df2
696   df1 ^ df2
697   -df1
698
699Transposing
700~~~~~~~~~~~
701
702To transpose, access the ``T`` attribute (also the ``transpose`` function),
703similar to an ndarray:
704
705.. ipython:: python
706
707   # only show the first 5 rows
708   df[:5].T
709
710.. _dsintro.numpy_interop:
711
712DataFrame interoperability with NumPy functions
713~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
714
715Elementwise NumPy ufuncs (log, exp, sqrt, ...) and various other NumPy functions
716can be used with no issues on Series and DataFrame, assuming the data within
717are numeric:
718
719.. ipython:: python
720
721   np.exp(df)
722   np.asarray(df)
723
724DataFrame is not intended to be a drop-in replacement for ndarray as its
725indexing semantics and data model are quite different in places from an n-dimensional
726array.
727
728:class:`Series` implements ``__array_ufunc__``, which allows it to work with NumPy's
729`universal functions <https://numpy.org/doc/stable/reference/ufuncs.html>`_.
730
731The ufunc is applied to the underlying array in a Series.
732
733.. ipython:: python
734
735   ser = pd.Series([1, 2, 3, 4])
736   np.exp(ser)
737
738.. versionchanged:: 0.25.0
739
740   When multiple ``Series`` are passed to a ufunc, they are aligned before
741   performing the operation.
742
743Like other parts of the library, pandas will automatically align labeled inputs
744as part of a ufunc with multiple inputs. For example, using :meth:`numpy.remainder`
745on two :class:`Series` with differently ordered labels will align before the operation.
746
747.. ipython:: python
748
749   ser1 = pd.Series([1, 2, 3], index=["a", "b", "c"])
750   ser2 = pd.Series([1, 3, 5], index=["b", "a", "c"])
751   ser1
752   ser2
753   np.remainder(ser1, ser2)
754
755As usual, the union of the two indices is taken, and non-overlapping values are filled
756with missing values.
757
758.. ipython:: python
759
760   ser3 = pd.Series([2, 4, 6], index=["b", "c", "d"])
761   ser3
762   np.remainder(ser1, ser3)
763
764When a binary ufunc is applied to a :class:`Series` and :class:`Index`, the Series
765implementation takes precedence and a Series is returned.
766
767.. ipython:: python
768
769   ser = pd.Series([1, 2, 3])
770   idx = pd.Index([4, 5, 6])
771
772   np.maximum(ser, idx)
773
774NumPy ufuncs are safe to apply to :class:`Series` backed by non-ndarray arrays,
775for example :class:`arrays.SparseArray` (see :ref:`sparse.calculation`). If possible,
776the ufunc is applied without converting the underlying data to an ndarray.
777
778Console display
779~~~~~~~~~~~~~~~
780
781Very large DataFrames will be truncated to display them in the console.
782You can also get a summary using :meth:`~pandas.DataFrame.info`.
783(Here I am reading a CSV version of the **baseball** dataset from the **plyr**
784R package):
785
786.. ipython:: python
787   :suppress:
788
789   # force a summary to be printed
790   pd.set_option("display.max_rows", 5)
791
792.. ipython:: python
793
794   baseball = pd.read_csv("data/baseball.csv")
795   print(baseball)
796   baseball.info()
797
798.. ipython:: python
799   :suppress:
800   :okwarning:
801
802   # restore GlobalPrintConfig
803   pd.reset_option(r"^display\.")
804
805However, using ``to_string`` will return a string representation of the
806DataFrame in tabular form, though it won't always fit the console width:
807
808.. ipython:: python
809
810   print(baseball.iloc[-20:, :12].to_string())
811
812Wide DataFrames will be printed across multiple rows by
813default:
814
815.. ipython:: python
816
817   pd.DataFrame(np.random.randn(3, 12))
818
819You can change how much to print on a single row by setting the ``display.width``
820option:
821
822.. ipython:: python
823
824   pd.set_option("display.width", 40)  # default is 80
825
826   pd.DataFrame(np.random.randn(3, 12))
827
828You can adjust the max width of the individual columns by setting ``display.max_colwidth``
829
830.. ipython:: python
831
832   datafile = {
833       "filename": ["filename_01", "filename_02"],
834       "path": [
835           "media/user_name/storage/folder_01/filename_01",
836           "media/user_name/storage/folder_02/filename_02",
837       ],
838   }
839
840   pd.set_option("display.max_colwidth", 30)
841   pd.DataFrame(datafile)
842
843   pd.set_option("display.max_colwidth", 100)
844   pd.DataFrame(datafile)
845
846.. ipython:: python
847   :suppress:
848
849   pd.reset_option("display.width")
850   pd.reset_option("display.max_colwidth")
851
852You can also disable this feature via the ``expand_frame_repr`` option.
853This will print the table in one block.
854
855DataFrame column attribute access and IPython completion
856~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
857
858If a DataFrame column label is a valid Python variable name, the column can be
859accessed like an attribute:
860
861.. ipython:: python
862
863   df = pd.DataFrame({"foo1": np.random.randn(5), "foo2": np.random.randn(5)})
864   df
865   df.foo1
866
867The columns are also connected to the `IPython <https://ipython.org>`__
868completion mechanism so they can be tab-completed:
869
870.. code-block:: ipython
871
872    In [5]: df.fo<TAB>  # noqa: E225, E999
873    df.foo1  df.foo2
874