1.. _dsintro: 2 3{{ header }} 4 5************************ 6Intro to data structures 7************************ 8 9We'll start with a quick, non-comprehensive overview of the fundamental data 10structures in pandas to get you started. The fundamental behavior about data 11types, indexing, and axis labeling / alignment apply across all of the 12objects. To get started, import NumPy and load pandas into your namespace: 13 14.. ipython:: python 15 16 import numpy as np 17 import pandas as pd 18 19Here is a basic tenet to keep in mind: **data alignment is intrinsic**. The link 20between labels and data will not be broken unless done so explicitly by you. 21 22We'll give a brief intro to the data structures, then consider all of the broad 23categories of functionality and methods in separate sections. 24 25.. _basics.series: 26 27Series 28------ 29 30:class:`Series` is a one-dimensional labeled array capable of holding any data 31type (integers, strings, floating point numbers, Python objects, etc.). The axis 32labels are collectively referred to as the **index**. The basic method to create a Series is to call: 33 34:: 35 36 >>> s = pd.Series(data, index=index) 37 38Here, ``data`` can be many different things: 39 40* a Python dict 41* an ndarray 42* a scalar value (like 5) 43 44The passed **index** is a list of axis labels. Thus, this separates into a few 45cases depending on what **data is**: 46 47**From ndarray** 48 49If ``data`` is an ndarray, **index** must be the same length as **data**. If no 50index is passed, one will be created having values ``[0, ..., len(data) - 1]``. 51 52.. ipython:: python 53 54 s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"]) 55 s 56 s.index 57 58 pd.Series(np.random.randn(5)) 59 60.. note:: 61 62 pandas supports non-unique index values. If an operation 63 that does not support duplicate index values is attempted, an exception 64 will be raised at that time. The reason for being lazy is nearly all performance-based 65 (there are many instances in computations, like parts of GroupBy, where the index 66 is not used). 67 68**From dict** 69 70Series can be instantiated from dicts: 71 72.. ipython:: python 73 74 d = {"b": 1, "a": 0, "c": 2} 75 pd.Series(d) 76 77.. note:: 78 79 When the data is a dict, and an index is not passed, the ``Series`` index 80 will be ordered by the dict's insertion order, if you're using Python 81 version >= 3.6 and pandas version >= 0.23. 82 83 If you're using Python < 3.6 or pandas < 0.23, and an index is not passed, 84 the ``Series`` index will be the lexically ordered list of dict keys. 85 86In the example above, if you were on a Python version lower than 3.6 or a 87pandas version lower than 0.23, the ``Series`` would be ordered by the lexical 88order of the dict keys (i.e. ``['a', 'b', 'c']`` rather than ``['b', 'a', 'c']``). 89 90If an index is passed, the values in data corresponding to the labels in the 91index will be pulled out. 92 93.. ipython:: python 94 95 d = {"a": 0.0, "b": 1.0, "c": 2.0} 96 pd.Series(d) 97 pd.Series(d, index=["b", "c", "d", "a"]) 98 99.. note:: 100 101 NaN (not a number) is the standard missing data marker used in pandas. 102 103**From scalar value** 104 105If ``data`` is a scalar value, an index must be 106provided. The value will be repeated to match the length of **index**. 107 108.. ipython:: python 109 110 pd.Series(5.0, index=["a", "b", "c", "d", "e"]) 111 112Series is ndarray-like 113~~~~~~~~~~~~~~~~~~~~~~ 114 115``Series`` acts very similarly to a ``ndarray``, and is a valid argument to most NumPy functions. 116However, operations such as slicing will also slice the index. 117 118.. ipython:: python 119 120 s[0] 121 s[:3] 122 s[s > s.median()] 123 s[[4, 3, 1]] 124 np.exp(s) 125 126.. note:: 127 128 We will address array-based indexing like ``s[[4, 3, 1]]`` 129 in :ref:`section <indexing>`. 130 131Like a NumPy array, a pandas Series has a :attr:`~Series.dtype`. 132 133.. ipython:: python 134 135 s.dtype 136 137This is often a NumPy dtype. However, pandas and 3rd-party libraries 138extend NumPy's type system in a few places, in which case the dtype would 139be an :class:`~pandas.api.extensions.ExtensionDtype`. Some examples within 140pandas are :ref:`categorical` and :ref:`integer_na`. See :ref:`basics.dtypes` 141for more. 142 143If you need the actual array backing a ``Series``, use :attr:`Series.array`. 144 145.. ipython:: python 146 147 s.array 148 149Accessing the array can be useful when you need to do some operation without the 150index (to disable :ref:`automatic alignment <dsintro.alignment>`, for example). 151 152:attr:`Series.array` will always be an :class:`~pandas.api.extensions.ExtensionArray`. 153Briefly, an ExtensionArray is a thin wrapper around one or more *concrete* arrays like a 154:class:`numpy.ndarray`. pandas knows how to take an ``ExtensionArray`` and 155store it in a ``Series`` or a column of a ``DataFrame``. 156See :ref:`basics.dtypes` for more. 157 158While Series is ndarray-like, if you need an *actual* ndarray, then use 159:meth:`Series.to_numpy`. 160 161.. ipython:: python 162 163 s.to_numpy() 164 165Even if the Series is backed by a :class:`~pandas.api.extensions.ExtensionArray`, 166:meth:`Series.to_numpy` will return a NumPy ndarray. 167 168Series is dict-like 169~~~~~~~~~~~~~~~~~~~ 170 171A Series is like a fixed-size dict in that you can get and set values by index 172label: 173 174.. ipython:: python 175 176 s["a"] 177 s["e"] = 12.0 178 s 179 "e" in s 180 "f" in s 181 182If a label is not contained, an exception is raised: 183 184.. code-block:: python 185 186 >>> s["f"] 187 KeyError: 'f' 188 189Using the ``get`` method, a missing label will return None or specified default: 190 191.. ipython:: python 192 193 s.get("f") 194 195 s.get("f", np.nan) 196 197See also the :ref:`section on attribute access<indexing.attribute_access>`. 198 199Vectorized operations and label alignment with Series 200~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 201 202When working with raw NumPy arrays, looping through value-by-value is usually 203not necessary. The same is true when working with Series in pandas. 204Series can also be passed into most NumPy methods expecting an ndarray. 205 206.. ipython:: python 207 208 s + s 209 s * 2 210 np.exp(s) 211 212A key difference between Series and ndarray is that operations between Series 213automatically align the data based on label. Thus, you can write computations 214without giving consideration to whether the Series involved have the same 215labels. 216 217.. ipython:: python 218 219 s[1:] + s[:-1] 220 221The result of an operation between unaligned Series will have the **union** of 222the indexes involved. If a label is not found in one Series or the other, the 223result will be marked as missing ``NaN``. Being able to write code without doing 224any explicit data alignment grants immense freedom and flexibility in 225interactive data analysis and research. The integrated data alignment features 226of the pandas data structures set pandas apart from the majority of related 227tools for working with labeled data. 228 229.. note:: 230 231 In general, we chose to make the default result of operations between 232 differently indexed objects yield the **union** of the indexes in order to 233 avoid loss of information. Having an index label, though the data is 234 missing, is typically important information as part of a computation. You 235 of course have the option of dropping labels with missing data via the 236 **dropna** function. 237 238Name attribute 239~~~~~~~~~~~~~~ 240 241.. _dsintro.name_attribute: 242 243Series can also have a ``name`` attribute: 244 245.. ipython:: python 246 247 s = pd.Series(np.random.randn(5), name="something") 248 s 249 s.name 250 251The Series ``name`` will be assigned automatically in many cases, in particular 252when taking 1D slices of DataFrame as you will see below. 253 254You can rename a Series with the :meth:`pandas.Series.rename` method. 255 256.. ipython:: python 257 258 s2 = s.rename("different") 259 s2.name 260 261Note that ``s`` and ``s2`` refer to different objects. 262 263.. _basics.dataframe: 264 265DataFrame 266--------- 267 268**DataFrame** is a 2-dimensional labeled data structure with columns of 269potentially different types. You can think of it like a spreadsheet or SQL 270table, or a dict of Series objects. It is generally the most commonly used 271pandas object. Like Series, DataFrame accepts many different kinds of input: 272 273* Dict of 1D ndarrays, lists, dicts, or Series 274* 2-D numpy.ndarray 275* `Structured or record 276 <https://numpy.org/doc/stable/user/basics.rec.html>`__ ndarray 277* A ``Series`` 278* Another ``DataFrame`` 279 280Along with the data, you can optionally pass **index** (row labels) and 281**columns** (column labels) arguments. If you pass an index and / or columns, 282you are guaranteeing the index and / or columns of the resulting 283DataFrame. Thus, a dict of Series plus a specific index will discard all data 284not matching up to the passed index. 285 286If axis labels are not passed, they will be constructed from the input data 287based on common sense rules. 288 289.. note:: 290 291 When the data is a dict, and ``columns`` is not specified, the ``DataFrame`` 292 columns will be ordered by the dict's insertion order, if you are using 293 Python version >= 3.6 and pandas >= 0.23. 294 295 If you are using Python < 3.6 or pandas < 0.23, and ``columns`` is not 296 specified, the ``DataFrame`` columns will be the lexically ordered list of dict 297 keys. 298 299From dict of Series or dicts 300~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 301 302The resulting **index** will be the **union** of the indexes of the various 303Series. If there are any nested dicts, these will first be converted to 304Series. If no columns are passed, the columns will be the ordered list of dict 305keys. 306 307.. ipython:: python 308 309 d = { 310 "one": pd.Series([1.0, 2.0, 3.0], index=["a", "b", "c"]), 311 "two": pd.Series([1.0, 2.0, 3.0, 4.0], index=["a", "b", "c", "d"]), 312 } 313 df = pd.DataFrame(d) 314 df 315 316 pd.DataFrame(d, index=["d", "b", "a"]) 317 pd.DataFrame(d, index=["d", "b", "a"], columns=["two", "three"]) 318 319The row and column labels can be accessed respectively by accessing the 320**index** and **columns** attributes: 321 322.. note:: 323 324 When a particular set of columns is passed along with a dict of data, the 325 passed columns override the keys in the dict. 326 327.. ipython:: python 328 329 df.index 330 df.columns 331 332From dict of ndarrays / lists 333~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 334 335The ndarrays must all be the same length. If an index is passed, it must 336clearly also be the same length as the arrays. If no index is passed, the 337result will be ``range(n)``, where ``n`` is the array length. 338 339.. ipython:: python 340 341 d = {"one": [1.0, 2.0, 3.0, 4.0], "two": [4.0, 3.0, 2.0, 1.0]} 342 pd.DataFrame(d) 343 pd.DataFrame(d, index=["a", "b", "c", "d"]) 344 345From structured or record array 346~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 347 348This case is handled identically to a dict of arrays. 349 350.. ipython:: python 351 352 data = np.zeros((2,), dtype=[("A", "i4"), ("B", "f4"), ("C", "a10")]) 353 data[:] = [(1, 2.0, "Hello"), (2, 3.0, "World")] 354 355 pd.DataFrame(data) 356 pd.DataFrame(data, index=["first", "second"]) 357 pd.DataFrame(data, columns=["C", "A", "B"]) 358 359.. note:: 360 361 DataFrame is not intended to work exactly like a 2-dimensional NumPy 362 ndarray. 363 364.. _basics.dataframe.from_list_of_dicts: 365 366From a list of dicts 367~~~~~~~~~~~~~~~~~~~~ 368 369.. ipython:: python 370 371 data2 = [{"a": 1, "b": 2}, {"a": 5, "b": 10, "c": 20}] 372 pd.DataFrame(data2) 373 pd.DataFrame(data2, index=["first", "second"]) 374 pd.DataFrame(data2, columns=["a", "b"]) 375 376.. _basics.dataframe.from_dict_of_tuples: 377 378From a dict of tuples 379~~~~~~~~~~~~~~~~~~~~~ 380 381You can automatically create a MultiIndexed frame by passing a tuples 382dictionary. 383 384.. ipython:: python 385 386 pd.DataFrame( 387 { 388 ("a", "b"): {("A", "B"): 1, ("A", "C"): 2}, 389 ("a", "a"): {("A", "C"): 3, ("A", "B"): 4}, 390 ("a", "c"): {("A", "B"): 5, ("A", "C"): 6}, 391 ("b", "a"): {("A", "C"): 7, ("A", "B"): 8}, 392 ("b", "b"): {("A", "D"): 9, ("A", "B"): 10}, 393 } 394 ) 395 396.. _basics.dataframe.from_series: 397 398From a Series 399~~~~~~~~~~~~~ 400 401The result will be a DataFrame with the same index as the input Series, and 402with one column whose name is the original name of the Series (only if no other 403column name provided). 404 405 406.. _basics.dataframe.from_list_namedtuples: 407 408From a list of namedtuples 409~~~~~~~~~~~~~~~~~~~~~~~~~~ 410 411The field names of the first ``namedtuple`` in the list determine the columns 412of the ``DataFrame``. The remaining namedtuples (or tuples) are simply unpacked 413and their values are fed into the rows of the ``DataFrame``. If any of those 414tuples is shorter than the first ``namedtuple`` then the later columns in the 415corresponding row are marked as missing values. If any are longer than the 416first ``namedtuple``, a ``ValueError`` is raised. 417 418.. ipython:: python 419 420 from collections import namedtuple 421 422 Point = namedtuple("Point", "x y") 423 424 pd.DataFrame([Point(0, 0), Point(0, 3), (2, 3)]) 425 426 Point3D = namedtuple("Point3D", "x y z") 427 428 pd.DataFrame([Point3D(0, 0, 0), Point3D(0, 3, 5), Point(2, 3)]) 429 430 431.. _basics.dataframe.from_list_dataclasses: 432 433From a list of dataclasses 434~~~~~~~~~~~~~~~~~~~~~~~~~~ 435 436.. versionadded:: 1.1.0 437 438Data Classes as introduced in `PEP557 <https://www.python.org/dev/peps/pep-0557>`__, 439can be passed into the DataFrame constructor. 440Passing a list of dataclasses is equivalent to passing a list of dictionaries. 441 442Please be aware, that all values in the list should be dataclasses, mixing 443types in the list would result in a TypeError. 444 445.. ipython:: python 446 447 from dataclasses import make_dataclass 448 449 Point = make_dataclass("Point", [("x", int), ("y", int)]) 450 451 pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)]) 452 453**Missing data** 454 455Much more will be said on this topic in the :ref:`Missing data <missing_data>` 456section. To construct a DataFrame with missing data, we use ``np.nan`` to 457represent missing values. Alternatively, you may pass a ``numpy.MaskedArray`` 458as the data argument to the DataFrame constructor, and its masked entries will 459be considered missing. 460 461Alternate constructors 462~~~~~~~~~~~~~~~~~~~~~~ 463 464.. _basics.dataframe.from_dict: 465 466**DataFrame.from_dict** 467 468``DataFrame.from_dict`` takes a dict of dicts or a dict of array-like sequences 469and returns a DataFrame. It operates like the ``DataFrame`` constructor except 470for the ``orient`` parameter which is ``'columns'`` by default, but which can be 471set to ``'index'`` in order to use the dict keys as row labels. 472 473 474.. ipython:: python 475 476 pd.DataFrame.from_dict(dict([("A", [1, 2, 3]), ("B", [4, 5, 6])])) 477 478If you pass ``orient='index'``, the keys will be the row labels. In this 479case, you can also pass the desired column names: 480 481.. ipython:: python 482 483 pd.DataFrame.from_dict( 484 dict([("A", [1, 2, 3]), ("B", [4, 5, 6])]), 485 orient="index", 486 columns=["one", "two", "three"], 487 ) 488 489.. _basics.dataframe.from_records: 490 491**DataFrame.from_records** 492 493``DataFrame.from_records`` takes a list of tuples or an ndarray with structured 494dtype. It works analogously to the normal ``DataFrame`` constructor, except that 495the resulting DataFrame index may be a specific field of the structured 496dtype. For example: 497 498.. ipython:: python 499 500 data 501 pd.DataFrame.from_records(data, index="C") 502 503.. _basics.dataframe.sel_add_del: 504 505Column selection, addition, deletion 506~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 507 508You can treat a DataFrame semantically like a dict of like-indexed Series 509objects. Getting, setting, and deleting columns works with the same syntax as 510the analogous dict operations: 511 512.. ipython:: python 513 514 df["one"] 515 df["three"] = df["one"] * df["two"] 516 df["flag"] = df["one"] > 2 517 df 518 519Columns can be deleted or popped like with a dict: 520 521.. ipython:: python 522 523 del df["two"] 524 three = df.pop("three") 525 df 526 527When inserting a scalar value, it will naturally be propagated to fill the 528column: 529 530.. ipython:: python 531 532 df["foo"] = "bar" 533 df 534 535When inserting a Series that does not have the same index as the DataFrame, it 536will be conformed to the DataFrame's index: 537 538.. ipython:: python 539 540 df["one_trunc"] = df["one"][:2] 541 df 542 543You can insert raw ndarrays but their length must match the length of the 544DataFrame's index. 545 546By default, columns get inserted at the end. The ``insert`` function is 547available to insert at a particular location in the columns: 548 549.. ipython:: python 550 551 df.insert(1, "bar", df["one"]) 552 df 553 554.. _dsintro.chained_assignment: 555 556Assigning new columns in method chains 557~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 558 559Inspired by `dplyr's 560<https://dplyr.tidyverse.org/reference/mutate.html>`__ 561``mutate`` verb, DataFrame has an :meth:`~pandas.DataFrame.assign` 562method that allows you to easily create new columns that are potentially 563derived from existing columns. 564 565.. ipython:: python 566 567 iris = pd.read_csv("data/iris.data") 568 iris.head() 569 iris.assign(sepal_ratio=iris["SepalWidth"] / iris["SepalLength"]).head() 570 571In the example above, we inserted a precomputed value. We can also pass in 572a function of one argument to be evaluated on the DataFrame being assigned to. 573 574.. ipython:: python 575 576 iris.assign(sepal_ratio=lambda x: (x["SepalWidth"] / x["SepalLength"])).head() 577 578``assign`` **always** returns a copy of the data, leaving the original 579DataFrame untouched. 580 581Passing a callable, as opposed to an actual value to be inserted, is 582useful when you don't have a reference to the DataFrame at hand. This is 583common when using ``assign`` in a chain of operations. For example, 584we can limit the DataFrame to just those observations with a Sepal Length 585greater than 5, calculate the ratio, and plot: 586 587.. ipython:: python 588 589 @savefig basics_assign.png 590 ( 591 iris.query("SepalLength > 5") 592 .assign( 593 SepalRatio=lambda x: x.SepalWidth / x.SepalLength, 594 PetalRatio=lambda x: x.PetalWidth / x.PetalLength, 595 ) 596 .plot(kind="scatter", x="SepalRatio", y="PetalRatio") 597 ) 598 599Since a function is passed in, the function is computed on the DataFrame 600being assigned to. Importantly, this is the DataFrame that's been filtered 601to those rows with sepal length greater than 5. The filtering happens first, 602and then the ratio calculations. This is an example where we didn't 603have a reference to the *filtered* DataFrame available. 604 605The function signature for ``assign`` is simply ``**kwargs``. The keys 606are the column names for the new fields, and the values are either a value 607to be inserted (for example, a ``Series`` or NumPy array), or a function 608of one argument to be called on the ``DataFrame``. A *copy* of the original 609DataFrame is returned, with the new values inserted. 610 611Starting with Python 3.6 the order of ``**kwargs`` is preserved. This allows 612for *dependent* assignment, where an expression later in ``**kwargs`` can refer 613to a column created earlier in the same :meth:`~DataFrame.assign`. 614 615.. ipython:: python 616 617 dfa = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}) 618 dfa.assign(C=lambda x: x["A"] + x["B"], D=lambda x: x["A"] + x["C"]) 619 620In the second expression, ``x['C']`` will refer to the newly created column, 621that's equal to ``dfa['A'] + dfa['B']``. 622 623 624Indexing / selection 625~~~~~~~~~~~~~~~~~~~~ 626The basics of indexing are as follows: 627 628.. csv-table:: 629 :header: "Operation", "Syntax", "Result" 630 :widths: 30, 20, 10 631 632 Select column, ``df[col]``, Series 633 Select row by label, ``df.loc[label]``, Series 634 Select row by integer location, ``df.iloc[loc]``, Series 635 Slice rows, ``df[5:10]``, DataFrame 636 Select rows by boolean vector, ``df[bool_vec]``, DataFrame 637 638Row selection, for example, returns a Series whose index is the columns of the 639DataFrame: 640 641.. ipython:: python 642 643 df.loc["b"] 644 df.iloc[2] 645 646For a more exhaustive treatment of sophisticated label-based indexing and 647slicing, see the :ref:`section on indexing <indexing>`. We will address the 648fundamentals of reindexing / conforming to new sets of labels in the 649:ref:`section on reindexing <basics.reindexing>`. 650 651.. _dsintro.alignment: 652 653Data alignment and arithmetic 654~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 655 656Data alignment between DataFrame objects automatically align on **both the 657columns and the index (row labels)**. Again, the resulting object will have the 658union of the column and row labels. 659 660.. ipython:: python 661 662 df = pd.DataFrame(np.random.randn(10, 4), columns=["A", "B", "C", "D"]) 663 df2 = pd.DataFrame(np.random.randn(7, 3), columns=["A", "B", "C"]) 664 df + df2 665 666When doing an operation between DataFrame and Series, the default behavior is 667to align the Series **index** on the DataFrame **columns**, thus `broadcasting 668<https://numpy.org/doc/stable/user/basics.broadcasting.html>`__ 669row-wise. For example: 670 671.. ipython:: python 672 673 df - df.iloc[0] 674 675For explicit control over the matching and broadcasting behavior, see the 676section on :ref:`flexible binary operations <basics.binop>`. 677 678Operations with scalars are just as you would expect: 679 680.. ipython:: python 681 682 df * 5 + 2 683 1 / df 684 df ** 4 685 686.. _dsintro.boolean: 687 688Boolean operators work as well: 689 690.. ipython:: python 691 692 df1 = pd.DataFrame({"a": [1, 0, 1], "b": [0, 1, 1]}, dtype=bool) 693 df2 = pd.DataFrame({"a": [0, 1, 1], "b": [1, 1, 0]}, dtype=bool) 694 df1 & df2 695 df1 | df2 696 df1 ^ df2 697 -df1 698 699Transposing 700~~~~~~~~~~~ 701 702To transpose, access the ``T`` attribute (also the ``transpose`` function), 703similar to an ndarray: 704 705.. ipython:: python 706 707 # only show the first 5 rows 708 df[:5].T 709 710.. _dsintro.numpy_interop: 711 712DataFrame interoperability with NumPy functions 713~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 714 715Elementwise NumPy ufuncs (log, exp, sqrt, ...) and various other NumPy functions 716can be used with no issues on Series and DataFrame, assuming the data within 717are numeric: 718 719.. ipython:: python 720 721 np.exp(df) 722 np.asarray(df) 723 724DataFrame is not intended to be a drop-in replacement for ndarray as its 725indexing semantics and data model are quite different in places from an n-dimensional 726array. 727 728:class:`Series` implements ``__array_ufunc__``, which allows it to work with NumPy's 729`universal functions <https://numpy.org/doc/stable/reference/ufuncs.html>`_. 730 731The ufunc is applied to the underlying array in a Series. 732 733.. ipython:: python 734 735 ser = pd.Series([1, 2, 3, 4]) 736 np.exp(ser) 737 738.. versionchanged:: 0.25.0 739 740 When multiple ``Series`` are passed to a ufunc, they are aligned before 741 performing the operation. 742 743Like other parts of the library, pandas will automatically align labeled inputs 744as part of a ufunc with multiple inputs. For example, using :meth:`numpy.remainder` 745on two :class:`Series` with differently ordered labels will align before the operation. 746 747.. ipython:: python 748 749 ser1 = pd.Series([1, 2, 3], index=["a", "b", "c"]) 750 ser2 = pd.Series([1, 3, 5], index=["b", "a", "c"]) 751 ser1 752 ser2 753 np.remainder(ser1, ser2) 754 755As usual, the union of the two indices is taken, and non-overlapping values are filled 756with missing values. 757 758.. ipython:: python 759 760 ser3 = pd.Series([2, 4, 6], index=["b", "c", "d"]) 761 ser3 762 np.remainder(ser1, ser3) 763 764When a binary ufunc is applied to a :class:`Series` and :class:`Index`, the Series 765implementation takes precedence and a Series is returned. 766 767.. ipython:: python 768 769 ser = pd.Series([1, 2, 3]) 770 idx = pd.Index([4, 5, 6]) 771 772 np.maximum(ser, idx) 773 774NumPy ufuncs are safe to apply to :class:`Series` backed by non-ndarray arrays, 775for example :class:`arrays.SparseArray` (see :ref:`sparse.calculation`). If possible, 776the ufunc is applied without converting the underlying data to an ndarray. 777 778Console display 779~~~~~~~~~~~~~~~ 780 781Very large DataFrames will be truncated to display them in the console. 782You can also get a summary using :meth:`~pandas.DataFrame.info`. 783(Here I am reading a CSV version of the **baseball** dataset from the **plyr** 784R package): 785 786.. ipython:: python 787 :suppress: 788 789 # force a summary to be printed 790 pd.set_option("display.max_rows", 5) 791 792.. ipython:: python 793 794 baseball = pd.read_csv("data/baseball.csv") 795 print(baseball) 796 baseball.info() 797 798.. ipython:: python 799 :suppress: 800 :okwarning: 801 802 # restore GlobalPrintConfig 803 pd.reset_option(r"^display\.") 804 805However, using ``to_string`` will return a string representation of the 806DataFrame in tabular form, though it won't always fit the console width: 807 808.. ipython:: python 809 810 print(baseball.iloc[-20:, :12].to_string()) 811 812Wide DataFrames will be printed across multiple rows by 813default: 814 815.. ipython:: python 816 817 pd.DataFrame(np.random.randn(3, 12)) 818 819You can change how much to print on a single row by setting the ``display.width`` 820option: 821 822.. ipython:: python 823 824 pd.set_option("display.width", 40) # default is 80 825 826 pd.DataFrame(np.random.randn(3, 12)) 827 828You can adjust the max width of the individual columns by setting ``display.max_colwidth`` 829 830.. ipython:: python 831 832 datafile = { 833 "filename": ["filename_01", "filename_02"], 834 "path": [ 835 "media/user_name/storage/folder_01/filename_01", 836 "media/user_name/storage/folder_02/filename_02", 837 ], 838 } 839 840 pd.set_option("display.max_colwidth", 30) 841 pd.DataFrame(datafile) 842 843 pd.set_option("display.max_colwidth", 100) 844 pd.DataFrame(datafile) 845 846.. ipython:: python 847 :suppress: 848 849 pd.reset_option("display.width") 850 pd.reset_option("display.max_colwidth") 851 852You can also disable this feature via the ``expand_frame_repr`` option. 853This will print the table in one block. 854 855DataFrame column attribute access and IPython completion 856~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 857 858If a DataFrame column label is a valid Python variable name, the column can be 859accessed like an attribute: 860 861.. ipython:: python 862 863 df = pd.DataFrame({"foo1": np.random.randn(5), "foo2": np.random.randn(5)}) 864 df 865 df.foo1 866 867The columns are also connected to the `IPython <https://ipython.org>`__ 868completion mechanism so they can be tab-completed: 869 870.. code-block:: ipython 871 872 In [5]: df.fo<TAB> # noqa: E225, E999 873 df.foo1 df.foo2 874