1{ 2 "cells": [ 3 { 4 "cell_type": "markdown", 5 "metadata": {}, 6 "source": [ 7 "# Pandas support\n", 8 "\n", 9 "<div class=\"alert alert-warning\">\n", 10 "\n", 11 "**Warning:** pandas support is currently experimental, don't expect everything to work.\n", 12 "\n", 13 "</div>\n", 14 "\n", 15 "It is convenient to use the Pandas package when dealing with numerical data, so Pint provides PintArray. A PintArray is a Pandas Extension Array, which allows Pandas to recognise the Quantity and store it in Pandas DataFrames and Series." 16 ] 17 }, 18 { 19 "cell_type": "markdown", 20 "metadata": {}, 21 "source": [ 22 "## Installation\n", 23 "\n", 24 "\n", 25 "Pandas support is provided by the `pint-pandas` package. To install it use either:\n", 26 "```\n", 27 "python -m pip install pint-pandas\n", 28 "```\n", 29 "Or:\n", 30 "```\n", 31 "conda install -c conda-forge pint-pandas\n", 32 "```" 33 ] 34 }, 35 { 36 "cell_type": "markdown", 37 "metadata": {}, 38 "source": [ 39 "## Basic example" 40 ] 41 }, 42 { 43 "cell_type": "markdown", 44 "metadata": {}, 45 "source": [ 46 "This example will show the simplist way to use pandas with pint and the underlying objects. It's slightly fiddly as you are not reading from a file. A more normal use case is given in Reading a csv.\n", 47 "\n", 48 "First some imports" 49 ] 50 }, 51 { 52 "cell_type": "code", 53 "execution_count": null, 54 "metadata": {}, 55 "outputs": [], 56 "source": [ 57 "import pandas as pd \n", 58 "import pint\n", 59 "import pint_pandas" 60 ] 61 }, 62 { 63 "cell_type": "markdown", 64 "metadata": {}, 65 "source": [ 66 "Next, we create a DataFrame with PintArrays as columns." 67 ] 68 }, 69 { 70 "cell_type": "code", 71 "execution_count": null, 72 "metadata": {}, 73 "outputs": [], 74 "source": [ 75 "df = pd.DataFrame({\n", 76 " \"torque\": pd.Series([1., 2., 2., 3.], dtype=\"pint[lbf ft]\"),\n", 77 " \"angular_velocity\": pd.Series([1., 2., 2., 3.], dtype=\"pint[rpm]\"),\n", 78 "})\n", 79 "df" 80 ] 81 }, 82 { 83 "cell_type": "markdown", 84 "metadata": {}, 85 "source": [ 86 "Operations with columns are units aware so behave as we would intuitively expect." 87 ] 88 }, 89 { 90 "cell_type": "code", 91 "execution_count": null, 92 "metadata": {}, 93 "outputs": [], 94 "source": [ 95 "df['power'] = df['torque'] * df['angular_velocity']\n", 96 "df" 97 ] 98 }, 99 { 100 "cell_type": "markdown", 101 "metadata": {}, 102 "source": [ 103 "We can see the columns' units in the dtypes attribute" 104 ] 105 }, 106 { 107 "cell_type": "code", 108 "execution_count": null, 109 "metadata": {}, 110 "outputs": [], 111 "source": [ 112 "df.dtypes" 113 ] 114 }, 115 { 116 "cell_type": "markdown", 117 "metadata": {}, 118 "source": [ 119 "Each column can be accessed as a Pandas Series" 120 ] 121 }, 122 { 123 "cell_type": "code", 124 "execution_count": null, 125 "metadata": {}, 126 "outputs": [], 127 "source": [ 128 "df.power" 129 ] 130 }, 131 { 132 "cell_type": "markdown", 133 "metadata": {}, 134 "source": [ 135 "Which contains a PintArray" 136 ] 137 }, 138 { 139 "cell_type": "code", 140 "execution_count": null, 141 "metadata": {}, 142 "outputs": [], 143 "source": [ 144 "df.power.values" 145 ] 146 }, 147 { 148 "cell_type": "markdown", 149 "metadata": {}, 150 "source": [ 151 "The PintArray contains a Quantity" 152 ] 153 }, 154 { 155 "cell_type": "code", 156 "execution_count": null, 157 "metadata": {}, 158 "outputs": [], 159 "source": [ 160 "df.power.values.quantity" 161 ] 162 }, 163 { 164 "cell_type": "markdown", 165 "metadata": {}, 166 "source": [ 167 "Pandas Series accessors are provided for most Quantity properties and methods, which will convert the result to a Series where possible." 168 ] 169 }, 170 { 171 "cell_type": "code", 172 "execution_count": null, 173 "metadata": {}, 174 "outputs": [], 175 "source": [ 176 "df.power.pint.units" 177 ] 178 }, 179 { 180 "cell_type": "code", 181 "execution_count": null, 182 "metadata": {}, 183 "outputs": [], 184 "source": [ 185 "df.power.pint.to(\"kW\").values" 186 ] 187 }, 188 { 189 "cell_type": "markdown", 190 "metadata": {}, 191 "source": [ 192 "## Reading from csv\n", 193 "\n", 194 "Reading from files is the far more standard way to use pandas. To facilitate this, DataFrame accessors are provided to make it easy to get to PintArrays. " 195 ] 196 }, 197 { 198 "cell_type": "code", 199 "execution_count": null, 200 "metadata": {}, 201 "outputs": [], 202 "source": [ 203 "import pandas as pd \n", 204 "import pint\n", 205 "import pint_pandas\n", 206 "import io" 207 ] 208 }, 209 { 210 "cell_type": "markdown", 211 "metadata": {}, 212 "source": [ 213 "Here's the contents of the csv file." 214 ] 215 }, 216 { 217 "cell_type": "code", 218 "execution_count": null, 219 "metadata": {}, 220 "outputs": [], 221 "source": [ 222 "test_data = '''ShaftSpeedIndex,rpm,1200,1200,1200,1600,1600,1600,2300,2300,2300\n", 223 "pump,,A,B,C,A,B,C,A,B,C\n", 224 "ShaftSpeed,rpm,1200,1200,1200,1600,1600,1600,2300,2300,2300\n", 225 "FlowRate,m^3 h^-1,8.72,9.28,9.31,11.61,12.78,13.51,18.32,17.90,19.23\n", 226 "DifferentialPressure,kPa,162.03,144.16,136.47,286.86,241.41,204.21,533.17,526.74,440.76\n", 227 "ShaftPower,kW,1.32,1.23,1.18,3.09,2.78,2.50,8.59,8.51,7.61\n", 228 "Efficiency,dimensionless,30.60,31.16,30.70,30.72,31.83,31.81,32.52,31.67,32.05'''" 229 ] 230 }, 231 { 232 "cell_type": "markdown", 233 "metadata": {}, 234 "source": [ 235 "Let's read that into a DataFrame.\n", 236 "Here io.StringIO is used in place of reading a file from disk, whereas a csv file path would typically be used and is shown commented." 237 ] 238 }, 239 { 240 "cell_type": "code", 241 "execution_count": null, 242 "metadata": {}, 243 "outputs": [], 244 "source": [ 245 "df = pd.read_csv(io.StringIO(test_data), header=[0, 1], index_col = [0,1]).T\n", 246 "# df = pd.read_csv(\"/path/to/test_data.csv\", header=[0, 1])\n", 247 "df" 248 ] 249 }, 250 { 251 "cell_type": "markdown", 252 "metadata": {}, 253 "source": [ 254 "Then use the DataFrame's pint accessor's quantify method to convert the columns from `np.ndarray`s to PintArrays, with units from the bottom column level." 255 ] 256 }, 257 { 258 "cell_type": "code", 259 "execution_count": null, 260 "metadata": {}, 261 "outputs": [], 262 "source": [ 263 "df.dtypes" 264 ] 265 }, 266 { 267 "cell_type": "code", 268 "execution_count": null, 269 "metadata": {}, 270 "outputs": [], 271 "source": [ 272 "df_ = df.pint.quantify(level=-1)\n", 273 "df_" 274 ] 275 }, 276 { 277 "cell_type": "markdown", 278 "metadata": {}, 279 "source": [ 280 "Let's confirm the units have been parsed correctly" 281 ] 282 }, 283 { 284 "cell_type": "code", 285 "execution_count": null, 286 "metadata": {}, 287 "outputs": [], 288 "source": [ 289 "df_.dtypes" 290 ] 291 }, 292 { 293 "cell_type": "markdown", 294 "metadata": {}, 295 "source": [ 296 "Here the h in m^3 h^-1 has been parsed as the planck constant. Let's change the unit to hours." 297 ] 298 }, 299 { 300 "cell_type": "code", 301 "execution_count": null, 302 "metadata": {}, 303 "outputs": [], 304 "source": [ 305 "df_['FlowRate'] = pint_pandas.PintArray(df_['FlowRate'].values.quantity.m, dtype = \"pint[m^3/hr]\")\n", 306 "df_.dtypes" 307 ] 308 }, 309 { 310 "cell_type": "markdown", 311 "metadata": {}, 312 "source": [ 313 "As previously, operations between DataFrame columns are unit aware" 314 ] 315 }, 316 { 317 "cell_type": "code", 318 "execution_count": null, 319 "metadata": {}, 320 "outputs": [], 321 "source": [ 322 "df_.ShaftPower / df_.ShaftSpeed" 323 ] 324 }, 325 { 326 "cell_type": "code", 327 "execution_count": null, 328 "metadata": {}, 329 "outputs": [], 330 "source": [ 331 "df_['ShaftTorque'] = df_.ShaftPower / df_.ShaftSpeed\n", 332 "df_['FluidPower'] = df_['FlowRate'] * df_['DifferentialPressure']\n", 333 "df_" 334 ] 335 }, 336 { 337 "cell_type": "markdown", 338 "metadata": {}, 339 "source": [ 340 "The DataFrame's `pint.dequantify` method then allows us to retrieve the units information as a header row once again." 341 ] 342 }, 343 { 344 "cell_type": "code", 345 "execution_count": null, 346 "metadata": {}, 347 "outputs": [], 348 "source": [ 349 "df_.pint.dequantify()" 350 ] 351 }, 352 { 353 "cell_type": "markdown", 354 "metadata": {}, 355 "source": [ 356 "This allows for some rather powerful abilities. For example, to change single column units" 357 ] 358 }, 359 { 360 "cell_type": "code", 361 "execution_count": null, 362 "metadata": {}, 363 "outputs": [], 364 "source": [ 365 "df_['FluidPower'] = df_['FluidPower'].pint.to(\"kW\")\n", 366 "df_['FlowRate'] = df_['FlowRate'].pint.to(\"L/s\")\n", 367 "df_['ShaftTorque'] = df_['ShaftTorque'].pint.to(\"N m\")\n", 368 "df_.pint.dequantify()" 369 ] 370 }, 371 { 372 "cell_type": "markdown", 373 "metadata": {}, 374 "source": [ 375 "The units are harder to read than they need be, so lets change pints default format for displaying units." 376 ] 377 }, 378 { 379 "cell_type": "code", 380 "execution_count": null, 381 "metadata": {}, 382 "outputs": [], 383 "source": [ 384 "pint_pandas.PintType.ureg.default_format = \"~P\"\n", 385 "df_.pint.dequantify()" 386 ] 387 }, 388 { 389 "cell_type": "markdown", 390 "metadata": {}, 391 "source": [ 392 "or the entire table's units" 393 ] 394 }, 395 { 396 "cell_type": "code", 397 "execution_count": null, 398 "metadata": {}, 399 "outputs": [], 400 "source": [ 401 "df_.pint.to_base_units().pint.dequantify()" 402 ] 403 }, 404 { 405 "cell_type": "markdown", 406 "metadata": {}, 407 "source": [ 408 "## Plotting\n", 409 "Pint's matplotlib support allows columns with the same dimensionality to be plotted." 410 ] 411 }, 412 { 413 "cell_type": "code", 414 "execution_count": null, 415 "metadata": {}, 416 "outputs": [], 417 "source": [ 418 "pint_pandas.PintType.ureg.setup_matplotlib()\n", 419 "# ax = df_[['ShaftPower', 'FluidPower']].unstack(\"pump\").plot()" 420 ] 421 }, 422 { 423 "cell_type": "code", 424 "execution_count": null, 425 "metadata": {}, 426 "outputs": [], 427 "source": [ 428 "# ax.yaxis.units" 429 ] 430 }, 431 { 432 "cell_type": "markdown", 433 "metadata": {}, 434 "source": [ 435 "Note that indexes cannot store PintArrays, so don't contain unit information" 436 ] 437 }, 438 { 439 "cell_type": "code", 440 "execution_count": null, 441 "metadata": {}, 442 "outputs": [], 443 "source": [ 444 "# print(ax.xaxis.units)" 445 ] 446 }, 447 { 448 "cell_type": "markdown", 449 "metadata": {}, 450 "source": [ 451 "## Advanced example\n", 452 "This example shows alternative ways to use pint with pandas and other features.\n", 453 "\n", 454 "Start with the same imports." 455 ] 456 }, 457 { 458 "cell_type": "code", 459 "execution_count": null, 460 "metadata": {}, 461 "outputs": [], 462 "source": [ 463 "import pandas as pd \n", 464 "import pint\n", 465 "import pint_pandas" 466 ] 467 }, 468 { 469 "cell_type": "markdown", 470 "metadata": {}, 471 "source": [ 472 "We'll be use a shorthand for PintArray" 473 ] 474 }, 475 { 476 "cell_type": "code", 477 "execution_count": null, 478 "metadata": {}, 479 "outputs": [], 480 "source": [ 481 "PA_ = pint_pandas.PintArray" 482 ] 483 }, 484 { 485 "cell_type": "markdown", 486 "metadata": {}, 487 "source": [ 488 "And set up a unit registry and quantity shorthand." 489 ] 490 }, 491 { 492 "cell_type": "code", 493 "execution_count": null, 494 "metadata": {}, 495 "outputs": [], 496 "source": [ 497 "ureg = pint.UnitRegistry()\n", 498 "Q_ = ureg.Quantity" 499 ] 500 }, 501 { 502 "cell_type": "markdown", 503 "metadata": {}, 504 "source": [ 505 "Operations between PintArrays of different unit registry will not work. We can change the unit registry that will be used in creating new PintArrays to prevent this issue." 506 ] 507 }, 508 { 509 "cell_type": "code", 510 "execution_count": null, 511 "metadata": {}, 512 "outputs": [], 513 "source": [ 514 "pint_pandas.PintType.ureg = ureg" 515 ] 516 }, 517 { 518 "cell_type": "markdown", 519 "metadata": {}, 520 "source": [ 521 "These are the possible ways to create a PintArray.\n", 522 "\n", 523 "Note that pint[unit] must be used for the Series constuctor, whereas the PintArray constructor allows the unit string or object." 524 ] 525 }, 526 { 527 "cell_type": "code", 528 "execution_count": null, 529 "metadata": {}, 530 "outputs": [], 531 "source": [ 532 "df = pd.DataFrame({\n", 533 " \"length\" : pd.Series([1.,2.], dtype=\"pint[m]\"),\n", 534 " \"width\" : PA_([2.,3.], dtype=\"pint[m]\"),\n", 535 " \"distance\" : PA_([2.,3.], dtype=\"m\"),\n", 536 " \"height\" : PA_([2.,3.], dtype=ureg.m),\n", 537 " \"depth\" : PA_.from_1darray_quantity(Q_([2,3],ureg.m)),\n", 538 " })\n", 539 "df" 540 ] 541 }, 542 { 543 "cell_type": "code", 544 "execution_count": null, 545 "metadata": {}, 546 "outputs": [], 547 "source": [ 548 "df.length.values.units" 549 ] 550 } 551 ], 552 "metadata": { 553 "anaconda-cloud": {}, 554 "kernelspec": { 555 "display_name": "Python 3", 556 "language": "python", 557 "name": "python3" 558 }, 559 "language_info": { 560 "codemirror_mode": { 561 "name": "ipython", 562 "version": 3 563 }, 564 "file_extension": ".py", 565 "mimetype": "text/x-python", 566 "name": "python", 567 "nbconvert_exporter": "python", 568 "pygments_lexer": "ipython3", 569 "version": "3.7.6" 570 } 571 }, 572 "nbformat": 4, 573 "nbformat_minor": 2 574} 575