1================================================================================ 2pyexcel - Let you focus on data, instead of file formats 3================================================================================ 4 5.. image:: https://raw.githubusercontent.com/pyexcel/pyexcel.github.io/master/images/patreon.png 6 :target: https://www.patreon.com/chfw 7 8.. image:: https://raw.githubusercontent.com/pyexcel/pyexcel-mobans/master/images/awesome-badge.svg 9 :target: https://awesome-python.com/#specific-formats-processing 10 11.. image:: https://github.com/pyexcel/pyexcel/workflows/run_tests/badge.svg 12 :target: http://github.com/pyexcel/pyexcel/actions 13 14.. image:: https://codecov.io/gh/pyexcel/pyexcel/branch/master/graph/badge.svg 15 :target: https://codecov.io/gh/pyexcel/pyexcel 16 17.. image:: https://badge.fury.io/py/pyexcel.svg 18 :target: https://pypi.org/project/pyexcel 19 20.. image:: https://anaconda.org/conda-forge/pyexcel/badges/version.svg 21 :target: https://anaconda.org/conda-forge/pyexcel 22 23.. image:: https://pepy.tech/badge/pyexcel/month 24 :target: https://pepy.tech/project/pyexcel 25 26.. image:: https://anaconda.org/conda-forge/pyexcel/badges/downloads.svg 27 :target: https://anaconda.org/conda-forge/pyexcel 28 29.. image:: https://img.shields.io/gitter/room/gitterHQ/gitter.svg 30 :target: https://gitter.im/pyexcel/Lobby 31 32.. image:: https://img.shields.io/static/v1?label=continuous%20templating&message=%E6%A8%A1%E7%89%88%E6%9B%B4%E6%96%B0&color=blue&style=flat-square 33 :target: https://moban.readthedocs.io/en/latest/#at-scale-continous-templating-for-open-source-projects 34 35.. image:: https://img.shields.io/static/v1?label=coding%20style&message=black&color=black&style=flat-square 36 :target: https://github.com/psf/black 37.. image:: https://readthedocs.org/projects/pyexcel/badge/?version=latest 38 :target: http://pyexcel.readthedocs.org/en/latest/ 39 40Support the project 41================================================================================ 42 43If your company has embedded pyexcel and its components into a revenue generating 44product, please support me on github, `patreon <https://www.patreon.com/bePatron?u=5537627>`_ 45or `bounty source <https://salt.bountysource.com/teams/chfw-pyexcel>`_ to maintain 46the project and develop it further. 47 48If you are an individual, you are welcome to support me too and for however long 49you feel like. As my backer, you will receive 50`early access to pyexcel related contents <https://www.patreon.com/pyexcel/posts>`_. 51 52And your issues will get prioritized if you would like to become my patreon as `pyexcel pro user`. 53 54With your financial support, I will be able to invest 55a little bit more time in coding, documentation and writing interesting posts. 56 57 58Known constraints 59================== 60 61Fonts, colors and charts are not supported. 62 63Nor to read password protected xls, xlsx and ods files. 64 65Introduction 66================================================================================ 67 68Feature Highlights 69=================== 70 71.. table:: A list of supported file formats 72 73 ============ ======================================================= 74 file format definition 75 ============ ======================================================= 76 csv comma separated values 77 tsv tab separated values 78 csvz a zip file that contains one or many csv files 79 tsvz a zip file that contains one or many tsv files 80 xls a spreadsheet file format created by 81 MS-Excel 97-2003 82 xlsx MS-Excel Extensions to the Office Open XML 83 SpreadsheetML File Format. 84 xlsm an MS-Excel Macro-Enabled Workbook file 85 ods open document spreadsheet 86 fods flat open document spreadsheet 87 json java script object notation 88 html html table of the data structure 89 simple simple presentation 90 rst rStructured Text presentation of the data 91 mediawiki media wiki table 92 ============ ======================================================= 93 94 95.. image:: https://github.com/pyexcel/pyexcel/raw/dev/docs/source/_static/images/architecture.svg 96 97 981. One application programming interface(API) to handle multiple data sources: 99 100 * physical file 101 * memory file 102 * SQLAlchemy table 103 * Django Model 104 * Python data structures: dictionary, records and array 105 1062. One API to read and write data in various excel file formats. 1073. For large data sets, data streaming are supported. A genenerator can be returned to you. Checkout iget_records, iget_array, isave_as and isave_book_as. 108 109 110 111 112Installation 113================================================================================ 114 115You can install pyexcel via pip: 116 117.. code-block:: bash 118 119 $ pip install pyexcel 120 121 122or clone it and install it: 123 124.. code-block:: bash 125 126 $ git clone https://github.com/pyexcel/pyexcel.git 127 $ cd pyexcel 128 $ python setup.py install 129 130 131 132One liners 133================================================================================ 134 135This section shows you how to get data from your excel files and how to 136export data to excel files in **one line** 137 138Read from the excel files 139-------------------------------------------------------------------------------- 140 141Get a list of dictionaries 142******************************************************************************** 143 144 145Suppose you want to process the following coffee data (data source `coffee chart <https://cspinet.org/eating-healthy/ingredients-of-concern/caffeine-chart>`_ on the center for science in the public interest): 146 147 148Top 5 coffeine drinks: 149 150===================================== =============== ============= 151Coffees Serving Size Caffeine (mg) 152Starbucks Coffee Blonde Roast venti(20 oz) 475 153Dunkin' Donuts Coffee with Turbo Shot large(20 oz.) 398 154Starbucks Coffee Pike Place Roast grande(16 oz.) 310 155Panera Coffee Light Roast regular(16 oz.) 300 156===================================== =============== ============= 157 158 159Let's get a list of dictionary out from the xls file: 160 161.. code-block:: python 162 163 >>> records = p.get_records(file_name="your_file.xls") 164 165And let's check what do we have: 166 167.. code-block:: python 168 169 >>> for r in records: 170 ... print(f"{r['Serving Size']} of {r['Coffees']} has {r['Caffeine (mg)']} mg") 171 venti(20 oz) of Starbucks Coffee Blonde Roast has 475 mg 172 large(20 oz.) of Dunkin' Donuts Coffee with Turbo Shot has 398 mg 173 grande(16 oz.) of Starbucks Coffee Pike Place Roast has 310 mg 174 regular(16 oz.) of Panera Coffee Light Roast has 300 mg 175 176 177Get two dimensional array 178******************************************************************************** 179 180Instead, what if you have to use `pyexcel.get_array` to do the same: 181 182.. code-block:: python 183 184 >>> for row in p.get_array(file_name="your_file.xls", start_row=1): 185 ... print(f"{row[1]} of {row[0]} has {row[2]} mg") 186 venti(20 oz) of Starbucks Coffee Blonde Roast has 475 mg 187 large(20 oz.) of Dunkin' Donuts Coffee with Turbo Shot has 398 mg 188 grande(16 oz.) of Starbucks Coffee Pike Place Roast has 310 mg 189 regular(16 oz.) of Panera Coffee Light Roast has 300 mg 190 191 192where `start_row` skips the header row. 193 194 195Get a dictionary 196******************************************************************************** 197 198You can get a dictionary too: 199 200Now let's get a dictionary out from the spreadsheet: 201 202.. code-block:: python 203 204 >>> my_dict = p.get_dict(file_name="your_file.xls", name_columns_by_row=0) 205 206And check what do we have: 207 208.. code-block:: python 209 210 >>> from pyexcel._compact import OrderedDict 211 >>> isinstance(my_dict, OrderedDict) 212 True 213 >>> for key, values in my_dict.items(): 214 ... print(key + " : " + ','.join([str(item) for item in values])) 215 Coffees : Starbucks Coffee Blonde Roast,Dunkin' Donuts Coffee with Turbo Shot,Starbucks Coffee Pike Place Roast,Panera Coffee Light Roast 216 Serving Size : venti(20 oz),large(20 oz.),grande(16 oz.),regular(16 oz.) 217 Caffeine (mg) : 475,398,310,300 218 219Please note that my_dict is an OrderedDict. 220 221Get a dictionary of two dimensional array 222******************************************************************************** 223 224 225Suppose you have a multiple sheet book as the following: 226 227 228pyexcel:Sheet 1: 229 230===================== = = 2311 2 3 2324 5 6 2337 8 9 234===================== = = 235 236pyexcel:Sheet 2: 237 238===================== = = 239X Y Z 2401 2 3 2414 5 6 242===================== = = 243 244pyexcel:Sheet 3: 245 246===================== = = 247O P Q 2483 2 1 2494 3 2 250===================== = = 251 252 253Here is the code to obtain those sheets as a single dictionary: 254 255.. code-block:: python 256 257 >>> book_dict = p.get_book_dict(file_name="book.xls") 258 259And check: 260 261.. code-block:: python 262 263 >>> isinstance(book_dict, OrderedDict) 264 True 265 >>> import json 266 >>> for key, item in book_dict.items(): 267 ... print(json.dumps({key: item})) 268 {"Sheet 1": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]} 269 {"Sheet 2": [["X", "Y", "Z"], [1, 2, 3], [4, 5, 6]]} 270 {"Sheet 3": [["O", "P", "Q"], [3, 2, 1], [4, 3, 2]]} 271 272 273Write data 274--------------------------------------------- 275 276Export an array 277********************** 278 279Suppose you have the following array: 280 281.. code-block:: python 282 283 >>> data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] 284 285And here is the code to save it as an excel file : 286 287.. code-block:: python 288 289 >>> p.save_as(array=data, dest_file_name="example.xls") 290 291Let's verify it: 292 293.. code-block:: python 294 295 >>> p.get_sheet(file_name="example.xls") 296 pyexcel_sheet1: 297 +---+---+---+ 298 | 1 | 2 | 3 | 299 +---+---+---+ 300 | 4 | 5 | 6 | 301 +---+---+---+ 302 | 7 | 8 | 9 | 303 +---+---+---+ 304 305 306And here is the code to save it as a csv file : 307 308.. code-block:: python 309 310 >>> p.save_as(array=data, 311 ... dest_file_name="example.csv", 312 ... dest_delimiter=':') 313 314Let's verify it: 315 316.. code-block:: python 317 318 >>> with open("example.csv") as f: 319 ... for line in f.readlines(): 320 ... print(line.rstrip()) 321 ... 322 1:2:3 323 4:5:6 324 7:8:9 325 326Export a list of dictionaries 327********************************** 328 329.. code-block:: python 330 331 >>> records = [ 332 ... {"year": 1903, "country": "Germany", "speed": "206.7km/h"}, 333 ... {"year": 1964, "country": "Japan", "speed": "210km/h"}, 334 ... {"year": 2008, "country": "China", "speed": "350km/h"} 335 ... ] 336 >>> p.save_as(records=records, dest_file_name='high_speed_rail.xls') 337 338 339Export a dictionary of single key value pair 340******************************************************************************** 341 342.. code-block:: python 343 344 >>> henley_on_thames_facts = { 345 ... "area": "5.58 square meters", 346 ... "population": "11,619", 347 ... "civial parish": "Henley-on-Thames", 348 ... "latitude": "51.536", 349 ... "longitude": "-0.898" 350 ... } 351 >>> p.save_as(adict=henley_on_thames_facts, dest_file_name='henley.xlsx') 352 353 354Export a dictionary of single dimensonal array 355******************************************************************************** 356 357.. code-block:: python 358 359 >>> ccs_insights = { 360 ... "year": ["2017", "2018", "2019", "2020", "2021"], 361 ... "smart phones": [1.53, 1.64, 1.74, 1.82, 1.90], 362 ... "feature phones": [0.46, 0.38, 0.30, 0.23, 0.17] 363 ... } 364 >>> p.save_as(adict=ccs_insights, dest_file_name='ccs.csv') 365 366 367Export a dictionary of two dimensional array as a book 368******************************************************************************** 369 370Suppose you want to save the below dictionary to an excel file : 371 372.. code-block:: python 373 374 >>> a_dictionary_of_two_dimensional_arrays = { 375 ... 'Sheet 1': 376 ... [ 377 ... [1.0, 2.0, 3.0], 378 ... [4.0, 5.0, 6.0], 379 ... [7.0, 8.0, 9.0] 380 ... ], 381 ... 'Sheet 2': 382 ... [ 383 ... ['X', 'Y', 'Z'], 384 ... [1.0, 2.0, 3.0], 385 ... [4.0, 5.0, 6.0] 386 ... ], 387 ... 'Sheet 3': 388 ... [ 389 ... ['O', 'P', 'Q'], 390 ... [3.0, 2.0, 1.0], 391 ... [4.0, 3.0, 2.0] 392 ... ] 393 ... } 394 395Here is the code: 396 397.. code-block:: python 398 399 >>> p.save_book_as( 400 ... bookdict=a_dictionary_of_two_dimensional_arrays, 401 ... dest_file_name="book.xls" 402 ... ) 403 404If you want to preserve the order of sheets in your dictionary, you have to 405pass on an ordered dictionary to the function itself. For example: 406 407.. code-block:: python 408 409 >>> data = OrderedDict() 410 >>> data.update({"Sheet 2": a_dictionary_of_two_dimensional_arrays['Sheet 2']}) 411 >>> data.update({"Sheet 1": a_dictionary_of_two_dimensional_arrays['Sheet 1']}) 412 >>> data.update({"Sheet 3": a_dictionary_of_two_dimensional_arrays['Sheet 3']}) 413 >>> p.save_book_as(bookdict=data, dest_file_name="book.xls") 414 415Let's verify its order: 416 417.. code-block:: python 418 419 >>> book_dict = p.get_book_dict(file_name="book.xls") 420 >>> for key, item in book_dict.items(): 421 ... print(json.dumps({key: item})) 422 {"Sheet 2": [["X", "Y", "Z"], [1, 2, 3], [4, 5, 6]]} 423 {"Sheet 1": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]} 424 {"Sheet 3": [["O", "P", "Q"], [3, 2, 1], [4, 3, 2]]} 425 426Please notice that "Sheet 2" is the first item in the *book_dict*, meaning the order of sheets are preserved. 427 428 429Transcoding 430------------------------------------------- 431 432.. note:: 433 434 Please note that `pyexcel-cli` can perform file transcoding at command line. 435 No need to open your editor, save the problem, then python run. 436 437 438The following code does a simple file format transcoding from xls to csv: 439 440.. code-block:: python 441 442 >>> p.save_as(file_name="birth.xls", dest_file_name="birth.csv") 443 444Again it is really simple. Let's verify what we have gotten: 445 446.. code-block:: python 447 448 >>> sheet = p.get_sheet(file_name="birth.csv") 449 >>> sheet 450 birth.csv: 451 +-------+--------+----------+ 452 | name | weight | birth | 453 +-------+--------+----------+ 454 | Adam | 3.4 | 03/02/15 | 455 +-------+--------+----------+ 456 | Smith | 4.2 | 12/11/14 | 457 +-------+--------+----------+ 458 459.. NOTE:: 460 461 Please note that csv(comma separate value) file is pure text file. Formula, charts, images and formatting in xls file will disappear no matter which transcoding tool you use. Hence, pyexcel is a quick alternative for this transcoding job. 462 463 464Let use previous example and save it as xlsx instead 465 466.. code-block:: python 467 468 >>> p.save_as(file_name="birth.xls", 469 ... dest_file_name="birth.xlsx") # change the file extension 470 471Again let's verify what we have gotten: 472 473.. code-block:: python 474 475 >>> sheet = p.get_sheet(file_name="birth.xlsx") 476 >>> sheet 477 pyexcel_sheet1: 478 +-------+--------+----------+ 479 | name | weight | birth | 480 +-------+--------+----------+ 481 | Adam | 3.4 | 03/02/15 | 482 +-------+--------+----------+ 483 | Smith | 4.2 | 12/11/14 | 484 +-------+--------+----------+ 485 486 487Excel book merge and split operation in one line 488-------------------------------------------------------------------------------- 489 490Merge all excel files in directory into a book where each file become a sheet 491******************************************************************************** 492 493The following code will merge every excel files into one file, say "output.xls": 494 495.. code-block:: python 496 497 from pyexcel.cookbook import merge_all_to_a_book 498 import glob 499 500 501 merge_all_to_a_book(glob.glob("your_csv_directory\*.csv"), "output.xls") 502 503You can mix and match with other excel formats: xls, xlsm and ods. For example, if you are sure you have only xls, xlsm, xlsx, ods and csv files in `your_excel_file_directory`, you can do the following: 504 505.. code-block:: python 506 507 from pyexcel.cookbook import merge_all_to_a_book 508 import glob 509 510 511 merge_all_to_a_book(glob.glob("your_excel_file_directory\*.*"), "output.xls") 512 513Split a book into single sheet files 514**************************************** 515 516 517Suppose you have many sheets in a work book and you would like to separate each into a single sheet excel file. You can easily do this: 518 519.. code-block:: python 520 521 >>> from pyexcel.cookbook import split_a_book 522 >>> split_a_book("megabook.xls", "output.xls") 523 >>> import glob 524 >>> outputfiles = glob.glob("*_output.xls") 525 >>> for file in sorted(outputfiles): 526 ... print(file) 527 ... 528 Sheet 1_output.xls 529 Sheet 2_output.xls 530 Sheet 3_output.xls 531 532for the output file, you can specify any of the supported formats 533 534 535Extract just one sheet from a book 536************************************* 537 538 539Suppose you just want to extract one sheet from many sheets that exists in a work book and you would like to separate it into a single sheet excel file. You can easily do this: 540 541.. code-block:: python 542 543 >>> from pyexcel.cookbook import extract_a_sheet_from_a_book 544 >>> extract_a_sheet_from_a_book("megabook.xls", "Sheet 1", "output.xls") 545 >>> if os.path.exists("Sheet 1_output.xls"): 546 ... print("Sheet 1_output.xls exists") 547 ... 548 Sheet 1_output.xls exists 549 550for the output file, you can specify any of the supported formats 551 552 553Hidden feature: partial read 554=============================================== 555 556Most pyexcel users do not know, but other library users were requesting `the similar features <https://github.com/jazzband/tablib/issues/467>`_ 557 558 559When you are dealing with huge amount of data, e.g. 64GB, obviously you would not 560like to fill up your memory with those data. What you may want to do is, record 561data from Nth line, take M records and stop. And you only want to use your memory 562for the M records, not for beginning part nor for the tail part. 563 564Hence partial read feature is developed to read partial data into memory for 565processing. 566 567You can paginate by row, by column and by both, hence you dictate what portion of the 568data to read back. But remember only row limit features help you save memory. Let's 569you use this feature to record data from Nth column, take M number of columns and skip 570the rest. You are not going to reduce your memory footprint. 571 572Why did not I see above benefit? 573-------------------------------------------------------------------------------- 574 575This feature depends heavily on the implementation details. 576 577`pyexcel-xls`_ (xlrd), `pyexcel-xlsx`_ (openpyxl), `pyexcel-ods`_ (odfpy) and 578`pyexcel-ods3`_ (pyexcel-ezodf) will read all data into memory. Because xls, 579xlsx and ods file are effective a zipped folder, all four will unzip the folder 580and read the content in xml format in **full**, so as to make sense of all details. 581 582Hence, during the partial data is been returned, the memory consumption won't 583differ from reading the whole data back. Only after the partial 584data is returned, the memory comsumption curve shall jump the cliff. So pagination 585code here only limits the data returned to your program. 586 587With that said, `pyexcel-xlsxr`_, `pyexcel-odsr`_ and `pyexcel-htmlr`_ DOES read 588partial data into memory. Those three are implemented in such a way that they 589consume the xml(html) when needed. When they have read designated portion of the 590data, they stop, even if they are half way through. 591 592In addition, pyexcel's csv readers can read partial data into memory too. 593 594 595Let's assume the following file is a huge csv file: 596 597.. code-block:: python 598 599 >>> import datetime 600 >>> import pyexcel as pe 601 >>> data = [ 602 ... [1, 21, 31], 603 ... [2, 22, 32], 604 ... [3, 23, 33], 605 ... [4, 24, 34], 606 ... [5, 25, 35], 607 ... [6, 26, 36] 608 ... ] 609 >>> pe.save_as(array=data, dest_file_name="your_file.csv") 610 611 612And let's pretend to read partial data: 613 614 615.. code-block:: python 616 617 >>> pe.get_sheet(file_name="your_file.csv", start_row=2, row_limit=3) 618 your_file.csv: 619 +---+----+----+ 620 | 3 | 23 | 33 | 621 +---+----+----+ 622 | 4 | 24 | 34 | 623 +---+----+----+ 624 | 5 | 25 | 35 | 625 +---+----+----+ 626 627And you could as well do the same for columns: 628 629.. code-block:: python 630 631 >>> pe.get_sheet(file_name="your_file.csv", start_column=1, column_limit=2) 632 your_file.csv: 633 +----+----+ 634 | 21 | 31 | 635 +----+----+ 636 | 22 | 32 | 637 +----+----+ 638 | 23 | 33 | 639 +----+----+ 640 | 24 | 34 | 641 +----+----+ 642 | 25 | 35 | 643 +----+----+ 644 | 26 | 36 | 645 +----+----+ 646 647Obvious, you could do both at the same time: 648 649.. code-block:: python 650 651 >>> pe.get_sheet(file_name="your_file.csv", 652 ... start_row=2, row_limit=3, 653 ... start_column=1, column_limit=2) 654 your_file.csv: 655 +----+----+ 656 | 23 | 33 | 657 +----+----+ 658 | 24 | 34 | 659 +----+----+ 660 | 25 | 35 | 661 +----+----+ 662 663 664The pagination support is available across all pyexcel plugins. 665 666.. note:: 667 668 No column pagination support for query sets as data source. 669 670 671Formatting while transcoding a big data file 672-------------------------------------------------------------------------------- 673 674If you are transcoding a big data set, conventional formatting method would not 675help unless a on-demand free RAM is available. However, there is a way to minimize 676the memory footprint of pyexcel while the formatting is performed. 677 678Let's continue from previous example. Suppose we want to transcode "your_file.csv" 679to "your_file.xls" but increase each element by 1. 680 681What we can do is to define a row renderer function as the following: 682 683.. code-block:: python 684 685 >>> def increment_by_one(row): 686 ... for element in row: 687 ... yield element + 1 688 689Then pass it onto save_as function using row_renderer: 690 691.. code-block:: python 692 693 >>> pe.isave_as(file_name="your_file.csv", 694 ... row_renderer=increment_by_one, 695 ... dest_file_name="your_file.xlsx") 696 697 698.. note:: 699 700 If the data content is from a generator, isave_as has to be used. 701 702We can verify if it was done correctly: 703 704.. code-block:: python 705 706 >>> pe.get_sheet(file_name="your_file.xlsx") 707 your_file.csv: 708 +---+----+----+ 709 | 2 | 22 | 32 | 710 +---+----+----+ 711 | 3 | 23 | 33 | 712 +---+----+----+ 713 | 4 | 24 | 34 | 714 +---+----+----+ 715 | 5 | 25 | 35 | 716 +---+----+----+ 717 | 6 | 26 | 36 | 718 +---+----+----+ 719 | 7 | 27 | 37 | 720 +---+----+----+ 721 722 723Stream APIs for big file : A set of two liners 724================================================================================ 725 726When you are dealing with **BIG** excel files, you will want **pyexcel** to use 727constant memory. 728 729This section shows you how to get data from your **BIG** excel files and how to 730export data to excel files in **two lines** at most, without eating all 731your computer memory. 732 733 734Two liners for get data from big excel files 735-------------------------------------------------------------------------------- 736 737Get a list of dictionaries 738******************************************************************************** 739 740 741 742Suppose you want to process the following coffee data again: 743 744Top 5 coffeine drinks: 745 746===================================== =============== ============= 747Coffees Serving Size Caffeine (mg) 748Starbucks Coffee Blonde Roast venti(20 oz) 475 749Dunkin' Donuts Coffee with Turbo Shot large(20 oz.) 398 750Starbucks Coffee Pike Place Roast grande(16 oz.) 310 751Panera Coffee Light Roast regular(16 oz.) 300 752===================================== =============== ============= 753 754 755Let's get a list of dictionary out from the xls file: 756 757.. code-block:: python 758 759 >>> records = p.iget_records(file_name="your_file.xls") 760 761And let's check what do we have: 762 763.. code-block:: python 764 765 >>> for r in records: 766 ... print(f"{r['Serving Size']} of {r['Coffees']} has {r['Caffeine (mg)']} mg") 767 venti(20 oz) of Starbucks Coffee Blonde Roast has 475 mg 768 large(20 oz.) of Dunkin' Donuts Coffee with Turbo Shot has 398 mg 769 grande(16 oz.) of Starbucks Coffee Pike Place Roast has 310 mg 770 regular(16 oz.) of Panera Coffee Light Roast has 300 mg 771 772Please do not forgot the second line to close the opened file handle: 773 774.. code-block:: python 775 776 >>> p.free_resources() 777 778Get two dimensional array 779******************************************************************************** 780 781Instead, what if you have to use `pyexcel.get_array` to do the same: 782 783.. code-block:: python 784 785 >>> for row in p.iget_array(file_name="your_file.xls", start_row=1): 786 ... print(f"{row[1]} of {row[0]} has {row[2]} mg") 787 venti(20 oz) of Starbucks Coffee Blonde Roast has 475 mg 788 large(20 oz.) of Dunkin' Donuts Coffee with Turbo Shot has 398 mg 789 grande(16 oz.) of Starbucks Coffee Pike Place Roast has 310 mg 790 regular(16 oz.) of Panera Coffee Light Roast has 300 mg 791 792Again, do not forgot the second line: 793 794.. code-block:: python 795 796 >>> p.free_resources() 797 798where `start_row` skips the header row. 799 800Data export in one liners 801--------------------------------------------- 802 803Export an array 804********************** 805 806Suppose you have the following array: 807 808.. code-block:: python 809 810 >>> data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] 811 812And here is the code to save it as an excel file : 813 814.. code-block:: python 815 816 >>> p.isave_as(array=data, dest_file_name="example.xls") 817 818But the following line is not required because the data source 819are not file sources: 820 821.. code-block:: python 822 823 >>> # p.free_resources() 824 825Let's verify it: 826 827.. code-block:: python 828 829 >>> p.get_sheet(file_name="example.xls") 830 pyexcel_sheet1: 831 +---+---+---+ 832 | 1 | 2 | 3 | 833 +---+---+---+ 834 | 4 | 5 | 6 | 835 +---+---+---+ 836 | 7 | 8 | 9 | 837 +---+---+---+ 838 839 840And here is the code to save it as a csv file : 841 842.. code-block:: python 843 844 >>> p.isave_as(array=data, 845 ... dest_file_name="example.csv", 846 ... dest_delimiter=':') 847 848Let's verify it: 849 850.. code-block:: python 851 852 >>> with open("example.csv") as f: 853 ... for line in f.readlines(): 854 ... print(line.rstrip()) 855 ... 856 1:2:3 857 4:5:6 858 7:8:9 859 860Export a list of dictionaries 861********************************** 862 863.. code-block:: python 864 865 >>> records = [ 866 ... {"year": 1903, "country": "Germany", "speed": "206.7km/h"}, 867 ... {"year": 1964, "country": "Japan", "speed": "210km/h"}, 868 ... {"year": 2008, "country": "China", "speed": "350km/h"} 869 ... ] 870 >>> p.isave_as(records=records, dest_file_name='high_speed_rail.xls') 871 872Export a dictionary of single key value pair 873******************************************************************************** 874 875.. code-block:: python 876 877 >>> henley_on_thames_facts = { 878 ... "area": "5.58 square meters", 879 ... "population": "11,619", 880 ... "civial parish": "Henley-on-Thames", 881 ... "latitude": "51.536", 882 ... "longitude": "-0.898" 883 ... } 884 >>> p.isave_as(adict=henley_on_thames_facts, dest_file_name='henley.xlsx') 885 886Export a dictionary of single dimensonal array 887******************************************************************************** 888 889.. code-block:: python 890 891 >>> ccs_insights = { 892 ... "year": ["2017", "2018", "2019", "2020", "2021"], 893 ... "smart phones": [1.53, 1.64, 1.74, 1.82, 1.90], 894 ... "feature phones": [0.46, 0.38, 0.30, 0.23, 0.17] 895 ... } 896 >>> p.isave_as(adict=ccs_insights, dest_file_name='ccs.csv') 897 >>> p.free_resources() 898 899Export a dictionary of two dimensional array as a book 900******************************************************************************** 901 902Suppose you want to save the below dictionary to an excel file : 903 904.. code-block:: python 905 906 >>> a_dictionary_of_two_dimensional_arrays = { 907 ... 'Sheet 1': 908 ... [ 909 ... [1.0, 2.0, 3.0], 910 ... [4.0, 5.0, 6.0], 911 ... [7.0, 8.0, 9.0] 912 ... ], 913 ... 'Sheet 2': 914 ... [ 915 ... ['X', 'Y', 'Z'], 916 ... [1.0, 2.0, 3.0], 917 ... [4.0, 5.0, 6.0] 918 ... ], 919 ... 'Sheet 3': 920 ... [ 921 ... ['O', 'P', 'Q'], 922 ... [3.0, 2.0, 1.0], 923 ... [4.0, 3.0, 2.0] 924 ... ] 925 ... } 926 927Here is the code: 928 929.. code-block:: python 930 931 >>> p.isave_book_as( 932 ... bookdict=a_dictionary_of_two_dimensional_arrays, 933 ... dest_file_name="book.xls" 934 ... ) 935 936If you want to preserve the order of sheets in your dictionary, you have to 937pass on an ordered dictionary to the function itself. For example: 938 939.. code-block:: python 940 941 >>> from pyexcel._compact import OrderedDict 942 >>> data = OrderedDict() 943 >>> data.update({"Sheet 2": a_dictionary_of_two_dimensional_arrays['Sheet 2']}) 944 >>> data.update({"Sheet 1": a_dictionary_of_two_dimensional_arrays['Sheet 1']}) 945 >>> data.update({"Sheet 3": a_dictionary_of_two_dimensional_arrays['Sheet 3']}) 946 >>> p.isave_book_as(bookdict=data, dest_file_name="book.xls") 947 >>> p.free_resources() 948 949Let's verify its order: 950 951.. code-block:: python 952 953 >>> import json 954 >>> book_dict = p.get_book_dict(file_name="book.xls") 955 >>> for key, item in book_dict.items(): 956 ... print(json.dumps({key: item})) 957 {"Sheet 2": [["X", "Y", "Z"], [1, 2, 3], [4, 5, 6]]} 958 {"Sheet 1": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]} 959 {"Sheet 3": [["O", "P", "Q"], [3, 2, 1], [4, 3, 2]]} 960 961Please notice that "Sheet 2" is the first item in the *book_dict*, meaning the order of sheets are preserved. 962 963 964File format transcoding on one line 965------------------------------------------- 966 967.. note:: 968 969 Please note that the following file transcoding could be with zero line. Please 970 install pyexcel-cli and you will do the transcode in one command. No need to 971 open your editor, save the problem, then python run. 972 973 974The following code does a simple file format transcoding from xls to csv: 975 976.. code-block:: python 977 978 >>> import pyexcel 979 >>> p.save_as(file_name="birth.xls", dest_file_name="birth.csv") 980 981Again it is really simple. Let's verify what we have gotten: 982 983.. code-block:: python 984 985 >>> sheet = p.get_sheet(file_name="birth.csv") 986 >>> sheet 987 birth.csv: 988 +-------+--------+----------+ 989 | name | weight | birth | 990 +-------+--------+----------+ 991 | Adam | 3.4 | 03/02/15 | 992 +-------+--------+----------+ 993 | Smith | 4.2 | 12/11/14 | 994 +-------+--------+----------+ 995 996.. note:: 997 998 Please note that csv(comma separate value) file is pure text file. Formula, charts, images and formatting in xls file will disappear no matter which transcoding tool you use. Hence, pyexcel is a quick alternative for this transcoding job. 999 1000 1001Let use previous example and save it as xlsx instead 1002 1003.. code-block:: python 1004 1005 >>> import pyexcel 1006 >>> p.isave_as(file_name="birth.xls", 1007 ... dest_file_name="birth.xlsx") # change the file extension 1008 1009Again let's verify what we have gotten: 1010 1011.. code-block:: python 1012 1013 >>> sheet = p.get_sheet(file_name="birth.xlsx") 1014 >>> sheet 1015 pyexcel_sheet1: 1016 +-------+--------+----------+ 1017 | name | weight | birth | 1018 +-------+--------+----------+ 1019 | Adam | 3.4 | 03/02/15 | 1020 +-------+--------+----------+ 1021 | Smith | 4.2 | 12/11/14 | 1022 +-------+--------+----------+ 1023 1024 1025Available Plugins 1026================= 1027 1028.. _file-format-list: 1029.. _a-map-of-plugins-and-file-formats: 1030 1031.. table:: A list of file formats supported by external plugins 1032 1033 ======================== ======================= ================= 1034 Package name Supported file formats Dependencies 1035 ======================== ======================= ================= 1036 `pyexcel-io`_ csv, csvz [#f1]_, tsv, 1037 tsvz [#f2]_ 1038 `pyexcel-xls`_ xls, xlsx(read only), `xlrd`_, 1039 xlsm(read only) `xlwt`_ 1040 `pyexcel-xlsx`_ xlsx `openpyxl`_ 1041 `pyexcel-ods3`_ ods `pyexcel-ezodf`_, 1042 lxml 1043 `pyexcel-ods`_ ods `odfpy`_ 1044 ======================== ======================= ================= 1045 1046.. table:: Dedicated file reader and writers 1047 1048 ======================== ======================= ================= 1049 Package name Supported file formats Dependencies 1050 ======================== ======================= ================= 1051 `pyexcel-xlsxw`_ xlsx(write only) `XlsxWriter`_ 1052 `pyexcel-libxlsxw`_ xlsx(write only) `libxlsxwriter`_ 1053 `pyexcel-xlsxr`_ xlsx(read only) lxml 1054 `pyexcel-xlsbr`_ xlsb(read only) pyxlsb 1055 `pyexcel-odsr`_ read only for ods, fods lxml 1056 `pyexcel-odsw`_ write only for ods loxun 1057 `pyexcel-htmlr`_ html(read only) lxml,html5lib 1058 `pyexcel-pdfr`_ pdf(read only) camelot 1059 ======================== ======================= ================= 1060 1061 1062Plugin shopping guide 1063------------------------ 1064 1065Since 2020, all pyexcel-io plugins have dropped the support for python version 1066lower than 3.6. If you want to use any python verions, please use pyexcel-io 1067and its plugins version lower than 0.6.0. 1068 1069 1070Except csv files, xls, xlsx and ods files are a zip of a folder containing a lot of 1071xml files 1072 1073The dedicated readers for excel files can stream read 1074 1075 1076In order to manage the list of plugins installed, you need to use pip to add or remove 1077a plugin. When you use virtualenv, you can have different plugins per virtual 1078environment. In the situation where you have multiple plugins that does the same thing 1079in your environment, you need to tell pyexcel which plugin to use per function call. 1080For example, pyexcel-ods and pyexcel-odsr, and you want to get_array to use pyexcel-odsr. 1081You need to append get_array(..., library='pyexcel-odsr'). 1082 1083 1084 1085.. _pyexcel-io: https://github.com/pyexcel/pyexcel-io 1086.. _pyexcel-xls: https://github.com/pyexcel/pyexcel-xls 1087.. _pyexcel-xlsx: https://github.com/pyexcel/pyexcel-xlsx 1088.. _pyexcel-ods: https://github.com/pyexcel/pyexcel-ods 1089.. _pyexcel-ods3: https://github.com/pyexcel/pyexcel-ods3 1090.. _pyexcel-odsr: https://github.com/pyexcel/pyexcel-odsr 1091.. _pyexcel-odsw: https://github.com/pyexcel/pyexcel-odsw 1092.. _pyexcel-pdfr: https://github.com/pyexcel/pyexcel-pdfr 1093 1094.. _pyexcel-xlsxw: https://github.com/pyexcel/pyexcel-xlsxw 1095.. _pyexcel-libxlsxw: https://github.com/pyexcel/pyexcel-libxlsxw 1096.. _pyexcel-xlsxr: https://github.com/pyexcel/pyexcel-xlsxr 1097.. _pyexcel-xlsbr: https://github.com/pyexcel/pyexcel-xlsbr 1098.. _pyexcel-htmlr: https://github.com/pyexcel/pyexcel-htmlr 1099 1100.. _xlrd: https://github.com/python-excel/xlrd 1101.. _xlwt: https://github.com/python-excel/xlwt 1102.. _openpyxl: https://bitbucket.org/openpyxl/openpyxl 1103.. _XlsxWriter: https://github.com/jmcnamara/XlsxWriter 1104.. _pyexcel-ezodf: https://github.com/pyexcel/pyexcel-ezodf 1105.. _odfpy: https://github.com/eea/odfpy 1106.. _libxlsxwriter: http://libxlsxwriter.github.io/getting_started.html 1107 1108.. table:: Other data renderers 1109 1110 ======================== ======================= ================= ================== 1111 Package name Supported file formats Dependencies Python versions 1112 ======================== ======================= ================= ================== 1113 `pyexcel-text`_ write only:rst, `tabulate`_ 2.6, 2.7, 3.3, 3.4 1114 mediawiki, html, 3.5, 3.6, pypy 1115 latex, grid, pipe, 1116 orgtbl, plain simple 1117 read only: ndjson 1118 r/w: json 1119 `pyexcel-handsontable`_ handsontable in html `handsontable`_ same as above 1120 `pyexcel-pygal`_ svg chart `pygal`_ 2.7, 3.3, 3.4, 3.5 1121 3.6, pypy 1122 `pyexcel-sortable`_ sortable table in html `csvtotable`_ same as above 1123 `pyexcel-gantt`_ gantt chart in html `frappe-gantt`_ except pypy, same 1124 as above 1125 ======================== ======================= ================= ================== 1126 1127.. _pyexcel-text: https://github.com/pyexcel/pyexcel-text 1128.. _tabulate: https://bitbucket.org/astanin/python-tabulate 1129.. _pyexcel-handsontable: https://github.com/pyexcel/pyexcel-handsontable 1130.. _handsontable: https://cdnjs.com/libraries/handsontable 1131.. _pyexcel-pygal: https://github.com/pyexcel/pyexcel-chart 1132.. _pygal: https://github.com/Kozea/pygal 1133.. _pyexcel-matplotlib: https://github.com/pyexcel/pyexcel-matplotlib 1134.. _matplotlib: https://matplotlib.org 1135.. _pyexcel-sortable: https://github.com/pyexcel/pyexcel-sortable 1136.. _csvtotable: https://github.com/vividvilla/csvtotable 1137.. _pyexcel-gantt: https://github.com/pyexcel/pyexcel-gantt 1138.. _frappe-gantt: https://github.com/frappe/gantt 1139 1140.. rubric:: Footnotes 1141 1142.. [#f1] zipped csv file 1143.. [#f2] zipped tsv file 1144 1145 1146Acknowledgement 1147=============== 1148 1149All great work have been done by odf, ezodf, xlrd, xlwt, tabulate and other 1150individual developers. This library unites only the data access code. 1151 1152 1153 1154 1155License 1156================================================================================ 1157 1158New BSD License 1159