1================================================================================
2pyexcel - Let you focus on data, instead of file formats
3================================================================================
4
5.. image:: https://raw.githubusercontent.com/pyexcel/pyexcel.github.io/master/images/patreon.png
6   :target: https://www.patreon.com/chfw
7
8.. image:: https://raw.githubusercontent.com/pyexcel/pyexcel-mobans/master/images/awesome-badge.svg
9   :target: https://awesome-python.com/#specific-formats-processing
10
11.. image:: https://github.com/pyexcel/pyexcel/workflows/run_tests/badge.svg
12   :target: http://github.com/pyexcel/pyexcel/actions
13
14.. image:: https://codecov.io/gh/pyexcel/pyexcel/branch/master/graph/badge.svg
15   :target: https://codecov.io/gh/pyexcel/pyexcel
16
17.. image:: https://badge.fury.io/py/pyexcel.svg
18   :target: https://pypi.org/project/pyexcel
19
20.. image:: https://anaconda.org/conda-forge/pyexcel/badges/version.svg
21   :target: https://anaconda.org/conda-forge/pyexcel
22
23.. image:: https://pepy.tech/badge/pyexcel/month
24   :target: https://pepy.tech/project/pyexcel
25
26.. image:: https://anaconda.org/conda-forge/pyexcel/badges/downloads.svg
27   :target: https://anaconda.org/conda-forge/pyexcel
28
29.. image:: https://img.shields.io/gitter/room/gitterHQ/gitter.svg
30   :target: https://gitter.im/pyexcel/Lobby
31
32.. image:: https://img.shields.io/static/v1?label=continuous%20templating&message=%E6%A8%A1%E7%89%88%E6%9B%B4%E6%96%B0&color=blue&style=flat-square
33    :target: https://moban.readthedocs.io/en/latest/#at-scale-continous-templating-for-open-source-projects
34
35.. image:: https://img.shields.io/static/v1?label=coding%20style&message=black&color=black&style=flat-square
36    :target: https://github.com/psf/black
37.. image:: https://readthedocs.org/projects/pyexcel/badge/?version=latest
38   :target: http://pyexcel.readthedocs.org/en/latest/
39
40Support the project
41================================================================================
42
43If your company has embedded pyexcel and its components into a revenue generating
44product, please support me on github, `patreon <https://www.patreon.com/bePatron?u=5537627>`_
45or `bounty source <https://salt.bountysource.com/teams/chfw-pyexcel>`_ to maintain
46the project and develop it further.
47
48If you are an individual, you are welcome to support me too and for however long
49you feel like. As my backer, you will receive
50`early access to pyexcel related contents <https://www.patreon.com/pyexcel/posts>`_.
51
52And your issues will get prioritized if you would like to become my patreon as `pyexcel pro user`.
53
54With your financial support, I will be able to invest
55a little bit more time in coding, documentation and writing interesting posts.
56
57
58Known constraints
59==================
60
61Fonts, colors and charts are not supported.
62
63Nor to read password protected xls, xlsx and ods files.
64
65Introduction
66================================================================================
67
68Feature Highlights
69===================
70
71.. table:: A list of supported file formats
72
73    ============ =======================================================
74    file format  definition
75    ============ =======================================================
76    csv          comma separated values
77    tsv          tab separated values
78    csvz         a zip file that contains one or many csv files
79    tsvz         a zip file that contains one or many tsv files
80    xls          a spreadsheet file format created by
81                 MS-Excel 97-2003
82    xlsx         MS-Excel Extensions to the Office Open XML
83                 SpreadsheetML File Format.
84    xlsm         an MS-Excel Macro-Enabled Workbook file
85    ods          open document spreadsheet
86    fods         flat open document spreadsheet
87    json         java script object notation
88    html         html table of the data structure
89    simple       simple presentation
90    rst          rStructured Text presentation of the data
91    mediawiki    media wiki table
92    ============ =======================================================
93
94
95.. image:: https://github.com/pyexcel/pyexcel/raw/dev/docs/source/_static/images/architecture.svg
96
97
981. One application programming interface(API) to handle multiple data sources:
99
100   * physical file
101   * memory file
102   * SQLAlchemy table
103   * Django Model
104   * Python data structures: dictionary, records and array
105
1062. One API to read and write data in various excel file formats.
1073. For large data sets, data streaming are supported. A genenerator can be returned to you. Checkout iget_records, iget_array, isave_as and isave_book_as.
108
109
110
111
112Installation
113================================================================================
114
115You can install pyexcel via pip:
116
117.. code-block:: bash
118
119    $ pip install pyexcel
120
121
122or clone it and install it:
123
124.. code-block:: bash
125
126    $ git clone https://github.com/pyexcel/pyexcel.git
127    $ cd pyexcel
128    $ python setup.py install
129
130
131
132One liners
133================================================================================
134
135This section shows you how to get data from your excel files and how to
136export data to excel files in **one line**
137
138Read from the excel files
139--------------------------------------------------------------------------------
140
141Get a list of dictionaries
142********************************************************************************
143
144
145Suppose you want to process the following coffee data (data source `coffee chart <https://cspinet.org/eating-healthy/ingredients-of-concern/caffeine-chart>`_ on the center for science in the public interest):
146
147
148Top 5 coffeine drinks:
149
150=====================================  ===============  =============
151Coffees                                Serving Size     Caffeine (mg)
152Starbucks Coffee Blonde Roast          venti(20 oz)     475
153Dunkin' Donuts Coffee with Turbo Shot  large(20 oz.)    398
154Starbucks Coffee Pike Place Roast      grande(16 oz.)   310
155Panera Coffee Light Roast              regular(16 oz.)  300
156=====================================  ===============  =============
157
158
159Let's get a list of dictionary out from the xls file:
160
161.. code-block:: python
162
163   >>> records = p.get_records(file_name="your_file.xls")
164
165And let's check what do we have:
166
167.. code-block:: python
168
169   >>> for r in records:
170   ...     print(f"{r['Serving Size']} of {r['Coffees']} has {r['Caffeine (mg)']} mg")
171   venti(20 oz) of Starbucks Coffee Blonde Roast has 475 mg
172   large(20 oz.) of Dunkin' Donuts Coffee with Turbo Shot has 398 mg
173   grande(16 oz.) of Starbucks Coffee Pike Place Roast has 310 mg
174   regular(16 oz.) of Panera Coffee Light Roast has 300 mg
175
176
177Get two dimensional array
178********************************************************************************
179
180Instead, what if you have to use `pyexcel.get_array` to do the same:
181
182.. code-block:: python
183
184   >>> for row in p.get_array(file_name="your_file.xls", start_row=1):
185   ...     print(f"{row[1]} of {row[0]} has {row[2]} mg")
186   venti(20 oz) of Starbucks Coffee Blonde Roast has 475 mg
187   large(20 oz.) of Dunkin' Donuts Coffee with Turbo Shot has 398 mg
188   grande(16 oz.) of Starbucks Coffee Pike Place Roast has 310 mg
189   regular(16 oz.) of Panera Coffee Light Roast has 300 mg
190
191
192where `start_row` skips the header row.
193
194
195Get a dictionary
196********************************************************************************
197
198You can get a dictionary too:
199
200Now let's get a dictionary out from the spreadsheet:
201
202.. code-block:: python
203
204   >>> my_dict = p.get_dict(file_name="your_file.xls", name_columns_by_row=0)
205
206And check what do we have:
207
208.. code-block:: python
209
210   >>> from pyexcel._compact import OrderedDict
211   >>> isinstance(my_dict, OrderedDict)
212   True
213   >>> for key, values in my_dict.items():
214   ...     print(key + " : " + ','.join([str(item) for item in values]))
215   Coffees : Starbucks Coffee Blonde Roast,Dunkin' Donuts Coffee with Turbo Shot,Starbucks Coffee Pike Place Roast,Panera Coffee Light Roast
216   Serving Size : venti(20 oz),large(20 oz.),grande(16 oz.),regular(16 oz.)
217   Caffeine (mg) : 475,398,310,300
218
219Please note that my_dict is an OrderedDict.
220
221Get a dictionary of two dimensional array
222********************************************************************************
223
224
225Suppose you have a multiple sheet book as the following:
226
227
228pyexcel:Sheet 1:
229
230=====================  =  =
2311                      2  3
2324                      5  6
2337                      8  9
234=====================  =  =
235
236pyexcel:Sheet 2:
237
238=====================  =  =
239X                      Y  Z
2401                      2  3
2414                      5  6
242=====================  =  =
243
244pyexcel:Sheet 3:
245
246=====================  =  =
247O                      P  Q
2483                      2  1
2494                      3  2
250=====================  =  =
251
252
253Here is the code to obtain those sheets as a single dictionary:
254
255.. code-block:: python
256
257   >>> book_dict = p.get_book_dict(file_name="book.xls")
258
259And check:
260
261.. code-block:: python
262
263   >>> isinstance(book_dict, OrderedDict)
264   True
265   >>> import json
266   >>> for key, item in book_dict.items():
267   ...     print(json.dumps({key: item}))
268   {"Sheet 1": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]}
269   {"Sheet 2": [["X", "Y", "Z"], [1, 2, 3], [4, 5, 6]]}
270   {"Sheet 3": [["O", "P", "Q"], [3, 2, 1], [4, 3, 2]]}
271
272
273Write data
274---------------------------------------------
275
276Export an array
277**********************
278
279Suppose you have the following array:
280
281.. code-block:: python
282
283   >>> data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
284
285And here is the code to save it as an excel file :
286
287.. code-block:: python
288
289   >>> p.save_as(array=data, dest_file_name="example.xls")
290
291Let's verify it:
292
293.. code-block:: python
294
295    >>> p.get_sheet(file_name="example.xls")
296    pyexcel_sheet1:
297    +---+---+---+
298    | 1 | 2 | 3 |
299    +---+---+---+
300    | 4 | 5 | 6 |
301    +---+---+---+
302    | 7 | 8 | 9 |
303    +---+---+---+
304
305
306And here is the code to save it as a csv file :
307
308.. code-block:: python
309
310   >>> p.save_as(array=data,
311   ...           dest_file_name="example.csv",
312   ...           dest_delimiter=':')
313
314Let's verify it:
315
316.. code-block:: python
317
318   >>> with open("example.csv") as f:
319   ...     for line in f.readlines():
320   ...         print(line.rstrip())
321   ...
322   1:2:3
323   4:5:6
324   7:8:9
325
326Export a list of dictionaries
327**********************************
328
329.. code-block:: python
330
331    >>> records = [
332    ...     {"year": 1903, "country": "Germany", "speed": "206.7km/h"},
333    ...     {"year": 1964, "country": "Japan", "speed": "210km/h"},
334    ...     {"year": 2008, "country": "China", "speed": "350km/h"}
335    ... ]
336    >>> p.save_as(records=records, dest_file_name='high_speed_rail.xls')
337
338
339Export a dictionary of single key value pair
340********************************************************************************
341
342.. code-block:: python
343
344    >>> henley_on_thames_facts = {
345    ...     "area": "5.58 square meters",
346    ...     "population": "11,619",
347    ...     "civial parish": "Henley-on-Thames",
348    ...     "latitude": "51.536",
349    ...     "longitude": "-0.898"
350    ... }
351    >>> p.save_as(adict=henley_on_thames_facts, dest_file_name='henley.xlsx')
352
353
354Export a dictionary of single dimensonal array
355********************************************************************************
356
357.. code-block:: python
358
359    >>> ccs_insights = {
360    ...     "year": ["2017", "2018", "2019", "2020", "2021"],
361    ...     "smart phones": [1.53, 1.64, 1.74, 1.82, 1.90],
362    ...     "feature phones": [0.46, 0.38, 0.30, 0.23, 0.17]
363    ... }
364    >>> p.save_as(adict=ccs_insights, dest_file_name='ccs.csv')
365
366
367Export a dictionary of two dimensional array as a book
368********************************************************************************
369
370Suppose you want to save the below dictionary to an excel file :
371
372.. code-block:: python
373
374   >>> a_dictionary_of_two_dimensional_arrays = {
375   ...      'Sheet 1':
376   ...          [
377   ...              [1.0, 2.0, 3.0],
378   ...              [4.0, 5.0, 6.0],
379   ...              [7.0, 8.0, 9.0]
380   ...          ],
381   ...      'Sheet 2':
382   ...          [
383   ...              ['X', 'Y', 'Z'],
384   ...              [1.0, 2.0, 3.0],
385   ...              [4.0, 5.0, 6.0]
386   ...          ],
387   ...      'Sheet 3':
388   ...          [
389   ...              ['O', 'P', 'Q'],
390   ...              [3.0, 2.0, 1.0],
391   ...              [4.0, 3.0, 2.0]
392   ...          ]
393   ...  }
394
395Here is the code:
396
397.. code-block:: python
398
399   >>> p.save_book_as(
400   ...    bookdict=a_dictionary_of_two_dimensional_arrays,
401   ...    dest_file_name="book.xls"
402   ... )
403
404If you want to preserve the order of sheets in your dictionary, you have to
405pass on an ordered dictionary to the function itself. For example:
406
407.. code-block:: python
408
409   >>> data = OrderedDict()
410   >>> data.update({"Sheet 2": a_dictionary_of_two_dimensional_arrays['Sheet 2']})
411   >>> data.update({"Sheet 1": a_dictionary_of_two_dimensional_arrays['Sheet 1']})
412   >>> data.update({"Sheet 3": a_dictionary_of_two_dimensional_arrays['Sheet 3']})
413   >>> p.save_book_as(bookdict=data, dest_file_name="book.xls")
414
415Let's verify its order:
416
417.. code-block:: python
418
419   >>> book_dict = p.get_book_dict(file_name="book.xls")
420   >>> for key, item in book_dict.items():
421   ...     print(json.dumps({key: item}))
422   {"Sheet 2": [["X", "Y", "Z"], [1, 2, 3], [4, 5, 6]]}
423   {"Sheet 1": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]}
424   {"Sheet 3": [["O", "P", "Q"], [3, 2, 1], [4, 3, 2]]}
425
426Please notice that "Sheet 2" is the first item in the *book_dict*, meaning the order of sheets are preserved.
427
428
429Transcoding
430-------------------------------------------
431
432.. note::
433
434   Please note that `pyexcel-cli` can perform file transcoding at command line.
435   No need to open your editor, save the problem, then python run.
436
437
438The following code does a simple file format transcoding from xls to csv:
439
440.. code-block:: python
441
442   >>> p.save_as(file_name="birth.xls", dest_file_name="birth.csv")
443
444Again it is really simple. Let's verify what we have gotten:
445
446.. code-block:: python
447
448   >>> sheet = p.get_sheet(file_name="birth.csv")
449   >>> sheet
450   birth.csv:
451   +-------+--------+----------+
452   | name  | weight | birth    |
453   +-------+--------+----------+
454   | Adam  | 3.4    | 03/02/15 |
455   +-------+--------+----------+
456   | Smith | 4.2    | 12/11/14 |
457   +-------+--------+----------+
458
459.. NOTE::
460
461   Please note that csv(comma separate value) file is pure text file. Formula, charts, images and formatting in xls file will disappear no matter which transcoding tool you use. Hence, pyexcel is a quick alternative for this transcoding job.
462
463
464Let use previous example and save it as xlsx instead
465
466.. code-block:: python
467
468   >>> p.save_as(file_name="birth.xls",
469   ...           dest_file_name="birth.xlsx") # change the file extension
470
471Again let's verify what we have gotten:
472
473.. code-block:: python
474
475   >>> sheet = p.get_sheet(file_name="birth.xlsx")
476   >>> sheet
477   pyexcel_sheet1:
478   +-------+--------+----------+
479   | name  | weight | birth    |
480   +-------+--------+----------+
481   | Adam  | 3.4    | 03/02/15 |
482   +-------+--------+----------+
483   | Smith | 4.2    | 12/11/14 |
484   +-------+--------+----------+
485
486
487Excel book merge and split operation in one line
488--------------------------------------------------------------------------------
489
490Merge all excel files in directory into  a book where each file become a sheet
491********************************************************************************
492
493The following code will merge every excel files into one file, say "output.xls":
494
495.. code-block:: python
496
497    from pyexcel.cookbook import merge_all_to_a_book
498    import glob
499
500
501    merge_all_to_a_book(glob.glob("your_csv_directory\*.csv"), "output.xls")
502
503You can mix and match with other excel formats: xls, xlsm and ods. For example, if you are sure you have only xls, xlsm, xlsx, ods and csv files in `your_excel_file_directory`, you can do the following:
504
505.. code-block:: python
506
507    from pyexcel.cookbook import merge_all_to_a_book
508    import glob
509
510
511    merge_all_to_a_book(glob.glob("your_excel_file_directory\*.*"), "output.xls")
512
513Split a book into single sheet files
514****************************************
515
516
517Suppose you have many sheets in a work book and you would like to separate each into a single sheet excel file. You can easily do this:
518
519.. code-block:: python
520
521   >>> from pyexcel.cookbook import split_a_book
522   >>> split_a_book("megabook.xls", "output.xls")
523   >>> import glob
524   >>> outputfiles = glob.glob("*_output.xls")
525   >>> for file in sorted(outputfiles):
526   ...     print(file)
527   ...
528   Sheet 1_output.xls
529   Sheet 2_output.xls
530   Sheet 3_output.xls
531
532for the output file, you can specify any of the supported formats
533
534
535Extract just one sheet from a book
536*************************************
537
538
539Suppose you just want to extract one sheet from many sheets that exists in a work book and you would like to separate it into a single sheet excel file. You can easily do this:
540
541.. code-block:: python
542
543    >>> from pyexcel.cookbook import extract_a_sheet_from_a_book
544    >>> extract_a_sheet_from_a_book("megabook.xls", "Sheet 1", "output.xls")
545    >>> if os.path.exists("Sheet 1_output.xls"):
546    ...     print("Sheet 1_output.xls exists")
547    ...
548    Sheet 1_output.xls exists
549
550for the output file, you can specify any of the supported formats
551
552
553Hidden feature: partial read
554===============================================
555
556Most pyexcel users do not know, but other library users were requesting `the similar features <https://github.com/jazzband/tablib/issues/467>`_
557
558
559When you are dealing with huge amount of data, e.g. 64GB, obviously you would not
560like to fill up your memory with those data. What you may want to do is, record
561data from Nth line, take M records and stop. And you only want to use your memory
562for the M records, not for beginning part nor for the tail part.
563
564Hence partial read feature is developed to read partial data into memory for
565processing.
566
567You can paginate by row, by column and by both, hence you dictate what portion of the
568data to read back. But remember only row limit features help you save memory. Let's
569you use this feature to record data from Nth column, take M number of columns and skip
570the rest. You are not going to reduce your memory footprint.
571
572Why did not I see above benefit?
573--------------------------------------------------------------------------------
574
575This feature depends heavily on the implementation details.
576
577`pyexcel-xls`_ (xlrd), `pyexcel-xlsx`_ (openpyxl), `pyexcel-ods`_ (odfpy) and
578`pyexcel-ods3`_ (pyexcel-ezodf) will read all data into memory. Because xls,
579xlsx and ods file are effective a zipped folder, all four will unzip the folder
580and read the content in xml format in **full**, so as to make sense of all details.
581
582Hence, during the partial data is been returned, the memory consumption won't
583differ from reading the whole data back. Only after the partial
584data is returned, the memory comsumption curve shall jump the cliff. So pagination
585code here only limits the data returned to your program.
586
587With that said, `pyexcel-xlsxr`_, `pyexcel-odsr`_ and `pyexcel-htmlr`_ DOES read
588partial data into memory. Those three are implemented in such a way that they
589consume the xml(html) when needed. When they have read designated portion of the
590data, they stop, even if they are half way through.
591
592In addition, pyexcel's csv readers can read partial data into memory too.
593
594
595Let's assume the following file is a huge csv file:
596
597.. code-block:: python
598
599   >>> import datetime
600   >>> import pyexcel as pe
601   >>> data = [
602   ...     [1, 21, 31],
603   ...     [2, 22, 32],
604   ...     [3, 23, 33],
605   ...     [4, 24, 34],
606   ...     [5, 25, 35],
607   ...     [6, 26, 36]
608   ... ]
609   >>> pe.save_as(array=data, dest_file_name="your_file.csv")
610
611
612And let's pretend to read partial data:
613
614
615.. code-block:: python
616
617   >>> pe.get_sheet(file_name="your_file.csv", start_row=2, row_limit=3)
618   your_file.csv:
619   +---+----+----+
620   | 3 | 23 | 33 |
621   +---+----+----+
622   | 4 | 24 | 34 |
623   +---+----+----+
624   | 5 | 25 | 35 |
625   +---+----+----+
626
627And you could as well do the same for columns:
628
629.. code-block:: python
630
631   >>> pe.get_sheet(file_name="your_file.csv", start_column=1, column_limit=2)
632   your_file.csv:
633   +----+----+
634   | 21 | 31 |
635   +----+----+
636   | 22 | 32 |
637   +----+----+
638   | 23 | 33 |
639   +----+----+
640   | 24 | 34 |
641   +----+----+
642   | 25 | 35 |
643   +----+----+
644   | 26 | 36 |
645   +----+----+
646
647Obvious, you could do both at the same time:
648
649.. code-block:: python
650
651   >>> pe.get_sheet(file_name="your_file.csv",
652   ...     start_row=2, row_limit=3,
653   ...     start_column=1, column_limit=2)
654   your_file.csv:
655   +----+----+
656   | 23 | 33 |
657   +----+----+
658   | 24 | 34 |
659   +----+----+
660   | 25 | 35 |
661   +----+----+
662
663
664The pagination support is available across all pyexcel plugins.
665
666.. note::
667
668   No column pagination support for query sets as data source.
669
670
671Formatting while transcoding a big data file
672--------------------------------------------------------------------------------
673
674If you are transcoding a big data set, conventional formatting method would not
675help unless a on-demand free RAM is available. However, there is a way to minimize
676the memory footprint of pyexcel while the formatting is performed.
677
678Let's continue from previous example. Suppose we want to transcode "your_file.csv"
679to "your_file.xls" but increase each element by 1.
680
681What we can do is to define a row renderer function as the following:
682
683.. code-block:: python
684
685   >>> def increment_by_one(row):
686   ...     for element in row:
687   ...         yield element + 1
688
689Then pass it onto save_as function using row_renderer:
690
691.. code-block:: python
692
693   >>> pe.isave_as(file_name="your_file.csv",
694   ...             row_renderer=increment_by_one,
695   ...             dest_file_name="your_file.xlsx")
696
697
698.. note::
699
700   If the data content is from a generator, isave_as has to be used.
701
702We can verify if it was done correctly:
703
704.. code-block:: python
705
706   >>> pe.get_sheet(file_name="your_file.xlsx")
707   your_file.csv:
708   +---+----+----+
709   | 2 | 22 | 32 |
710   +---+----+----+
711   | 3 | 23 | 33 |
712   +---+----+----+
713   | 4 | 24 | 34 |
714   +---+----+----+
715   | 5 | 25 | 35 |
716   +---+----+----+
717   | 6 | 26 | 36 |
718   +---+----+----+
719   | 7 | 27 | 37 |
720   +---+----+----+
721
722
723Stream APIs for big file : A set of two liners
724================================================================================
725
726When you are dealing with **BIG** excel files, you will want **pyexcel** to use
727constant memory.
728
729This section shows you how to get data from your **BIG** excel files and how to
730export data to excel files in **two lines** at most, without eating all
731your computer memory.
732
733
734Two liners for get data from big excel files
735--------------------------------------------------------------------------------
736
737Get a list of dictionaries
738********************************************************************************
739
740
741
742Suppose you want to process the following coffee data again:
743
744Top 5 coffeine drinks:
745
746=====================================  ===============  =============
747Coffees                                Serving Size     Caffeine (mg)
748Starbucks Coffee Blonde Roast          venti(20 oz)     475
749Dunkin' Donuts Coffee with Turbo Shot  large(20 oz.)    398
750Starbucks Coffee Pike Place Roast      grande(16 oz.)   310
751Panera Coffee Light Roast              regular(16 oz.)  300
752=====================================  ===============  =============
753
754
755Let's get a list of dictionary out from the xls file:
756
757.. code-block:: python
758
759   >>> records = p.iget_records(file_name="your_file.xls")
760
761And let's check what do we have:
762
763.. code-block:: python
764
765   >>> for r in records:
766   ...     print(f"{r['Serving Size']} of {r['Coffees']} has {r['Caffeine (mg)']} mg")
767   venti(20 oz) of Starbucks Coffee Blonde Roast has 475 mg
768   large(20 oz.) of Dunkin' Donuts Coffee with Turbo Shot has 398 mg
769   grande(16 oz.) of Starbucks Coffee Pike Place Roast has 310 mg
770   regular(16 oz.) of Panera Coffee Light Roast has 300 mg
771
772Please do not forgot the second line to close the opened file handle:
773
774.. code-block:: python
775
776   >>> p.free_resources()
777
778Get two dimensional array
779********************************************************************************
780
781Instead, what if you have to use `pyexcel.get_array` to do the same:
782
783.. code-block:: python
784
785   >>> for row in p.iget_array(file_name="your_file.xls", start_row=1):
786   ...     print(f"{row[1]} of {row[0]} has {row[2]} mg")
787   venti(20 oz) of Starbucks Coffee Blonde Roast has 475 mg
788   large(20 oz.) of Dunkin' Donuts Coffee with Turbo Shot has 398 mg
789   grande(16 oz.) of Starbucks Coffee Pike Place Roast has 310 mg
790   regular(16 oz.) of Panera Coffee Light Roast has 300 mg
791
792Again, do not forgot the second line:
793
794.. code-block:: python
795
796   >>> p.free_resources()
797
798where `start_row` skips the header row.
799
800Data export in one liners
801---------------------------------------------
802
803Export an array
804**********************
805
806Suppose you have the following array:
807
808.. code-block:: python
809
810   >>> data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
811
812And here is the code to save it as an excel file :
813
814.. code-block:: python
815
816   >>> p.isave_as(array=data, dest_file_name="example.xls")
817
818But the following line is not required because the data source
819are not file sources:
820
821.. code-block:: python
822
823   >>> # p.free_resources()
824
825Let's verify it:
826
827.. code-block:: python
828
829    >>> p.get_sheet(file_name="example.xls")
830    pyexcel_sheet1:
831    +---+---+---+
832    | 1 | 2 | 3 |
833    +---+---+---+
834    | 4 | 5 | 6 |
835    +---+---+---+
836    | 7 | 8 | 9 |
837    +---+---+---+
838
839
840And here is the code to save it as a csv file :
841
842.. code-block:: python
843
844   >>> p.isave_as(array=data,
845   ...            dest_file_name="example.csv",
846   ...            dest_delimiter=':')
847
848Let's verify it:
849
850.. code-block:: python
851
852   >>> with open("example.csv") as f:
853   ...     for line in f.readlines():
854   ...         print(line.rstrip())
855   ...
856   1:2:3
857   4:5:6
858   7:8:9
859
860Export a list of dictionaries
861**********************************
862
863.. code-block:: python
864
865    >>> records = [
866    ...     {"year": 1903, "country": "Germany", "speed": "206.7km/h"},
867    ...     {"year": 1964, "country": "Japan", "speed": "210km/h"},
868    ...     {"year": 2008, "country": "China", "speed": "350km/h"}
869    ... ]
870    >>> p.isave_as(records=records, dest_file_name='high_speed_rail.xls')
871
872Export a dictionary of single key value pair
873********************************************************************************
874
875.. code-block:: python
876
877    >>> henley_on_thames_facts = {
878    ...     "area": "5.58 square meters",
879    ...     "population": "11,619",
880    ...     "civial parish": "Henley-on-Thames",
881    ...     "latitude": "51.536",
882    ...     "longitude": "-0.898"
883    ... }
884    >>> p.isave_as(adict=henley_on_thames_facts, dest_file_name='henley.xlsx')
885
886Export a dictionary of single dimensonal array
887********************************************************************************
888
889.. code-block:: python
890
891    >>> ccs_insights = {
892    ...     "year": ["2017", "2018", "2019", "2020", "2021"],
893    ...     "smart phones": [1.53, 1.64, 1.74, 1.82, 1.90],
894    ...     "feature phones": [0.46, 0.38, 0.30, 0.23, 0.17]
895    ... }
896    >>> p.isave_as(adict=ccs_insights, dest_file_name='ccs.csv')
897    >>> p.free_resources()
898
899Export a dictionary of two dimensional array as a book
900********************************************************************************
901
902Suppose you want to save the below dictionary to an excel file :
903
904.. code-block:: python
905
906   >>> a_dictionary_of_two_dimensional_arrays = {
907   ...      'Sheet 1':
908   ...          [
909   ...              [1.0, 2.0, 3.0],
910   ...              [4.0, 5.0, 6.0],
911   ...              [7.0, 8.0, 9.0]
912   ...          ],
913   ...      'Sheet 2':
914   ...          [
915   ...              ['X', 'Y', 'Z'],
916   ...              [1.0, 2.0, 3.0],
917   ...              [4.0, 5.0, 6.0]
918   ...          ],
919   ...      'Sheet 3':
920   ...          [
921   ...              ['O', 'P', 'Q'],
922   ...              [3.0, 2.0, 1.0],
923   ...              [4.0, 3.0, 2.0]
924   ...          ]
925   ...  }
926
927Here is the code:
928
929.. code-block:: python
930
931   >>> p.isave_book_as(
932   ...    bookdict=a_dictionary_of_two_dimensional_arrays,
933   ...    dest_file_name="book.xls"
934   ... )
935
936If you want to preserve the order of sheets in your dictionary, you have to
937pass on an ordered dictionary to the function itself. For example:
938
939.. code-block:: python
940
941   >>> from pyexcel._compact import OrderedDict
942   >>> data = OrderedDict()
943   >>> data.update({"Sheet 2": a_dictionary_of_two_dimensional_arrays['Sheet 2']})
944   >>> data.update({"Sheet 1": a_dictionary_of_two_dimensional_arrays['Sheet 1']})
945   >>> data.update({"Sheet 3": a_dictionary_of_two_dimensional_arrays['Sheet 3']})
946   >>> p.isave_book_as(bookdict=data, dest_file_name="book.xls")
947   >>> p.free_resources()
948
949Let's verify its order:
950
951.. code-block:: python
952
953   >>> import json
954   >>> book_dict = p.get_book_dict(file_name="book.xls")
955   >>> for key, item in book_dict.items():
956   ...     print(json.dumps({key: item}))
957   {"Sheet 2": [["X", "Y", "Z"], [1, 2, 3], [4, 5, 6]]}
958   {"Sheet 1": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]}
959   {"Sheet 3": [["O", "P", "Q"], [3, 2, 1], [4, 3, 2]]}
960
961Please notice that "Sheet 2" is the first item in the *book_dict*, meaning the order of sheets are preserved.
962
963
964File format transcoding on one line
965-------------------------------------------
966
967.. note::
968
969   Please note that the following file transcoding could be with zero line. Please
970   install pyexcel-cli and you will do the transcode in one command. No need to
971   open your editor, save the problem, then python run.
972
973
974The following code does a simple file format transcoding from xls to csv:
975
976.. code-block:: python
977
978   >>> import pyexcel
979   >>> p.save_as(file_name="birth.xls", dest_file_name="birth.csv")
980
981Again it is really simple. Let's verify what we have gotten:
982
983.. code-block:: python
984
985   >>> sheet = p.get_sheet(file_name="birth.csv")
986   >>> sheet
987   birth.csv:
988   +-------+--------+----------+
989   | name  | weight | birth    |
990   +-------+--------+----------+
991   | Adam  | 3.4    | 03/02/15 |
992   +-------+--------+----------+
993   | Smith | 4.2    | 12/11/14 |
994   +-------+--------+----------+
995
996.. note::
997
998   Please note that csv(comma separate value) file is pure text file. Formula, charts, images and formatting in xls file will disappear no matter which transcoding tool you use. Hence, pyexcel is a quick alternative for this transcoding job.
999
1000
1001Let use previous example and save it as xlsx instead
1002
1003.. code-block:: python
1004
1005   >>> import pyexcel
1006   >>> p.isave_as(file_name="birth.xls",
1007   ...            dest_file_name="birth.xlsx") # change the file extension
1008
1009Again let's verify what we have gotten:
1010
1011.. code-block:: python
1012
1013   >>> sheet = p.get_sheet(file_name="birth.xlsx")
1014   >>> sheet
1015   pyexcel_sheet1:
1016   +-------+--------+----------+
1017   | name  | weight | birth    |
1018   +-------+--------+----------+
1019   | Adam  | 3.4    | 03/02/15 |
1020   +-------+--------+----------+
1021   | Smith | 4.2    | 12/11/14 |
1022   +-------+--------+----------+
1023
1024
1025Available Plugins
1026=================
1027
1028.. _file-format-list:
1029.. _a-map-of-plugins-and-file-formats:
1030
1031.. table:: A list of file formats supported by external plugins
1032
1033   ======================== ======================= =================
1034   Package name              Supported file formats  Dependencies
1035   ======================== ======================= =================
1036   `pyexcel-io`_            csv, csvz [#f1]_, tsv,
1037                            tsvz [#f2]_
1038   `pyexcel-xls`_           xls, xlsx(read only),   `xlrd`_,
1039                            xlsm(read only)         `xlwt`_
1040   `pyexcel-xlsx`_          xlsx                    `openpyxl`_
1041   `pyexcel-ods3`_          ods                     `pyexcel-ezodf`_,
1042                                                    lxml
1043   `pyexcel-ods`_           ods                     `odfpy`_
1044   ======================== ======================= =================
1045
1046.. table:: Dedicated file reader and writers
1047
1048   ======================== ======================= =================
1049   Package name              Supported file formats  Dependencies
1050   ======================== ======================= =================
1051   `pyexcel-xlsxw`_         xlsx(write only)        `XlsxWriter`_
1052   `pyexcel-libxlsxw`_      xlsx(write only)        `libxlsxwriter`_
1053   `pyexcel-xlsxr`_         xlsx(read only)         lxml
1054   `pyexcel-xlsbr`_         xlsb(read only)         pyxlsb
1055   `pyexcel-odsr`_          read only for ods, fods lxml
1056   `pyexcel-odsw`_          write only for ods      loxun
1057   `pyexcel-htmlr`_         html(read only)         lxml,html5lib
1058   `pyexcel-pdfr`_          pdf(read only)          camelot
1059   ======================== ======================= =================
1060
1061
1062Plugin shopping guide
1063------------------------
1064
1065Since 2020, all pyexcel-io plugins have dropped the support for python version
1066lower than 3.6. If you want to use any python verions, please use pyexcel-io
1067and its plugins version lower than 0.6.0.
1068
1069
1070Except csv files, xls, xlsx and ods files are a zip of a folder containing a lot of
1071xml files
1072
1073The dedicated readers for excel files can stream read
1074
1075
1076In order to manage the list of plugins installed, you need to use pip to add or remove
1077a plugin. When you use virtualenv, you can have different plugins per virtual
1078environment. In the situation where you have multiple plugins that does the same thing
1079in your environment, you need to tell pyexcel which plugin to use per function call.
1080For example, pyexcel-ods and pyexcel-odsr, and you want to get_array to use pyexcel-odsr.
1081You need to append get_array(..., library='pyexcel-odsr').
1082
1083
1084
1085.. _pyexcel-io: https://github.com/pyexcel/pyexcel-io
1086.. _pyexcel-xls: https://github.com/pyexcel/pyexcel-xls
1087.. _pyexcel-xlsx: https://github.com/pyexcel/pyexcel-xlsx
1088.. _pyexcel-ods: https://github.com/pyexcel/pyexcel-ods
1089.. _pyexcel-ods3: https://github.com/pyexcel/pyexcel-ods3
1090.. _pyexcel-odsr: https://github.com/pyexcel/pyexcel-odsr
1091.. _pyexcel-odsw: https://github.com/pyexcel/pyexcel-odsw
1092.. _pyexcel-pdfr: https://github.com/pyexcel/pyexcel-pdfr
1093
1094.. _pyexcel-xlsxw: https://github.com/pyexcel/pyexcel-xlsxw
1095.. _pyexcel-libxlsxw: https://github.com/pyexcel/pyexcel-libxlsxw
1096.. _pyexcel-xlsxr: https://github.com/pyexcel/pyexcel-xlsxr
1097.. _pyexcel-xlsbr: https://github.com/pyexcel/pyexcel-xlsbr
1098.. _pyexcel-htmlr: https://github.com/pyexcel/pyexcel-htmlr
1099
1100.. _xlrd: https://github.com/python-excel/xlrd
1101.. _xlwt: https://github.com/python-excel/xlwt
1102.. _openpyxl: https://bitbucket.org/openpyxl/openpyxl
1103.. _XlsxWriter: https://github.com/jmcnamara/XlsxWriter
1104.. _pyexcel-ezodf: https://github.com/pyexcel/pyexcel-ezodf
1105.. _odfpy: https://github.com/eea/odfpy
1106.. _libxlsxwriter: http://libxlsxwriter.github.io/getting_started.html
1107
1108.. table:: Other data renderers
1109
1110   ======================== ======================= ================= ==================
1111   Package name              Supported file formats  Dependencies     Python versions
1112   ======================== ======================= ================= ==================
1113   `pyexcel-text`_          write only:rst,         `tabulate`_       2.6, 2.7, 3.3, 3.4
1114                            mediawiki, html,                          3.5, 3.6, pypy
1115                            latex, grid, pipe,
1116                            orgtbl, plain simple
1117                            read only: ndjson
1118                            r/w: json
1119   `pyexcel-handsontable`_  handsontable in html    `handsontable`_   same as above
1120   `pyexcel-pygal`_         svg chart               `pygal`_          2.7, 3.3, 3.4, 3.5
1121                                                                      3.6, pypy
1122   `pyexcel-sortable`_      sortable table in html  `csvtotable`_     same as above
1123   `pyexcel-gantt`_         gantt chart in html     `frappe-gantt`_   except pypy, same
1124                                                                      as above
1125   ======================== ======================= ================= ==================
1126
1127.. _pyexcel-text: https://github.com/pyexcel/pyexcel-text
1128.. _tabulate: https://bitbucket.org/astanin/python-tabulate
1129.. _pyexcel-handsontable: https://github.com/pyexcel/pyexcel-handsontable
1130.. _handsontable: https://cdnjs.com/libraries/handsontable
1131.. _pyexcel-pygal: https://github.com/pyexcel/pyexcel-chart
1132.. _pygal: https://github.com/Kozea/pygal
1133.. _pyexcel-matplotlib: https://github.com/pyexcel/pyexcel-matplotlib
1134.. _matplotlib: https://matplotlib.org
1135.. _pyexcel-sortable: https://github.com/pyexcel/pyexcel-sortable
1136.. _csvtotable: https://github.com/vividvilla/csvtotable
1137.. _pyexcel-gantt: https://github.com/pyexcel/pyexcel-gantt
1138.. _frappe-gantt: https://github.com/frappe/gantt
1139
1140.. rubric:: Footnotes
1141
1142.. [#f1] zipped csv file
1143.. [#f2] zipped tsv file
1144
1145
1146Acknowledgement
1147===============
1148
1149All great work have been done by odf, ezodf, xlrd, xlwt, tabulate and other
1150individual developers. This library unites only the data access code.
1151
1152
1153
1154
1155License
1156================================================================================
1157
1158New BSD License
1159