1:mod:`cgi` --- Common Gateway Interface support
2===============================================
3
4.. module:: cgi
5   :synopsis: Helpers for running Python scripts via the Common Gateway Interface.
6
7**Source code:** :source:`Lib/cgi.py`
8
9.. index::
10   pair: WWW; server
11   pair: CGI; protocol
12   pair: HTTP; protocol
13   pair: MIME; headers
14   single: URL
15   single: Common Gateway Interface
16
17--------------
18
19Support module for Common Gateway Interface (CGI) scripts.
20
21This module defines a number of utilities for use by CGI scripts written in
22Python.
23
24
25Introduction
26------------
27
28.. _cgi-intro:
29
30A CGI script is invoked by an HTTP server, usually to process user input
31submitted through an HTML ``<FORM>`` or ``<ISINDEX>`` element.
32
33Most often, CGI scripts live in the server's special :file:`cgi-bin` directory.
34The HTTP server places all sorts of information about the request (such as the
35client's hostname, the requested URL, the query string, and lots of other
36goodies) in the script's shell environment, executes the script, and sends the
37script's output back to the client.
38
39The script's input is connected to the client too, and sometimes the form data
40is read this way; at other times the form data is passed via the "query string"
41part of the URL.  This module is intended to take care of the different cases
42and provide a simpler interface to the Python script.  It also provides a number
43of utilities that help in debugging scripts, and the latest addition is support
44for file uploads from a form (if your browser supports it).
45
46The output of a CGI script should consist of two sections, separated by a blank
47line.  The first section contains a number of headers, telling the client what
48kind of data is following.  Python code to generate a minimal header section
49looks like this::
50
51   print("Content-Type: text/html")    # HTML is following
52   print()                             # blank line, end of headers
53
54The second section is usually HTML, which allows the client software to display
55nicely formatted text with header, in-line images, etc. Here's Python code that
56prints a simple piece of HTML::
57
58   print("<TITLE>CGI script output</TITLE>")
59   print("<H1>This is my first CGI script</H1>")
60   print("Hello, world!")
61
62
63.. _using-the-cgi-module:
64
65Using the cgi module
66--------------------
67
68Begin by writing ``import cgi``.
69
70When you write a new script, consider adding these lines::
71
72   import cgitb
73   cgitb.enable()
74
75This activates a special exception handler that will display detailed reports in
76the web browser if any errors occur.  If you'd rather not show the guts of your
77program to users of your script, you can have the reports saved to files
78instead, with code like this::
79
80   import cgitb
81   cgitb.enable(display=0, logdir="/path/to/logdir")
82
83It's very helpful to use this feature during script development. The reports
84produced by :mod:`cgitb` provide information that can save you a lot of time in
85tracking down bugs.  You can always remove the ``cgitb`` line later when you
86have tested your script and are confident that it works correctly.
87
88To get at submitted form data, use the :class:`FieldStorage` class. If the form
89contains non-ASCII characters, use the *encoding* keyword parameter set to the
90value of the encoding defined for the document. It is usually contained in the
91META tag in the HEAD section of the HTML document or by the
92:mailheader:`Content-Type` header.  This reads the form contents from the
93standard input or the environment (depending on the value of various
94environment variables set according to the CGI standard).  Since it may consume
95standard input, it should be instantiated only once.
96
97The :class:`FieldStorage` instance can be indexed like a Python dictionary.
98It allows membership testing with the :keyword:`in` operator, and also supports
99the standard dictionary method :meth:`~dict.keys` and the built-in function
100:func:`len`.  Form fields containing empty strings are ignored and do not appear
101in the dictionary; to keep such values, provide a true value for the optional
102*keep_blank_values* keyword parameter when creating the :class:`FieldStorage`
103instance.
104
105For instance, the following code (which assumes that the
106:mailheader:`Content-Type` header and blank line have already been printed)
107checks that the fields ``name`` and ``addr`` are both set to a non-empty
108string::
109
110   form = cgi.FieldStorage()
111   if "name" not in form or "addr" not in form:
112       print("<H1>Error</H1>")
113       print("Please fill in the name and addr fields.")
114       return
115   print("<p>name:", form["name"].value)
116   print("<p>addr:", form["addr"].value)
117   ...further form processing here...
118
119Here the fields, accessed through ``form[key]``, are themselves instances of
120:class:`FieldStorage` (or :class:`MiniFieldStorage`, depending on the form
121encoding). The :attr:`~FieldStorage.value` attribute of the instance yields
122the string value of the field.  The :meth:`~FieldStorage.getvalue` method
123returns this string value directly; it also accepts an optional second argument
124as a default to return if the requested key is not present.
125
126If the submitted form data contains more than one field with the same name, the
127object retrieved by ``form[key]`` is not a :class:`FieldStorage` or
128:class:`MiniFieldStorage` instance but a list of such instances.  Similarly, in
129this situation, ``form.getvalue(key)`` would return a list of strings. If you
130expect this possibility (when your HTML form contains multiple fields with the
131same name), use the :meth:`~FieldStorage.getlist` method, which always returns
132a list of values (so that you do not need to special-case the single item
133case).  For example, this code concatenates any number of username fields,
134separated by commas::
135
136   value = form.getlist("username")
137   usernames = ",".join(value)
138
139If a field represents an uploaded file, accessing the value via the
140:attr:`~FieldStorage.value` attribute or the :meth:`~FieldStorage.getvalue`
141method reads the entire file in memory as bytes.  This may not be what you
142want.  You can test for an uploaded file by testing either the
143:attr:`~FieldStorage.filename` attribute or the :attr:`~FieldStorage.file`
144attribute.  You can then read the data from the :attr:`!file`
145attribute before it is automatically closed as part of the garbage collection of
146the :class:`FieldStorage` instance
147(the :func:`~io.RawIOBase.read` and :func:`~io.IOBase.readline` methods will
148return bytes)::
149
150   fileitem = form["userfile"]
151   if fileitem.file:
152       # It's an uploaded file; count lines
153       linecount = 0
154       while True:
155           line = fileitem.file.readline()
156           if not line: break
157           linecount = linecount + 1
158
159:class:`FieldStorage` objects also support being used in a :keyword:`with`
160statement, which will automatically close them when done.
161
162If an error is encountered when obtaining the contents of an uploaded file
163(for example, when the user interrupts the form submission by clicking on
164a Back or Cancel button) the :attr:`~FieldStorage.done` attribute of the
165object for the field will be set to the value -1.
166
167The file upload draft standard entertains the possibility of uploading multiple
168files from one field (using a recursive :mimetype:`multipart/\*` encoding).
169When this occurs, the item will be a dictionary-like :class:`FieldStorage` item.
170This can be determined by testing its :attr:`!type` attribute, which should be
171:mimetype:`multipart/form-data` (or perhaps another MIME type matching
172:mimetype:`multipart/\*`).  In this case, it can be iterated over recursively
173just like the top-level form object.
174
175When a form is submitted in the "old" format (as the query string or as a single
176data part of type :mimetype:`application/x-www-form-urlencoded`), the items will
177actually be instances of the class :class:`MiniFieldStorage`.  In this case, the
178:attr:`!list`, :attr:`!file`, and :attr:`filename` attributes are always ``None``.
179
180A form submitted via POST that also has a query string will contain both
181:class:`FieldStorage` and :class:`MiniFieldStorage` items.
182
183.. versionchanged:: 3.4
184   The :attr:`~FieldStorage.file` attribute is automatically closed upon the
185   garbage collection of the creating :class:`FieldStorage` instance.
186
187.. versionchanged:: 3.5
188   Added support for the context management protocol to the
189   :class:`FieldStorage` class.
190
191
192Higher Level Interface
193----------------------
194
195The previous section explains how to read CGI form data using the
196:class:`FieldStorage` class.  This section describes a higher level interface
197which was added to this class to allow one to do it in a more readable and
198intuitive way.  The interface doesn't make the techniques described in previous
199sections obsolete --- they are still useful to process file uploads efficiently,
200for example.
201
202.. XXX: Is this true ?
203
204The interface consists of two simple methods. Using the methods you can process
205form data in a generic way, without the need to worry whether only one or more
206values were posted under one name.
207
208In the previous section, you learned to write following code anytime you
209expected a user to post more than one value under one name::
210
211   item = form.getvalue("item")
212   if isinstance(item, list):
213       # The user is requesting more than one item.
214   else:
215       # The user is requesting only one item.
216
217This situation is common for example when a form contains a group of multiple
218checkboxes with the same name::
219
220   <input type="checkbox" name="item" value="1" />
221   <input type="checkbox" name="item" value="2" />
222
223In most situations, however, there's only one form control with a particular
224name in a form and then you expect and need only one value associated with this
225name.  So you write a script containing for example this code::
226
227   user = form.getvalue("user").upper()
228
229The problem with the code is that you should never expect that a client will
230provide valid input to your scripts.  For example, if a curious user appends
231another ``user=foo`` pair to the query string, then the script would crash,
232because in this situation the ``getvalue("user")`` method call returns a list
233instead of a string.  Calling the :meth:`~str.upper` method on a list is not valid
234(since lists do not have a method of this name) and results in an
235:exc:`AttributeError` exception.
236
237Therefore, the appropriate way to read form data values was to always use the
238code which checks whether the obtained value is a single value or a list of
239values.  That's annoying and leads to less readable scripts.
240
241A more convenient approach is to use the methods :meth:`~FieldStorage.getfirst`
242and :meth:`~FieldStorage.getlist` provided by this higher level interface.
243
244
245.. method:: FieldStorage.getfirst(name, default=None)
246
247   This method always returns only one value associated with form field *name*.
248   The method returns only the first value in case that more values were posted
249   under such name.  Please note that the order in which the values are received
250   may vary from browser to browser and should not be counted on. [#]_  If no such
251   form field or value exists then the method returns the value specified by the
252   optional parameter *default*.  This parameter defaults to ``None`` if not
253   specified.
254
255
256.. method:: FieldStorage.getlist(name)
257
258   This method always returns a list of values associated with form field *name*.
259   The method returns an empty list if no such form field or value exists for
260   *name*.  It returns a list consisting of one item if only one such value exists.
261
262Using these methods you can write nice compact code::
263
264   import cgi
265   form = cgi.FieldStorage()
266   user = form.getfirst("user", "").upper()    # This way it's safe.
267   for item in form.getlist("item"):
268       do_something(item)
269
270
271.. _functions-in-cgi-module:
272
273Functions
274---------
275
276These are useful if you want more control, or if you want to employ some of the
277algorithms implemented in this module in other circumstances.
278
279
280.. function:: parse(fp=None, environ=os.environ, keep_blank_values=False, strict_parsing=False, separator="&")
281
282   Parse a query in the environment or from a file (the file defaults to
283   ``sys.stdin``).  The *keep_blank_values*, *strict_parsing* and *separator* parameters are
284   passed to :func:`urllib.parse.parse_qs` unchanged.
285
286
287.. function:: parse_multipart(fp, pdict, encoding="utf-8", errors="replace", separator="&")
288
289   Parse input of type :mimetype:`multipart/form-data` (for  file uploads).
290   Arguments are *fp* for the input file, *pdict* for a dictionary containing
291   other parameters in the :mailheader:`Content-Type` header, and *encoding*,
292   the request encoding.
293
294   Returns a dictionary just like :func:`urllib.parse.parse_qs`: keys are the
295   field names, each value is a list of values for that field. For non-file
296   fields, the value is a list of strings.
297
298   This is easy to use but not much good if you are expecting megabytes to be
299   uploaded --- in that case, use the :class:`FieldStorage` class instead
300   which is much more flexible.
301
302   .. versionchanged:: 3.7
303      Added the *encoding* and *errors* parameters.  For non-file fields, the
304      value is now a list of strings, not bytes.
305
306   .. versionchanged:: 3.10
307      Added the *separator* parameter.
308
309
310.. function:: parse_header(string)
311
312   Parse a MIME header (such as :mailheader:`Content-Type`) into a main value and a
313   dictionary of parameters.
314
315
316.. function:: test()
317
318   Robust test CGI script, usable as main program. Writes minimal HTTP headers and
319   formats all information provided to the script in HTML format.
320
321
322.. function:: print_environ()
323
324   Format the shell environment in HTML.
325
326
327.. function:: print_form(form)
328
329   Format a form in HTML.
330
331
332.. function:: print_directory()
333
334   Format the current directory in HTML.
335
336
337.. function:: print_environ_usage()
338
339   Print a list of useful (used by CGI) environment variables in HTML.
340
341
342.. _cgi-security:
343
344Caring about security
345---------------------
346
347.. index:: pair: CGI; security
348
349There's one important rule: if you invoke an external program (via
350:func:`os.system`, :func:`os.popen` or other functions with similar
351functionality), make very sure you don't pass arbitrary strings received from
352the client to the shell.  This is a well-known security hole whereby clever
353hackers anywhere on the web can exploit a gullible CGI script to invoke
354arbitrary shell commands.  Even parts of the URL or field names cannot be
355trusted, since the request doesn't have to come from your form!
356
357To be on the safe side, if you must pass a string gotten from a form to a shell
358command, you should make sure the string contains only alphanumeric characters,
359dashes, underscores, and periods.
360
361
362Installing your CGI script on a Unix system
363-------------------------------------------
364
365Read the documentation for your HTTP server and check with your local system
366administrator to find the directory where CGI scripts should be installed;
367usually this is in a directory :file:`cgi-bin` in the server tree.
368
369Make sure that your script is readable and executable by "others"; the Unix file
370mode should be ``0o755`` octal (use ``chmod 0755 filename``).  Make sure that the
371first line of the script contains ``#!`` starting in column 1 followed by the
372pathname of the Python interpreter, for instance::
373
374   #!/usr/local/bin/python
375
376Make sure the Python interpreter exists and is executable by "others".
377
378Make sure that any files your script needs to read or write are readable or
379writable, respectively, by "others" --- their mode should be ``0o644`` for
380readable and ``0o666`` for writable.  This is because, for security reasons, the
381HTTP server executes your script as user "nobody", without any special
382privileges.  It can only read (write, execute) files that everybody can read
383(write, execute).  The current directory at execution time is also different (it
384is usually the server's cgi-bin directory) and the set of environment variables
385is also different from what you get when you log in.  In particular, don't count
386on the shell's search path for executables (:envvar:`PATH`) or the Python module
387search path (:envvar:`PYTHONPATH`) to be set to anything interesting.
388
389If you need to load modules from a directory which is not on Python's default
390module search path, you can change the path in your script, before importing
391other modules.  For example::
392
393   import sys
394   sys.path.insert(0, "/usr/home/joe/lib/python")
395   sys.path.insert(0, "/usr/local/lib/python")
396
397(This way, the directory inserted last will be searched first!)
398
399Instructions for non-Unix systems will vary; check your HTTP server's
400documentation (it will usually have a section on CGI scripts).
401
402
403Testing your CGI script
404-----------------------
405
406Unfortunately, a CGI script will generally not run when you try it from the
407command line, and a script that works perfectly from the command line may fail
408mysteriously when run from the server.  There's one reason why you should still
409test your script from the command line: if it contains a syntax error, the
410Python interpreter won't execute it at all, and the HTTP server will most likely
411send a cryptic error to the client.
412
413Assuming your script has no syntax errors, yet it does not work, you have no
414choice but to read the next section.
415
416
417Debugging CGI scripts
418---------------------
419
420.. index:: pair: CGI; debugging
421
422First of all, check for trivial installation errors --- reading the section
423above on installing your CGI script carefully can save you a lot of time.  If
424you wonder whether you have understood the installation procedure correctly, try
425installing a copy of this module file (:file:`cgi.py`) as a CGI script.  When
426invoked as a script, the file will dump its environment and the contents of the
427form in HTML format. Give it the right mode etc., and send it a request.  If it's
428installed in the standard :file:`cgi-bin` directory, it should be possible to
429send it a request by entering a URL into your browser of the form:
430
431.. code-block:: none
432
433   http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home
434
435If this gives an error of type 404, the server cannot find the script -- perhaps
436you need to install it in a different directory.  If it gives another error,
437there's an installation problem that you should fix before trying to go any
438further.  If you get a nicely formatted listing of the environment and form
439content (in this example, the fields should be listed as "addr" with value "At
440Home" and "name" with value "Joe Blow"), the :file:`cgi.py` script has been
441installed correctly.  If you follow the same procedure for your own script, you
442should now be able to debug it.
443
444The next step could be to call the :mod:`cgi` module's :func:`test` function
445from your script: replace its main code with the single statement ::
446
447   cgi.test()
448
449This should produce the same results as those gotten from installing the
450:file:`cgi.py` file itself.
451
452When an ordinary Python script raises an unhandled exception (for whatever
453reason: of a typo in a module name, a file that can't be opened, etc.), the
454Python interpreter prints a nice traceback and exits.  While the Python
455interpreter will still do this when your CGI script raises an exception, most
456likely the traceback will end up in one of the HTTP server's log files, or be
457discarded altogether.
458
459Fortunately, once you have managed to get your script to execute *some* code,
460you can easily send tracebacks to the web browser using the :mod:`cgitb` module.
461If you haven't done so already, just add the lines::
462
463   import cgitb
464   cgitb.enable()
465
466to the top of your script.  Then try running it again; when a problem occurs,
467you should see a detailed report that will likely make apparent the cause of the
468crash.
469
470If you suspect that there may be a problem in importing the :mod:`cgitb` module,
471you can use an even more robust approach (which only uses built-in modules)::
472
473   import sys
474   sys.stderr = sys.stdout
475   print("Content-Type: text/plain")
476   print()
477   ...your code here...
478
479This relies on the Python interpreter to print the traceback.  The content type
480of the output is set to plain text, which disables all HTML processing.  If your
481script works, the raw HTML will be displayed by your client.  If it raises an
482exception, most likely after the first two lines have been printed, a traceback
483will be displayed. Because no HTML interpretation is going on, the traceback
484will be readable.
485
486
487Common problems and solutions
488-----------------------------
489
490* Most HTTP servers buffer the output from CGI scripts until the script is
491  completed.  This means that it is not possible to display a progress report on
492  the client's display while the script is running.
493
494* Check the installation instructions above.
495
496* Check the HTTP server's log files.  (``tail -f logfile`` in a separate window
497  may be useful!)
498
499* Always check a script for syntax errors first, by doing something like
500  ``python script.py``.
501
502* If your script does not have any syntax errors, try adding ``import cgitb;
503  cgitb.enable()`` to the top of the script.
504
505* When invoking external programs, make sure they can be found. Usually, this
506  means using absolute path names --- :envvar:`PATH` is usually not set to a very
507  useful value in a CGI script.
508
509* When reading or writing external files, make sure they can be read or written
510  by the userid under which your CGI script will be running: this is typically the
511  userid under which the web server is running, or some explicitly specified
512  userid for a web server's ``suexec`` feature.
513
514* Don't try to give a CGI script a set-uid mode.  This doesn't work on most
515  systems, and is a security liability as well.
516
517.. rubric:: Footnotes
518
519.. [#] Note that some recent versions of the HTML specification do state what
520   order the field values should be supplied in, but knowing whether a request
521   was received from a conforming browser, or even from a browser at all, is
522   tedious and error-prone.
523