1:mod:`cgi` --- Common Gateway Interface support
2===============================================
3
4.. module:: cgi
5   :synopsis: Helpers for running Python scripts via the Common Gateway Interface.
6
7**Source code:** :source:`Lib/cgi.py`
8
9.. index::
10   pair: WWW; server
11   pair: CGI; protocol
12   pair: HTTP; protocol
13   pair: MIME; headers
14   single: URL
15   single: Common Gateway Interface
16
17--------------
18
19Support module for Common Gateway Interface (CGI) scripts.
20
21This module defines a number of utilities for use by CGI scripts written in
22Python.
23
24
25Introduction
26------------
27
28.. _cgi-intro:
29
30A CGI script is invoked by an HTTP server, usually to process user input
31submitted through an HTML ``<FORM>`` or ``<ISINDEX>`` element.
32
33Most often, CGI scripts live in the server's special :file:`cgi-bin` directory.
34The HTTP server places all sorts of information about the request (such as the
35client's hostname, the requested URL, the query string, and lots of other
36goodies) in the script's shell environment, executes the script, and sends the
37script's output back to the client.
38
39The script's input is connected to the client too, and sometimes the form data
40is read this way; at other times the form data is passed via the "query string"
41part of the URL.  This module is intended to take care of the different cases
42and provide a simpler interface to the Python script.  It also provides a number
43of utilities that help in debugging scripts, and the latest addition is support
44for file uploads from a form (if your browser supports it).
45
46The output of a CGI script should consist of two sections, separated by a blank
47line.  The first section contains a number of headers, telling the client what
48kind of data is following.  Python code to generate a minimal header section
49looks like this::
50
51   print("Content-Type: text/html")    # HTML is following
52   print()                             # blank line, end of headers
53
54The second section is usually HTML, which allows the client software to display
55nicely formatted text with header, in-line images, etc. Here's Python code that
56prints a simple piece of HTML::
57
58   print("<TITLE>CGI script output</TITLE>")
59   print("<H1>This is my first CGI script</H1>")
60   print("Hello, world!")
61
62
63.. _using-the-cgi-module:
64
65Using the cgi module
66--------------------
67
68Begin by writing ``import cgi``.
69
70When you write a new script, consider adding these lines::
71
72   import cgitb
73   cgitb.enable()
74
75This activates a special exception handler that will display detailed reports in
76the Web browser if any errors occur.  If you'd rather not show the guts of your
77program to users of your script, you can have the reports saved to files
78instead, with code like this::
79
80   import cgitb
81   cgitb.enable(display=0, logdir="/path/to/logdir")
82
83It's very helpful to use this feature during script development. The reports
84produced by :mod:`cgitb` provide information that can save you a lot of time in
85tracking down bugs.  You can always remove the ``cgitb`` line later when you
86have tested your script and are confident that it works correctly.
87
88To get at submitted form data, use the :class:`FieldStorage` class. If the form
89contains non-ASCII characters, use the *encoding* keyword parameter set to the
90value of the encoding defined for the document. It is usually contained in the
91META tag in the HEAD section of the HTML document or by the
92:mailheader:`Content-Type` header).  This reads the form contents from the
93standard input or the environment (depending on the value of various
94environment variables set according to the CGI standard).  Since it may consume
95standard input, it should be instantiated only once.
96
97The :class:`FieldStorage` instance can be indexed like a Python dictionary.
98It allows membership testing with the :keyword:`in` operator, and also supports
99the standard dictionary method :meth:`~dict.keys` and the built-in function
100:func:`len`.  Form fields containing empty strings are ignored and do not appear
101in the dictionary; to keep such values, provide a true value for the optional
102*keep_blank_values* keyword parameter when creating the :class:`FieldStorage`
103instance.
104
105For instance, the following code (which assumes that the
106:mailheader:`Content-Type` header and blank line have already been printed)
107checks that the fields ``name`` and ``addr`` are both set to a non-empty
108string::
109
110   form = cgi.FieldStorage()
111   if "name" not in form or "addr" not in form:
112       print("<H1>Error</H1>")
113       print("Please fill in the name and addr fields.")
114       return
115   print("<p>name:", form["name"].value)
116   print("<p>addr:", form["addr"].value)
117   ...further form processing here...
118
119Here the fields, accessed through ``form[key]``, are themselves instances of
120:class:`FieldStorage` (or :class:`MiniFieldStorage`, depending on the form
121encoding). The :attr:`~FieldStorage.value` attribute of the instance yields
122the string value of the field.  The :meth:`~FieldStorage.getvalue` method
123returns this string value directly; it also accepts an optional second argument
124as a default to return if the requested key is not present.
125
126If the submitted form data contains more than one field with the same name, the
127object retrieved by ``form[key]`` is not a :class:`FieldStorage` or
128:class:`MiniFieldStorage` instance but a list of such instances.  Similarly, in
129this situation, ``form.getvalue(key)`` would return a list of strings. If you
130expect this possibility (when your HTML form contains multiple fields with the
131same name), use the :meth:`~FieldStorage.getlist` method, which always returns
132a list of values (so that you do not need to special-case the single item
133case).  For example, this code concatenates any number of username fields,
134separated by commas::
135
136   value = form.getlist("username")
137   usernames = ",".join(value)
138
139If a field represents an uploaded file, accessing the value via the
140:attr:`~FieldStorage.value` attribute or the :meth:`~FieldStorage.getvalue`
141method reads the entire file in memory as bytes.  This may not be what you
142want.  You can test for an uploaded file by testing either the
143:attr:`~FieldStorage.filename` attribute or the :attr:`~FieldStorage.file`
144attribute.  You can then read the data from the :attr:`!file`
145attribute before it is automatically closed as part of the garbage collection of
146the :class:`FieldStorage` instance
147(the :func:`~io.RawIOBase.read` and :func:`~io.IOBase.readline` methods will
148return bytes)::
149
150   fileitem = form["userfile"]
151   if fileitem.file:
152       # It's an uploaded file; count lines
153       linecount = 0
154       while True:
155           line = fileitem.file.readline()
156           if not line: break
157           linecount = linecount + 1
158
159:class:`FieldStorage` objects also support being used in a :keyword:`with`
160statement, which will automatically close them when done.
161
162If an error is encountered when obtaining the contents of an uploaded file
163(for example, when the user interrupts the form submission by clicking on
164a Back or Cancel button) the :attr:`~FieldStorage.done` attribute of the
165object for the field will be set to the value -1.
166
167The file upload draft standard entertains the possibility of uploading multiple
168files from one field (using a recursive :mimetype:`multipart/\*` encoding).
169When this occurs, the item will be a dictionary-like :class:`FieldStorage` item.
170This can be determined by testing its :attr:`!type` attribute, which should be
171:mimetype:`multipart/form-data` (or perhaps another MIME type matching
172:mimetype:`multipart/\*`).  In this case, it can be iterated over recursively
173just like the top-level form object.
174
175When a form is submitted in the "old" format (as the query string or as a single
176data part of type :mimetype:`application/x-www-form-urlencoded`), the items will
177actually be instances of the class :class:`MiniFieldStorage`.  In this case, the
178:attr:`!list`, :attr:`!file`, and :attr:`filename` attributes are always ``None``.
179
180A form submitted via POST that also has a query string will contain both
181:class:`FieldStorage` and :class:`MiniFieldStorage` items.
182
183.. versionchanged:: 3.4
184   The :attr:`~FieldStorage.file` attribute is automatically closed upon the
185   garbage collection of the creating :class:`FieldStorage` instance.
186
187.. versionchanged:: 3.5
188   Added support for the context management protocol to the
189   :class:`FieldStorage` class.
190
191
192Higher Level Interface
193----------------------
194
195The previous section explains how to read CGI form data using the
196:class:`FieldStorage` class.  This section describes a higher level interface
197which was added to this class to allow one to do it in a more readable and
198intuitive way.  The interface doesn't make the techniques described in previous
199sections obsolete --- they are still useful to process file uploads efficiently,
200for example.
201
202.. XXX: Is this true ?
203
204The interface consists of two simple methods. Using the methods you can process
205form data in a generic way, without the need to worry whether only one or more
206values were posted under one name.
207
208In the previous section, you learned to write following code anytime you
209expected a user to post more than one value under one name::
210
211   item = form.getvalue("item")
212   if isinstance(item, list):
213       # The user is requesting more than one item.
214   else:
215       # The user is requesting only one item.
216
217This situation is common for example when a form contains a group of multiple
218checkboxes with the same name::
219
220   <input type="checkbox" name="item" value="1" />
221   <input type="checkbox" name="item" value="2" />
222
223In most situations, however, there's only one form control with a particular
224name in a form and then you expect and need only one value associated with this
225name.  So you write a script containing for example this code::
226
227   user = form.getvalue("user").upper()
228
229The problem with the code is that you should never expect that a client will
230provide valid input to your scripts.  For example, if a curious user appends
231another ``user=foo`` pair to the query string, then the script would crash,
232because in this situation the ``getvalue("user")`` method call returns a list
233instead of a string.  Calling the :meth:`~str.upper` method on a list is not valid
234(since lists do not have a method of this name) and results in an
235:exc:`AttributeError` exception.
236
237Therefore, the appropriate way to read form data values was to always use the
238code which checks whether the obtained value is a single value or a list of
239values.  That's annoying and leads to less readable scripts.
240
241A more convenient approach is to use the methods :meth:`~FieldStorage.getfirst`
242and :meth:`~FieldStorage.getlist` provided by this higher level interface.
243
244
245.. method:: FieldStorage.getfirst(name, default=None)
246
247   This method always returns only one value associated with form field *name*.
248   The method returns only the first value in case that more values were posted
249   under such name.  Please note that the order in which the values are received
250   may vary from browser to browser and should not be counted on. [#]_  If no such
251   form field or value exists then the method returns the value specified by the
252   optional parameter *default*.  This parameter defaults to ``None`` if not
253   specified.
254
255
256.. method:: FieldStorage.getlist(name)
257
258   This method always returns a list of values associated with form field *name*.
259   The method returns an empty list if no such form field or value exists for
260   *name*.  It returns a list consisting of one item if only one such value exists.
261
262Using these methods you can write nice compact code::
263
264   import cgi
265   form = cgi.FieldStorage()
266   user = form.getfirst("user", "").upper()    # This way it's safe.
267   for item in form.getlist("item"):
268       do_something(item)
269
270
271.. _functions-in-cgi-module:
272
273Functions
274---------
275
276These are useful if you want more control, or if you want to employ some of the
277algorithms implemented in this module in other circumstances.
278
279
280.. function:: parse(fp=None, environ=os.environ, keep_blank_values=False, strict_parsing=False, separator="&")
281
282   Parse a query in the environment or from a file (the file defaults to
283   ``sys.stdin``).  The *keep_blank_values*, *strict_parsing* and *separator* parameters are
284   passed to :func:`urllib.parse.parse_qs` unchanged.
285
286   .. versionchanged:: 3.8.8
287      Added the *separator* parameter.
288
289.. function:: parse_multipart(fp, pdict, encoding="utf-8", errors="replace", separator="&")
290
291   Parse input of type :mimetype:`multipart/form-data` (for  file uploads).
292   Arguments are *fp* for the input file, *pdict* for a dictionary containing
293   other parameters in the :mailheader:`Content-Type` header, and *encoding*,
294   the request encoding.
295
296   Returns a dictionary just like :func:`urllib.parse.parse_qs`: keys are the
297   field names, each value is a list of values for that field. For non-file
298   fields, the value is a list of strings.
299
300   This is easy to use but not much good if you are expecting megabytes to be
301   uploaded --- in that case, use the :class:`FieldStorage` class instead
302   which is much more flexible.
303
304   .. versionchanged:: 3.7
305      Added the *encoding* and *errors* parameters.  For non-file fields, the
306      value is now a list of strings, not bytes.
307
308   .. versionchanged:: 3.8.8
309      Added the *separator* parameter.
310
311
312.. function:: parse_header(string)
313
314   Parse a MIME header (such as :mailheader:`Content-Type`) into a main value and a
315   dictionary of parameters.
316
317
318.. function:: test()
319
320   Robust test CGI script, usable as main program. Writes minimal HTTP headers and
321   formats all information provided to the script in HTML form.
322
323
324.. function:: print_environ()
325
326   Format the shell environment in HTML.
327
328
329.. function:: print_form(form)
330
331   Format a form in HTML.
332
333
334.. function:: print_directory()
335
336   Format the current directory in HTML.
337
338
339.. function:: print_environ_usage()
340
341   Print a list of useful (used by CGI) environment variables in HTML.
342
343
344.. _cgi-security:
345
346Caring about security
347---------------------
348
349.. index:: pair: CGI; security
350
351There's one important rule: if you invoke an external program (via the
352:func:`os.system` or :func:`os.popen` functions. or others with similar
353functionality), make very sure you don't pass arbitrary strings received from
354the client to the shell.  This is a well-known security hole whereby clever
355hackers anywhere on the Web can exploit a gullible CGI script to invoke
356arbitrary shell commands.  Even parts of the URL or field names cannot be
357trusted, since the request doesn't have to come from your form!
358
359To be on the safe side, if you must pass a string gotten from a form to a shell
360command, you should make sure the string contains only alphanumeric characters,
361dashes, underscores, and periods.
362
363
364Installing your CGI script on a Unix system
365-------------------------------------------
366
367Read the documentation for your HTTP server and check with your local system
368administrator to find the directory where CGI scripts should be installed;
369usually this is in a directory :file:`cgi-bin` in the server tree.
370
371Make sure that your script is readable and executable by "others"; the Unix file
372mode should be ``0o755`` octal (use ``chmod 0755 filename``).  Make sure that the
373first line of the script contains ``#!`` starting in column 1 followed by the
374pathname of the Python interpreter, for instance::
375
376   #!/usr/local/bin/python
377
378Make sure the Python interpreter exists and is executable by "others".
379
380Make sure that any files your script needs to read or write are readable or
381writable, respectively, by "others" --- their mode should be ``0o644`` for
382readable and ``0o666`` for writable.  This is because, for security reasons, the
383HTTP server executes your script as user "nobody", without any special
384privileges.  It can only read (write, execute) files that everybody can read
385(write, execute).  The current directory at execution time is also different (it
386is usually the server's cgi-bin directory) and the set of environment variables
387is also different from what you get when you log in.  In particular, don't count
388on the shell's search path for executables (:envvar:`PATH`) or the Python module
389search path (:envvar:`PYTHONPATH`) to be set to anything interesting.
390
391If you need to load modules from a directory which is not on Python's default
392module search path, you can change the path in your script, before importing
393other modules.  For example::
394
395   import sys
396   sys.path.insert(0, "/usr/home/joe/lib/python")
397   sys.path.insert(0, "/usr/local/lib/python")
398
399(This way, the directory inserted last will be searched first!)
400
401Instructions for non-Unix systems will vary; check your HTTP server's
402documentation (it will usually have a section on CGI scripts).
403
404
405Testing your CGI script
406-----------------------
407
408Unfortunately, a CGI script will generally not run when you try it from the
409command line, and a script that works perfectly from the command line may fail
410mysteriously when run from the server.  There's one reason why you should still
411test your script from the command line: if it contains a syntax error, the
412Python interpreter won't execute it at all, and the HTTP server will most likely
413send a cryptic error to the client.
414
415Assuming your script has no syntax errors, yet it does not work, you have no
416choice but to read the next section.
417
418
419Debugging CGI scripts
420---------------------
421
422.. index:: pair: CGI; debugging
423
424First of all, check for trivial installation errors --- reading the section
425above on installing your CGI script carefully can save you a lot of time.  If
426you wonder whether you have understood the installation procedure correctly, try
427installing a copy of this module file (:file:`cgi.py`) as a CGI script.  When
428invoked as a script, the file will dump its environment and the contents of the
429form in HTML form. Give it the right mode etc, and send it a request.  If it's
430installed in the standard :file:`cgi-bin` directory, it should be possible to
431send it a request by entering a URL into your browser of the form:
432
433.. code-block:: none
434
435   http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home
436
437If this gives an error of type 404, the server cannot find the script -- perhaps
438you need to install it in a different directory.  If it gives another error,
439there's an installation problem that you should fix before trying to go any
440further.  If you get a nicely formatted listing of the environment and form
441content (in this example, the fields should be listed as "addr" with value "At
442Home" and "name" with value "Joe Blow"), the :file:`cgi.py` script has been
443installed correctly.  If you follow the same procedure for your own script, you
444should now be able to debug it.
445
446The next step could be to call the :mod:`cgi` module's :func:`test` function
447from your script: replace its main code with the single statement ::
448
449   cgi.test()
450
451This should produce the same results as those gotten from installing the
452:file:`cgi.py` file itself.
453
454When an ordinary Python script raises an unhandled exception (for whatever
455reason: of a typo in a module name, a file that can't be opened, etc.), the
456Python interpreter prints a nice traceback and exits.  While the Python
457interpreter will still do this when your CGI script raises an exception, most
458likely the traceback will end up in one of the HTTP server's log files, or be
459discarded altogether.
460
461Fortunately, once you have managed to get your script to execute *some* code,
462you can easily send tracebacks to the Web browser using the :mod:`cgitb` module.
463If you haven't done so already, just add the lines::
464
465   import cgitb
466   cgitb.enable()
467
468to the top of your script.  Then try running it again; when a problem occurs,
469you should see a detailed report that will likely make apparent the cause of the
470crash.
471
472If you suspect that there may be a problem in importing the :mod:`cgitb` module,
473you can use an even more robust approach (which only uses built-in modules)::
474
475   import sys
476   sys.stderr = sys.stdout
477   print("Content-Type: text/plain")
478   print()
479   ...your code here...
480
481This relies on the Python interpreter to print the traceback.  The content type
482of the output is set to plain text, which disables all HTML processing.  If your
483script works, the raw HTML will be displayed by your client.  If it raises an
484exception, most likely after the first two lines have been printed, a traceback
485will be displayed. Because no HTML interpretation is going on, the traceback
486will be readable.
487
488
489Common problems and solutions
490-----------------------------
491
492* Most HTTP servers buffer the output from CGI scripts until the script is
493  completed.  This means that it is not possible to display a progress report on
494  the client's display while the script is running.
495
496* Check the installation instructions above.
497
498* Check the HTTP server's log files.  (``tail -f logfile`` in a separate window
499  may be useful!)
500
501* Always check a script for syntax errors first, by doing something like
502  ``python script.py``.
503
504* If your script does not have any syntax errors, try adding ``import cgitb;
505  cgitb.enable()`` to the top of the script.
506
507* When invoking external programs, make sure they can be found. Usually, this
508  means using absolute path names --- :envvar:`PATH` is usually not set to a very
509  useful value in a CGI script.
510
511* When reading or writing external files, make sure they can be read or written
512  by the userid under which your CGI script will be running: this is typically the
513  userid under which the web server is running, or some explicitly specified
514  userid for a web server's ``suexec`` feature.
515
516* Don't try to give a CGI script a set-uid mode.  This doesn't work on most
517  systems, and is a security liability as well.
518
519.. rubric:: Footnotes
520
521.. [#] Note that some recent versions of the HTML specification do state what
522   order the field values should be supplied in, but knowing whether a request
523   was received from a conforming browser, or even from a browser at all, is
524   tedious and error-prone.
525