1****************************
2  What's New In Python 3.1
3****************************
4
5:Author: Raymond Hettinger
6
7.. $Id$
8   Rules for maintenance:
9
10   * Anyone can add text to this document.  Do not spend very much time
11   on the wording of your changes, because your text will probably
12   get rewritten to some degree.
13
14   * The maintainer will go through Misc/NEWS periodically and add
15   changes; it's therefore more important to add your changes to
16   Misc/NEWS than to this file.
17
18   * This is not a complete list of every single change; completeness
19   is the purpose of Misc/NEWS.  Some changes I consider too small
20   or esoteric to include.  If such a change is added to the text,
21   I'll just remove it.  (This is another reason you shouldn't spend
22   too much time on writing your addition.)
23
24   * If you want to draw your new text to the attention of the
25   maintainer, add 'XXX' to the beginning of the paragraph or
26   section.
27
28   * It's OK to just add a fragmentary note about a change.  For
29   example: "XXX Describe the transmogrify() function added to the
30   socket module."  The maintainer will research the change and
31   write the necessary text.
32
33   * You can comment out your additions if you like, but it's not
34   necessary (especially when a final release is some months away).
35
36   * Credit the author of a patch or bugfix.   Just the name is
37   sufficient; the e-mail address isn't necessary.
38
39   * It's helpful to add the bug/patch number as a comment:
40
41   % Patch 12345
42   XXX Describe the transmogrify() function added to the socket
43   module.
44   (Contributed by P.Y. Developer.)
45
46   This saves the maintainer the effort of going through the SVN log
47   when researching a change.
48
49This article explains the new features in Python 3.1, compared to 3.0.
50
51
52PEP 372: Ordered Dictionaries
53=============================
54
55Regular Python dictionaries iterate over key/value pairs in arbitrary order.
56Over the years, a number of authors have written alternative implementations
57that remember the order that the keys were originally inserted.  Based on
58the experiences from those implementations, a new
59:class:`collections.OrderedDict` class has been introduced.
60
61The OrderedDict API is substantially the same as regular dictionaries
62but will iterate over keys and values in a guaranteed order depending on
63when a key was first inserted.  If a new entry overwrites an existing entry,
64the original insertion position is left unchanged.  Deleting an entry and
65reinserting it will move it to the end.
66
67The standard library now supports use of ordered dictionaries in several
68modules.  The :mod:`configparser` module uses them by default.  This lets
69configuration files be read, modified, and then written back in their original
70order.  The *_asdict()* method for :func:`collections.namedtuple` now
71returns an ordered dictionary with the values appearing in the same order as
72the underlying tuple indices.  The :mod:`json` module is being built-out with
73an *object_pairs_hook* to allow OrderedDicts to be built by the decoder.
74Support was also added for third-party tools like `PyYAML <http://pyyaml.org/>`_.
75
76.. seealso::
77
78   :pep:`372` - Ordered Dictionaries
79      PEP written by Armin Ronacher and Raymond Hettinger.  Implementation
80      written by Raymond Hettinger.
81
82
83PEP 378: Format Specifier for Thousands Separator
84=================================================
85
86The built-in :func:`format` function and the :meth:`str.format` method use
87a mini-language that now includes a simple, non-locale aware way to format
88a number with a thousands separator.  That provides a way to humanize a
89program's output, improving its professional appearance and readability::
90
91    >>> format(1234567, ',d')
92    '1,234,567'
93    >>> format(1234567.89, ',.2f')
94    '1,234,567.89'
95    >>> format(12345.6 + 8901234.12j, ',f')
96    '12,345.600000+8,901,234.120000j'
97    >>> format(Decimal('1234567.89'), ',f')
98    '1,234,567.89'
99
100The supported types are :class:`int`, :class:`float`, :class:`complex`
101and :class:`decimal.Decimal`.
102
103Discussions are underway about how to specify alternative separators
104like dots, spaces, apostrophes, or underscores.  Locale-aware applications
105should use the existing *n* format specifier which already has some support
106for thousands separators.
107
108.. seealso::
109
110   :pep:`378` - Format Specifier for Thousands Separator
111      PEP written by Raymond Hettinger and implemented by Eric Smith and
112      Mark Dickinson.
113
114
115Other Language Changes
116======================
117
118Some smaller changes made to the core Python language are:
119
120* Directories and zip archives containing a :file:`__main__.py`
121  file can now be executed directly by passing their name to the
122  interpreter. The directory/zipfile is automatically inserted as the
123  first entry in sys.path.  (Suggestion and initial patch by Andy Chu;
124  revised patch by Phillip J. Eby and Nick Coghlan; :issue:`1739468`.)
125
126* The :func:`int` type gained a ``bit_length`` method that returns the
127  number of bits necessary to represent its argument in binary::
128
129      >>> n = 37
130      >>> bin(37)
131      '0b100101'
132      >>> n.bit_length()
133      6
134      >>> n = 2**123-1
135      >>> n.bit_length()
136      123
137      >>> (n+1).bit_length()
138      124
139
140  (Contributed by Fredrik Johansson, Victor Stinner, Raymond Hettinger,
141  and Mark Dickinson; :issue:`3439`.)
142
143* The fields in :func:`format` strings can now be automatically
144  numbered::
145
146    >>> 'Sir {} of {}'.format('Gallahad', 'Camelot')
147    'Sir Gallahad of Camelot'
148
149  Formerly, the string would have required numbered fields such as:
150  ``'Sir {0} of {1}'``.
151
152  (Contributed by Eric Smith; :issue:`5237`.)
153
154* The :func:`string.maketrans` function is deprecated and is replaced by new
155  static methods, :meth:`bytes.maketrans` and :meth:`bytearray.maketrans`.
156  This change solves the confusion around which types were supported by the
157  :mod:`string` module. Now, :class:`str`, :class:`bytes`, and
158  :class:`bytearray` each have their own **maketrans** and **translate**
159  methods with intermediate translation tables of the appropriate type.
160
161  (Contributed by Georg Brandl; :issue:`5675`.)
162
163* The syntax of the :keyword:`with` statement now allows multiple context
164  managers in a single statement::
165
166    >>> with open('mylog.txt') as infile, open('a.out', 'w') as outfile:
167    ...     for line in infile:
168    ...         if '<critical>' in line:
169    ...             outfile.write(line)
170
171  With the new syntax, the :func:`contextlib.nested` function is no longer
172  needed and is now deprecated.
173
174  (Contributed by Georg Brandl and Mattias Brändström;
175  `appspot issue 53094 <https://codereview.appspot.com/53094>`_.)
176
177* ``round(x, n)`` now returns an integer if *x* is an integer.
178  Previously it returned a float::
179
180    >>> round(1123, -2)
181    1100
182
183  (Contributed by Mark Dickinson; :issue:`4707`.)
184
185* Python now uses David Gay's algorithm for finding the shortest floating
186  point representation that doesn't change its value.  This should help
187  mitigate some of the confusion surrounding binary floating point
188  numbers.
189
190  The significance is easily seen with a number like ``1.1`` which does not
191  have an exact equivalent in binary floating point.  Since there is no exact
192  equivalent, an expression like ``float('1.1')`` evaluates to the nearest
193  representable value which is ``0x1.199999999999ap+0`` in hex or
194  ``1.100000000000000088817841970012523233890533447265625`` in decimal. That
195  nearest value was and still is used in subsequent floating point
196  calculations.
197
198  What is new is how the number gets displayed.  Formerly, Python used a
199  simple approach.  The value of ``repr(1.1)`` was computed as ``format(1.1,
200  '.17g')`` which evaluated to ``'1.1000000000000001'``. The advantage of
201  using 17 digits was that it relied on IEEE-754 guarantees to assure that
202  ``eval(repr(1.1))`` would round-trip exactly to its original value.  The
203  disadvantage is that many people found the output to be confusing (mistaking
204  intrinsic limitations of binary floating point representation as being a
205  problem with Python itself).
206
207  The new algorithm for ``repr(1.1)`` is smarter and returns ``'1.1'``.
208  Effectively, it searches all equivalent string representations (ones that
209  get stored with the same underlying float value) and returns the shortest
210  representation.
211
212  The new algorithm tends to emit cleaner representations when possible, but
213  it does not change the underlying values.  So, it is still the case that
214  ``1.1 + 2.2 != 3.3`` even though the representations may suggest otherwise.
215
216  The new algorithm depends on certain features in the underlying floating
217  point implementation.  If the required features are not found, the old
218  algorithm will continue to be used.  Also, the text pickle protocols
219  assure cross-platform portability by using the old algorithm.
220
221  (Contributed by Eric Smith and Mark Dickinson; :issue:`1580`)
222
223New, Improved, and Deprecated Modules
224=====================================
225
226* Added a :class:`collections.Counter` class to support convenient
227  counting of unique items in a sequence or iterable::
228
229      >>> Counter(['red', 'blue', 'red', 'green', 'blue', 'blue'])
230      Counter({'blue': 3, 'red': 2, 'green': 1})
231
232  (Contributed by Raymond Hettinger; :issue:`1696199`.)
233
234* Added a new module, :mod:`tkinter.ttk` for access to the Tk themed widget set.
235  The basic idea of ttk is to separate, to the extent possible, the code
236  implementing a widget's behavior from the code implementing its appearance.
237
238  (Contributed by Guilherme Polo; :issue:`2983`.)
239
240* The :class:`gzip.GzipFile` and :class:`bz2.BZ2File` classes now support
241  the context management protocol::
242
243        >>> # Automatically close file after writing
244        >>> with gzip.GzipFile(filename, "wb") as f:
245        ...     f.write(b"xxx")
246
247  (Contributed by Antoine Pitrou.)
248
249* The :mod:`decimal` module now supports methods for creating a
250  decimal object from a binary :class:`float`.  The conversion is
251  exact but can sometimes be surprising::
252
253      >>> Decimal.from_float(1.1)
254      Decimal('1.100000000000000088817841970012523233890533447265625')
255
256  The long decimal result shows the actual binary fraction being
257  stored for *1.1*.  The fraction has many digits because *1.1* cannot
258  be exactly represented in binary.
259
260  (Contributed by Raymond Hettinger and Mark Dickinson.)
261
262* The :mod:`itertools` module grew two new functions.  The
263  :func:`itertools.combinations_with_replacement` function is one of
264  four for generating combinatorics including permutations and Cartesian
265  products.  The :func:`itertools.compress` function mimics its namesake
266  from APL.  Also, the existing :func:`itertools.count` function now has
267  an optional *step* argument and can accept any type of counting
268  sequence including :class:`fractions.Fraction` and
269  :class:`decimal.Decimal`::
270
271    >>> [p+q for p,q in combinations_with_replacement('LOVE', 2)]
272    ['LL', 'LO', 'LV', 'LE', 'OO', 'OV', 'OE', 'VV', 'VE', 'EE']
273
274    >>> list(compress(data=range(10), selectors=[0,0,1,1,0,1,0,1,0,0]))
275    [2, 3, 5, 7]
276
277    >>> c = count(start=Fraction(1,2), step=Fraction(1,6))
278    >>> [next(c), next(c), next(c), next(c)]
279    [Fraction(1, 2), Fraction(2, 3), Fraction(5, 6), Fraction(1, 1)]
280
281  (Contributed by Raymond Hettinger.)
282
283* :func:`collections.namedtuple` now supports a keyword argument
284  *rename* which lets invalid fieldnames be automatically converted to
285  positional names in the form _0, _1, etc.  This is useful when
286  the field names are being created by an external source such as a
287  CSV header, SQL field list, or user input::
288
289    >>> query = input()
290    SELECT region, dept, count(*) FROM main GROUPBY region, dept
291
292    >>> cursor.execute(query)
293    >>> query_fields = [desc[0] for desc in cursor.description]
294    >>> UserQuery = namedtuple('UserQuery', query_fields, rename=True)
295    >>> pprint.pprint([UserQuery(*row) for row in cursor])
296    [UserQuery(region='South', dept='Shipping', _2=185),
297     UserQuery(region='North', dept='Accounting', _2=37),
298     UserQuery(region='West', dept='Sales', _2=419)]
299
300  (Contributed by Raymond Hettinger; :issue:`1818`.)
301
302* The :func:`re.sub`, :func:`re.subn` and :func:`re.split` functions now
303  accept a flags parameter.
304
305  (Contributed by Gregory Smith.)
306
307* The :mod:`logging` module now implements a simple :class:`logging.NullHandler`
308  class for applications that are not using logging but are calling
309  library code that does.  Setting-up a null handler will suppress
310  spurious warnings such as "No handlers could be found for logger foo"::
311
312    >>> h = logging.NullHandler()
313    >>> logging.getLogger("foo").addHandler(h)
314
315  (Contributed by Vinay Sajip; :issue:`4384`).
316
317* The :mod:`runpy` module which supports the ``-m`` command line switch
318  now supports the execution of packages by looking for and executing
319  a ``__main__`` submodule when a package name is supplied.
320
321  (Contributed by Andi Vajda; :issue:`4195`.)
322
323* The :mod:`pdb` module can now access and display source code loaded via
324  :mod:`zipimport` (or any other conformant :pep:`302` loader).
325
326  (Contributed by Alexander Belopolsky; :issue:`4201`.)
327
328*  :class:`functools.partial` objects can now be pickled.
329
330  (Suggested by Antoine Pitrou and Jesse Noller.  Implemented by
331  Jack Diederich; :issue:`5228`.)
332
333* Add :mod:`pydoc` help topics for symbols so that ``help('@')``
334  works as expected in the interactive environment.
335
336  (Contributed by David Laban; :issue:`4739`.)
337
338* The :mod:`unittest` module now supports skipping individual tests or classes
339  of tests. And it supports marking a test as an expected failure, a test that
340  is known to be broken, but shouldn't be counted as a failure on a
341  TestResult::
342
343    class TestGizmo(unittest.TestCase):
344
345        @unittest.skipUnless(sys.platform.startswith("win"), "requires Windows")
346        def test_gizmo_on_windows(self):
347            ...
348
349        @unittest.expectedFailure
350        def test_gimzo_without_required_library(self):
351            ...
352
353  Also, tests for exceptions have been builtout to work with context managers
354  using the :keyword:`with` statement::
355
356      def test_division_by_zero(self):
357          with self.assertRaises(ZeroDivisionError):
358              x / 0
359
360  In addition, several new assertion methods were added including
361  :func:`assertSetEqual`, :func:`assertDictEqual`,
362  :func:`assertDictContainsSubset`, :func:`assertListEqual`,
363  :func:`assertTupleEqual`, :func:`assertSequenceEqual`,
364  :func:`assertRaisesRegexp`, :func:`assertIsNone`,
365  and :func:`assertIsNotNone`.
366
367  (Contributed by Benjamin Peterson and Antoine Pitrou.)
368
369* The :mod:`io` module has three new constants for the :meth:`seek`
370  method :data:`SEEK_SET`, :data:`SEEK_CUR`, and :data:`SEEK_END`.
371
372* The :attr:`sys.version_info` tuple is now a named tuple::
373
374    >>> sys.version_info
375    sys.version_info(major=3, minor=1, micro=0, releaselevel='alpha', serial=2)
376
377  (Contributed by Ross Light; :issue:`4285`.)
378
379* The :mod:`nntplib` and :mod:`imaplib` modules now support IPv6.
380
381  (Contributed by Derek Morr; :issue:`1655` and :issue:`1664`.)
382
383* The :mod:`pickle` module has been adapted for better interoperability with
384  Python 2.x when used with protocol 2 or lower.  The reorganization of the
385  standard library changed the formal reference for many objects.  For
386  example, ``__builtin__.set`` in Python 2 is called ``builtins.set`` in Python
387  3. This change confounded efforts to share data between different versions of
388  Python.  But now when protocol 2 or lower is selected, the pickler will
389  automatically use the old Python 2 names for both loading and dumping. This
390  remapping is turned-on by default but can be disabled with the *fix_imports*
391  option::
392
393    >>> s = {1, 2, 3}
394    >>> pickle.dumps(s, protocol=0)
395    b'c__builtin__\nset\np0\n((lp1\nL1L\naL2L\naL3L\natp2\nRp3\n.'
396    >>> pickle.dumps(s, protocol=0, fix_imports=False)
397    b'cbuiltins\nset\np0\n((lp1\nL1L\naL2L\naL3L\natp2\nRp3\n.'
398
399  An unfortunate but unavoidable side-effect of this change is that protocol 2
400  pickles produced by Python 3.1 won't be readable with Python 3.0. The latest
401  pickle protocol, protocol 3, should be used when migrating data between
402  Python 3.x implementations, as it doesn't attempt to remain compatible with
403  Python 2.x.
404
405  (Contributed by Alexandre Vassalotti and Antoine Pitrou, :issue:`6137`.)
406
407* A new module, :mod:`importlib` was added.  It provides a complete, portable,
408  pure Python reference implementation of the :keyword:`import` statement and its
409  counterpart, the :func:`__import__` function.  It represents a substantial
410  step forward in documenting and defining the actions that take place during
411  imports.
412
413  (Contributed by Brett Cannon.)
414
415Optimizations
416=============
417
418Major performance enhancements have been added:
419
420* The new I/O library (as defined in :pep:`3116`) was mostly written in
421  Python and quickly proved to be a problematic bottleneck in Python 3.0.
422  In Python 3.1, the I/O library has been entirely rewritten in C and is
423  2 to 20 times faster depending on the task at hand. The pure Python
424  version is still available for experimentation purposes through
425  the ``_pyio`` module.
426
427  (Contributed by Amaury Forgeot d'Arc and Antoine Pitrou.)
428
429* Added a heuristic so that tuples and dicts containing only untrackable objects
430  are not tracked by the garbage collector. This can reduce the size of
431  collections and therefore the garbage collection overhead on long-running
432  programs, depending on their particular use of datatypes.
433
434  (Contributed by Antoine Pitrou, :issue:`4688`.)
435
436* Enabling a configure option named ``--with-computed-gotos``
437  on compilers that support it (notably: gcc, SunPro, icc), the bytecode
438  evaluation loop is compiled with a new dispatch mechanism which gives
439  speedups of up to 20%, depending on the system, the compiler, and
440  the benchmark.
441
442  (Contributed by Antoine Pitrou along with a number of other participants,
443  :issue:`4753`).
444
445* The decoding of UTF-8, UTF-16 and LATIN-1 is now two to four times
446  faster.
447
448  (Contributed by Antoine Pitrou and Amaury Forgeot d'Arc, :issue:`4868`.)
449
450* The :mod:`json` module now has a C extension to substantially improve
451  its performance.  In addition, the API was modified so that json works
452  only with :class:`str`, not with :class:`bytes`.  That change makes the
453  module closely match the `JSON specification <http://json.org/>`_
454  which is defined in terms of Unicode.
455
456  (Contributed by Bob Ippolito and converted to Py3.1 by Antoine Pitrou
457  and Benjamin Peterson; :issue:`4136`.)
458
459* Unpickling now interns the attribute names of pickled objects.  This saves
460  memory and allows pickles to be smaller.
461
462  (Contributed by Jake McGuire and Antoine Pitrou; :issue:`5084`.)
463
464IDLE
465====
466
467* IDLE's format menu now provides an option to strip trailing whitespace
468  from a source file.
469
470  (Contributed by Roger D. Serwy; :issue:`5150`.)
471
472Build and C API Changes
473=======================
474
475Changes to Python's build process and to the C API include:
476
477* Integers are now stored internally either in base 2**15 or in base
478  2**30, the base being determined at build time.  Previously, they
479  were always stored in base 2**15.  Using base 2**30 gives
480  significant performance improvements on 64-bit machines, but
481  benchmark results on 32-bit machines have been mixed.  Therefore,
482  the default is to use base 2**30 on 64-bit machines and base 2**15
483  on 32-bit machines; on Unix, there's a new configure option
484  ``--enable-big-digits`` that can be used to override this default.
485
486  Apart from the performance improvements this change should be invisible to
487  end users, with one exception: for testing and debugging purposes there's a
488  new :attr:`sys.int_info` that provides information about the
489  internal format, giving the number of bits per digit and the size in bytes
490  of the C type used to store each digit::
491
492     >>> import sys
493     >>> sys.int_info
494     sys.int_info(bits_per_digit=30, sizeof_digit=4)
495
496  (Contributed by Mark Dickinson; :issue:`4258`.)
497
498* The :c:func:`PyLong_AsUnsignedLongLong()` function now handles a negative
499  *pylong* by raising :exc:`OverflowError` instead of :exc:`TypeError`.
500
501  (Contributed by Mark Dickinson and Lisandro Dalcrin; :issue:`5175`.)
502
503* Deprecated :c:func:`PyNumber_Int`.  Use :c:func:`PyNumber_Long` instead.
504
505  (Contributed by Mark Dickinson; :issue:`4910`.)
506
507* Added a new :c:func:`PyOS_string_to_double` function to replace the
508  deprecated functions :c:func:`PyOS_ascii_strtod` and :c:func:`PyOS_ascii_atof`.
509
510  (Contributed by Mark Dickinson; :issue:`5914`.)
511
512* Added :c:type:`PyCapsule` as a replacement for the :c:type:`PyCObject` API.
513  The principal difference is that the new type has a well defined interface
514  for passing typing safety information and a less complicated signature
515  for calling a destructor.  The old type had a problematic API and is now
516  deprecated.
517
518  (Contributed by Larry Hastings; :issue:`5630`.)
519
520Porting to Python 3.1
521=====================
522
523This section lists previously described changes and other bugfixes
524that may require changes to your code:
525
526* The new floating point string representations can break existing doctests.
527  For example::
528
529    def e():
530        '''Compute the base of natural logarithms.
531
532        >>> e()
533        2.7182818284590451
534
535        '''
536        return sum(1/math.factorial(x) for x in reversed(range(30)))
537
538    doctest.testmod()
539
540    **********************************************************************
541    Failed example:
542        e()
543    Expected:
544        2.7182818284590451
545    Got:
546        2.718281828459045
547    **********************************************************************
548
549* The automatic name remapping in the pickle module for protocol 2 or lower can
550  make Python 3.1 pickles unreadable in Python 3.0.  One solution is to use
551  protocol 3.  Another solution is to set the *fix_imports* option to ``False``.
552  See the discussion above for more details.
553