1.. _sortinghowto: 2 3Sorting HOW TO 4************** 5 6:Author: Andrew Dalke and Raymond Hettinger 7:Release: 0.1 8 9 10Python lists have a built-in :meth:`list.sort` method that modifies the list 11in-place. There is also a :func:`sorted` built-in function that builds a new 12sorted list from an iterable. 13 14In this document, we explore the various techniques for sorting data using Python. 15 16 17Sorting Basics 18============== 19 20A simple ascending sort is very easy: just call the :func:`sorted` function. It 21returns a new sorted list:: 22 23 >>> sorted([5, 2, 3, 1, 4]) 24 [1, 2, 3, 4, 5] 25 26You can also use the :meth:`list.sort` method. It modifies the list 27in-place (and returns ``None`` to avoid confusion). Usually it's less convenient 28than :func:`sorted` - but if you don't need the original list, it's slightly 29more efficient. 30 31 >>> a = [5, 2, 3, 1, 4] 32 >>> a.sort() 33 >>> a 34 [1, 2, 3, 4, 5] 35 36Another difference is that the :meth:`list.sort` method is only defined for 37lists. In contrast, the :func:`sorted` function accepts any iterable. 38 39 >>> sorted({1: 'D', 2: 'B', 3: 'B', 4: 'E', 5: 'A'}) 40 [1, 2, 3, 4, 5] 41 42Key Functions 43============= 44 45Both :meth:`list.sort` and :func:`sorted` have a *key* parameter to specify a 46function to be called on each list element prior to making comparisons. 47 48For example, here's a case-insensitive string comparison: 49 50 >>> sorted("This is a test string from Andrew".split(), key=str.lower) 51 ['a', 'Andrew', 'from', 'is', 'string', 'test', 'This'] 52 53The value of the *key* parameter should be a function that takes a single argument 54and returns a key to use for sorting purposes. This technique is fast because 55the key function is called exactly once for each input record. 56 57A common pattern is to sort complex objects using some of the object's indices 58as keys. For example: 59 60 >>> student_tuples = [ 61 ... ('john', 'A', 15), 62 ... ('jane', 'B', 12), 63 ... ('dave', 'B', 10), 64 ... ] 65 >>> sorted(student_tuples, key=lambda student: student[2]) # sort by age 66 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 67 68The same technique works for objects with named attributes. For example: 69 70 >>> class Student: 71 ... def __init__(self, name, grade, age): 72 ... self.name = name 73 ... self.grade = grade 74 ... self.age = age 75 ... def __repr__(self): 76 ... return repr((self.name, self.grade, self.age)) 77 78 >>> student_objects = [ 79 ... Student('john', 'A', 15), 80 ... Student('jane', 'B', 12), 81 ... Student('dave', 'B', 10), 82 ... ] 83 >>> sorted(student_objects, key=lambda student: student.age) # sort by age 84 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 85 86Operator Module Functions 87========================= 88 89The key-function patterns shown above are very common, so Python provides 90convenience functions to make accessor functions easier and faster. The 91:mod:`operator` module has :func:`~operator.itemgetter`, 92:func:`~operator.attrgetter`, and a :func:`~operator.methodcaller` function. 93 94Using those functions, the above examples become simpler and faster: 95 96 >>> from operator import itemgetter, attrgetter 97 98 >>> sorted(student_tuples, key=itemgetter(2)) 99 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 100 101 >>> sorted(student_objects, key=attrgetter('age')) 102 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 103 104The operator module functions allow multiple levels of sorting. For example, to 105sort by *grade* then by *age*: 106 107 >>> sorted(student_tuples, key=itemgetter(1,2)) 108 [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)] 109 110 >>> sorted(student_objects, key=attrgetter('grade', 'age')) 111 [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)] 112 113Ascending and Descending 114======================== 115 116Both :meth:`list.sort` and :func:`sorted` accept a *reverse* parameter with a 117boolean value. This is used to flag descending sorts. For example, to get the 118student data in reverse *age* order: 119 120 >>> sorted(student_tuples, key=itemgetter(2), reverse=True) 121 [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] 122 123 >>> sorted(student_objects, key=attrgetter('age'), reverse=True) 124 [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] 125 126Sort Stability and Complex Sorts 127================================ 128 129Sorts are guaranteed to be `stable 130<https://en.wikipedia.org/wiki/Sorting_algorithm#Stability>`_\. That means that 131when multiple records have the same key, their original order is preserved. 132 133 >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)] 134 >>> sorted(data, key=itemgetter(0)) 135 [('blue', 1), ('blue', 2), ('red', 1), ('red', 2)] 136 137Notice how the two records for *blue* retain their original order so that 138``('blue', 1)`` is guaranteed to precede ``('blue', 2)``. 139 140This wonderful property lets you build complex sorts in a series of sorting 141steps. For example, to sort the student data by descending *grade* and then 142ascending *age*, do the *age* sort first and then sort again using *grade*: 143 144 >>> s = sorted(student_objects, key=attrgetter('age')) # sort on secondary key 145 >>> sorted(s, key=attrgetter('grade'), reverse=True) # now sort on primary key, descending 146 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 147 148This can be abstracted out into a wrapper function that can take a list and 149tuples of field and order to sort them on multiple passes. 150 151 >>> def multisort(xs, specs): 152 ... for key, reverse in reversed(specs): 153 ... xs.sort(key=attrgetter(key), reverse=reverse) 154 ... return xs 155 156 >>> multisort(list(student_objects), (('grade', True), ('age', False))) 157 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 158 159The `Timsort <https://en.wikipedia.org/wiki/Timsort>`_ algorithm used in Python 160does multiple sorts efficiently because it can take advantage of any ordering 161already present in a dataset. 162 163The Old Way Using Decorate-Sort-Undecorate 164========================================== 165 166This idiom is called Decorate-Sort-Undecorate after its three steps: 167 168* First, the initial list is decorated with new values that control the sort order. 169 170* Second, the decorated list is sorted. 171 172* Finally, the decorations are removed, creating a list that contains only the 173 initial values in the new order. 174 175For example, to sort the student data by *grade* using the DSU approach: 176 177 >>> decorated = [(student.grade, i, student) for i, student in enumerate(student_objects)] 178 >>> decorated.sort() 179 >>> [student for grade, i, student in decorated] # undecorate 180 [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] 181 182This idiom works because tuples are compared lexicographically; the first items 183are compared; if they are the same then the second items are compared, and so 184on. 185 186It is not strictly necessary in all cases to include the index *i* in the 187decorated list, but including it gives two benefits: 188 189* The sort is stable -- if two items have the same key, their order will be 190 preserved in the sorted list. 191 192* The original items do not have to be comparable because the ordering of the 193 decorated tuples will be determined by at most the first two items. So for 194 example the original list could contain complex numbers which cannot be sorted 195 directly. 196 197Another name for this idiom is 198`Schwartzian transform <https://en.wikipedia.org/wiki/Schwartzian_transform>`_\, 199after Randal L. Schwartz, who popularized it among Perl programmers. 200 201Now that Python sorting provides key-functions, this technique is not often needed. 202 203 204The Old Way Using the *cmp* Parameter 205===================================== 206 207Many constructs given in this HOWTO assume Python 2.4 or later. Before that, 208there was no :func:`sorted` builtin and :meth:`list.sort` took no keyword 209arguments. Instead, all of the Py2.x versions supported a *cmp* parameter to 210handle user specified comparison functions. 211 212In Py3.0, the *cmp* parameter was removed entirely (as part of a larger effort to 213simplify and unify the language, eliminating the conflict between rich 214comparisons and the :meth:`__cmp__` magic method). 215 216In Py2.x, sort allowed an optional function which can be called for doing the 217comparisons. That function should take two arguments to be compared and then 218return a negative value for less-than, return zero if they are equal, or return 219a positive value for greater-than. For example, we can do: 220 221 >>> def numeric_compare(x, y): 222 ... return x - y 223 >>> sorted([5, 2, 4, 1, 3], cmp=numeric_compare) # doctest: +SKIP 224 [1, 2, 3, 4, 5] 225 226Or you can reverse the order of comparison with: 227 228 >>> def reverse_numeric(x, y): 229 ... return y - x 230 >>> sorted([5, 2, 4, 1, 3], cmp=reverse_numeric) # doctest: +SKIP 231 [5, 4, 3, 2, 1] 232 233When porting code from Python 2.x to 3.x, the situation can arise when you have 234the user supplying a comparison function and you need to convert that to a key 235function. The following wrapper makes that easy to do:: 236 237 def cmp_to_key(mycmp): 238 'Convert a cmp= function into a key= function' 239 class K: 240 def __init__(self, obj, *args): 241 self.obj = obj 242 def __lt__(self, other): 243 return mycmp(self.obj, other.obj) < 0 244 def __gt__(self, other): 245 return mycmp(self.obj, other.obj) > 0 246 def __eq__(self, other): 247 return mycmp(self.obj, other.obj) == 0 248 def __le__(self, other): 249 return mycmp(self.obj, other.obj) <= 0 250 def __ge__(self, other): 251 return mycmp(self.obj, other.obj) >= 0 252 def __ne__(self, other): 253 return mycmp(self.obj, other.obj) != 0 254 return K 255 256To convert to a key function, just wrap the old comparison function: 257 258.. testsetup:: 259 260 from functools import cmp_to_key 261 262.. doctest:: 263 264 >>> sorted([5, 2, 4, 1, 3], key=cmp_to_key(reverse_numeric)) 265 [5, 4, 3, 2, 1] 266 267In Python 3.2, the :func:`functools.cmp_to_key` function was added to the 268:mod:`functools` module in the standard library. 269 270Odd and Ends 271============ 272 273* For locale aware sorting, use :func:`locale.strxfrm` for a key function or 274 :func:`locale.strcoll` for a comparison function. 275 276* The *reverse* parameter still maintains sort stability (so that records with 277 equal keys retain the original order). Interestingly, that effect can be 278 simulated without the parameter by using the builtin :func:`reversed` function 279 twice: 280 281 >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)] 282 >>> standard_way = sorted(data, key=itemgetter(0), reverse=True) 283 >>> double_reversed = list(reversed(sorted(reversed(data), key=itemgetter(0)))) 284 >>> assert standard_way == double_reversed 285 >>> standard_way 286 [('red', 1), ('red', 2), ('blue', 1), ('blue', 2)] 287 288* The sort routines are guaranteed to use :meth:`__lt__` when making comparisons 289 between two objects. So, it is easy to add a standard sort order to a class by 290 defining an :meth:`__lt__` method:: 291 292 >>> Student.__lt__ = lambda self, other: self.age < other.age 293 >>> sorted(student_objects) 294 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 295 296* Key functions need not depend directly on the objects being sorted. A key 297 function can also access external resources. For instance, if the student grades 298 are stored in a dictionary, they can be used to sort a separate list of student 299 names: 300 301 >>> students = ['dave', 'john', 'jane'] 302 >>> newgrades = {'john': 'F', 'jane':'A', 'dave': 'C'} 303 >>> sorted(students, key=newgrades.__getitem__) 304 ['jane', 'dave', 'john'] 305