1.. _querying:
2
3Querying
4========
5
6This section will cover the basic CRUD operations commonly performed on a
7relational database:
8
9* :py:meth:`Model.create`, for executing *INSERT* queries.
10* :py:meth:`Model.save` and :py:meth:`Model.update`, for executing *UPDATE*
11  queries.
12* :py:meth:`Model.delete_instance` and :py:meth:`Model.delete`, for executing
13  *DELETE* queries.
14* :py:meth:`Model.select`, for executing *SELECT* queries.
15
16.. note::
17    There is also a large collection of example queries taken from the
18    `Postgresql Exercises <https://pgexercises.com/>`_ website. Examples are
19    listed on the :ref:`query examples <query_examples>` document.
20
21Creating a new record
22---------------------
23
24You can use :py:meth:`Model.create` to create a new model instance. This method
25accepts keyword arguments, where the keys correspond to the names of the
26model's fields. A new instance is returned and a row is added to the table.
27
28.. code-block:: pycon
29
30    >>> User.create(username='Charlie')
31    <__main__.User object at 0x2529350>
32
33This will *INSERT* a new row into the database. The primary key will
34automatically be retrieved and stored on the model instance.
35
36Alternatively, you can build up a model instance programmatically and then call
37:py:meth:`~Model.save`:
38
39.. code-block:: pycon
40
41    >>> user = User(username='Charlie')
42    >>> user.save()  # save() returns the number of rows modified.
43    1
44    >>> user.id
45    1
46    >>> huey = User()
47    >>> huey.username = 'Huey'
48    >>> huey.save()
49    1
50    >>> huey.id
51    2
52
53When a model has a foreign key, you can directly assign a model instance to the
54foreign key field when creating a new record.
55
56.. code-block:: pycon
57
58    >>> tweet = Tweet.create(user=huey, message='Hello!')
59
60You can also use the value of the related object's primary key:
61
62.. code-block:: pycon
63
64    >>> tweet = Tweet.create(user=2, message='Hello again!')
65
66If you simply wish to insert data and do not need to create a model instance,
67you can use :py:meth:`Model.insert`:
68
69.. code-block:: pycon
70
71    >>> User.insert(username='Mickey').execute()
72    3
73
74After executing the insert query, the primary key of the new row is returned.
75
76.. note::
77    There are several ways you can speed up bulk insert operations. Check out
78    the :ref:`bulk_inserts` recipe section for more information.
79
80.. _bulk_inserts:
81
82Bulk inserts
83------------
84
85There are a couple of ways you can load lots of data quickly. The naive
86approach is to simply call :py:meth:`Model.create` in a loop:
87
88.. code-block:: python
89
90    data_source = [
91        {'field1': 'val1-1', 'field2': 'val1-2'},
92        {'field1': 'val2-1', 'field2': 'val2-2'},
93        # ...
94    ]
95
96    for data_dict in data_source:
97        MyModel.create(**data_dict)
98
99The above approach is slow for a couple of reasons:
100
1011. If you are not wrapping the loop in a transaction then each call to
102   :py:meth:`~Model.create` happens in its own transaction. That is going to be
103   really slow!
1042. There is a decent amount of Python logic getting in your way, and each
105   :py:class:`InsertQuery` must be generated and parsed into SQL.
1063. That's a lot of data (in terms of raw bytes of SQL) you are sending to your
107   database to parse.
1084. We are retrieving the *last insert id*, which causes an additional query to
109   be executed in some cases.
110
111You can get a significant speedup by simply wrapping this in a transaction with
112:py:meth:`~Database.atomic`.
113
114.. code-block:: python
115
116    # This is much faster.
117    with db.atomic():
118        for data_dict in data_source:
119            MyModel.create(**data_dict)
120
121The above code still suffers from points 2, 3 and 4. We can get another big
122boost by using :py:meth:`~Model.insert_many`. This method accepts a list of
123tuples or dictionaries, and inserts multiple rows in a single query:
124
125.. code-block:: python
126
127    data_source = [
128        {'field1': 'val1-1', 'field2': 'val1-2'},
129        {'field1': 'val2-1', 'field2': 'val2-2'},
130        # ...
131    ]
132
133    # Fastest way to INSERT multiple rows.
134    MyModel.insert_many(data_source).execute()
135
136The :py:meth:`~Model.insert_many` method also accepts a list of row-tuples,
137provided you also specify the corresponding fields:
138
139.. code-block:: python
140
141    # We can INSERT tuples as well...
142    data = [('val1-1', 'val1-2'),
143            ('val2-1', 'val2-2'),
144            ('val3-1', 'val3-2')]
145
146    # But we need to indicate which fields the values correspond to.
147    MyModel.insert_many(data, fields=[MyModel.field1, MyModel.field2]).execute()
148
149It is also a good practice to wrap the bulk insert in a transaction:
150
151.. code-block:: python
152
153    # You can, of course, wrap this in a transaction as well:
154    with db.atomic():
155        MyModel.insert_many(data, fields=fields).execute()
156
157.. note::
158    SQLite users should be aware of some caveats when using bulk inserts.
159    Specifically, your SQLite3 version must be 3.7.11.0 or newer to take
160    advantage of the bulk insert API. Additionally, by default SQLite limits
161    the number of bound variables in a SQL query to ``999`` for SQLite versions
162    prior to 3.32.0 (2020-05-22) and 32766 for SQLite versions after 3.32.0.
163
164Inserting rows in batches
165^^^^^^^^^^^^^^^^^^^^^^^^^
166
167Depending on the number of rows in your data source, you may need to break it
168up into chunks. SQLite in particular typically has a `limit of 999 or 32766 <https://www.sqlite.org/limits.html#max_variable_number>`_
169variables-per-query (batch size would then be 999 // row length or 32766 // row length).
170
171You can write a loop to batch your data into chunks (in which case it is
172**strongly recommended** you use a transaction):
173
174.. code-block:: python
175
176    # Insert rows 100 at a time.
177    with db.atomic():
178        for idx in range(0, len(data_source), 100):
179            MyModel.insert_many(data_source[idx:idx+100]).execute()
180
181Peewee comes with a :py:func:`chunked` helper function which you can use for
182*efficiently* chunking a generic iterable into a series of *batch*-sized
183iterables:
184
185.. code-block:: python
186
187    from peewee import chunked
188
189    # Insert rows 100 at a time.
190    with db.atomic():
191        for batch in chunked(data_source, 100):
192            MyModel.insert_many(batch).execute()
193
194Alternatives
195^^^^^^^^^^^^
196
197The :py:meth:`Model.bulk_create` method behaves much like
198:py:meth:`Model.insert_many`, but instead it accepts a list of unsaved model
199instances to insert, and it optionally accepts a batch-size parameter. To use
200the :py:meth:`~Model.bulk_create` API:
201
202.. code-block:: python
203
204    # Read list of usernames from a file, for example.
205    with open('user_list.txt') as fh:
206        # Create a list of unsaved User instances.
207        users = [User(username=line.strip()) for line in fh.readlines()]
208
209    # Wrap the operation in a transaction and batch INSERT the users
210    # 100 at a time.
211    with db.atomic():
212        User.bulk_create(users, batch_size=100)
213
214.. note::
215    If you are using Postgresql (which supports the ``RETURNING`` clause), then
216    the previously-unsaved model instances will have their new primary key
217    values automatically populated.
218
219In addition, Peewee also offers :py:meth:`Model.bulk_update`, which can
220efficiently update one or more columns on a list of models. For example:
221
222.. code-block:: python
223
224    # First, create 3 users with usernames u1, u2, u3.
225    u1, u2, u3 = [User.create(username='u%s' % i) for i in (1, 2, 3)]
226
227    # Now we'll modify the user instances.
228    u1.username = 'u1-x'
229    u2.username = 'u2-y'
230    u3.username = 'u3-z'
231
232    # Update all three users with a single UPDATE query.
233    User.bulk_update([u1, u2, u3], fields=[User.username])
234
235.. note::
236    For large lists of objects, you should specify a reasonable batch_size and
237    wrap the call to :py:meth:`~Model.bulk_update` with
238    :py:meth:`Database.atomic`:
239
240    .. code-block:: python
241
242        with database.atomic():
243            User.bulk_update(list_of_users, fields=['username'], batch_size=50)
244
245Alternatively, you can use the :py:meth:`Database.batch_commit` helper to
246process chunks of rows inside *batch*-sized transactions. This method also
247provides a workaround for databases besides Postgresql, when the primary-key of
248the newly-created rows must be obtained.
249
250.. code-block:: python
251
252    # List of row data to insert.
253    row_data = [{'username': 'u1'}, {'username': 'u2'}, ...]
254
255    # Assume there are 789 items in row_data. The following code will result in
256    # 8 total transactions (7x100 rows + 1x89 rows).
257    for row in db.batch_commit(row_data, 100):
258        User.create(**row)
259
260Bulk-loading from another table
261^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
262
263If the data you would like to bulk load is stored in another table, you can
264also create *INSERT* queries whose source is a *SELECT* query. Use the
265:py:meth:`Model.insert_from` method:
266
267.. code-block:: python
268
269    res = (TweetArchive
270           .insert_from(
271               Tweet.select(Tweet.user, Tweet.message),
272               fields=[TweetArchive.user, TweetArchive.message])
273           .execute())
274
275The above query is equivalent to the following SQL:
276
277.. code-block:: sql
278
279    INSERT INTO "tweet_archive" ("user_id", "message")
280    SELECT "user_id", "message" FROM "tweet";
281
282
283Updating existing records
284-------------------------
285
286Once a model instance has a primary key, any subsequent call to
287:py:meth:`~Model.save` will result in an *UPDATE* rather than another *INSERT*.
288The model's primary key will not change:
289
290.. code-block:: pycon
291
292    >>> user.save()  # save() returns the number of rows modified.
293    1
294    >>> user.id
295    1
296    >>> user.save()
297    >>> user.id
298    1
299    >>> huey.save()
300    1
301    >>> huey.id
302    2
303
304If you want to update multiple records, issue an *UPDATE* query. The following
305example will update all ``Tweet`` objects, marking them as *published*, if they
306were created before today. :py:meth:`Model.update` accepts keyword arguments
307where the keys correspond to the model's field names:
308
309.. code-block:: pycon
310
311    >>> today = datetime.today()
312    >>> query = Tweet.update(is_published=True).where(Tweet.creation_date < today)
313    >>> query.execute()  # Returns the number of rows that were updated.
314    4
315
316For more information, see the documentation on :py:meth:`Model.update`,
317:py:class:`Update` and :py:meth:`Model.bulk_update`.
318
319.. note::
320    If you would like more information on performing atomic updates (such as
321    incrementing the value of a column), check out the :ref:`atomic update <atomic_updates>`
322    recipes.
323
324.. _atomic_updates:
325
326Atomic updates
327--------------
328
329Peewee allows you to perform atomic updates. Let's suppose we need to update
330some counters. The naive approach would be to write something like this:
331
332.. code-block:: pycon
333
334    >>> for stat in Stat.select().where(Stat.url == request.url):
335    ...     stat.counter += 1
336    ...     stat.save()
337
338**Do not do this!** Not only is this slow, but it is also vulnerable to race
339conditions if multiple processes are updating the counter at the same time.
340
341Instead, you can update the counters atomically using :py:meth:`~Model.update`:
342
343.. code-block:: pycon
344
345    >>> query = Stat.update(counter=Stat.counter + 1).where(Stat.url == request.url)
346    >>> query.execute()
347
348You can make these update statements as complex as you like. Let's give all our
349employees a bonus equal to their previous bonus plus 10% of their salary:
350
351.. code-block:: pycon
352
353    >>> query = Employee.update(bonus=(Employee.bonus + (Employee.salary * .1)))
354    >>> query.execute()  # Give everyone a bonus!
355
356We can even use a subquery to update the value of a column. Suppose we had a
357denormalized column on the ``User`` model that stored the number of tweets a
358user had made, and we updated this value periodically. Here is how you might
359write such a query:
360
361.. code-block:: pycon
362
363    >>> subquery = Tweet.select(fn.COUNT(Tweet.id)).where(Tweet.user == User.id)
364    >>> update = User.update(num_tweets=subquery)
365    >>> update.execute()
366
367Upsert
368^^^^^^
369
370Peewee provides support for varying types of upsert functionality. With SQLite
371prior to 3.24.0 and MySQL, Peewee offers the :py:meth:`~Model.replace`, which
372allows you to insert a record or, in the event of a constraint violation,
373replace the existing record.
374
375Example of using :py:meth:`~Model.replace` and :py:meth:`~Insert.on_conflict_replace`:
376
377.. code-block:: python
378
379    class User(Model):
380        username = TextField(unique=True)
381        last_login = DateTimeField(null=True)
382
383    # Insert or update the user. The "last_login" value will be updated
384    # regardless of whether the user existed previously.
385    user_id = (User
386               .replace(username='the-user', last_login=datetime.now())
387               .execute())
388
389    # This query is equivalent:
390    user_id = (User
391               .insert(username='the-user', last_login=datetime.now())
392               .on_conflict_replace()
393               .execute())
394
395.. note::
396    In addition to *replace*, SQLite, MySQL and Postgresql provide an *ignore*
397    action (see: :py:meth:`~Insert.on_conflict_ignore`) if you simply wish to
398    insert and ignore any potential constraint violation.
399
400**MySQL** supports upsert via the *ON DUPLICATE KEY UPDATE* clause. For
401example:
402
403.. code-block:: python
404
405    class User(Model):
406        username = TextField(unique=True)
407        last_login = DateTimeField(null=True)
408        login_count = IntegerField()
409
410    # Insert a new user.
411    User.create(username='huey', login_count=0)
412
413    # Simulate the user logging in. The login count and timestamp will be
414    # either created or updated correctly.
415    now = datetime.now()
416    rowid = (User
417             .insert(username='huey', last_login=now, login_count=1)
418             .on_conflict(
419                 preserve=[User.last_login],  # Use the value we would have inserted.
420                 update={User.login_count: User.login_count + 1})
421             .execute())
422
423In the above example, we could safely invoke the upsert query as many times as
424we wanted. The login count will be incremented atomically, the last login
425column will be updated, and no duplicate rows will be created.
426
427**Postgresql and SQLite** (3.24.0 and newer) provide a different syntax that
428allows for more granular control over which constraint violation should trigger
429the conflict resolution, and what values should be updated or preserved.
430
431Example of using :py:meth:`~Insert.on_conflict` to perform a Postgresql-style
432upsert (or SQLite 3.24+):
433
434.. code-block:: python
435
436    class User(Model):
437        username = TextField(unique=True)
438        last_login = DateTimeField(null=True)
439        login_count = IntegerField()
440
441    # Insert a new user.
442    User.create(username='huey', login_count=0)
443
444    # Simulate the user logging in. The login count and timestamp will be
445    # either created or updated correctly.
446    now = datetime.now()
447    rowid = (User
448             .insert(username='huey', last_login=now, login_count=1)
449             .on_conflict(
450                 conflict_target=[User.username],  # Which constraint?
451                 preserve=[User.last_login],  # Use the value we would have inserted.
452                 update={User.login_count: User.login_count + 1})
453             .execute())
454
455In the above example, we could safely invoke the upsert query as many times as
456we wanted. The login count will be incremented atomically, the last login
457column will be updated, and no duplicate rows will be created.
458
459.. note::
460    The main difference between MySQL and Postgresql/SQLite is that Postgresql
461    and SQLite require that you specify a ``conflict_target``.
462
463Here is a more advanced (if contrived) example using the :py:class:`EXCLUDED`
464namespace. The :py:class:`EXCLUDED` helper allows us to reference values in the
465conflicting data. For our example, we'll assume a simple table mapping a unique
466key (string) to a value (integer):
467
468.. code-block:: python
469
470    class KV(Model):
471        key = CharField(unique=True)
472        value = IntegerField()
473
474    # Create one row.
475    KV.create(key='k1', value=1)
476
477    # Demonstrate usage of EXCLUDED.
478    # Here we will attempt to insert a new value for a given key. If that
479    # key already exists, then we will update its value with the *sum* of its
480    # original value and the value we attempted to insert -- provided that
481    # the new value is larger than the original value.
482    query = (KV.insert(key='k1', value=10)
483             .on_conflict(conflict_target=[KV.key],
484                          update={KV.value: KV.value + EXCLUDED.value},
485                          where=(EXCLUDED.value > KV.value)))
486
487    # Executing the above query will result in the following data being
488    # present in the "kv" table:
489    # (key='k1', value=11)
490    query.execute()
491
492    # If we attempted to execute the query *again*, then nothing would be
493    # updated, as the new value (10) is now less than the value in the
494    # original row (11).
495
496For more information, see :py:meth:`Insert.on_conflict` and
497:py:class:`OnConflict`.
498
499Deleting records
500----------------
501
502To delete a single model instance, you can use the
503:py:meth:`Model.delete_instance` shortcut. :py:meth:`~Model.delete_instance`
504will delete the given model instance and can optionally delete any dependent
505objects recursively (by specifying `recursive=True`).
506
507.. code-block:: pycon
508
509    >>> user = User.get(User.id == 1)
510    >>> user.delete_instance()  # Returns the number of rows deleted.
511    1
512
513    >>> User.get(User.id == 1)
514    UserDoesNotExist: instance matching query does not exist:
515    SQL: SELECT t1."id", t1."username" FROM "user" AS t1 WHERE t1."id" = ?
516    PARAMS: [1]
517
518To delete an arbitrary set of rows, you can issue a *DELETE* query. The
519following will delete all ``Tweet`` objects that are over one year old:
520
521.. code-block:: pycon
522
523    >>> query = Tweet.delete().where(Tweet.creation_date < one_year_ago)
524    >>> query.execute()  # Returns the number of rows deleted.
525    7
526
527For more information, see the documentation on:
528
529* :py:meth:`Model.delete_instance`
530* :py:meth:`Model.delete`
531* :py:class:`DeleteQuery`
532
533Selecting a single record
534-------------------------
535
536You can use the :py:meth:`Model.get` method to retrieve a single instance
537matching the given query. For primary-key lookups, you can also use the
538shortcut method :py:meth:`Model.get_by_id`.
539
540This method is a shortcut that calls :py:meth:`Model.select` with the given
541query, but limits the result set to a single row. Additionally, if no model
542matches the given query, a ``DoesNotExist`` exception will be raised.
543
544.. code-block:: pycon
545
546    >>> User.get(User.id == 1)
547    <__main__.User object at 0x25294d0>
548
549    >>> User.get_by_id(1)  # Same as above.
550    <__main__.User object at 0x252df10>
551
552    >>> User[1]  # Also same as above.
553    <__main__.User object at 0x252dd10>
554
555    >>> User.get(User.id == 1).username
556    u'Charlie'
557
558    >>> User.get(User.username == 'Charlie')
559    <__main__.User object at 0x2529410>
560
561    >>> User.get(User.username == 'nobody')
562    UserDoesNotExist: instance matching query does not exist:
563    SQL: SELECT t1."id", t1."username" FROM "user" AS t1 WHERE t1."username" = ?
564    PARAMS: ['nobody']
565
566For more advanced operations, you can use :py:meth:`SelectBase.get`. The
567following query retrieves the latest tweet from the user named *charlie*:
568
569.. code-block:: pycon
570
571    >>> (Tweet
572    ...  .select()
573    ...  .join(User)
574    ...  .where(User.username == 'charlie')
575    ...  .order_by(Tweet.created_date.desc())
576    ...  .get())
577    <__main__.Tweet object at 0x2623410>
578
579For more information, see the documentation on:
580
581* :py:meth:`Model.get`
582* :py:meth:`Model.get_by_id`
583* :py:meth:`Model.get_or_none` - if no matching row is found, return ``None``.
584* :py:meth:`Model.first`
585* :py:meth:`Model.select`
586* :py:meth:`SelectBase.get`
587
588Create or get
589-------------
590
591Peewee has one helper method for performing "get/create" type operations:
592:py:meth:`Model.get_or_create`, which first attempts to retrieve the matching
593row. Failing that, a new row will be created.
594
595For "create or get" type logic, typically one would rely on a *unique*
596constraint or primary key to prevent the creation of duplicate objects. As an
597example, let's say we wish to implement registering a new user account using
598the :ref:`example User model <blog-models>`. The *User* model has a *unique*
599constraint on the username field, so we will rely on the database's integrity
600guarantees to ensure we don't end up with duplicate usernames:
601
602.. code-block:: python
603
604    try:
605        with db.atomic():
606            return User.create(username=username)
607    except peewee.IntegrityError:
608        # `username` is a unique column, so this username already exists,
609        # making it safe to call .get().
610        return User.get(User.username == username)
611
612You can easily encapsulate this type of logic as a ``classmethod`` on your own
613``Model`` classes.
614
615The above example first attempts at creation, then falls back to retrieval,
616relying on the database to enforce a unique constraint. If you prefer to
617attempt to retrieve the record first, you can use
618:py:meth:`~Model.get_or_create`. This method is implemented along the same
619lines as the Django function of the same name. You can use the Django-style
620keyword argument filters to specify your ``WHERE`` conditions. The function
621returns a 2-tuple containing the instance and a boolean value indicating if the
622object was created.
623
624Here is how you might implement user account creation using
625:py:meth:`~Model.get_or_create`:
626
627.. code-block:: python
628
629    user, created = User.get_or_create(username=username)
630
631Suppose we have a different model ``Person`` and would like to get or create a
632person object. The only conditions we care about when retrieving the ``Person``
633are their first and last names, **but** if we end up needing to create a new
634record, we will also specify their date-of-birth and favorite color:
635
636.. code-block:: python
637
638    person, created = Person.get_or_create(
639        first_name=first_name,
640        last_name=last_name,
641        defaults={'dob': dob, 'favorite_color': 'green'})
642
643Any keyword argument passed to :py:meth:`~Model.get_or_create` will be used in
644the ``get()`` portion of the logic, except for the ``defaults`` dictionary,
645which will be used to populate values on newly-created instances.
646
647For more details read the documentation for :py:meth:`Model.get_or_create`.
648
649Selecting multiple records
650--------------------------
651
652We can use :py:meth:`Model.select` to retrieve rows from the table. When you
653construct a *SELECT* query, the database will return any rows that correspond
654to your query. Peewee allows you to iterate over these rows, as well as use
655indexing and slicing operations:
656
657.. code-block:: pycon
658
659    >>> query = User.select()
660    >>> [user.username for user in query]
661    ['Charlie', 'Huey', 'Peewee']
662
663    >>> query[1]
664    <__main__.User at 0x7f83e80f5550>
665
666    >>> query[1].username
667    'Huey'
668
669    >>> query[:2]
670    [<__main__.User at 0x7f83e80f53a8>, <__main__.User at 0x7f83e80f5550>]
671
672:py:class:`Select` queries are smart, in that you can iterate, index and slice
673the query multiple times but the query is only executed once.
674
675In the following example, we will simply call :py:meth:`~Model.select` and
676iterate over the return value, which is an instance of :py:class:`Select`.
677This will return all the rows in the *User* table:
678
679.. code-block:: pycon
680
681    >>> for user in User.select():
682    ...     print user.username
683    ...
684    Charlie
685    Huey
686    Peewee
687
688.. note::
689    Subsequent iterations of the same query will not hit the database as the
690    results are cached. To disable this behavior (to reduce memory usage), call
691    :py:meth:`Select.iterator` when iterating.
692
693When iterating over a model that contains a foreign key, be careful with the
694way you access values on related models. Accidentally resolving a foreign key
695or iterating over a back-reference can cause :ref:`N+1 query behavior <nplusone>`.
696
697When you create a foreign key, such as ``Tweet.user``, you can use the
698*backref* to create a back-reference (``User.tweets``). Back-references
699are exposed as :py:class:`Select` instances:
700
701.. code-block:: pycon
702
703    >>> tweet = Tweet.get()
704    >>> tweet.user  # Accessing a foreign key returns the related model.
705    <tw.User at 0x7f3ceb017f50>
706
707    >>> user = User.get()
708    >>> user.tweets  # Accessing a back-reference returns a query.
709    <peewee.ModelSelect at 0x7f73db3bafd0>
710
711You can iterate over the ``user.tweets`` back-reference just like any other
712:py:class:`Select`:
713
714.. code-block:: pycon
715
716    >>> for tweet in user.tweets:
717    ...     print(tweet.message)
718    ...
719    hello world
720    this is fun
721    look at this picture of my food
722
723In addition to returning model instances, :py:class:`Select` queries can return
724dictionaries, tuples and namedtuples. Depending on your use-case, you may find
725it easier to work with rows as dictionaries, for example:
726
727.. code-block:: pycon
728
729    >>> query = User.select().dicts()
730    >>> for row in query:
731    ...     print(row)
732
733    {'id': 1, 'username': 'Charlie'}
734    {'id': 2, 'username': 'Huey'}
735    {'id': 3, 'username': 'Peewee'}
736
737See :py:meth:`~BaseQuery.namedtuples`, :py:meth:`~BaseQuery.tuples`,
738:py:meth:`~BaseQuery.dicts` for more information.
739
740Iterating over large result-sets
741^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
742
743By default peewee will cache the rows returned when iterating over a
744:py:class:`Select` query. This is an optimization to allow multiple iterations
745as well as indexing and slicing without causing additional queries. This
746caching can be problematic, however, when you plan to iterate over a large
747number of rows.
748
749To reduce the amount of memory used by peewee when iterating over a query, use
750the :py:meth:`~BaseQuery.iterator` method. This method allows you to iterate
751without caching each model returned, using much less memory when iterating over
752large result sets.
753
754.. code-block:: python
755
756    # Let's assume we've got 10 million stat objects to dump to a csv file.
757    stats = Stat.select()
758
759    # Our imaginary serializer class
760    serializer = CSVSerializer()
761
762    # Loop over all the stats and serialize.
763    for stat in stats.iterator():
764        serializer.serialize_object(stat)
765
766For simple queries you can see further speed improvements by returning rows as
767dictionaries, namedtuples or tuples. The following methods can be used on any
768:py:class:`Select` query to change the result row type:
769
770* :py:meth:`~BaseQuery.dicts`
771* :py:meth:`~BaseQuery.namedtuples`
772* :py:meth:`~BaseQuery.tuples`
773
774Don't forget to append the :py:meth:`~BaseQuery.iterator` method call to also
775reduce memory consumption. For example, the above code might look like:
776
777.. code-block:: python
778
779    # Let's assume we've got 10 million stat objects to dump to a csv file.
780    stats = Stat.select()
781
782    # Our imaginary serializer class
783    serializer = CSVSerializer()
784
785    # Loop over all the stats (rendered as tuples, without caching) and serialize.
786    for stat_tuple in stats.tuples().iterator():
787        serializer.serialize_tuple(stat_tuple)
788
789When iterating over a large number of rows that contain columns from multiple
790tables, peewee will reconstruct the model graph for each row returned. This
791operation can be slow for complex graphs. For example, if we were selecting a
792list of tweets along with the username and avatar of the tweet's author, Peewee
793would have to create two objects for each row (a tweet and a user). In addition
794to the above row-types, there is a fourth method :py:meth:`~BaseQuery.objects`
795which will return the rows as model instances, but will not attempt to resolve
796the model graph.
797
798For example:
799
800.. code-block:: python
801
802    query = (Tweet
803             .select(Tweet, User)  # Select tweet and user data.
804             .join(User))
805
806    # Note that the user columns are stored in a separate User instance
807    # accessible at tweet.user:
808    for tweet in query:
809        print(tweet.user.username, tweet.content)
810
811    # Using ".objects()" will not create the tweet.user object and assigns all
812    # user attributes to the tweet instance:
813    for tweet in query.objects():
814        print(tweet.username, tweet.content)
815
816For maximum performance, you can execute queries and then iterate over the
817results using the underlying database cursor. :py:meth:`Database.execute`
818accepts a query object, executes the query, and returns a DB-API 2.0 ``Cursor``
819object. The cursor will return the raw row-tuples:
820
821.. code-block:: python
822
823    query = Tweet.select(Tweet.content, User.username).join(User)
824    cursor = database.execute(query)
825    for (content, username) in cursor:
826        print(username, '->', content)
827
828Filtering records
829-----------------
830
831You can filter for particular records using normal python operators. Peewee
832supports a wide variety of :ref:`query operators <query-operators>`.
833
834.. code-block:: pycon
835
836    >>> user = User.get(User.username == 'Charlie')
837    >>> for tweet in Tweet.select().where(Tweet.user == user, Tweet.is_published == True):
838    ...     print(tweet.user.username, '->', tweet.message)
839    ...
840    Charlie -> hello world
841    Charlie -> this is fun
842
843    >>> for tweet in Tweet.select().where(Tweet.created_date < datetime.datetime(2011, 1, 1)):
844    ...     print(tweet.message, tweet.created_date)
845    ...
846    Really old tweet 2010-01-01 00:00:00
847
848You can also filter across joins:
849
850.. code-block:: pycon
851
852    >>> for tweet in Tweet.select().join(User).where(User.username == 'Charlie'):
853    ...     print(tweet.message)
854    hello world
855    this is fun
856    look at this picture of my food
857
858If you want to express a complex query, use parentheses and python's bitwise
859*or* and *and* operators:
860
861.. code-block:: pycon
862
863    >>> Tweet.select().join(User).where(
864    ...     (User.username == 'Charlie') |
865    ...     (User.username == 'Peewee Herman'))
866
867.. note::
868    Note that Peewee uses **bitwise** operators (``&`` and ``|``) rather than
869    logical operators (``and`` and ``or``). The reason for this is that Python
870    coerces the return value of logical operations to a boolean value. This is
871    also the reason why "IN" queries must be expressed using ``.in_()`` rather
872    than the ``in`` operator.
873
874Check out :ref:`the table of query operations <query-operators>` to see what
875types of queries are possible.
876
877.. note::
878
879    A lot of fun things can go in the where clause of a query, such as:
880
881    * A field expression, e.g. ``User.username == 'Charlie'``
882    * A function expression, e.g. ``fn.Lower(fn.Substr(User.username, 1, 1)) == 'a'``
883    * A comparison of one column to another, e.g. ``Employee.salary < (Employee.tenure * 1000) + 40000``
884
885    You can also nest queries, for example tweets by users whose username
886    starts with "a":
887
888    .. code-block:: python
889
890        # get users whose username starts with "a"
891        a_users = User.select().where(fn.Lower(fn.Substr(User.username, 1, 1)) == 'a')
892
893        # the ".in_()" method signifies an "IN" query
894        a_user_tweets = Tweet.select().where(Tweet.user.in_(a_users))
895
896More query examples
897^^^^^^^^^^^^^^^^^^^
898
899.. note::
900    For a wide range of example queries, see the :ref:`Query Examples <query_examples>`
901    document, which shows how to implements queries from the `PostgreSQL Exercises <https://pgexercises.com/>`_
902    website.
903
904Get active users:
905
906.. code-block:: python
907
908    User.select().where(User.active == True)
909
910Get users who are either staff or superusers:
911
912.. code-block:: python
913
914    User.select().where(
915        (User.is_staff == True) | (User.is_superuser == True))
916
917Get tweets by user named "charlie":
918
919.. code-block:: python
920
921    Tweet.select().join(User).where(User.username == 'charlie')
922
923Get tweets by staff or superusers (assumes FK relationship):
924
925.. code-block:: python
926
927    Tweet.select().join(User).where(
928        (User.is_staff == True) | (User.is_superuser == True))
929
930Get tweets by staff or superusers using a subquery:
931
932.. code-block:: python
933
934    staff_super = User.select(User.id).where(
935        (User.is_staff == True) | (User.is_superuser == True))
936    Tweet.select().where(Tweet.user.in_(staff_super))
937
938Sorting records
939---------------
940
941To return rows in order, use the :py:meth:`~Query.order_by` method:
942
943.. code-block:: pycon
944
945    >>> for t in Tweet.select().order_by(Tweet.created_date):
946    ...     print(t.pub_date)
947    ...
948    2010-01-01 00:00:00
949    2011-06-07 14:08:48
950    2011-06-07 14:12:57
951
952    >>> for t in Tweet.select().order_by(Tweet.created_date.desc()):
953    ...     print(t.pub_date)
954    ...
955    2011-06-07 14:12:57
956    2011-06-07 14:08:48
957    2010-01-01 00:00:00
958
959You can also use ``+`` and ``-`` prefix operators to indicate ordering:
960
961.. code-block:: python
962
963    # The following queries are equivalent:
964    Tweet.select().order_by(Tweet.created_date.desc())
965
966    Tweet.select().order_by(-Tweet.created_date)  # Note the "-" prefix.
967
968    # Similarly you can use "+" to indicate ascending order, though ascending
969    # is the default when no ordering is otherwise specified.
970    User.select().order_by(+User.username)
971
972You can also order across joins. Assuming you want to order tweets by the
973username of the author, then by created_date:
974
975.. code-block:: pycon
976
977    query = (Tweet
978             .select()
979             .join(User)
980             .order_by(User.username, Tweet.created_date.desc()))
981
982.. code-block:: sql
983
984    SELECT t1."id", t1."user_id", t1."message", t1."is_published", t1."created_date"
985    FROM "tweet" AS t1
986    INNER JOIN "user" AS t2
987      ON t1."user_id" = t2."id"
988    ORDER BY t2."username", t1."created_date" DESC
989
990When sorting on a calculated value, you can either include the necessary SQL
991expressions, or reference the alias assigned to the value. Here are two
992examples illustrating these methods:
993
994.. code-block:: python
995
996    # Let's start with our base query. We want to get all usernames and the number of
997    # tweets they've made. We wish to sort this list from users with most tweets to
998    # users with fewest tweets.
999    query = (User
1000             .select(User.username, fn.COUNT(Tweet.id).alias('num_tweets'))
1001             .join(Tweet, JOIN.LEFT_OUTER)
1002             .group_by(User.username))
1003
1004You can order using the same COUNT expression used in the ``select`` clause. In
1005the example below we are ordering by the ``COUNT()`` of tweet ids descending:
1006
1007.. code-block:: python
1008
1009    query = (User
1010             .select(User.username, fn.COUNT(Tweet.id).alias('num_tweets'))
1011             .join(Tweet, JOIN.LEFT_OUTER)
1012             .group_by(User.username)
1013             .order_by(fn.COUNT(Tweet.id).desc()))
1014
1015Alternatively, you can reference the alias assigned to the calculated value in
1016the ``select`` clause. This method has the benefit of being a bit easier to
1017read. Note that we are not referring to the named alias directly, but are
1018wrapping it using the :py:class:`SQL` helper:
1019
1020.. code-block:: python
1021
1022    query = (User
1023             .select(User.username, fn.COUNT(Tweet.id).alias('num_tweets'))
1024             .join(Tweet, JOIN.LEFT_OUTER)
1025             .group_by(User.username)
1026             .order_by(SQL('num_tweets').desc()))
1027
1028Or, to do things the "peewee" way:
1029
1030.. code-block:: python
1031
1032    ntweets = fn.COUNT(Tweet.id)
1033    query = (User
1034             .select(User.username, ntweets.alias('num_tweets'))
1035             .join(Tweet, JOIN.LEFT_OUTER)
1036             .group_by(User.username)
1037             .order_by(ntweets.desc())
1038
1039Getting random records
1040----------------------
1041
1042Occasionally you may want to pull a random record from the database. You can
1043accomplish this by ordering by the *random* or *rand* function (depending on
1044your database):
1045
1046Postgresql and Sqlite use the *Random* function:
1047
1048.. code-block:: python
1049
1050    # Pick 5 lucky winners:
1051    LotteryNumber.select().order_by(fn.Random()).limit(5)
1052
1053MySQL uses *Rand*:
1054
1055.. code-block:: python
1056
1057    # Pick 5 lucky winners:
1058    LotteryNumber.select().order_by(fn.Rand()).limit(5)
1059
1060Paginating records
1061------------------
1062
1063The :py:meth:`~Query.paginate` method makes it easy to grab a *page* or
1064records. :py:meth:`~Query.paginate` takes two parameters,
1065``page_number``, and ``items_per_page``.
1066
1067.. attention::
1068    Page numbers are 1-based, so the first page of results will be page 1.
1069
1070.. code-block:: pycon
1071
1072    >>> for tweet in Tweet.select().order_by(Tweet.id).paginate(2, 10):
1073    ...     print(tweet.message)
1074    ...
1075    tweet 10
1076    tweet 11
1077    tweet 12
1078    tweet 13
1079    tweet 14
1080    tweet 15
1081    tweet 16
1082    tweet 17
1083    tweet 18
1084    tweet 19
1085
1086If you would like more granular control, you can always use
1087:py:meth:`~Query.limit` and :py:meth:`~Query.offset`.
1088
1089Counting records
1090----------------
1091
1092You can count the number of rows in any select query:
1093
1094.. code-block:: python
1095
1096    >>> Tweet.select().count()
1097    100
1098    >>> Tweet.select().where(Tweet.id > 50).count()
1099    50
1100
1101Peewee will wrap your query in an outer query that performs a count, which
1102results in SQL like:
1103
1104.. code-block:: sql
1105
1106    SELECT COUNT(1) FROM ( ... your query ... );
1107
1108Aggregating records
1109-------------------
1110
1111Suppose you have some users and want to get a list of them along with the count
1112of tweets in each.
1113
1114.. code-block:: python
1115
1116    query = (User
1117             .select(User, fn.Count(Tweet.id).alias('count'))
1118             .join(Tweet, JOIN.LEFT_OUTER)
1119             .group_by(User))
1120
1121The resulting query will return *User* objects with all their normal attributes
1122plus an additional attribute *count* which will contain the count of tweets for
1123each user. We use a left outer join to include users who have no tweets.
1124
1125Let's assume you have a tagging application and want to find tags that have a
1126certain number of related objects. For this example we'll use some different
1127models in a :ref:`many-to-many <manytomany>` configuration:
1128
1129.. code-block:: python
1130
1131    class Photo(Model):
1132        image = CharField()
1133
1134    class Tag(Model):
1135        name = CharField()
1136
1137    class PhotoTag(Model):
1138        photo = ForeignKeyField(Photo)
1139        tag = ForeignKeyField(Tag)
1140
1141Now say we want to find tags that have at least 5 photos associated with them:
1142
1143.. code-block:: python
1144
1145    query = (Tag
1146             .select()
1147             .join(PhotoTag)
1148             .join(Photo)
1149             .group_by(Tag)
1150             .having(fn.Count(Photo.id) > 5))
1151
1152This query is equivalent to the following SQL:
1153
1154.. code-block:: sql
1155
1156    SELECT t1."id", t1."name"
1157    FROM "tag" AS t1
1158    INNER JOIN "phototag" AS t2 ON t1."id" = t2."tag_id"
1159    INNER JOIN "photo" AS t3 ON t2."photo_id" = t3."id"
1160    GROUP BY t1."id", t1."name"
1161    HAVING Count(t3."id") > 5
1162
1163Suppose we want to grab the associated count and store it on the tag:
1164
1165.. code-block:: python
1166
1167    query = (Tag
1168             .select(Tag, fn.Count(Photo.id).alias('count'))
1169             .join(PhotoTag)
1170             .join(Photo)
1171             .group_by(Tag)
1172             .having(fn.Count(Photo.id) > 5))
1173
1174Retrieving Scalar Values
1175------------------------
1176
1177You can retrieve scalar values by calling :py:meth:`Query.scalar`. For
1178instance:
1179
1180.. code-block:: python
1181
1182    >>> PageView.select(fn.Count(fn.Distinct(PageView.url))).scalar()
1183    100
1184
1185You can retrieve multiple scalar values by passing ``as_tuple=True``:
1186
1187.. code-block:: python
1188
1189    >>> Employee.select(
1190    ...     fn.Min(Employee.salary), fn.Max(Employee.salary)
1191    ... ).scalar(as_tuple=True)
1192    (30000, 50000)
1193
1194.. _window-functions:
1195
1196Window functions
1197----------------
1198
1199A :py:class:`Window` function refers to an aggregate function that operates on
1200a sliding window of data that is being processed as part of a ``SELECT`` query.
1201Window functions make it possible to do things like:
1202
12031. Perform aggregations against subsets of a result-set.
12042. Calculate a running total.
12053. Rank results.
12064. Compare a row value to a value in the preceding (or succeeding!) row(s).
1207
1208peewee comes with support for SQL window functions, which can be created by
1209calling :py:meth:`Function.over` and passing in your partitioning or ordering
1210parameters.
1211
1212For the following examples, we'll use the following model and sample data:
1213
1214.. code-block:: python
1215
1216    class Sample(Model):
1217        counter = IntegerField()
1218        value = FloatField()
1219
1220    data = [(1, 10),
1221            (1, 20),
1222            (2, 1),
1223            (2, 3),
1224            (3, 100)]
1225    Sample.insert_many(data, fields=[Sample.counter, Sample.value]).execute()
1226
1227Our sample table now contains:
1228
1229=== ======== ======
1230id  counter  value
1231=== ======== ======
12321   1        10.0
12332   1        20.0
12343   2        1.0
12354   2        3.0
12365   3        100.0
1237=== ======== ======
1238
1239Ordered Windows
1240^^^^^^^^^^^^^^^
1241
1242Let's calculate a running sum of the ``value`` field. In order for it to be a
1243"running" sum, we need it to be ordered, so we'll order with respect to the
1244Sample's ``id`` field:
1245
1246.. code-block:: python
1247
1248    query = Sample.select(
1249        Sample.counter,
1250        Sample.value,
1251        fn.SUM(Sample.value).over(order_by=[Sample.id]).alias('total'))
1252
1253    for sample in query:
1254        print(sample.counter, sample.value, sample.total)
1255
1256    # 1    10.    10.
1257    # 1    20.    30.
1258    # 2     1.    31.
1259    # 2     3.    34.
1260    # 3   100    134.
1261
1262For another example, we'll calculate the difference between the current value
1263and the previous value, when ordered by the ``id``:
1264
1265.. code-block:: python
1266
1267    difference = Sample.value - fn.LAG(Sample.value, 1).over(order_by=[Sample.id])
1268    query = Sample.select(
1269        Sample.counter,
1270        Sample.value,
1271        difference.alias('diff'))
1272
1273    for sample in query:
1274        print(sample.counter, sample.value, sample.diff)
1275
1276    # 1    10.   NULL
1277    # 1    20.    10.  -- (20 - 10)
1278    # 2     1.   -19.  -- (1 - 20)
1279    # 2     3.     2.  -- (3 - 1)
1280    # 3   100     97.  -- (100 - 3)
1281
1282Partitioned Windows
1283^^^^^^^^^^^^^^^^^^^
1284
1285Let's calculate the average ``value`` for each distinct "counter" value. Notice
1286that there are three possible values for the ``counter`` field (1, 2, and 3).
1287We can do this by calculating the ``AVG()`` of the ``value`` column over a
1288window that is partitioned depending on the ``counter`` field:
1289
1290.. code-block:: python
1291
1292    query = Sample.select(
1293        Sample.counter,
1294        Sample.value,
1295        fn.AVG(Sample.value).over(partition_by=[Sample.counter]).alias('cavg'))
1296
1297    for sample in query:
1298        print(sample.counter, sample.value, sample.cavg)
1299
1300    # 1    10.    15.
1301    # 1    20.    15.
1302    # 2     1.     2.
1303    # 2     3.     2.
1304    # 3   100    100.
1305
1306We can use ordering within partitions by specifying both the ``order_by`` and
1307``partition_by`` parameters. For an example, let's rank the samples by value
1308within each distinct ``counter`` group.
1309
1310.. code-block:: python
1311
1312    query = Sample.select(
1313        Sample.counter,
1314        Sample.value,
1315        fn.RANK().over(
1316            order_by=[Sample.value],
1317            partition_by=[Sample.counter]).alias('rank'))
1318
1319    for sample in query:
1320        print(sample.counter, sample.value, sample.rank)
1321
1322    # 1    10.    1
1323    # 1    20.    2
1324    # 2     1.    1
1325    # 2     3.    2
1326    # 3   100     1
1327
1328Bounded windows
1329^^^^^^^^^^^^^^^
1330
1331By default, window functions are evaluated using an *unbounded preceding* start
1332for the window, and the *current row* as the end. We can change the bounds of
1333the window our aggregate functions operate on by specifying a ``start`` and/or
1334``end`` in the call to :py:meth:`Function.over`. Additionally, Peewee comes
1335with helper-methods on the :py:class:`Window` object for generating the
1336appropriate boundary references:
1337
1338* :py:attr:`Window.CURRENT_ROW` - attribute that references the current row.
1339* :py:meth:`Window.preceding` - specify number of row(s) preceding, or omit
1340  number to indicate **all** preceding rows.
1341* :py:meth:`Window.following` - specify number of row(s) following, or omit
1342  number to indicate **all** following rows.
1343
1344To examine how boundaries work, we'll calculate a running total of the
1345``value`` column, ordered with respect to ``id``, **but** we'll only look the
1346running total of the current row and it's two preceding rows:
1347
1348.. code-block:: python
1349
1350    query = Sample.select(
1351        Sample.counter,
1352        Sample.value,
1353        fn.SUM(Sample.value).over(
1354            order_by=[Sample.id],
1355            start=Window.preceding(2),
1356            end=Window.CURRENT_ROW).alias('rsum'))
1357
1358    for sample in query:
1359        print(sample.counter, sample.value, sample.rsum)
1360
1361    # 1    10.    10.
1362    # 1    20.    30.  -- (20 + 10)
1363    # 2     1.    31.  -- (1 + 20 + 10)
1364    # 2     3.    24.  -- (3 + 1 + 20)
1365    # 3   100    104.  -- (100 + 3 + 1)
1366
1367.. note::
1368    Technically we did not need to specify the ``end=Window.CURRENT`` because
1369    that is the default. It was shown in the example for demonstration.
1370
1371Let's look at another example. In this example we will calculate the "opposite"
1372of a running total, in which the total sum of all values is decreased by the
1373value of the samples, ordered by ``id``. To accomplish this, we'll calculate
1374the sum from the current row to the last row.
1375
1376.. code-block:: python
1377
1378    query = Sample.select(
1379        Sample.counter,
1380        Sample.value,
1381        fn.SUM(Sample.value).over(
1382            order_by=[Sample.id],
1383            start=Window.CURRENT_ROW,
1384            end=Window.following()).alias('rsum'))
1385
1386    # 1    10.   134.  -- (10 + 20 + 1 + 3 + 100)
1387    # 1    20.   124.  -- (20 + 1 + 3 + 100)
1388    # 2     1.   104.  -- (1 + 3 + 100)
1389    # 2     3.   103.  -- (3 + 100)
1390    # 3   100    100.  -- (100)
1391
1392Filtered Aggregates
1393^^^^^^^^^^^^^^^^^^^
1394
1395Aggregate functions may also support filter functions (Postgres and Sqlite
13963.25+), which get translated into a ``FILTER (WHERE...)`` clause. Filter
1397expressions are added to an aggregate function with the
1398:py:meth:`Function.filter` method.
1399
1400For an example, we will calculate the running sum of the ``value`` field with
1401respect to the ``id``, but we will filter-out any samples whose ``counter=2``.
1402
1403.. code-block:: python
1404
1405    query = Sample.select(
1406        Sample.counter,
1407        Sample.value,
1408        fn.SUM(Sample.value).filter(Sample.counter != 2).over(
1409            order_by=[Sample.id]).alias('csum'))
1410
1411    for sample in query:
1412        print(sample.counter, sample.value, sample.csum)
1413
1414    # 1    10.    10.
1415    # 1    20.    30.
1416    # 2     1.    30.
1417    # 2     3.    30.
1418    # 3   100    130.
1419
1420.. note::
1421    The call to :py:meth:`~Function.filter` must precede the call to
1422    :py:meth:`~Function.over`.
1423
1424Reusing Window Definitions
1425^^^^^^^^^^^^^^^^^^^^^^^^^^
1426
1427If you intend to use the same window definition for multiple aggregates, you
1428can create a :py:class:`Window` object. The :py:class:`Window` object takes the
1429same parameters as :py:meth:`Function.over`, and can be passed to the
1430``over()`` method in-place of the individual parameters.
1431
1432Here we'll declare a single window, ordered with respect to the sample ``id``,
1433and call several window functions using that window definition:
1434
1435.. code-block:: python
1436
1437    win = Window(order_by=[Sample.id])
1438    query = Sample.select(
1439        Sample.counter,
1440        Sample.value,
1441        fn.LEAD(Sample.value).over(win),
1442        fn.LAG(Sample.value).over(win),
1443        fn.SUM(Sample.value).over(win)
1444    ).window(win)  # Include our window definition in query.
1445
1446    for row in query.tuples():
1447        print(row)
1448
1449    # counter  value  lead()  lag()  sum()
1450    # 1          10.     20.   NULL    10.
1451    # 1          20.      1.    10.    30.
1452    # 2           1.      3.    20.    31.
1453    # 2           3.    100.     1.    34.
1454    # 3         100.    NULL     3.   134.
1455
1456Multiple window definitions
1457^^^^^^^^^^^^^^^^^^^^^^^^^^^
1458
1459In the previous example, we saw how to declare a :py:class:`Window` definition
1460and re-use it for multiple different aggregations. You can include as many
1461window definitions as you need in your queries, but it is necessary to ensure
1462each window has a unique alias:
1463
1464.. code-block:: python
1465
1466    w1 = Window(order_by=[Sample.id]).alias('w1')
1467    w2 = Window(partition_by=[Sample.counter]).alias('w2')
1468    query = Sample.select(
1469        Sample.counter,
1470        Sample.value,
1471        fn.SUM(Sample.value).over(w1).alias('rsum'),  # Running total.
1472        fn.AVG(Sample.value).over(w2).alias('cavg')   # Avg per category.
1473    ).window(w1, w2)  # Include our window definitions.
1474
1475    for sample in query:
1476        print(sample.counter, sample.value, sample.rsum, sample.cavg)
1477
1478    # counter  value   rsum     cavg
1479    # 1          10.     10.     15.
1480    # 1          20.     30.     15.
1481    # 2           1.     31.      2.
1482    # 2           3.     34.      2.
1483    # 3         100     134.    100.
1484
1485Similarly, if you have multiple window definitions that share similar
1486definitions, it is possible to extend a previously-defined window definition.
1487For example, here we will be partitioning the data-set by the counter value, so
1488we'll be doing our aggregations with respect to the counter. Then we'll define
1489a second window that extends this partitioning, and adds an ordering clause:
1490
1491.. code-block:: python
1492
1493    w1 = Window(partition_by=[Sample.counter]).alias('w1')
1494
1495    # By extending w1, this window definition will also be partitioned
1496    # by "counter".
1497    w2 = Window(extends=w1, order_by=[Sample.value.desc()]).alias('w2')
1498
1499    query = (Sample
1500             .select(Sample.counter, Sample.value,
1501                     fn.SUM(Sample.value).over(w1).alias('group_sum'),
1502                     fn.RANK().over(w2).alias('revrank'))
1503             .window(w1, w2)
1504             .order_by(Sample.id))
1505
1506    for sample in query:
1507        print(sample.counter, sample.value, sample.group_sum, sample.revrank)
1508
1509    # counter  value   group_sum   revrank
1510    # 1        10.     30.         2
1511    # 1        20.     30.         1
1512    # 2        1.      4.          2
1513    # 2        3.      4.          1
1514    # 3        100.    100.        1
1515
1516.. _window-frame-types:
1517
1518Frame types: RANGE vs ROWS vs GROUPS
1519^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1520
1521Depending on the frame type, the database will process ordered groups
1522differently. Let's create two additional ``Sample`` rows to visualize the
1523difference:
1524
1525.. code-block:: pycon
1526
1527    >>> Sample.create(counter=1, value=20.)
1528    <Sample 6>
1529    >>> Sample.create(counter=2, value=1.)
1530    <Sample 7>
1531
1532Our table now contains:
1533
1534=== ======== ======
1535id  counter  value
1536=== ======== ======
15371   1        10.0
15382   1        20.0
15393   2        1.0
15404   2        3.0
15415   3        100.0
15426   1        20.0
15437   2        1.0
1544=== ======== ======
1545
1546Let's examine the difference by calculating a "running sum" of the samples,
1547ordered with respect to the ``counter`` and ``value`` fields. To specify the
1548frame type, we can use either:
1549
1550* :py:attr:`Window.RANGE`
1551* :py:attr:`Window.ROWS`
1552* :py:attr:`Window.GROUPS`
1553
1554The behavior of :py:attr:`~Window.RANGE`, when there are logical duplicates,
1555may lead to unexpected results:
1556
1557.. code-block:: python
1558
1559    query = Sample.select(
1560        Sample.counter,
1561        Sample.value,
1562        fn.SUM(Sample.value).over(
1563            order_by=[Sample.counter, Sample.value],
1564            frame_type=Window.RANGE).alias('rsum'))
1565
1566    for sample in query.order_by(Sample.counter, Sample.value):
1567        print(sample.counter, sample.value, sample.rsum)
1568
1569    # counter  value   rsum
1570    # 1          10.     10.
1571    # 1          20.     50.
1572    # 1          20.     50.
1573    # 2           1.     52.
1574    # 2           1.     52.
1575    # 2           3.     55.
1576    # 3         100     155.
1577
1578With the inclusion of the new rows we now have some rows that have duplicate
1579``category`` and ``value`` values. The :py:attr:`~Window.RANGE` frame type
1580causes these duplicates to be evaluated together rather than separately.
1581
1582The more expected result can be achieved by using :py:attr:`~Window.ROWS` as
1583the frame-type:
1584
1585.. code-block:: python
1586
1587    query = Sample.select(
1588        Sample.counter,
1589        Sample.value,
1590        fn.SUM(Sample.value).over(
1591            order_by=[Sample.counter, Sample.value],
1592            frame_type=Window.ROWS).alias('rsum'))
1593
1594    for sample in query.order_by(Sample.counter, Sample.value):
1595        print(sample.counter, sample.value, sample.rsum)
1596
1597    # counter  value   rsum
1598    # 1          10.     10.
1599    # 1          20.     30.
1600    # 1          20.     50.
1601    # 2           1.     51.
1602    # 2           1.     52.
1603    # 2           3.     55.
1604    # 3         100     155.
1605
1606Peewee uses these rules for determining what frame-type to use:
1607
1608* If the user specifies a ``frame_type``, that frame type will be used.
1609* If ``start`` and/or ``end`` boundaries are specified Peewee will default to
1610  using ``ROWS``.
1611* If the user did not specify frame type or start/end boundaries, Peewee will
1612  use the database default, which is ``RANGE``.
1613
1614The :py:attr:`Window.GROUPS` frame type looks at the window range specification
1615in terms of groups of rows, based on the ordering term(s). Using ``GROUPS``, we
1616can define the frame so it covers distinct groupings of rows. Let's look at an
1617example:
1618
1619.. code-block:: python
1620
1621    query = (Sample
1622             .select(Sample.counter, Sample.value,
1623                     fn.SUM(Sample.value).over(
1624                        order_by=[Sample.counter, Sample.value],
1625                        frame_type=Window.GROUPS,
1626                        start=Window.preceding(1)).alias('gsum'))
1627             .order_by(Sample.counter, Sample.value))
1628
1629    for sample in query:
1630        print(sample.counter, sample.value, sample.gsum)
1631
1632    #  counter   value    gsum
1633    #  1         10       10
1634    #  1         20       50
1635    #  1         20       50   (10) + (20+0)
1636    #  2         1        42
1637    #  2         1        42   (20+20) + (1+1)
1638    #  2         3        5    (1+1) + 3
1639    #  3         100      103  (3) + 100
1640
1641As you can hopefully infer, the window is grouped by its ordering term, which
1642is ``(counter, value)``. We are looking at a window that extends between one
1643previous group and the current group.
1644
1645.. note::
1646    For information about the window function APIs, see:
1647
1648    * :py:meth:`Function.over`
1649    * :py:meth:`Function.filter`
1650    * :py:class:`Window`
1651
1652    For general information on window functions, read the postgres `window functions tutorial <https://www.postgresql.org/docs/current/tutorial-window.html>`_
1653
1654    Additionally, the `postgres docs <https://www.postgresql.org/docs/current/sql-select.html#SQL-WINDOW>`_
1655    and the `sqlite docs <https://www.sqlite.org/windowfunctions.html>`_
1656    contain a lot of good information.
1657
1658.. _rowtypes:
1659
1660Retrieving row tuples / dictionaries / namedtuples
1661--------------------------------------------------
1662
1663Sometimes you do not need the overhead of creating model instances and simply
1664want to iterate over the row data without needing all the APIs provided
1665:py:class:`Model`. To do this, use:
1666
1667* :py:meth:`~BaseQuery.dicts`
1668* :py:meth:`~BaseQuery.namedtuples`
1669* :py:meth:`~BaseQuery.tuples`
1670* :py:meth:`~BaseQuery.objects` -- accepts an arbitrary constructor function
1671  which is called with the row tuple.
1672
1673.. code-block:: python
1674
1675    stats = (Stat
1676             .select(Stat.url, fn.Count(Stat.url))
1677             .group_by(Stat.url)
1678             .tuples())
1679
1680    # iterate over a list of 2-tuples containing the url and count
1681    for stat_url, stat_count in stats:
1682        print(stat_url, stat_count)
1683
1684Similarly, you can return the rows from the cursor as dictionaries using
1685:py:meth:`~BaseQuery.dicts`:
1686
1687.. code-block:: python
1688
1689    stats = (Stat
1690             .select(Stat.url, fn.Count(Stat.url).alias('ct'))
1691             .group_by(Stat.url)
1692             .dicts())
1693
1694    # iterate over a list of 2-tuples containing the url and count
1695    for stat in stats:
1696        print(stat['url'], stat['ct'])
1697
1698.. _returning-clause:
1699
1700Returning Clause
1701----------------
1702
1703:py:class:`PostgresqlDatabase` supports a ``RETURNING`` clause on ``UPDATE``,
1704``INSERT`` and ``DELETE`` queries. Specifying a ``RETURNING`` clause allows you
1705to iterate over the rows accessed by the query.
1706
1707By default, the return values upon execution of the different queries are:
1708
1709* ``INSERT`` - auto-incrementing primary key value of the newly-inserted row.
1710  When not using an auto-incrementing primary key, Postgres will return the new
1711  row's primary key, but SQLite and MySQL will not.
1712* ``UPDATE`` - number of rows modified
1713* ``DELETE`` - number of rows deleted
1714
1715When a returning clause is used the return value upon executing a query will be
1716an iterable cursor object.
1717
1718Postgresql allows, via the ``RETURNING`` clause, to return data from the rows
1719inserted or modified by a query.
1720
1721For example, let's say you have an :py:class:`Update` that deactivates all
1722user accounts whose registration has expired. After deactivating them, you want
1723to send each user an email letting them know their account was deactivated.
1724Rather than writing two queries, a ``SELECT`` and an ``UPDATE``, you can do
1725this in a single ``UPDATE`` query with a ``RETURNING`` clause:
1726
1727.. code-block:: python
1728
1729    query = (User
1730             .update(is_active=False)
1731             .where(User.registration_expired == True)
1732             .returning(User))
1733
1734    # Send an email to every user that was deactivated.
1735    for deactivate_user in query.execute():
1736        send_deactivation_email(deactivated_user.email)
1737
1738The ``RETURNING`` clause is also available on :py:class:`Insert` and
1739:py:class:`Delete`. When used with ``INSERT``, the newly-created rows will be
1740returned. When used with ``DELETE``, the deleted rows will be returned.
1741
1742The only limitation of the ``RETURNING`` clause is that it can only consist of
1743columns from tables listed in the query's ``FROM`` clause. To select all
1744columns from a particular table, you can simply pass in the :py:class:`Model`
1745class.
1746
1747As another example, let's add a user and set their creation-date to the
1748server-generated current timestamp. We'll create and retrieve the new user's
1749ID, Email and the creation timestamp in a single query:
1750
1751.. code-block:: python
1752
1753    query = (User
1754             .insert(email='foo@bar.com', created=fn.now())
1755             .returning(User))  # Shorthand for all columns on User.
1756
1757    # When using RETURNING, execute() returns a cursor.
1758    cursor = query.execute()
1759
1760    # Get the user object we just inserted and log the data:
1761    user = cursor[0]
1762    logger.info('Created user %s (id=%s) at %s', user.email, user.id, user.created)
1763
1764By default the cursor will return :py:class:`Model` instances, but you can
1765specify a different row type:
1766
1767.. code-block:: python
1768
1769    data = [{'name': 'charlie'}, {'name': 'huey'}, {'name': 'mickey'}]
1770    query = (User
1771             .insert_many(data)
1772             .returning(User.id, User.username)
1773             .dicts())
1774
1775    for new_user in query.execute():
1776        print('Added user "%s", id=%s' % (new_user['username'], new_user['id']))
1777
1778Just as with :py:class:`Select` queries, you can specify various :ref:`result row types <rowtypes>`.
1779
1780.. _cte:
1781
1782Common Table Expressions
1783------------------------
1784
1785Peewee supports the inclusion of common table expressions (CTEs) in all types
1786of queries. CTEs may be useful for:
1787
1788* Factoring out a common subquery.
1789* Grouping or filtering by a column derived in the CTE's result set.
1790* Writing recursive queries.
1791
1792To declare a :py:class:`Select` query for use as a CTE, use
1793:py:meth:`~SelectQuery.cte` method, which wraps the query in a :py:class:`CTE`
1794object. To indicate that a :py:class:`CTE` should be included as part of a
1795query, use the :py:meth:`Query.with_cte` method, passing a list of CTE objects.
1796
1797Simple Example
1798^^^^^^^^^^^^^^
1799
1800For an example, let's say we have some data points that consist of a key and a
1801floating-point value. Let's define our model and populate some test data:
1802
1803.. code-block:: python
1804
1805    class Sample(Model):
1806        key = TextField()
1807        value = FloatField()
1808
1809    data = (
1810        ('a', (1.25, 1.5, 1.75)),
1811        ('b', (2.1, 2.3, 2.5, 2.7, 2.9)),
1812        ('c', (3.5, 3.5)))
1813
1814    # Populate data.
1815    for key, values in data:
1816        Sample.insert_many([(key, value) for value in values],
1817                           fields=[Sample.key, Sample.value]).execute()
1818
1819Let's use a CTE to calculate, for each distinct key, which values were
1820above-average for that key.
1821
1822.. code-block:: python
1823
1824    # First we'll declare the query that will be used as a CTE. This query
1825    # simply determines the average value for each key.
1826    cte = (Sample
1827           .select(Sample.key, fn.AVG(Sample.value).alias('avg_value'))
1828           .group_by(Sample.key)
1829           .cte('key_avgs', columns=('key', 'avg_value')))
1830
1831    # Now we'll query the sample table, using our CTE to find rows whose value
1832    # exceeds the average for the given key. We'll calculate how far above the
1833    # average the given sample's value is, as well.
1834    query = (Sample
1835             .select(Sample.key, Sample.value)
1836             .join(cte, on=(Sample.key == cte.c.key))
1837             .where(Sample.value > cte.c.avg_value)
1838             .order_by(Sample.value)
1839             .with_cte(cte))
1840
1841We can iterate over the samples returned by the query to see which samples had
1842above-average values for their given group:
1843
1844.. code-block:: pycon
1845
1846    >>> for sample in query:
1847    ...     print(sample.key, sample.value)
1848
1849    # 'a', 1.75
1850    # 'b', 2.7
1851    # 'b', 2.9
1852
1853Complex Example
1854^^^^^^^^^^^^^^^
1855
1856For a more complete example, let's consider the following query which uses
1857multiple CTEs to find per-product sales totals in only the top sales regions.
1858Our model looks like this:
1859
1860.. code-block:: python
1861
1862    class Order(Model):
1863        region = TextField()
1864        amount = FloatField()
1865        product = TextField()
1866        quantity = IntegerField()
1867
1868Here is how the query might be written in SQL. This example can be found in
1869the `postgresql documentation <https://www.postgresql.org/docs/current/static/queries-with.html>`_.
1870
1871.. code-block:: sql
1872
1873    WITH regional_sales AS (
1874        SELECT region, SUM(amount) AS total_sales
1875        FROM orders
1876        GROUP BY region
1877      ), top_regions AS (
1878        SELECT region
1879        FROM regional_sales
1880        WHERE total_sales > (SELECT SUM(total_sales) / 10 FROM regional_sales)
1881      )
1882    SELECT region,
1883           product,
1884           SUM(quantity) AS product_units,
1885           SUM(amount) AS product_sales
1886    FROM orders
1887    WHERE region IN (SELECT region FROM top_regions)
1888    GROUP BY region, product;
1889
1890With Peewee, we would write:
1891
1892.. code-block:: python
1893
1894    reg_sales = (Order
1895                 .select(Order.region,
1896                         fn.SUM(Order.amount).alias('total_sales'))
1897                 .group_by(Order.region)
1898                 .cte('regional_sales'))
1899
1900    top_regions = (reg_sales
1901                   .select(reg_sales.c.region)
1902                   .where(reg_sales.c.total_sales > (
1903                       reg_sales.select(fn.SUM(reg_sales.c.total_sales) / 10)))
1904                   .cte('top_regions'))
1905
1906    query = (Order
1907             .select(Order.region,
1908                     Order.product,
1909                     fn.SUM(Order.quantity).alias('product_units'),
1910                     fn.SUM(Order.amount).alias('product_sales'))
1911             .where(Order.region.in_(top_regions.select(top_regions.c.region)))
1912             .group_by(Order.region, Order.product)
1913             .with_cte(regional_sales, top_regions))
1914
1915Recursive CTEs
1916^^^^^^^^^^^^^^
1917
1918Peewee supports recursive CTEs. Recursive CTEs can be useful when, for example,
1919you have a tree data-structure represented by a parent-link foreign key.
1920Suppose, for example, that we have a hierarchy of categories for an online
1921bookstore. We wish to generate a table showing all categories and their
1922absolute depths, along with the path from the root to the category.
1923
1924We'll assume the following model definition, in which each category has a
1925foreign-key to its immediate parent category:
1926
1927.. code-block:: python
1928
1929    class Category(Model):
1930        name = TextField()
1931        parent = ForeignKeyField('self', backref='children', null=True)
1932
1933To list all categories along with their depth and parents, we can use a
1934recursive CTE:
1935
1936.. code-block:: python
1937
1938    # Define the base case of our recursive CTE. This will be categories that
1939    # have a null parent foreign-key.
1940    Base = Category.alias()
1941    level = Value(1).alias('level')
1942    path = Base.name.alias('path')
1943    base_case = (Base
1944                 .select(Base.id, Base.name, Base.parent, level, path)
1945                 .where(Base.parent.is_null())
1946                 .cte('base', recursive=True))
1947
1948    # Define the recursive terms.
1949    RTerm = Category.alias()
1950    rlevel = (base_case.c.level + 1).alias('level')
1951    rpath = base_case.c.path.concat('->').concat(RTerm.name).alias('path')
1952    recursive = (RTerm
1953                 .select(RTerm.id, RTerm.name, RTerm.parent, rlevel, rpath)
1954                 .join(base_case, on=(RTerm.parent == base_case.c.id)))
1955
1956    # The recursive CTE is created by taking the base case and UNION ALL with
1957    # the recursive term.
1958    cte = base_case.union_all(recursive)
1959
1960    # We will now query from the CTE to get the categories, their levels,  and
1961    # their paths.
1962    query = (cte
1963             .select_from(cte.c.name, cte.c.level, cte.c.path)
1964             .order_by(cte.c.path))
1965
1966    # We can now iterate over a list of all categories and print their names,
1967    # absolute levels, and path from root -> category.
1968    for category in query:
1969        print(category.name, category.level, category.path)
1970
1971    # Example output:
1972    # root, 1, root
1973    # p1, 2, root->p1
1974    # c1-1, 3, root->p1->c1-1
1975    # c1-2, 3, root->p1->c1-2
1976    # p2, 2, root->p2
1977    # c2-1, 3, root->p2->c2-1
1978
1979Foreign Keys and Joins
1980----------------------
1981
1982This section have been moved into its own document: :ref:`relationships`.
1983