1.. % Introduction
2.. % What is ZODB?
3.. % What is ZEO?
4.. % OODBs vs. Relational DBs
5.. % Other OODBs
6
7
8Introduction
9============
10
11This guide explains how to write Python programs that use the Z Object Database
12(ZODB) and Zope Enterprise Objects (ZEO).  The latest version of the guide is
13always available at `<http://www.zope.org/Wikis/ZODB/guide/index.html>`_.
14
15
16What is the ZODB?
17-----------------
18
19The ZODB is a persistence system for Python objects.  Persistent programming
20languages provide facilities that automatically write objects to disk and read
21them in again when they're required by a running program.  By installing the
22ZODB, you add such facilities to Python.
23
24It's certainly possible to build your own system for making Python objects
25persistent.  The usual starting points are the :mod:`pickle` module, for
26converting objects into a string representation, and various database modules,
27such as the :mod:`gdbm` or :mod:`bsddb` modules, that provide ways to write
28strings to disk and read them back.  It's straightforward to combine the
29:mod:`pickle` module and a database module to store and retrieve objects, and in
30fact the :mod:`shelve` module, included in Python's standard library, does this.
31
32The downside is that the programmer has to explicitly manage objects, reading an
33object when it's needed and writing it out to disk when the object is no longer
34required.  The ZODB manages objects for you, keeping them in a cache, writing
35them out to disk when they are modified, and dropping them from the cache if
36they haven't been used in a while.
37
38
39OODBs vs. Relational DBs
40------------------------
41
42Another way to look at it is that the ZODB is a Python-specific object-oriented
43database (OODB).  Commercial object databases for C++ or Java often require that
44you jump through some hoops, such as using a special preprocessor or avoiding
45certain data types.  As we'll see, the ZODB has some hoops of its own to jump
46through, but in comparison the naturalness of the ZODB is astonishing.
47
48Relational databases (RDBs) are far more common than OODBs. Relational databases
49store information in tables; a table consists of any number of rows, each row
50containing several columns of information.  (Rows are more formally called
51relations, which is where the term "relational database" originates.)
52
53Let's look at a concrete example.  The example comes from my day job working for
54the MEMS Exchange, in a greatly simplified version.  The job is to track process
55runs, which are lists of manufacturing steps to be performed in a semiconductor
56fab.  A run is owned by a particular user, and has a name and assigned ID
57number.  Runs consist of a number of operations; an operation is a single step
58to be performed, such as depositing something on a wafer or etching something
59off it.
60
61Operations may have parameters, which are additional information required to
62perform an operation.  For example, if you're depositing something on a wafer,
63you need to know two things: 1) what you're depositing, and 2) how much should
64be deposited.  You might deposit 100 microns of silicon oxide, or 1 micron of
65copper.
66
67Mapping these structures to a relational database is straightforward::
68
69   CREATE TABLE runs (
70     int      run_id,
71     varchar  owner,
72     varchar  title,
73     int      acct_num,
74     primary key(run_id)
75   );
76
77   CREATE TABLE operations (
78     int      run_id,
79     int      step_num,
80     varchar  process_id,
81     PRIMARY KEY(run_id, step_num),
82     FOREIGN KEY(run_id) REFERENCES runs(run_id),
83   );
84
85   CREATE TABLE parameters (
86     int      run_id,
87     int      step_num,
88     varchar  param_name,
89     varchar  param_value,
90     PRIMARY KEY(run_id, step_num, param_name)
91     FOREIGN KEY(run_id, step_num)
92        REFERENCES operations(run_id, step_num),
93   );
94
95In Python, you would write three classes named :class:`Run`, :class:`Operation`,
96and :class:`Parameter`.  I won't present code for defining these classes, since
97that code is uninteresting at this point. Each class would contain a single
98method to begin with, an :meth:`__init__` method that assigns default values,
99such as 0 or ``None``, to each attribute of the class.
100
101It's not difficult to write Python code that will create a :class:`Run` instance
102and populate it with the data from the relational tables; with a little more
103effort, you can build a straightforward tool, usually called an object-
104relational mapper, to do this automatically. (See
105`<http://www.amk.ca/python/unmaintained/ordb.html>`_ for a quick hack at a
106Python object-relational mapper, and
107`<http://www.python.org/workshops/1997-10/proceedings/shprentz.html>`_ for Joel
108Shprentz's more successful implementation of the same idea; Unlike mine,
109Shprentz's system has been used for actual work.)
110
111However, it is difficult to make an object-relational mapper reasonably quick; a
112simple-minded implementation like mine is quite slow because it has to do
113several queries to access all of an object's data.  Higher performance object-
114relational mappers cache objects to improve performance, only performing SQL
115queries when they actually need to.
116
117That helps if you want to access run number 123 all of a sudden.  But what if
118you want to find all runs where a step has a parameter named 'thickness' with a
119value of 2.0?  In the relational version, you have two unappealing choices:
120
121#. Write a specialized SQL query for this case: ``SELECT run_id FROM operations
122   WHERE param_name = 'thickness' AND param_value = 2.0``
123
124   If such queries are common, you can end up with lots of specialized queries.
125   When the database tables get rearranged, all these queries will need to be
126   modified.
127
128#. An object-relational mapper doesn't help much.  Scanning through the runs
129   means that the the mapper will perform the required SQL queries to read run #1,
130   and then a simple Python loop can check whether any of its steps have the
131   parameter you're looking for. Repeat for run #2, 3, and so forth.  This does a
132   vast number of SQL queries, and therefore is incredibly slow.
133
134An object database such as ZODB simply stores internal pointers from object to
135object, so reading in a single object is much faster than doing a bunch of SQL
136queries and assembling the results. Scanning all runs, therefore, is still
137inefficient, but not grossly inefficient.
138
139
140What is ZEO?
141------------
142
143The ZODB comes with a few different classes that implement the :class:`Storage`
144interface.  Such classes handle the job of writing out Python objects to a
145physical storage medium, which can be a disk file (the :class:`FileStorage`
146class), a BerkeleyDB file (:class:`BDBFullStorage`), a relational database
147(:class:`DCOracleStorage`), or some other medium.  ZEO adds
148:class:`ClientStorage`, a new :class:`Storage` that doesn't write to physical
149media but just forwards all requests across a network to a server.  The server,
150which is running an instance of the :class:`StorageServer` class, simply acts as
151a front-end for some physical :class:`Storage` class.  It's a fairly simple
152idea, but as we'll see later on in this document, it opens up many
153possibilities.
154
155
156About this guide
157----------------
158
159The primary author of this guide works on a project which uses the ZODB and ZEO
160as its primary storage technology.  We use the ZODB to store process runs and
161operations, a catalog of available processes, user information, accounting
162information, and other data.  Part of the goal of writing this document is to
163make our experience more widely available.  A few times we've spent hours or
164even days trying to figure out a problem, and this guide is an attempt to gather
165up the knowledge we've gained so that others don't have to make the same
166mistakes we did while learning.
167
168The author's ZODB project is described in a paper available here,
169`<http://www.amk.ca/python/writing/mx-architecture/>`_
170
171This document will always be a work in progress.  If you wish to suggest
172clarifications or additional topics, please send your comments to the
173`ZODB-dev mailing list <https://groups.google.com/forum/#!forum/zodb>`_.
174
175
176Acknowledgements
177----------------
178
179Andrew Kuchling wrote the original version of this guide, which provided some of
180the first ZODB documentation for Python programmers. His initial version has
181been updated over time by Jeremy Hylton and Tim Peters.
182
183I'd like to thank the people who've pointed out inaccuracies and bugs, offered
184suggestions on the text, or proposed new topics that should be covered: Jeff
185Bauer, Willem Broekema, Thomas Guettler, Chris McDonough, George Runyan.
186
187