1.. % Introduction 2.. % What is ZODB? 3.. % What is ZEO? 4.. % OODBs vs. Relational DBs 5.. % Other OODBs 6 7 8Introduction 9============ 10 11This guide explains how to write Python programs that use the Z Object Database 12(ZODB) and Zope Enterprise Objects (ZEO). The latest version of the guide is 13always available at `<http://www.zope.org/Wikis/ZODB/guide/index.html>`_. 14 15 16What is the ZODB? 17----------------- 18 19The ZODB is a persistence system for Python objects. Persistent programming 20languages provide facilities that automatically write objects to disk and read 21them in again when they're required by a running program. By installing the 22ZODB, you add such facilities to Python. 23 24It's certainly possible to build your own system for making Python objects 25persistent. The usual starting points are the :mod:`pickle` module, for 26converting objects into a string representation, and various database modules, 27such as the :mod:`gdbm` or :mod:`bsddb` modules, that provide ways to write 28strings to disk and read them back. It's straightforward to combine the 29:mod:`pickle` module and a database module to store and retrieve objects, and in 30fact the :mod:`shelve` module, included in Python's standard library, does this. 31 32The downside is that the programmer has to explicitly manage objects, reading an 33object when it's needed and writing it out to disk when the object is no longer 34required. The ZODB manages objects for you, keeping them in a cache, writing 35them out to disk when they are modified, and dropping them from the cache if 36they haven't been used in a while. 37 38 39OODBs vs. Relational DBs 40------------------------ 41 42Another way to look at it is that the ZODB is a Python-specific object-oriented 43database (OODB). Commercial object databases for C++ or Java often require that 44you jump through some hoops, such as using a special preprocessor or avoiding 45certain data types. As we'll see, the ZODB has some hoops of its own to jump 46through, but in comparison the naturalness of the ZODB is astonishing. 47 48Relational databases (RDBs) are far more common than OODBs. Relational databases 49store information in tables; a table consists of any number of rows, each row 50containing several columns of information. (Rows are more formally called 51relations, which is where the term "relational database" originates.) 52 53Let's look at a concrete example. The example comes from my day job working for 54the MEMS Exchange, in a greatly simplified version. The job is to track process 55runs, which are lists of manufacturing steps to be performed in a semiconductor 56fab. A run is owned by a particular user, and has a name and assigned ID 57number. Runs consist of a number of operations; an operation is a single step 58to be performed, such as depositing something on a wafer or etching something 59off it. 60 61Operations may have parameters, which are additional information required to 62perform an operation. For example, if you're depositing something on a wafer, 63you need to know two things: 1) what you're depositing, and 2) how much should 64be deposited. You might deposit 100 microns of silicon oxide, or 1 micron of 65copper. 66 67Mapping these structures to a relational database is straightforward:: 68 69 CREATE TABLE runs ( 70 int run_id, 71 varchar owner, 72 varchar title, 73 int acct_num, 74 primary key(run_id) 75 ); 76 77 CREATE TABLE operations ( 78 int run_id, 79 int step_num, 80 varchar process_id, 81 PRIMARY KEY(run_id, step_num), 82 FOREIGN KEY(run_id) REFERENCES runs(run_id), 83 ); 84 85 CREATE TABLE parameters ( 86 int run_id, 87 int step_num, 88 varchar param_name, 89 varchar param_value, 90 PRIMARY KEY(run_id, step_num, param_name) 91 FOREIGN KEY(run_id, step_num) 92 REFERENCES operations(run_id, step_num), 93 ); 94 95In Python, you would write three classes named :class:`Run`, :class:`Operation`, 96and :class:`Parameter`. I won't present code for defining these classes, since 97that code is uninteresting at this point. Each class would contain a single 98method to begin with, an :meth:`__init__` method that assigns default values, 99such as 0 or ``None``, to each attribute of the class. 100 101It's not difficult to write Python code that will create a :class:`Run` instance 102and populate it with the data from the relational tables; with a little more 103effort, you can build a straightforward tool, usually called an object- 104relational mapper, to do this automatically. (See 105`<http://www.amk.ca/python/unmaintained/ordb.html>`_ for a quick hack at a 106Python object-relational mapper, and 107`<http://www.python.org/workshops/1997-10/proceedings/shprentz.html>`_ for Joel 108Shprentz's more successful implementation of the same idea; Unlike mine, 109Shprentz's system has been used for actual work.) 110 111However, it is difficult to make an object-relational mapper reasonably quick; a 112simple-minded implementation like mine is quite slow because it has to do 113several queries to access all of an object's data. Higher performance object- 114relational mappers cache objects to improve performance, only performing SQL 115queries when they actually need to. 116 117That helps if you want to access run number 123 all of a sudden. But what if 118you want to find all runs where a step has a parameter named 'thickness' with a 119value of 2.0? In the relational version, you have two unappealing choices: 120 121#. Write a specialized SQL query for this case: ``SELECT run_id FROM operations 122 WHERE param_name = 'thickness' AND param_value = 2.0`` 123 124 If such queries are common, you can end up with lots of specialized queries. 125 When the database tables get rearranged, all these queries will need to be 126 modified. 127 128#. An object-relational mapper doesn't help much. Scanning through the runs 129 means that the the mapper will perform the required SQL queries to read run #1, 130 and then a simple Python loop can check whether any of its steps have the 131 parameter you're looking for. Repeat for run #2, 3, and so forth. This does a 132 vast number of SQL queries, and therefore is incredibly slow. 133 134An object database such as ZODB simply stores internal pointers from object to 135object, so reading in a single object is much faster than doing a bunch of SQL 136queries and assembling the results. Scanning all runs, therefore, is still 137inefficient, but not grossly inefficient. 138 139 140What is ZEO? 141------------ 142 143The ZODB comes with a few different classes that implement the :class:`Storage` 144interface. Such classes handle the job of writing out Python objects to a 145physical storage medium, which can be a disk file (the :class:`FileStorage` 146class), a BerkeleyDB file (:class:`BDBFullStorage`), a relational database 147(:class:`DCOracleStorage`), or some other medium. ZEO adds 148:class:`ClientStorage`, a new :class:`Storage` that doesn't write to physical 149media but just forwards all requests across a network to a server. The server, 150which is running an instance of the :class:`StorageServer` class, simply acts as 151a front-end for some physical :class:`Storage` class. It's a fairly simple 152idea, but as we'll see later on in this document, it opens up many 153possibilities. 154 155 156About this guide 157---------------- 158 159The primary author of this guide works on a project which uses the ZODB and ZEO 160as its primary storage technology. We use the ZODB to store process runs and 161operations, a catalog of available processes, user information, accounting 162information, and other data. Part of the goal of writing this document is to 163make our experience more widely available. A few times we've spent hours or 164even days trying to figure out a problem, and this guide is an attempt to gather 165up the knowledge we've gained so that others don't have to make the same 166mistakes we did while learning. 167 168The author's ZODB project is described in a paper available here, 169`<http://www.amk.ca/python/writing/mx-architecture/>`_ 170 171This document will always be a work in progress. If you wish to suggest 172clarifications or additional topics, please send your comments to the 173`ZODB-dev mailing list <https://groups.google.com/forum/#!forum/zodb>`_. 174 175 176Acknowledgements 177---------------- 178 179Andrew Kuchling wrote the original version of this guide, which provided some of 180the first ZODB documentation for Python programmers. His initial version has 181been updated over time by Jeremy Hylton and Tim Peters. 182 183I'd like to thank the people who've pointed out inaccuracies and bugs, offered 184suggestions on the text, or proposed new topics that should be covered: Jeff 185Bauer, Willem Broekema, Thomas Guettler, Chris McDonough, George Runyan. 186 187