1/* 2 This file is part of MADNESS. 3 4 Copyright (C) 2015 Stony Brook University 5 6 This program is free software; you can redistribute it and/or modify 7 it under the terms of the GNU General Public License as published by 8 the Free Software Foundation; either version 2 of the License, or 9 (at your option) any later version. 10 11 This program is distributed in the hope that it will be useful, 12 but WITHOUT ANY WARRANTY; without even the implied warranty of 13 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 14 GNU General Public License for more details. 15 16 You should have received a copy of the GNU General Public License 17 along with this program; if not, write to the Free Software 18 Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA 19 20 For more information please contact: 21 22 Robert J. Harrison 23 Oak Ridge National Laboratory 24 One Bethel Valley Road 25 P.O. Box 2008, MS-6367 26 27 email: harrisonrj@ornl.gov 28 tel: 865-241-3937 29 fax: 865-572-0680 30*/ 31 32/** 33 \file serialization.dox 34 \brief Overview of the interface templates for archives (serialization). 35 \addtogroup serialization 36 37The programmer should not need to include madness/world/archive.h directly. Instead, include the header file for the actual archive (binary file, text/xml file, vector in memory, etc.) that is desired. 38 39\par Background 40 41The interface and implementation are deliberately modelled, albeit loosely, upon the boost serialization class (thanks boost!). The major differences are that this archive class does \em not break cycles and does \em not automatically store unique copies of data referenced by multiple objects. Also, classes are responsbible for managing their own version information. At the lowest level, the interface to an archive also differs to facilitate vectorization and high-bandwidth data transfer. The implementation employs templates that are almost entirely inlined. This should enable low-overhead use of archives in applications, such as interprocess communication. 42 43\par How to use an archive? 44 45An archive is a uni-directional stream of typed data to/from disk, memory, or another process. Whether the stream is for input or for output, you can use the \c & operator to transfer data to/from the stream. If you really want, you can also use the \c << or \c >> for output or input, respectively, but there is no reason to do so. The \c & operator chains just like \c << for \c cout or \c >> for \c cin. You may discover in \c archive.h other interfaces but you should \em not use them --- use the \& operator! The lower level interfaces will probably not, or only inconsistently, incorporate type information, and may even appear to work when they are not. 46 47Unless type checking has not been implemented by an archive for reasons of efficiency (e.g., message passing) a C-string exception will be thrown on a type-mismatch when deserializing. End-of-file, out-of-memory, and others also generate string exceptions. 48 49Fundamental types (see below), STL complex, vector, strings, pairs and maps, and tensors (int, long, float, double, float_complex, double_complex) all work without you doing anything, as do fixed-dimension arrays of the same (STL allocators are not presently accomodated). For example, 50\code 51 bool finished = false; 52 int info[3] = {1, 33, 2}; 53 map<int, double> fred; 54 fred[0] = 55.0; fred[1] = 99.0; 55 56 BinaryFstreamOutputArchive ar('restart.dat'); 57 ar & fred & info & finished; 58\endcode 59Deserializing is identical, except that you need to use an input archive, c.f., 60\code 61 bool finished; 62 int info[3]; 63 map<int, double> fred; 64 65 BinaryFstreamInputArchive ar('restart.dat'); 66 ar & fred & info & finished; 67\endcode 68 69Variable dimension and dynamically allocated arrays do not have their dimension encoded in their type. The best way to (de)serialize them is to wrap them in an \c archive_array as follows. 70\code 71 int a[n]; // n is not known at compile time 72 double *p = new double[n]; 73 ar & wrap(a,n) & wrap(p,n); 74\endcode 75The \c wrap() function template is a factory function to simplify instantiation of a correctly typed \c archive_array template. Note that when deserializing, you must have first allocated the array --- the above code can be used for both serializing and deserializing. If you want the memory to be automatically allocated consider using either an STL vector or a madness tensor. 76 77To transfer the actual value of a pointer to a stream (is this really what you want?) then store an archive_ptr wrapping it. The factory function \c wrap_ptr() assists in doing this, e.g., here for a function pointer 78\code 79 int foo(); 80 ar & wrap_ptr(foo); 81\endcode 82 83\par User-defined types 84 85User-defined types require a little more effort. Three cases are distinguished. 86- symmetric load and store 87 - intrusive 88 - non-intrusive 89- non-symmetric load and store 90 91We will examine each in turn, but we first need to discuss a little about the implementation. 92 93When transfering an object \c obj to/from an archive \c ar with `ar & obj`, you need to invoke the templated function 94\code 95 template <class Archive, class T> 96 inline const Archive& operator&(const Archive& ar, T& obj); 97\endcode 98that then invokes other templated functions to redirect to input or output streams as appropriate, manage type checking, etc. We would now like to overload the behavior of these functions in order to accomodate your fancy object. However, function templates cannot be partially specialized. Following the technique recommended <a href=http://www.gotw.ca/publications/mill17.htm>here</a> (look for moral#2), each of the templated functions directly calls a member of a templated class. Classes, unlike functions, can be partially specialized, so it is easy to control and predict what is happening. Thus, in order to change the behavior of all archives for an object you just have to provide a partial specialization of the appropriate class(es). Do \em not overload any of the function templates. 99 100<em>Symmetric intrusive method</em> 101 102Many classes can use the same code for serializing and deserializing. If such a class can be modified, the cleanest way of enabling serialization is to add a templated method as follows. 103\code 104 class A { 105 float a; 106 107 public: 108 A(float a = 0.0) : a(a) {} 109 110 template <class Archive> 111 inline void serialize(const Archive& ar) { 112 ar & a; 113 } 114 }; 115\endcode 116 117<em>Symmetric non-intrusive method</em> 118 119If a class with symmetric serialization cannot be modified, then you can define an external class template with the following signature in the \c madness::archive namespace (where \c Obj is the name of your type). 120\code 121 namespace madness { 122 namespace archive { 123 template <class Archive> 124 struct ArchiveSerializeImpl<Archive,Obj> { 125 static inline void serialize(const Archive& ar, Obj& obj); 126 }; 127 } 128 } 129\endcode 130 131For example, 132\code 133 class B { 134 public: 135 bool b; 136 B(bool b = false) 137 : b(b) {}; 138 }; 139 140 namespace madness { 141 namespace archive { 142 template <class Archive> 143 struct ArchiveSerializeImpl<Archive, B> { 144 static inline void serialize(const Archive& ar, B& b) { 145 ar & b.b; 146 }; 147 }; 148 } 149 } 150\endcode 151 152<em>Non-symmetric non-intrusive</em> 153 154For classes that do not have symmetric (de)serialization you must define separate partial templates for the functions \c load and \c store with these signatures and again in the \c madness::archive namespace. 155\code 156 namespace madness { 157 namespace archive { 158 template <class Archive> 159 struct ArchiveLoadImpl<Archive, Obj> { 160 static inline void load(const Archive& ar, Obj& obj); 161 }; 162 163 template <class Archive> 164 struct ArchiveStoreImpl<Archive, Obj> { 165 static inline void store(const Archive& ar, const Obj& obj); 166 }; 167 } 168 } 169\endcode 170 171First a simple, but artificial example. 172\code 173 class C { 174 public: 175 long c; 176 C(long c = 0) 177 : c(c) {}; 178 }; 179 180 namespace madness { 181 namespace archive { 182 template <class Archive> 183 struct ArchiveLoadImpl<Archive, C> { 184 static inline void load(const Archive& ar, C& c) { 185 ar & c.c; 186 } 187 }; 188 189 template <class Archive> 190 struct ArchiveStoreImpl<Archive, C> { 191 static inline void store(const Archive& ar, const C& c) { 192 ar & c.c; 193 } 194 }; 195 } 196 } 197\endcode 198 199Now a more complicated example that genuinely requires asymmetric load and store.First, a class definition for a simple linked list. 200\code 201 class linked_list { 202 int value; 203 linked_list *next; 204 205 public: 206 linked_list(int value = 0) 207 : value(value), next(0) {}; 208 209 void append(int value) { 210 if (next) 211 next->append(value); 212 else 213 next = new linked_list(value); 214 }; 215 216 void set_value(int val) { 217 value = val; 218 }; 219 220 int get_value() const { 221 return value; 222 }; 223 224 linked_list* get_next() const { 225 return next; 226 }; 227 }; 228\endcode 229And this is how you (de)serialize it. 230\code 231 namespace madness { 232 namespace archive { 233 template <class Archive> 234 struct ArchiveStoreImpl<Archive, linked_list> { 235 static void store(const Archive& ar, const linked_list& c) { 236 ar & c.get_value() & bool(c.get_next()); 237 if (c.get_next()) 238 ar & *c.get_next(); 239 } 240 }; 241 242 template <class Archive> 243 struct ArchiveLoadImpl<Archive, linked_list> { 244 static void load(const Archive& ar, linked_list& c) { 245 int value; 246 bool flag; 247 248 ar & value & flag; 249 c.set_value(value); 250 if (flag) { 251 c.append(0); 252 ar & *c.get_next(); 253 } 254 } 255 }; 256 } 257 } 258\endcode 259 260Given the above implementation of a linked list, you can (de)serialize an entire list using a single statement. 261\code 262 linked_list list(0); 263 for (int i=1; i<=10; ++i) 264 list.append(i); 265 266 BinaryFstreamOutputArchive ar('list.dat'); 267 ar & list; 268\endcode 269 270\par Non-default constructor 271 272There are various options for objects that do not have a default constructor. The most appealing and totally non-intrusive approach is to define load/store functions for a pointer to the object. Then in the load method you can deserialize all of the information necessary to invoke the constructor and return a pointer to a new object. 273 274Things that you know are contiguously stored in memory and are painful to serialize with full type safety can be serialized by wrapping opaquely as byte streams using the \c wrap_opaque() interface. However, this should be regarded as a last resort. 275 276\par Type checking and registering your own types 277 278To enable type checking for user-defined types you must register them with the system. There are 64 empty slots for user types beginning at cookie=128. Type checked archives (currently all except the MPI archive) store a cookie (byte with value 0-255) with each datum. Unknown (user-defined) types all end up with the same cookie indicating unkown --- i.e., no type checking unless you register. 279 280Two steps are required to register your own types (e.g., here for the types \c %Foo and \c Bar) 281-# In a header file, after including madness/world/archive.h, associate your types and pointers to them with cookie values. 282 \code 283 namespace madness { 284 namespace archive { 285 ARCHIVE_REGISTER_TYPE_AND_PTR(Foo,128); 286 ARCHIVE_REGISTER_TYPE_AND_PTR(Bar,129); 287 } 288 } 289 \endcode 290-# In a single source file containing your initialization routine, define a macro to force instantiation of relevant templates. 291 \code 292 #define ARCHIVE_REGISTER_TYPE_INSTANTIATE_HERE 293 \endcode 294 Then, in the initalization routine register the name of your types as follows 295 \code 296 ARCHIVE_REGISTER_TYPE_AND_PTR_NAMES(Foo); 297 ARCHIVE_REGISTER_TYPE_AND_PTR_NAMES(Bar); 298 \endcode 299Have a look at the test in \c madness/world/test_ar.cc to see things in action. 300 301\par Types of archive 302 303Presently provided are 304- madness/world/text_fstream_archive.h --- (text \c std::fstream) a file in text (XML). 305- madness/world/binary_fstream_archive.h --- (binary \c std::fstream) a file in binary. 306- madness/world/vector_archive.h --- binary in memory using an \c std::vector<unsigned_char>. 307- madness/world/buffer_archive.h --- binary in memory buffer (this is rather heavily specialized for internal use, so applications should use a vector instead). 308- madness/world/mpi_archive.h --- binary stream for point-to-point communication using MPI (non-typesafe for efficiency). 309- madness/world/parallel_archive.h --- parallel archive to binary file with multiple readers/writers. This is here mostly to support efficient transfer of large \c WorldContainer (madness/world/worlddc.h) and MADNESS \c Function (mra/mra.h) objects, though any serializable object can employ it. 310 311The buffer and \c vector archives are bitwise identical to the binary file archive. 312 313\par Implementing a new archive 314 315Minimally, an archive must derive from either \c BaseInputArchive or \c BaseOutputArchive and define for arrays of fundamental types either a \c load or \c store method, as appropriate. Additional methods can be provided to manipulate the target stream. Here is a simple, but functional, implementation of a binary file archive. 316\code 317 #include <fstream> 318 #include <madness/world/archive.h> 319 using namespace std; 320 321 class OutputArchive : public BaseOutputArchive { 322 mutable ofstream os; 323 324 public: 325 OutputArchive(const char* filename) 326 : os(filename, ios_base::binary | ios_base::out | ios_base::trunc) 327 {}; 328 329 template <class T> 330 void store(const T* t, long n) const { 331 os.write((const char *) t, n*sizeof(T)); 332 } 333 }; 334 335 class InputArchive : public BaseInputArchive { 336 mutable ifstream is; 337 338 public: 339 InputArchive(const char* filename) 340 : is(filename, ios_base::binary | ios_base::in) 341 {}; 342 343 template <class T> 344 void load(T* t, long n) const { 345 is.read((char *) t, n*sizeof(T)); 346 } 347 }; 348\endcode 349*/ 350