1% ----------------------------------------------------------------
2% AMS-LaTeX Paper ************************************************
3% **** -----------------------------------------------------------
4\documentclass[12pt,a4paper,pdftex,nofootinbib]{article}
5\usepackage[cp1252]{inputenc}
6\usepackage{amssymb,amsmath}
7\usepackage[pdftex]{graphicx}
8\usepackage{epstopdf}
9\usepackage{natbib}
10\usepackage{verbatim}
11\usepackage[pdftex]{color}
12\usepackage{psfrag}
13\usepackage{setspace}
14\usepackage{rotating}
15\usepackage{epsf}
16\usepackage{epsfig}
17
18\newcounter{exmpl}
19\def\etal{{\em et al}.}
20\def\bfp{{\bf p}}
21\def\bfz{{\bf z}}
22\def\bfU{{\bf U}}
23\def\hbfx{{\hat{\bf x}}}
24\def\be{{\begin{equation}}}
25\def\ee{{\end{equation}}}
26\def\bfF{{\bf F}}
27\def\bfP{{\bf P}}
28\def\hbfP{{\hat{\bf P}}}
29\def\bfH{{\bf H}}
30\def\bfG{{\bf G}}
31\def\bfQ{{\bf Q}}
32\def\bfL{{\bf L}}
33\def\bfI{{\bf I}}
34\def\vare{\varepsilon}
35\def\mbfx{{\bf x}}
36\def\mbfH{{\bf H}}
37\def\mbfP{{\bf P}}
38\def\mbfchi{{\mbox{\boldmath$\chi$}}}
39\def\mbfzeta{{\mbox{\boldmath$\zeta$}}}
40\def\mbfeta{{\mbox{\boldmath$\eta$}}}
41\def\ni{{\noindent}}
42\def\mbfx{{\mbox{\boldmath$x$}}}
43
44
45%\bibpunct{(}{)}{;}{a}{,}{,}
46\bibpunct[, ]{(}{)}{;}{a}{,}{,}
47%\pagestyle{headings}
48
49\def \supp{{\rm supp}}
50\def \var{{\rm var}}
51
52% ----------------------------------------------------------------
53\begin{document}
54
55% ----------------------------------------------------------------
56\title{Parallel DYNARE Toolbox\\FP7 Funded \\ Project MONFISPOL Grant no.: 225149}
57
58\author{Marco Ratto\\
59European Commission, Joint Research Centre, Ispra, ITALY
60}
61%%% To have the current date inserted, use \date{\today}:
62%%% To insert a footnote, add thanks in the date/title/author fields:
63\date{\today}
64%\date{\today \thanks{Authors gratefully acknowledge the
65%contribution by ... for ...}}
66\newpage
67\singlespacing
68{\footnotesize
69\maketitle \tableofcontents
70}
71\newpage
72\doublespacing
73%-----------------------------------------------------------------------
74\begin{abstract}
75In this document, we describe the basic ideas and the methodology identified to realize the parallel package within the DYNARE project (called the ``Parallel DYNARE'' hereafter) and its algorithmic performance.
76The parallel methodology has been developed taking into account two different perspectives: the ``User perspective'' and the ``Developers perspective''. The fundamental requirement of the ``User perspective'' is to allow DYNARE users to use the parallel routines easily, quickly and appropriately. Under the ``Developers perspective'', on the other hand, we need to build a core of parallelizing routines that are sufficiently abstract and modular to allow DYNARE software developers to use them easily as a sort of `parallel paradigm', for application to any DYNARE routine or portion of code containing computational intensive loops suitable for parallelization.
77We will finally show tests showing the effectiveness of the parallel implementation.
78\end{abstract}
79% ----------------------------------------------------------------
80\newpage
81\section{The ideas implemented in Parallel DYNARE}
82The basic idea behind ``Parallel Dynare'' is to build a framework to parallelize portions of code that require a minimal (i.e. start-end communication) or no communications between different processes, denoted in the literature as ``embarrassingly parallel'' \citep{GoffeCreel_Grid_2008,Barney_2009}.  In more complicated cases there are different and more sophisticated solutions to write (or re-write) parallel codes using, for example, OpenMP or MPI.
83Within DYNARE, we can find many portions of code with the above features: loops of computational sequences with no interdependency that are coded sequentially. Clearly, this does not make optimal use of computers having 2-4-8, or more cores or CPUs.
84The basic idea is to assign the different and independent computational sequences to different cores, CPU's or computers and coordinating this new distributed computational environment with the following criteria:
85
86\begin{itemize}
87\item provide the necessary input data to any sequence, possibly including results obtained from previous DYNARE sessions (e.g. a first batch of Metropolis iterations);
88\item distribute the workload, automatically balancing between the computational resources;
89\item collect the output data;
90\item ensure the coherence of the results with the original sequential execution.
91\end{itemize}
92
93Generally, during a program execution, the largest computational time is spent to execute nested cycles. For simplicity and without loss in generality we can consider here only \verb"for" cycles (it is possible to demonstrate that any \verb"while" cycle admits an equivalent \verb"for" cycle).
94Then, after identifying the most computationally expensive \verb"for" cycles, we can split their execution (i.e. the number or iterations) between different cores, CPUs, computers. For example, consider the following simple MATLAB piece of code:
95
96\singlespacing
97
98%\begin{table}[!ht]
99%{\small
100{\footnotesize
101\hspace{3cm}
102\begin{tabular}[b]{| p{6cm} |}
103  \hline
104  % after \\: \hline or \cline{col1-col2} \cline{col3-col4} ...
105\begin{verbatim}
106...
107n=2;
108m=10^6;
109Matrix= zeros(n,m);
110for i=1:n,
111    Matrix(i,:)=rand(1,m);
112end,
113Mse= Matrix;
114...
115\end{verbatim}
116\\
117Example \refstepcounter{exmpl} \label{ex:serial} \theexmpl
118\\  \hline
119\end{tabular}
120%\vspace*{\baselineskip}
121}
122%\end{table}
123
124%\hspace{2cm}{\begin{minipage}[c]{5cm}
125%\begin{tabular}{| p{6cm} |}
126%  \hline
127%%\begin{quote}
128%\begin{verbatim}
129%...
130%n=2;
131%m=10^6;
132%Matrix= zeros(n,m);
133%for i=1:n,
134%    Matrix(i,:)=rand(1,m);
135%end,
136%Mse= Matrix;
137%...
138%\end{verbatim}
139%\\  \hline
140%\end{tabular}
141%%\end{quote}
142%\end{minipage}}
143\doublespacing
144
145With one CPU this cycle is executed in sequence: first for \verb"i=1", and then for \verb"i=2". Nevertheless, these 2 iterations are completely independent. Then, from a theoretical point of view, if we have two CPUs (cores) we can rewrite the above code as:
146
147\singlespacing
148{\footnotesize
149\hspace{1cm}\begin{tabular}[b]{| p{10cm} |}
150  \hline
151\begin{verbatim}
152            ...
153            n=2;
154            m=10^6;
155            <provide to CPU1 and CPU2 input data m>
156
157<Execute on CPU1>            <Execute on CPU2>
158Matrix1 = zeros(1,m);         Matrix2 = zeros(1,m);
159Matrix1(1,:)=rand(1,m);       Matrix2(1,:)=rand(1,m);
160save Matrix1                  save Matrix2
161
162            retrieve Matrix1 and Matrix2
163            Mpe(1,:) = Matrix1;
164            Mpe(2,:) = Matrix2;
165\end{verbatim}
166\\
167Example \refstepcounter{exmpl} \label{ex:parallel} \theexmpl\\
168\hline
169\end{tabular}
170}
171\doublespacing
172
173The \verb"for" cycle has disappeared and it has been split into two separated sequences that can be executed in parallel on two CPUs. We have the same result (\verb"Mpa=Mse") but the computational time can be reduced up to 50\%.
174
175\section{The DYNARE environment}
176We have considered the following DYNARE components suitable to be parallelized using the above strategy:
177
178\begin{enumerate}
179\item the Random Walk- (and the analogous Independent-)-Metropolis-Hastings algorithm with multiple chains: the different chains are completely independent and do not require any communication between them, so it can be executed on different cores/CPUs/Computer Network easily;
180\item a number of procedures performed after the completion of Metropolis, that use the posterior MC sample:
181\begin{enumerate}
182\item the diagnostic tests for the convergence of the Markov Chain \\(\texttt{McMCDiagnostics.m});
183\item the function that computes posterior IRF's (\texttt{posteriorIRF.m}).
184\item the function that computes posterior statistics for filtered and smoothed variables, forecasts, smoothed shocks, etc.. \\ (\verb"prior_posterior_statistics.m").
185\item the utility function that loads matrices of results and produces plots for posterior statistics (\texttt{pm3.m}).
186\end{enumerate}
187\end{enumerate}
188
189Unfortunately, MATLAB does not provide commands to simply write parallel code as in Example \ref{ex:parallel} (i.e.  the pseudo-commands : \texttt{<provide inputs>}, \texttt{<execute on CPU>} and \texttt{<retrieve>}). In other words, MATLAB does not allow concurrent programming: it does not support multi-threads, without the use (and purchase) of MATLAB Distributed Computing Toolbox. Then, to obtain the behavior described in Example \ref{ex:parallel}, we had to find an alternative solution.
190
191The solution that we have found can be synthesized as follows:
192
193\begin{quote}
194\emph{When the execution of the code should start in parallel (as in Example \ref{ex:parallel}), instead of running it inside the active MATLAB session, the following steps are performed:
195\begin{enumerate}
196\item the control of the execution is passed to the operating system (Windows/Linux) that allows for multi-threading;
197\item concurrent threads (i.e. MATLAB instances) are launched on different processors/cores/machines;
198\item when the parallel computations are concluded the control is given back to the original MATLAB session that collects the result from all parallel `agents' involved and coherently continue along the sequential computation.
199\end{enumerate}
200}\end{quote}
201
202Three core functions have been developed implementing this behavior, namely \verb"MasterParallel.m", \verb"slaveParallel.m" and \verb"fParallel.m". The first function (\verb"MasterParallel.m") operates at the level of the `master' (original) thread and acts as a wrapper of the portion of code to be distributed in parallel, distributes the tasks and collects the results from the parallel computation. The other functions (\verb"slaveParallel.m" and \verb"fParallel.m") operate at the level of each individual `slave' thread and collect the jobs distributed by the `master', execute them and make the final results available to the master.
203The two different implementations of slave operation comes from the fact that, in a single DYNARE session, there may be a number parallelized sessions that are launched by the master thread. Therefore, those two routines reflect two different versions of the parallel package:
204\begin{enumerate}
205\item the `slave' MATLAB sessions are closed after completion of each single job, and new instances are called for any subsequent parallelized task (\verb"fParallel.m");
206\item once opened, the `slave' MATLAB sessions are kept open during the DYNARE session, waiting for the jobs to be executed, and are only closed upon completion of the DYNARE session on the `master' (\verb"slaveParallel.m").
207\end{enumerate}
208
209We will see that none of the two options is superior to the other, depending on the model size.
210
211
212\section{Installation and utilization}
213Here we describe how to run parallel sessions in DYNARE and, for the developers community, how to apply the package to parallelize any suitable piece of code that may be deemed necessary.
214
215\subsection{Requirements}
216
217\subsubsection{For a Windows grid}
218\begin{enumerate}
219\item a standard Windows network (SMB) must be in place;
220\item PsTools \citep{PsTools} must be installed in the path of the master Windows machine;
221\item the Windows user on the master machine has to be user of any other slave machine in the cluster, and that user will be used for the remote computations.
222\end{enumerate}
223
224\subsubsection{For a UNIX grid}
225\begin{enumerate}
226\item SSH must be installed on the master and on the slave machines;
227\item the UNIX user on the master machine has to be user of any other slave machine in the cluster, and that user will be used for the remote computations;
228\item SSH keys must be installed so that the SSH connection from the master to the slaves can be done without passwords, or using an SSH agent.
229\end{enumerate}
230
231\subsection{The user perspective}
232We assume here that the reader has some familiarity with DYNARE and its use. For the DYNARE users, the parallel routines are fully integrated and hidden inside the DYNARE environment.
233
234\subsubsection{The interface}
235The general idea is to put all the configuration of the cluster in a config file different from the MOD file, and to trigger the parallel computation with option(s) on the \verb"dynare" command line.
236The configuration file is designed as follows:
237\begin{itemize}
238  \item be in a standard location
239   \begin{itemize}
240   \item {\footnotesize\verb"$HOME/.dynare"} under Unix;
241   \item {\footnotesize\verb"c:\Documents and Setting\<username>\Application Data\dynare.ini"} on Windows;
242   \end{itemize}
243  \item have provisions for other Dynare configuration parameters unrelated to parallel computation
244  \item allow to specify several clusters, each one associated with a nickname;
245  \item For each cluster, specify a list of slaves with a list of options for each slave [if not explicitly specified by the configuration file, the preprocessor sets the options to default];
246\end{itemize}
247
248The list of slave options includes:
249\begin{description}
250\item[Name]: name of the node;
251\item[CPUnbr]:  this is the number of CPU's to be used on that computer; if \verb"CPUnbr" is a vector of integers, the syntax is \verb"[s:d]", with \verb"d>=s" (\verb"d, s" are integer); the first core has number 1 so that, on a quad-core, use \verb"4" to use all cores, but use \verb"[3:4]" to specify just the last two cores (this is particularly relevant for Windows where it is possible to assign jobs to specific processors);
252\item[ComputerName]: Computer name on the network or IP address; use the NETBIOS name under Windows\footnote{In Windows XP it is possible find this name in 'My Computer' $->$ mouse right click $->$ 'Property' $->$ 'Computer Name'.}, or the DNS name under Unix.;
253\item[UserName]: required for remote login; in order to assure proper communications between the master and the slave threads, it must be the same user name actually logged on the `master' machine. On a Windows network, this is in the form \verb"DOMAIN\username", like \verb"DEPT\JohnSmith", i.e. user JohnSmith in windows group DEPT;
254\item[Password]: required for remote login (only under Windows): it is the user password on \verb"DOMAIN" and \verb"ComputerName";
255\item[RemoteDrive]: Drive to be used on remote computer (only for Windows, for example the drive \verb"C" or drive \verb"D");
256\item[RemoteDirectory]: Directory to be used on remote computer, the parallel toolbox will create a new empty temporary subfolder which will act as remote working directory;
257\item[DynarePath]: path to matlab directory within the Dynare installation directory;
258\item[MatlabOctavePath]: path to MATLAB or Octave executable;
259\item[SingleCompThread]: disable MATLAB's native multithreading;
260\end{description}
261
262Those options have the following specifications:
263
264\singlespacing  \noindent
265{\footnotesize
266      \begin{tabular}{|l|l|l|l|l|l|l|}
267        \hline
268        % after \\: \hline or \cline{col1-col2} \cline{col3-col4} ...
269 Node Options & type & default & \multicolumn{2}{c|}{Win} & \multicolumn{2}{c|}{Unix} \\
270 & &  & Local & Remote & Local & Remote \\ \hline
271 Name & string & (stop) & *&*&*&*\\
272 CPUnbr & integer & (stop) &*&*&*&*\\
273  & or array & & & & & \\
274 ComputerName & string & (stop) & &*& &*\\
275 UserName & string & empty & &*& &*\\
276 Password & string & empty & &*& & \\
277 RemoteDrive & string & empty & &*& & \\
278 RemoteDirectory & string & empty & &*& &*\\
279 DynarePath & string & empty & & & & \\
280 MatlabOctavePath & string & empty & & & & \\
281 SingleCompThread & boolean & true & & & & \\
282        \hline
283      \end{tabular}
284}
285\doublespacing
286
287\vspace{1cm}
288The cluster options are as follows
289
290\singlespacing \noindent
291{\footnotesize
292      \begin{tabular}{|l|l|l|l|l|}
293        \hline
294 Cluster Options & type & default & Meaning & Required \\ \hline
295 Name & string & empty & name of the node &*\\
296 Members & string & empty & list of members in this cluster &*\\
297        \hline
298      \end{tabular}
299}
300\doublespacing
301
302\vspace{1cm}
303The syntax of the configuration file will take the following form (the order in which the clusters and nodes are listed is not significant):
304
305\singlespacing
306{\footnotesize
307\hspace{2cm}\begin{tabular}[b]{| p{8cm} |}
308  \hline
309\begin{verbatim}
310[cluster]
311Name = c1
312Members = n1 n2 n3
313
314[cluster]
315Name = c2
316Members = n2 n3
317
318[node]
319Name = n1
320ComputerName = localhost
321CPUnbr = 1
322
323[node]
324Name = n2
325ComputerName = karaba.cepremap.org
326CPUnbr = 5
327UserName = houtanb
328RemoteDirectory = /home/houtanb/Remote
329DynarePath = /home/houtanb/dynare/matlab
330MatlabOctavePath = matlab
331
332[node]
333Name = n3
334ComputerName = hal.cepremap.ens.fr
335CPUnbr = 3
336UserName = houtanb
337RemoteDirectory = /home/houtanb/Remote
338DynarePath = /home/houtanb/dynare/matlab
339MatlabOctavePath = matlab
340 \end{verbatim}
341\\ \hline
342\end{tabular}
343}
344\doublespacing
345
346Finally, the DYNARE command line options are:
347 \begin{itemize}
348  \item \verb"conffile=<path>": specify the location of the configuration file if it is not standard
349  \item \verb"parallel": trigger the parallel computation using the first cluster specified in config file
350  \item \verb"parallel=<clustername>": trigger the parallel computation, using the given cluster
351  \item \verb"parallel_slave_open_mode": use the leaveSlaveOpen mode in the cluster
352  \item \verb"parallel_test": just test the cluster, don�t actually run the MOD file
353
354 \end{itemize}
355
356
357
358\subsubsection{Preprocessing cluster settings}
359The DYNARE pre-processor treats user-defined configurations by filling a new sub-structure in the \verb"options_" structure, named \verb"parallel", with the following fields:
360
361\singlespacing
362{\footnotesize
363\hspace{3cm}\begin{tabular}[b]{| p{7cm} |}
364  \hline
365\begin{verbatim}
366options_.parallel=
367    struct('Local', Value,
368    'ComputerName', Value,
369    'CPUnbr', Value,
370    'UserName', Value,
371    'Password', Value,
372    'RemoteDrive', Value,
373    'RemoteFolder', Value,
374    'MatlabOctavePath', Value,
375    'DynarePath', Value);
376\end{verbatim}
377\\ \hline
378\end{tabular}
379}
380\doublespacing
381
382All these fields correspond to the slave options except \verb"Local", which is set by the pre-processor according to the value of \verb"ComputerName":
383\begin{description}
384\item[Local:] the variable \verb"Local" is binary, so it can have only two values 0 and 1. If \verb"ComputerName" is set to \verb"localhost", the preprocessor sets \verb"Local = 1" and the parallel computation is executed on the local machine, i.e. on the same computer (and working directory) where the DYNARE project is placed. For any other value for \verb"ComputerName", we will have \verb"Local = 0";
385\end{description}
386
387In addition to the \verb"parallel" structure, which can be in a vector form, to allow specific entries for each slave machine in the cluster, there is another \verb"options_" field, called \verb"parallel_info", which stores all options that are common to all cluster. Namely, according to the \verb"parallel_slave_open_mode" in the command line, the \verb"leaveSlaveOpen" field takes values:
388\begin{description}
389\item[\texttt{leaveSlaveOpen=1}]: with \verb"parallel_slave_open_mode", i.e. the slaves operate `Always-Open'.
390\item[\texttt{leaveSlaveOpen=0}]: without \verb"parallel_slave_open_mode", i.e. the slaves operate `Open-Close';
391\end{description}
392
393
394\subsubsection{Example syntax for Windows and Unix, for local parallel runs (assuming quad-core)}
395In this case, the only slave options are \verb"ComputerName" and \verb"CPUnbr".
396
397\singlespacing
398{\footnotesize
399\hspace{2cm}\begin{tabular}[b]{| p{8cm} |}
400  \hline
401\begin{verbatim}
402[cluster]
403Name = local
404Members = n1
405
406[node]
407Name = n1
408ComputerName = localhost
409CPUnbr = 4
410\end{verbatim}
411\\ \hline
412\end{tabular}
413}
414\doublespacing
415
416\subsubsection{Examples of Windows syntax for remote runs}
417\begin{itemize}
418\item the Windows \verb"Password" has to be typed explicitly;
419\item \verb"RemoteDrive" has to be typed explicitly;
420\item for \verb"UserName", ALSO the group has to be specified, like \verb"DEPT\JohnSmith", i.e. user \verb"JohnSmith" in windows group \verb"DEPT";
421\item \verb"ComputerName" is the name of the computer in the windows network, i.e. the output of hostname, or the full IP address.
422\end{itemize}
423
424\begin{description}
425\item[Example 1] Parallel codes that are run on a remote computer named \verb"vonNeumann" with eight cores, using only the cores 4,5,6, working on the drive 'C' and folder '\verb"dynare_calcs\Remote"'. The computer \verb"vonNeumann" is in a net domain of the CompuTown university, with user \verb"John" logged with the password \verb"*****":
426
427\singlespacing
428{\footnotesize
429\hspace{2cm}\begin{tabular}[b]{| p{8cm} |}
430  \hline
431\begin{verbatim}
432[cluster]
433Name = vonNeumann
434Members = n2
435
436[node]
437Name = n2
438ComputerName = vonNeumann
439CPUnbr = [4:6]
440UserName = COMPUTOWN\John
441Password = *****
442RemoteDrive = C
443RemoteDirectory = dynare_calcs\Remote
444DynarePath = c:\dynare\matlab
445MatlabOctavePath = matlab
446\end{verbatim}
447\\ \hline
448\end{tabular}
449}
450\doublespacing
451
452\item[Example 2] We can build a cluster, combining local and remote runs. For example the following configuration file includes the two previous configurations but also gives the possibility (with cluster name \verb"c2") to build a grid with a total number of 7 CPU's :
453
454\singlespacing
455{\footnotesize
456\hspace{2cm}\begin{tabular}[b]{| p{8cm} |}
457  \hline
458\begin{verbatim}
459[cluster]
460Name = local
461Members = n1
462
463[cluster]
464Name = vonNeumann
465Members = n2
466
467[cluster]
468Name = c2
469Members = n1 n2
470
471[node]
472Name = n1
473ComputerName = localhost
474CPUnbr = 4
475
476[node]
477Name = n2
478ComputerName = vonNeumann
479CPUnbr = [4:6]
480UserName = COMPUTOWN\John
481Password = *****
482RemoteDrive = C
483RemoteDirectory = dynare_calcs\Remote
484DynarePath = c:\dynare\matlab
485MatlabOctavePath = matlab
486\end{verbatim}
487\\ \hline
488\end{tabular}
489}
490\doublespacing
491\item[Example 3] We can build a cluster, combining many remote machines. For example the following commands build a grid of four machines with a total number of 14 CPU's:
492
493\singlespacing
494{\footnotesize
495\hspace{2cm}\begin{tabular}[b]{| p{8cm} |}
496  \hline
497\begin{verbatim}
498[cluster]
499Name = c4
500Members = n1 n2 n3 n4
501
502[node]
503Name = n1
504ComputerName = vonNeumann1
505CPUnbr = 4
506UserName = COMPUTOWN\John
507Password = *****
508RemoteDrive = C
509RemoteDirectory = dynare_calcs\Remote
510DynarePath = c:\dynare\matlab
511MatlabOctavePath = matlab
512
513[node]
514Name = n2
515ComputerName = vonNeumann2
516CPUnbr = 4
517UserName = COMPUTOWN\John
518Password = *****
519RemoteDrive = C
520RemoteDirectory = dynare_calcs\Remote
521DynarePath = c:\dynare\matlab
522MatlabOctavePath = matlab
523
524[node]
525Name = n3
526ComputerName = vonNeumann3
527CPUnbr = 2
528UserName = COMPUTOWN\John
529Password = *****
530RemoteDrive = D
531RemoteDirectory = dynare_calcs\Remote
532DynarePath = c:\dynare\matlab
533MatlabOctavePath = matlab
534
535[node]
536Name = n4
537ComputerName = vonNeumann4
538CPUnbr = 4
539UserName = COMPUTOWN\John
540Password = *****
541RemoteDrive = C
542RemoteDirectory = John\dynare_calcs\Remote
543DynarePath = c:\dynare\matlab
544MatlabOctavePath = matlab
545
546\end{verbatim}
547\\ \hline
548\end{tabular}
549}
550\doublespacing
551\end{description}
552
553
554
555\subsubsection{Example Unix syntax for remote runs}
556\begin{itemize}
557\item no \verb"Password" and \verb"RemoteDrive" fields are needed;
558\item \verb"ComputerName" is the full IP address or the DNS address.
559\end{itemize}
560
561\begin{description}
562\item[One remote slave:] the following command defines remote runs on the machine \verb"name.domain.org".\\
563\singlespacing
564{\footnotesize
565\hspace{2cm}\begin{tabular}[b]{| p{8cm} |}
566  \hline
567\begin{verbatim}
568[cluster]
569Name = unix1
570Members = n2
571
572[node]
573Name = n2
574ComputerName = name.domain.org
575CPUnbr = 4
576UserName = JohnSmith
577RemoteDirectory = /home/john/Remote
578DynarePath = /home/john/dynare/matlab
579MatlabOctavePath = matlab
580\end{verbatim}
581\\ \hline
582\end{tabular}
583}
584\doublespacing
585\item[Combining local and remote runs:] the following commands define a cluster of local an remote CPU's.
586\singlespacing
587{\footnotesize
588\hspace{2cm}\begin{tabular}[b]{| p{8cm} |}
589  \hline
590\begin{verbatim}
591[cluster]
592Name = unix2
593Members = n1 n2
594
595[node]
596Name = n1
597ComputerName = localhost
598CPUnbr = 4
599
600[node]
601Name = n2
602ComputerName = name.domain.org
603CPUnbr = 4
604UserName = JohnSmith
605RemoteDirectory = /home/john/Remote
606DynarePath = /home/john/dynare/matlab
607MatlabOctavePath = matlab
608\end{verbatim}
609\\ \hline
610\end{tabular}
611}
612\doublespacing
613\end{description}
614
615\subsubsection{Testing the cluster}
616
617In this section we describe what happens when the user omits a mandatory entry or provides bad values for them and how DYNARE reacts in these cases. In the parallel package there is a utility (\verb"AnalyseComputationalEnvironment.m") devoted to this task (this is triggered by the command line option \verb"parallel_test"). When necessary during the discussion, we use the \verb"parallel" entries used in the previous examples.
618
619%Le parti in rosa sono una possibile reazione ad un errore che pu� accadere � vanno concordate e quindi magari riviste.
620
621\begin{description}
622%\item[Local:] if no value is given for this variable the execution is stopped when DYNARE starts. More serious if we give a bad value (i.e. for example -1, 3), DYNARE will be stopped after some time with no error message! In these cases the Dynare �
623\item[ComputerName:] If \verb"Local=0", DYNARE checks if the computer \verb"vonNeumann" exists and if it is possible communicate with it. If this is not the case, an error message is generated and the computation is stopped.
624\item[CPUnbr:] a value for this variable must be in the form \verb"[s:d]" with \verb"d>=s". If the user types  values \verb"s>d", their order is flipped and a warning message is sent. When the user provides a correct value for this field, DYNARE checks if \verb"d" CPUs (or cores) are available on the computer. Suppose that this check returns an integer \verb"nC". We can have three possibilities:
625    \begin{enumerate}
626    \item \verb"nC= d;" all the CPU's available are used, no warning message are generated by DYNARE;
627    \item \verb"nC> d;" some CPU's will not be used;
628    \item \verb"nC< d;" DYNARE alerts the user that there are less CPU's than those declared. The parallel tasks would run in any case, but some CPU's will have multiple instances assigned, with no gain in computational time.
629    \end{enumerate}
630\item[UserName \& Password:] if \verb"Local = 1", no information about user name and password is necessary: ``I am working on this computer''. When remote computations on a Windows network are required, DYNARE checks if the user name and password are correct, otherwise execution is stopped with an error; for a Unix network, the user and the proper operation of SSH is checked;
631\item[RemoteDrive \& RemoteDirectory:] if \verb"Local = 1", these fields are not required since the working directory of the `slaves' will be the same of the `master'. If \verb"Local = 0", DYNARE tries to copy a file (\verb"Tracing.txt") in this remote location. If this operation fails, the DYNARE execution is stopped with an error;
632\item[MatlabOctavePath \& DynarePath:] MATLAB instances are tried on slaves and the DYNARE path is checked.
633\end{description}
634
635
636\subsection{The Developers perspective}
637%L'esposizione nel seguito (e anche in alcuni punti su) dipende molto da come il pacchetto parallelo viene rilasciato in Dynare, e quindi se PsTools viene installato durante l'installazione di Dynare oppure no, se le directory necessarie alla computazione vengono create durante l'installazione di Dynare oppure no. E cose cos� ...
638
639In this section we describe with some accuracy the DYNARE parallel routines.
640\begin{description}
641\item[Windows:]
642With Windows operating system, the parallel package requires the installation of a free software package called PsTools \citep{PsTools}. PsTools suite is a resource kit with a number of command line tools that mimics administrative features available under the Unix environment. PsTools can be downloaded from \cite{PsTools} and extracted in a Windows directory on your computer: to make PsTools working properly, it is mandatory to add this directory to the Windows path. After this step it is possible to invoke and use the PsTools commands from any location in the Windows file system. PsTools, MATLAB and DYNARE have to be installed and work properly on all the machines in the grid for parallel computation.
643\item[Unix:]
644With Unix operating system, SSH must be installed on the master and on the slave machines. Moreover, SSH keys must be installed so that the SSH connections from the master to the slaves can be done without passwords.
645\end{description}
646
647% NO DEFAULT REMOTE FOLDER !!!!!!
648%\item the creation of a directory devoted to the local/remote computation and data exchanges. %We usually create this directory on local drive 'C' within a directory '\verb"dynare_calcs"' and call it 'Remote', i.e. '\verb"C:\dynare_calcs\Remote"'. In this way the default value for RemoteDrive \& RemoteDirectory will be 'C' \& '\verb"C:\dynare_calcs\Remote"'.
649
650As soon as the computational environment is set-up for working on a grid of CPU's, the parallel package allows to parallelize any loop that is computationally expensive, following the step by step procedure showed in Table \ref{tab:devpar}. This is done using five basic functions: \verb"masterParallel.m", \verb"fParallel.m" or \verb"slaveParallel.m", \verb"fMessageStatus.m", \verb"closeSlave.m".
651
652\begin{description}
653 \item[\texttt{masterParallel}] is the entry point to the parallelization system:
654 \begin{itemize}
655  \item It is called from the master computer, at the point where the parallelization system should be activated. Its main arguments are the name of the function containing the task to be run on every slave computer, inputs to that function stored in two structures (one for local and the other for global variables), and the configuration of the cluster; this function exits when the task has finished on all computers of the cluster, and returns the output in a structure vector (one entry per slave);
656  \item all file exchange through the filesystem is concentrated in this \verb"masterParallel" routine: so it prepares and send the input information for slaves, it retrieves from slaves the info about the status of remote computations stored on remote slaves by the remote processes; finally it retrieves outputs stored on remote machines by slave processes;
657  \item there are two modes of parallel execution, triggered by option \verb"parallel_slave_open_mode":
658   \begin{itemize}
659   \item when \verb"parallel_slave_open_mode=0", the slave processes are closed after the completion of each task, and new instances are initiated when a new job is required; this mode is managed by \verb"fParallel.m" [`Open-Close'];
660   \item when \verb"parallel_slave_open_mode=1", the slave processes are kept running after the completion of each task, and wait for new jobs to be performed; this mode is managed by \texttt{slaveParallel.m} [`Always-Open'];
661   \end{itemize}
662  \end{itemize}
663 \item[\texttt{slaveParallel.m/fParallel.m}:] are the top-level functions to be run on every slave; their main arguments are the name of the function to be run (containing the computing task), and some information identifying the slave; the functions use the input information that has been previously prepared and sent by \verb"masterParallel" through the filesystem, call the computing task, finally the routines store locally on remote machines the  outputs such that \verb"masterParallel" retrieves back the outputs to the master computer;
664 \item[\texttt{fMessageStatus.m}:] provides the core for simple message passing during slave execution: using this routine, slave processes can store locally on remote machine basic info on the progress of computations; such information is retrieved by the master process (i.e. \verb"masterParallel.m") allowing to echo progress of remote computations on the master; the routine \verb"fMessageStatus.m" is also the entry-point where a signal of interruption sent by the master can be checked and executed; this routine typically replaces calls to \verb"waitbar.m";
665 \item[\texttt{closeSlave.m}] is the utility that sends a signal to remote slaves to close themselves. In the standard operation, this is only needed with the `Always-Open' mode and it is called when DYNARE computations are completed. At that point, \texttt{slaveParallel.m} will get a signal to terminate and no longer wait for new jobs. However, this utility is also useful in any parallel mode if, for any reason, the master needs to interrupt the remote computations which are running;
666 \end{description}
667
668The parallel toolbox also includes a number of utilities:
669\begin{itemize}
670 \item \verb"AnalyseComputationalEnviroment.m": this a testing utility that checks that the cluster works properly and echoes error messages when problems are detected;
671 \item \verb"InitializeComputationalEnviroment.m" : initializes some internal variables and remote directories;
672 \item \verb"distributeJobs.m": uses a simple algorithm to distribute evenly jobs across the available CPU's;
673 \item a number of generalized routines that properly perform \verb"delete", \verb"copy", \verb"mkdir", \verb"rmdir" commands through the network file-system (i.e. used from the master to operate on slave machines); the routines are adaptive to the actual environment (Windows or Unix);
674 \begin{description}
675  \item[\texttt{dynareParallelDelete.m}]: generalized \verb"delete";
676  \item[\texttt{dynareParallelDir.m}]: generalized \verb"dir";
677  \item[\texttt{dynareParallelGetFiles.m}]: generalized \verb"copy" FROM slaves TO master machine;
678  \item[\texttt{dynareParallelMkDir.m}]: generalized \verb"mkdir" on remote machines;
679  \item[\texttt{dynareParallelRmDir.m}]: generalized \verb"rmdir" on remote machined;
680  \item[\texttt{dynareParallelSendFiles.m}]: generalized \verb"copy" TO slaves FROM master machine;
681 \end{description}
682\end{itemize}
683
684In Table \ref{tab:devpar} we have synthesized the main steps for parallelizing MATLAB codes.
685
686{\small
687\begin{table}
688\begin{tabular}{|p{\linewidth}|}
689\hline
690\begin{enumerate}
691\item locate within DYNARE the portion of code suitable to be parallelized, i.e. an expensive cycle \verb"for";
692\item suppose that the function \verb"tuna.m" contains a cycle \verb"for" that is suitable for parallelization: this cycle has to be extracted from \verb"tuna.m" and put it in a new MATLAB function named \verb"tuna_core.m";
693\item at the point where the expensive cycle should start, the function \verb"tuna.m" invokes the utility \verb"masterParallel.m", passing to it the \verb"options_.parallel" structure, the name of the of the function to be run in parallel (\verb"tuna_core.m"), the local and global variables needed and all the information about the files (MATLAB functions \verb"*.m"; data files \verb"*.mat") that will be handled by \verb"tuna_core.m";
694\item the function \verb"masterParallel.m" reads the input arguments provided by \verb"tuna.m" and:
695\begin{itemize}
696\item decides how to distribute the task evenly across the available CPU's (using the utility routine \verb"distributeJobs.m"); prepares and initializes the computational environment (i.e. copy files/data) for each slave machine;
697\item uses the PsTools and the Operating System commands to launch new MATLAB instances, synchronize the computations, monitor the progress of slave tasks through a simple message passing system (see later) and collect results upon completion of the slave threads;
698\end{itemize}
699\item the slave threads are executed using the MATLAB functions \verb"fParallel.m"/\verb"slaveParallel.m" as wrappers for implementing the tasks sent by the master (i.e. to run the \verb"tuna_core.m" routine);
700\item the utility \verb"fMessageStatus.m" can be used within the core routine \verb"tuna_core.m" to send information to the master regarding the progress of the slave thread;
701\item when all DYNARE computations are completed, \verb"closeSlave.m" closes all open remote MATLAB/OCTAVE instances waiting for new jobs to be run.
702\end{enumerate}
703\\ \hline
704\end{tabular}
705\caption{Procedure for parallelizing portions of codes.}\label{tab:devpar}
706\end{table}
707}
708
709So far, we have parallelized the following functions, by selecting the most computationally intensive loops:
710\begin{enumerate}
711\item the cycle looping for multiple chain random walk Metropolis:\\
712\verb"random_walk_metropolis_hastings", \\
713\verb"random_walk_metropolis_hastings_core";
714\item the cycle looping for multiple chain independent Metropolis:\\
715\verb"independent_metropolis_hastings.m", \\
716\verb"independent_metropolis_hastings_core.m";
717\item the cycle looping over estimated parameters computing univariate diagnostics:\\
718\verb"McMCDiagnostics.m", \\
719\verb"McMCDiagnostics_core.m";
720\item the Monte Carlo cycle looping over posterior parameter subdraws performing the IRF simulations (\verb"<*>_core1") and the cycle looping over exogenous shocks plotting IRF's charts (\verb"<*>_core2"):\\
721\verb"posteriorIRF.m", \\\verb"posteriorIRF_core1.m", \verb"posteriorIRF_core2.m";
722\item the Monte Carlo cycle looping over posterior parameter subdraws, that computes filtered, smoothed, forecasted variables and shocks:\\
723\verb"prior_posterior_statistics.m", \\
724\verb"prior_posterior_statistics_core.m";
725\item the cycle looping over endogenous variables making posterior plots of filter, smoother, forecasts:
726\verb"pm3.m", \verb"pm3_core.m".
727\end{enumerate}
728%A developer can use the existent functions \verb"masterParalle.m" and \verb"fParalell.m"  a couple of functions already parallelized as example and then to add new parallel routine in Dynare.
729%Remember that, from the user side to use the parallel routines it is only required to insert in the .mod file the commands the \verb"options_.parallel" structure as described above (see also [2, 5]). If the file \verb".mod" do not contain this structure (or it is commented) Dynare is executed in a traditional way.
730
731\subsubsection{Write a parallel code: an example}
732%Per questo paragrafo ci sono secondo me due soluzioni possibili:
733%
734%1.
735%
736Using a MATLAB pseudo (but very realistic) code, we now describe in detail how to use the above step by step procedure to parallelize the random walk Metropolis Hastings algorithm. Any other function can be parallelized in the same way.
737
738It is obvious that most of the computational time spent by the \\ \verb"random_walk_metropolis_hastings.m" function is given by the cycle looping over the parallel chains performing the Metropolis:
739
740\singlespacing
741{\footnotesize
742\hspace{2cm}\begin{tabular}[b]{| p{9cm} |}
743  \hline
744\begin{verbatim}
745function random_walk_metropolis_hastings
746       (TargetFun, ProposalFun, ..., varargin)
747[...]
748for b = fblck:nblck,
749...
750end
751[...]
752\end{verbatim}
753\\ \hline
754\end{tabular}
755}
756\doublespacing
757
758Since those chains are totally independent, the obvious way to reduce the computational time is to parallelize this loop, executing the \verb"(nblck-fblck)" chains on different computers/CPUs/cores.
759
760%\singlespacing
761%{\footnotesize
762%\hspace{2cm}\begin{tabular}[b]{| p{9cm} |}
763%  \hline
764%\begin{verbatim}
765%...
766%if (nblck>fblck) & (number of available CPUs >1)
767%execute
768%the	nblck - nblck branch on CPU 1
769%	    	(nblck+1) - (nblck+1) on CPU 2
770%		[...]
771%		Fblck - fblck on CPU F
772%else
773%	for b = fblck:nblck,
774%...
775%end
776%end
777%...
778%\end{verbatim}
779%\\ \hline
780%\end{tabular}
781%}
782%\doublespacing
783
784To do so, we remove the \verb"for" cycle and put it in a new function named \verb"<*>_core.m":
785
786\singlespacing
787{\footnotesize
788\noindent\begin{tabular}[b]{| p{\linewidth} |}
789\hline
790\begin{verbatim}
791function myoutput =
792    random_walk_metropolis_hastings_core(myinputs,fblck,nblck, ...)
793[...]
794\end{verbatim}
795just list global variables needed (they are set-up properly by \verb"fParallel" or \verb"slaveParallel")
796\begin{verbatim}
797global bayestopt_ estim_params_ options_  M_ oo_
798\end{verbatim}
799here we collect all local variables stored in \verb"myinputs"
800\begin{verbatim}
801TargetFun=myinputs.TargetFun;
802ProposalFun=myinputs.ProposalFun;
803xparam1=myinputs.xparam1;
804[...]
805\end{verbatim}
806here we run the loop
807\begin{verbatim}
808for b = fblck:nblck,
809...
810end
811[...]
812\end{verbatim}
813here we wrap all output arguments needed by the `master' routine
814\begin{verbatim}
815myoutput.record = record;
816[...]
817\end{verbatim}
818\\ \hline
819\end{tabular}
820}
821\doublespacing
822The split of the \verb"for" cycle has to be performed in such a way that the new \verb"<*>_core" function can work in both serial and parallel mode. In the latter case, such a function will be invoked by the slave threads and executed for the number of iterations assigned by \verb"masterParallel.m".
823
824The modified \verb"random_walk_metropolis_hastings.m" is therefore:
825
826\singlespacing
827{\footnotesize
828\noindent\begin{tabular}[b]{| p{\linewidth} |}
829\hline
830\begin{verbatim}
831function random_walk_metropolis_hastings(TargetFun,ProposalFun,�,varargin)
832[...]
833% here we wrap all local variables needed by the <*>_core function
834localVars = struct('TargetFun', TargetFun, ...
835[...]
836    'd', d);
837[...]
838% here we put the switch between serial and parallel computation:
839if isnumeric(options_.parallel) || (nblck-fblck)==0,
840% serial computation
841    fout = random_walk_metropolis_hastings_core(localVars, fblck,nblck, 0);
842    record = fout.record;
843
844else
845% parallel computation
846
847    % global variables for parallel routines
848    globalVars = struct('M_',M_, ...
849                       [...]
850                       'oo_', oo_);
851
852    % which files have to be copied to run remotely
853    NamFileInput(1,:) = {'',[ModelName '_static.m']};
854    NamFileInput(2,:) = {'',[ModelName '_dynamic.m']};
855    [ ...]
856
857    % call the master parallelizing utility
858    [fout, nBlockPerCPU, totCPU] = masterParallel(options_.parallel, ...
859    fblck, nblck, NamFileInput, 'random_walk_metropolis_hastings_core',
860    localVars, globalVars, options_.parallel_info);
861
862    % collect output info from parallel tasks provided in fout
863    [ ...]
864end
865
866% collect output info from either serial or parallel tasks
867irun = fout(1).irun;
868NewFile = fout(1).NewFile;
869[...]
870\end{verbatim}
871\\ \hline
872\end{tabular}
873}
874\doublespacing
875
876Finally, in order to allow the master thread to monitor the progress of the slave threads, some message passing elements have to be introduced in the \verb"<*>_core.m" file. The utility function \verb"fMessageStatus.m" has been  designed as an interface for this task, and can be seen as a generalized form of the MATLAB utility \verb"waitbar.m".
877
878In the following example, we show a typical use of this utility, again from the random walk Metropolis routine:
879\singlespacing
880{\footnotesize
881\noindent\begin{tabular}[b]{| p{\linewidth} |}
882\hline
883\begin{verbatim}
884for  j = 1:nruns
885[...]
886% define the progress of the loop:
887prtfrc = j/nruns;
888
889% define a running message:
890% first indicate which chain is running on the current CPU [b]
891% out of the chains [mh_nblock] requested by the DYNARE user
892waitbarString = [ '(' int2str(b) '/' int2str(mh_nblck) ') ...
893
894% then add possible further information, like the acceptation rate
895    ' sprintf('%f done, acceptation rate %f',prtfrc,isux/j)]
896
897    if mod(j, 3)==0 & ~whoiam
898        % serial computation
899        waitbar(prtfrc,hh,waitbarString);
900
901    elseif mod(j,50)==0 & whoiam,
902        % parallel computation
903        fMessageStatus(prtfrc, ...
904            whoiam, ...
905            waitbarString, ...
906            waitbarTitle, ...
907            options_.parallel(ThisMatlab))
908
909    end
910    [...]
911end
912\end{verbatim}
913\\ \hline
914\end{tabular}
915}
916\doublespacing
917In the previous example, a number of arguments are used to identify which CPU and which computer in the claster is sending the message, namely:
918\singlespacing
919{\footnotesize
920\noindent\begin{tabular}[b]{| p{\linewidth} |}
921\hline
922\begin{verbatim}
923%  whoiam [int]         index number of this CPU among all CPUs in the
924%                       cluster
925%  ThisMatlab [int]     index number of this slave machine in the cluster
926%                       (entry in options_.parallel)
927\end{verbatim}
928\\ \hline
929\end{tabular}
930}
931\doublespacing
932The message is stored as a MATLAB data file \verb"*.mat" saved on the working directory of remote slave computer. The master will will check periodically for those messages and retrieve the files from remote computers and produce an advanced monitoring plot.
933
934So, assuming to run two Metropolis chains, under the standard serial implementation there will be a first \verb"waitbar" popping up on matlab, corresponding to the first chain:
935
936\hspace{2cm}\epsfxsize=200pt \epsfbox{waitbars1.pdf}
937
938\ni followed by a second \texttt{waitbar}, when the first chain is completed.
939
940\hspace{2cm}\epsfxsize=200pt \epsfbox{waitbars2.pdf}
941
942On the other hand, under the parallel implementation, a parallel monitoring plot will be produced by \texttt{masterParallel.m}:
943
944\hspace{2cm}\epsfxsize=200pt \epsfbox{waitbarsP.pdf}
945
946
947
948%Finally we describe the masterParallel.m and fParallel.m functions:
949%
950%\begin{verbatim}
951%function Results= masterParallel (DATA, functionName)
952%
953%[...]
954%
955%read  options_.parallel;
956%call the function rACE=AnalyseComputationalEnviroment (options_.parallel);
957%
958%switch rACE
959%	[...]
960%end
961%[...]
962%\end{verbatim}
963
964%Non so se pu� andare bene impostata in questo modo � se va bene la concludo.
965%
966%2.
967%Prendiamo il tutto il lavoro fatto per arrivare a parallellizzare la PosteriorIRF,
968%Traacianti, anlisi computazionale � commentiamo bene tutte le funzioni coinvolte e le usiamo qui �
969
970
971\section{Parallel DYNARE: testing}
972We checked the new parallel platform for DYNARE performing a number of tests, using different models and computer architectures. We present here all tests performed with Windows XP/MATLAB. However, similar tests were performed successfully under Linux/Ubuntu environment.
973In the Bayesian estimation of DSGE models with DYNARE, most of the computing time is devoted to the posterior parameter estimation with the Metropolis algorithm. The first and second tests are therefore focused on the parallelization of the Random Walking Metropolis Hastings algorithm (Sections \ref{s:test1}-\ref{s:test2}). In addition, further tests (Sections \ref{s:test3}-\ref{s:test4}) are devoted to test all the parallelized functions in DYNARE. Finally, we compare the two parallel implementations of the Metropolis Hastings algorithms, available in DYNARE: the Independent and the Random Walk (Section \ref{s:test5}).
974
975\subsection{Test 1.}\label{s:test1}
976The main goal here was to evaluate the parallel package on a \emph{fixed hardware platform} and using chains of \emph{variable length}. The model used for testing is a modification of \cite{Hradisky_etal_2006}. This is a small scale open economy DSGE model with 6 observed variables, 6 endogenous variables and 19 parameters to be estimated.
977We estimated the model on a bi-processor machine (Fujitsu Siemens, Celsius R630) powered with an Intel(R) Xeon(TM) CPU 2.80GHz Hyper Treading Technology; first with the original serial Metropolis and subsequently using the parallel solution, to take advantage of the two processors technology. We ran chains of increasing length: 2500, 5000, 10,000, 50,000, 100,000, 250,000, 1,000,000.
978
979\begin{figure}[!ht]
980\begin{centering}
981  % Requires \usepackage{graphicx}
982  \epsfxsize=300pt \epsfbox{iVaNo_time_comp.pdf}
983  \caption{Computational time (in minutes) versus chain length for the serial and parallel implementation (Metropolis with two chains).}\label{fig:test_time_comp}
984\end{centering}
985\end{figure}
986\begin{figure}[!ht]
987\begin{centering}
988  % Requires \usepackage{graphicx}
989  \epsfxsize=300pt \epsfbox{iVaNo_gain.pdf}
990  \caption{Reduction of computational time (i.e. the `time gain') using the parallel coding versus chain length. The time gain is computed as $(T_s-T_p)/T_p$, where $T_s$ and $T_p$ denote the computing time of the serial and parallel implementations respectively.}\label{fig:test_gain}
991\end{centering}
992\end{figure}
993
994Overall results are given in Figure \ref{fig:test_time_comp}, showing the computational time versus chain length, and Figure \ref{fig:test_gain}, showing the reduction of computational time (or the time gain) with respect to the serial implementation provided by the parallel coding. The gain in computing time of the exercise is of about 45\% on this test case, so reducing from 11.40 hours to about 6 hours the cost of running 1,000,000 Metropolis iterations (the ideal gain would be of 50\% in this case).
995
996\subsection{Test 2.}\label{s:test2}
997The scope of the second test was to verify if results were robust over different hardware platforms.
998We estimated the model with chain lengths of 1,000,000 runs on the following hardware platforms:
999\begin{itemize}
1000\item Single processor machine: Intel(R) Pentium4(R) CPU 3.40GHz with Hyper Treading Technology (Fujitsu-Siemens Scenic Esprimo);
1001\item Bi-processor machine: two CPU's Intel(R) Xeon(TM) 2.80GHz Hyper Treading Technology (Fujitsu-Siemens, Celsius R630);
1002\item Dual core machine: Intel Centrino T2500 2.00GHz Dual Core  (Fujitsu-Siemens, LifeBook S Series).
1003\end{itemize}
1004
1005We first run the tests with normal configuration. However, since (i) dissimilar software environment on the machine can influence the computation; (ii) Windows service (Network, Hard Disk writing, Demon, Software Updating, Antivirus, etc.) can start during the simulation; we also run the tests not allowing for any other process to start during the estimation. Table \ref{tab:trail} gives results for the ordinary software environment and process priority is set as low/normal.
1006
1007\begin{table}
1008\begin{centering}
1009\begin{tabular}{l|l|l|l}
1010  % after \\: \hline or \cline{col1-col2} \cline{col3-col4} ...
1011Machine	& Single-processor	& Bi-processor	& Dual core \\ \hline
1012Parallel & 8:01:21 & 7:02:19 & 5:39:38 \\
1013Serial & 10:12:22 & 13:38:30 & 11:02:14 \\
1014Speed-Up rate & 1.2722 & 1.9381 & 1.9498\\
1015Ideal Speed-UP rate &  $\sim$1.5 & 2 & 2 \\
1016  \hline
1017\end{tabular}
1018\caption{Trail results with normal PC operation. Computing time expressed in h:m:s. Speed-up rate is computed as $T_s/T_p$, where $T_s$ and $T_p$ are the computing times for the serial and parallel implementations.}\label{tab:trail}
1019\end{centering}
1020\end{table}
1021
1022Results showed that Dual-core technology provides a similar gain if compared with bi-processor results, again about 45\%. The striking results was that the Dual-core processor clocked at 2.0GHz was about 30\% faster than the Bi-processor clocked at 2.8GHz. Interesting gains were also obtained via multi-threading on the Single-processor machine, with speed-up being about 1.27 (i.e. time gain of about 21\%). However, beware that we burned a number of processors performing tests on single processors with hyper-threading and using very long chains (1,000,000 runs)!
1023We re-run the tests on the Dual-core machine, by cleaning the PC operation from any interference by other programs and show results in Table \ref{tab:trail2}.
1024A speed-up rate of 1.06 (i.e. 5.6\% time gain) can be obtained simply hiding the MATLAB waitbar. The speed-up rate can be pushed to 1.22 (i.e. 18\% time gain) by disconnecting the network and setting the priority of the process to real time. It can be noted that from the original configuration, taking 11:02 hours to run the two parallel chains, the computational time can be reduced to 4:40 hours (i.e. for a total time gain of over 60\% with respect to the serial computation) by parallelizing and optimally configuring the operating environment.
1025These results are somehow surprising and show how it is possible to reduce dramatically the computational time with slight modification in the software configuration.
1026\begin{table}[t]
1027\begin{centering}
1028\begin{tabular}{p{5cm}|l|l}
1029  % after \\: \hline or \cline{col1-col2} \cline{col3-col4} ...
1030Environment	& Computing time & Speed-up rate \\
1031& & w.r.t. Table \ref{tab:trail}\\ \hline
1032Parallel Waitbar Not Visible & 5:06:00 & 1.06 \\ \hline
1033Parallel waitbar Not Visible, Real-time Process priority,
1034Unplugged network cable. &
1035	4:40:49 & 1.22\\
1036  \hline
1037\end{tabular}
1038\caption{Trail results with different software configurations (optimized operating environment for computational requirements).}\label{tab:trail2}
1039\end{centering}
1040\end{table}
1041
1042Given the excellent results reported above, we have parallelized many other DYNARE functions. This implies that parallel instances can be invoked many times during a single DYNARE session. Under the basic parallel toolbox implementation, that we call the `Open/Close' strategy, this implies that MATLAB instances are opened and closed many times by system calls, possibly slowing down the computation, specially for `entry-level' computer resources. As mentioned before, this suggested to implement an alternative strategy for the parallel toolbox, that we call the `Always-Open' strategy, where the slave MATLAB threads, once opened, stay alive and wait for new tasks assigned by the master until the full DYNARE procedure is completed. We show next the tests of these latest implementations.
1043
1044\subsection{Test 3}\label{s:test3}
1045In this Section we use the \cite{Lubik2003} model as test function\footnote{The \cite{Lubik2003} model is also selected as the `official' test model for the parallel toolbox in DYNARE.} and a very simple computer class, quite diffuse nowadays: Netbook personal Computer. In particular we used the Dell Mini 10 with Processor Intel� Atom� Z520 (1,33 GHz, 533 MHz), 1 GB di RAM (with Hyper-trading). First, we tested the computational gain of running a full Bayesian estimation: Metropolis (two parallel chains), MCMC diagnostics, posterior IRF's and filtered, smoothed, forecasts, etc. In other words, we designed DYNARE sessions that invoke all parallelized functions. Results are shown in Figures \ref{fig:netbook_complete_openclose}-\ref{fig:netbook_partial_openclose}.
1046\begin{figure}[p]
1047\begin{centering}
1048  % Requires \usepackage{graphicx}
1049  \epsfxsize=300pt \epsfbox{netbook_complete_openclose.pdf}
1050  \caption{Computational Time (s) versus Metropolis length, running all the parallelized functions in DYNARE and the basic parallel implementation (the `Open/Close' strategy). \citep{Lubik2003}.}\label{fig:netbook_complete_openclose}
1051\end{centering}
1052\end{figure}
1053\begin{figure}[p]
1054\begin{centering}
1055  % Requires \usepackage{graphicx}
1056  \epsfxsize=300pt \epsfbox{netbook_partial_openclose.pdf}
1057  \caption{Computational Time (s) versus Metropolis length, loading previously performed MH runs and running \emph{only} the parallelized functions after Metropolis \citep{Lubik2003}. Basic parallel implementation (the `Open/Close' strategy).}\label{fig:netbook_partial_openclose}
1058\end{centering}
1059\end{figure}
1060In Figure \ref{fig:netbook_complete_openclose} we show the computational time versus the length of the Metropolis chains in the serial and parallel setting (`Open/Close' strategy). With very short chain length, parallel setting obviously slows down performances of the computations (due to delays in open/close MATLAB sessions and in synchronization), while increasing the chain length, we can get speed-up rates up to 1.41 on this `entry-level' portable computer (single processor and Hyper-threading).
1061In order to appreciate the gain of parallelizing all functions invoked after Metropolis, in Figure \ref{fig:netbook_partial_openclose} we show the results of the experiment, but without running Metropolis, i.e. we use the options \verb"load_mh_files = 1" and \verb"mh_replic = 0" DYNARE options (i.e. Metropolis and MCMC diagnostics are not invoked). The parallelization of the functions invoked after Metropolis allows to attain speed-up rates of 1.14 (i.e. time gain of about 12\%). Note that the computational cost of these functions is proportional to the chain length only when the latter is relatively small. In fact, the number of sub-draws taken by \verb"posteriorIRF.m" or \verb"prior_posterior_statistics.m" is proportional to the total number of MH draws up to a maximum threshold of 500 sub-draws (for IRF's) and 1,200 sub-draws (for smoother). This is reflected in the shape of the plots, which attain a plateau when these thresholds are reached.
1062%\begin{table}
1063%\begin{centering}
1064%\begin{tabular}{l|l|l}
1065%  % after \\: \hline or \cline{col1-col2} \cline{col3-col4} ...
1066%Chain Length & Time Serial (s) & Time Parallel (s) \\ \hline
1067%105 & 85 & 151 \\
1068%1005 & 246 & 287 \\
1069%5005 & 755 & 599 \\
1070%10005 & 1246 & 948 \\
1071%15005 & 1647 & 1250 \\
1072%20005 & 2068 & 1502 \\
1073%25005 & 2366 & 1675 \\
1074% \hline
1075%\end{tabular}
1076%\caption{Trail results for the \cite{Lubik2003} model. Computational Time running all the parallelized functions in DYNARE and the basic parallel implementation (the `Open/Close' strategy).}\label{tab:trail_ls2003}
1077%\end{centering}
1078%\end{table}
1079\begin{figure}
1080\begin{centering}
1081  % Requires \usepackage{graphicx}
1082  \epsfxsize=300pt \epsfbox{netbook_complete_comp.pdf}
1083  \caption{Comparison of the `Open/Close' strategy and the `Always-open' strategy. Computational Time (s) versus Metropolis length, running all the parallelized functions in DYNARE \citep{Lubik2003}.}\label{fig:netbook_complete_comp}
1084\end{centering}
1085\end{figure}
1086\begin{figure}
1087\begin{centering}
1088  % Requires \usepackage{graphicx}
1089  \epsfxsize=300pt \epsfbox{netbook_partial_comp.pdf}
1090  \caption{Comparison of the `Open/Close' strategy and the `Always-open' strategy. Computational Time (s) versus Metropolis length, running only the parallelized functions after Metropolis \citep{Lubik2003}.}\label{fig:netbook_partial_comp}
1091\end{centering}
1092\end{figure}
1093In Figures \ref{fig:netbook_complete_comp}-\ref{fig:netbook_partial_comp} we plot results of the same type of tests just described, but comparing the `Open/Close' and the `Always-open' strategies. We can see in both graphs that the more sophisticated approach 'Always-open' provides some reduction in computational time. When the entire Bayesian analysis is performed (including Metropolis and MCMC diagnostics, Figure \ref{fig:netbook_complete_comp}) the gain is on average of 5\%, but it can be more than 10\% for short chains. When the Metropolis is not performed, the gain rises on average at about 10\%. As expectable, the gain of the `Always-open' strategy is specially visible when the computational time spent in a single parallel session is not too long if compared to the cost of opening and closing new MATLAB sessions under the `Open/Close' approach.
1094
1095
1096\subsection{Test 4}\label{s:test4}
1097Here we increase the dimension of the test model, using the QUEST III model \citep{Ratto_et_al_EconModel2009}, using a more powerful Notebook Samsung Q 45 with an Dual core Processor Intel Centrino. In Figures \ref{fig:quest_complete_openclose}-\ref{fig:quest_partial_openclose} we show the computational gain of the parallel coding with the `Open/Close' strategy. When the Metropolis is included in the analysis (Figure \ref{fig:quest_complete_openclose}), the computational gain increases with the chain length. For 50,000 MH iterations, the speed-up rate is about 1.42 (i.e. a 30\% time gain), but pushing the computation up to 1,000,000 runs provides an almost ideal speed-up of 1.9 (i.e. a gain of about 50\% similar to Figure \ref{fig:test_time_comp}).
1098It is also interesting to note that for this medium/large size model, even at very short chain length, the parallel coding is always winning over the serial. Excluding the Metropolis from DYNARE execution (Figure \ref{fig:quest_partial_openclose}), we can see that the speed-up rate of running the posterior analysis in parallel on two cores reaches 1.6 (i.e. 38\% of time gain).
1099\begin{figure}[!ht]
1100\begin{centering}
1101  % Requires \usepackage{graphicx}
1102  \epsfxsize=300pt \epsfbox{quest_complete_openclose.pdf}
1103  \caption{Computational Time (s) versus Metropolis length, running all the parallelized functions in DYNARE and the basic parallel implementation (the `Open/Close' strategy). \citep{Ratto_et_al_EconModel2009}.}\label{fig:quest_complete_openclose}
1104\end{centering}
1105\end{figure}
1106\begin{figure}[!hb]
1107\begin{centering}
1108  % Requires \usepackage{graphicx}
1109  \epsfxsize=300pt \epsfbox{quest_partial_openclose.pdf}
1110  \caption{Computational Time (s) versus Metropolis length, loading previously performed MH runs and running \emph{only} the parallelized functions after Metropolis \citep{Ratto_et_al_EconModel2009}. Basic parallel implementation (the `Open/Close' strategy).}\label{fig:quest_partial_openclose}
1111\end{centering}
1112\end{figure}
1113
1114We also checked the efficacy of the `Always-open' approach with respect to the `Open/Close' (Figures \ref{fig:quest_complete_comp} and \ref{fig:quest_partial_comp}). We can see in Figure \ref{fig:quest_complete_comp} that, running the entire Bayesian analysis, no advantage can be appreciated from the more sophisticated `Always-open' approach.
1115\begin{figure}[t]
1116\begin{centering}
1117  % Requires \usepackage{graphicx}
1118  \epsfxsize=300pt \epsfbox{quest_complete_comp.pdf}
1119  \caption{Comparison of the `Open/Close' strategy and the `Always-open' strategy. Computational Time (s) versus Metropolis length, running all the parallelized functions in DYNARE \citep{Ratto_et_al_EconModel2009}.}\label{fig:quest_complete_comp}
1120\end{centering}
1121\end{figure}
1122\begin{figure}[!ht]
1123\begin{centering}
1124  % Requires \usepackage{graphicx}
1125  \epsfxsize=300pt \epsfbox{quest_partial_comp.pdf}
1126  \caption{Comparison of the `Open/Close' strategy and the `Always-open' strategy. Computational Time (s) versus Metropolis length, running only the parallelized functions after Metropolis \citep[QUEST III model][]{Ratto_et_al_EconModel2009}.}\label{fig:quest_partial_comp}
1127\end{centering}
1128\end{figure}
1129
1130On the other hand, in Figure \ref{fig:quest_partial_comp}, we can see that the `Always-open' approach still provides a small speed-up rate of about 1.03. These results confirm the previous comment that the gain of the `Always-open' strategy is specially visible when the computational time spent in a single parallel session is not too long, and therefore, the bigger the model size, the less the advantage of this strategy.
1131
1132
1133\section{Conclusions}
1134The methodology identified for parallelizing MATLAB codes within DYNARE proved to be effective in reducing the computational time of the most extensive loops. This methodology is suitable for `embarrassingly parallel' codes, requiring only a minimal communication flow between slave and master threads. The parallel DYNARE is built around a few `core' routines, that act as a sort of `parallel paradigm'. Based on those routines, parallelization of expensive loops is made quite simple for DYNARE developers. A basic message passing system is also provided, that allows the master thread to monitor the progress of slave threads. The test model \verb"ls2003.mod" is available in the folder \verb"\tests\parallel" of the DYNARE distribution, that allows running parallel examples.
1135
1136% ----------------------------------------------------------------
1137\bibliographystyle{plainnat}
1138%\bibliographystyle{amsplain}
1139%\bibliographystyle{alpha}
1140\bibliography{marco}
1141
1142\newpage
1143\begin{figure}
1144\begin{centering}
1145  % Requires \usepackage{graphicx}
1146  \epsfxsize=250pt \epsfbox{RWMH_quest1_PriorsAndPosteriors1Comp.pdf}
1147  \caption{Prior (grey lines) and posterior density of estimated parameters (black = 100,000 runs; red = 1,000,000 runs) using the RWMH algorithm \citep[QUEST III model][]{Ratto_et_al_EconModel2009}.}\label{fig:quest_RWMH_comp1}
1148\end{centering}
1149\end{figure}
1150\begin{figure}
1151\begin{centering}
1152  % Requires \usepackage{graphicx}
1153  \epsfxsize=250pt \epsfbox{RWMH_quest1_PriorsAndPosteriors2Comp.pdf}
1154  \caption{Prior (grey lines) and posterior density of estimated parameters (black = 100,000 runs; red = 1,000,000 runs) using the RWMH algorithm \citep[QUEST III model][]{Ratto_et_al_EconModel2009}.}\label{fig:quest_RWMH_comp2}
1155\end{centering}
1156\end{figure}
1157\begin{figure}
1158\begin{centering}
1159  % Requires \usepackage{graphicx}
1160  \epsfxsize=250pt \epsfbox{RWMH_quest1_PriorsAndPosteriors3Comp.pdf}
1161  \caption{Prior (grey lines) and posterior density of estimated parameters (black = 100,000 runs; red = 1,000,000 runs) using the RWMH algorithm \citep[QUEST III model][]{Ratto_et_al_EconModel2009}.}\label{fig:quest_RWMH_comp3}
1162\end{centering}
1163\end{figure}
1164\begin{figure}
1165\begin{centering}
1166  % Requires \usepackage{graphicx}
1167  \epsfxsize=250pt \epsfbox{RWMH_quest1_PriorsAndPosteriors4Comp.pdf}
1168  \caption{Prior (grey lines) and posterior density of estimated parameters (black = 100,000 runs; red = 1,000,000 runs) using the RWMH algorithm \citep[QUEST III model][]{Ratto_et_al_EconModel2009}.}\label{fig:quest_RWMH_comp4}
1169\end{centering}
1170\end{figure}
1171\begin{figure}
1172\begin{centering}
1173  % Requires \usepackage{graphicx}
1174  \epsfxsize=250pt \epsfbox{RWMH_quest1_PriorsAndPosteriors5Comp.pdf}
1175  \caption{Prior (grey lines) and posterior density of estimated parameters (black = 100,000 runs; red = 1,000,000 runs) using the RWMH algorithm \citep[QUEST III model][]{Ratto_et_al_EconModel2009}.}\label{fig:quest_RWMH_comp5}
1176\end{centering}
1177\end{figure}
1178\begin{figure}
1179\begin{centering}
1180  % Requires \usepackage{graphicx}
1181  \epsfxsize=250pt \epsfbox{RWMH_quest1_PriorsAndPosteriors6Comp.pdf}
1182  \caption{Prior (grey lines) and posterior density of estimated parameters (black = 100,000 runs; red = 1,000,000 runs) using the RWMH algorithm \citep[QUEST III model][]{Ratto_et_al_EconModel2009}.}\label{fig:quest_RWMH_comp6}
1183\end{centering}
1184\end{figure}
1185\begin{figure}
1186\begin{centering}
1187  % Requires \usepackage{graphicx}
1188  \epsfxsize=250pt \epsfbox{RWMH_quest1_PriorsAndPosteriors7Comp.pdf}
1189  \caption{Prior (grey lines) and posterior density of estimated parameters (black = 100,000 runs; red = 1,000,000 runs) using the RWMH algorithm \citep[QUEST III model][]{Ratto_et_al_EconModel2009}.}\label{fig:quest_RWMH_comp7}
1190\end{centering}
1191\end{figure}
1192
1193\clearpage
1194\newpage
1195
1196\appendix
1197\section{A tale on parallel computing}
1198This is a general introduction to Parallel Computing. Readers can skip it, provided they have a basic knowledge of DYNARE and Computer Programming \citep{GoffeCreel_Grid_2008,Azzini_etal_DYNARE_2007,ParallelDYNARE}.
1199There exists an ample scientific literature as well as an enormous quantity of information on the Web, about parallel computing. Sometimes, this amount of information may result ambiguous and confusing in the notation adopted and the description of technologies.  Then main the goal here is therefore to provide a very simple introduction to this subject, leaving the reader to \cite{Brookshear} for a more extensive and clear introduction to computer science.
1200
1201Modern computer systems (hardware and software) is conceptually identical to the first computer developed by J. Von Neumann. Nevertheless, over time, hardware, software, but most importantly \emph{hardware \& software together} have acquired an ever increasing ability to perform incredibly complex and intensive tasks. Given this complexity, we use to explain the modern computer systems as the ``avenue paradigm'', that we summarize in the next tale.
1202
1203Nowadays there is a small but lovely town called ``CompuTown''. In CompuTown there are many roads, which are all very similar to each other, and also many gardens. The most important road in CompuTown is the Von Neumann Avenue. The first building in Von Neumann Avenue has three floors (this is a \emph{computer system}: PC, workstation, etc.; see Figure \ref{fig:building} and \cite{Brookshear}). Floors communicate between them only with a single stair. In each floor there are people coming from the same country, with the same language, culture and uses. People living, moving and interacting with each other in the first and second floor are  the \emph{programs} or software agents or, more generally speaking, \emph{algorithms} (see chapters 3, 5, 6 and 7 in \cite{Brookshear}). Examples of the latter are the softwares MATLAB, Octave, and a particular program called the \emph{operating system} (Windows, Linux, Mac OS, etc.).
1204
1205\begin{figure}
1206  % Requires \usepackage{graphicx}
1207  \hspace{-15pt}
1208  \epsfxsize=400pt \epsfbox{AvenueParadigm.pdf}
1209  \caption{The first building in Von Neumann Avenue: a \emph{Computer System}}\label{fig:building}
1210\end{figure}
1211
1212
1213People at the \emph{ground floor} are the transistors, the RAM, the CPU, the hard disk, etc. (i.e. the \emph{Computer Architecture}, see chapters 1 and 2 in \citeauthor{Brookshear}).
1214People at the \emph{second floor} communicate with people at the \emph{first floor} using the only existing scale (the \emph{pipe}). In these communications, people talk two different languages, and therefore do not understand each other. To remove this problem people define a set of words, fixed and understood by everybody: the \emph{Programming Languages}. More specifically, these languages are called \emph{high-level programming languages} (Java, C/C++, FORTRAN,MATLAB, etc.), because they are related to people living on the upper floors of the building! Sometimes people in the building use also pictures to communicate: \emph{the icons} and  \emph{graphical user interface}.
1215
1216In a similar way, people at the first floor communicate with people at the ground floor. Not surprisingly, in this case, people use \emph{low-level programming languages} to communicate to each other (assembler, binary code, machine language, etc.). More importantly, however, people at the first floor must also manage and coordinate the requests from people on the second floor to people at the ground floor, since there is no direct communication between ground and second floor. For example they need to translate high-level programming languages into binary code\footnote{The process to transform an high-level programming languages into binary code is called compilation process.}: the \emph{Operating System } performs this task.
1217
1218Sometimes, people at the second floor try to talk directly with people at the ground floor, via the \emph{system calls}. In the parallelizing software presented in this document, we will use frequently these system calls, to distribute the jobs between the available hardware resources, and to coordinate the overall parallel computational process.
1219If only a single person without family lives on the ground floor, such as the porter, we have a CPU single core.
1220In this case, the porter can only do one task at a time for the people in first or second floor (the main characteristic of the Von Neumann architecture). For example, in the morning he first collects and sorts the mail for the people in the building, and only after completing this task he can take care of the garden.
1221If the porter has to do many jobs, he needs to write in a paper the list of things to do: the \emph{memory} and the \emph{CPU load}. Furthermore, to properly perform its tasks, sometimes the porter has to move some objects trough the passageways at the ground floor (the \emph{System Bus}). If the passageways have standard width, we will have a 32 bits CPU architecture (or bus). If the passageways are very large we will have, for example, a 64 bits CPU architecture (or bus).
1222In this scenario, there will be very busy days where many tasks have to be done and many things have to be moved around: the porter will be very tired, although he will be able to `survive'. The most afflicted are always the people at the first floor. Every day they have a lot of new, complex requests from the people at the second floor. These requests must be translated in a correct way and passed to the porter.
1223The people at the second floor (the highest floor) ``live in cloud cuckoo land''. These people want everything to be done easily and promptly: the artificial intelligence, robotics, etc.
1224The activity in the building increases over time, so the porter decides to get helped in order to reduce the execution time for a single job. There are two ways to do this:
1225\begin{itemize}
1226\item the municipality of CompuTown interconnects all the buildings in the city using roads, so that the porter can share and distribute the jobs (the \emph{Computer Networks}): if the porters involved have the same nationality and language we have a \emph{Computer Cluster}, otherwise we have a \emph{Grid}. Nevertheless, in both cases, it is necessary to define a correct way in which porters can manage, share and complete a shared job: the \emph{communication protocol} (TCP/IP, internet protocol, etc.);
1227\item the building administrator employs an additional porter, producing a \emph{Bi-Processor Computer}. In other case, the porter may get married, producing a \emph{dual-core CPU}. In this case, the wife can help the porter to perform his tasks or even take entirely some jobs for her (for example do the accounting, take care of the apartment, etc.). If the couple has a children, they can have a further little help: the \emph{thread} and then the \emph{Hyper-threading} technology.
1228\end{itemize}
1229
1230Now a problem arises: who should coordinate the activities between the porters (and their family) and between the other buildings? Or, in other words, should we refurbish the first and second floors to take advantage of the innovations on the ground floor and of the new roads in CompuTown?
1231First we can lodge new persons at the first floor: the operating systems with a set of network tools and multi-processors support, as well as new people at the second floor with new programming paradigms (MPI, OpenMP, Parrallel DYNARE, etc.). Second, a more complex communication scheme between first and ground floor is necessary, building a new set of stairs. So, for example, if we have two stairs between ground and first floor and two porters, using multi-processors and a new parallel programming paradigm, we can assign jobs to each porter directly and independently, and then coordinate the overall work. In parallel DYNARE we use this kind of `refurbishing' to reduce the computational time and to meet the request of people at the second floor.
1232
1233Unfortunately, this is only an idealized  scenario, where all the citizens in CompuTown live in peace and cooperate between them. In reality, some building occupants argue with each other and this can cause stopping their job: these kinds of conflicts may be linked to \emph{software and hardware compatibility} (between ground and first floor), or to different \emph{software versions} (between second and first floor). The  building administration or the municipality of CompuTown have to take care of these problems an fix them, to make the computer system operate properly.
1234
1235This tale (that can be also called \emph{The Programs's Society}) covered in a few pages the fundamental ideas of computer science.
1236
1237
1238\end{document}
1239% ----------------------------------------------------------------
1240