1------------------------------------------------------------------------------
2--                                                                          --
3--                         GNAT COMPILER COMPONENTS                         --
4--                                                                          --
5--                              G N A T . A W K                             --
6--                                                                          --
7--                                 S p e c                                  --
8--                                                                          --
9--                     Copyright (C) 2000-2011, AdaCore                     --
10--                                                                          --
11-- GNAT is free software;  you can  redistribute it  and/or modify it under --
12-- terms of the  GNU General Public License as published  by the Free Soft- --
13-- ware  Foundation;  either version 3,  or (at your option) any later ver- --
14-- sion.  GNAT is distributed in the hope that it will be useful, but WITH- --
15-- OUT ANY WARRANTY;  without even the  implied warranty of MERCHANTABILITY --
16-- or FITNESS FOR A PARTICULAR PURPOSE.                                     --
17--                                                                          --
18-- As a special exception under Section 7 of GPL version 3, you are granted --
19-- additional permissions described in the GCC Runtime Library Exception,   --
20-- version 3.1, as published by the Free Software Foundation.               --
21--                                                                          --
22-- You should have received a copy of the GNU General Public License and    --
23-- a copy of the GCC Runtime Library Exception along with this program;     --
24-- see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see    --
25-- <http://www.gnu.org/licenses/>.                                          --
26--                                                                          --
27-- GNAT was originally developed  by the GNAT team at  New York University. --
28-- Extensive contributions were provided by Ada Core Technologies Inc.      --
29--                                                                          --
30------------------------------------------------------------------------------
31
32--  This is an AWK-like unit. It provides an easy interface for parsing one
33--  or more files containing formatted data. The file can be viewed seen as
34--  a database where each record is a line and a field is a data element in
35--  this line. In this implementation an AWK record is a line. This means
36--  that a record cannot span multiple lines. The operating procedure is to
37--  read files line by line, with each line being presented to the user of
38--  the package. The interface provides services to access specific fields
39--  in the line. Thus it is possible to control actions taken on a line based
40--  on values of some fields. This can be achieved directly or by registering
41--  callbacks triggered on programmed conditions.
42--
43--  The state of an AWK run is recorded in an object of type session.
44--  The following is the procedure for using a session to control an
45--  AWK run:
46--
47--     1) Specify which session is to be used. It is possible to use the
48--        default session or to create a new one by declaring an object of
49--        type Session_Type. For example:
50--
51--           Computers : Session_Type;
52--
53--     2) Specify how to cut a line into fields. There are two modes: using
54--        character fields separators or column width. This is done by using
55--        Set_Fields_Separators or Set_Fields_Width. For example by:
56--
57--           AWK.Set_Field_Separators (";,", Computers);
58--
59--        or by using iterators' Separators parameter.
60--
61--     3) Specify which files to parse. This is done with Add_File/Add_Files
62--        services, or by using the iterators' Filename parameter. For
63--        example:
64--
65--           AWK.Add_File ("myfile.db", Computers);
66--
67--     4) Run the AWK session using one of the provided iterators.
68--
69--           Parse
70--              This is the most automated iterator. You can gain control on
71--              the session only by registering one or more callbacks (see
72--              Register).
73--
74--           Get_Line/End_Of_Data
75--              This is a manual iterator to be used with a loop. You have
76--              complete control on the session. You can use callbacks but
77--              this is not required.
78--
79--           For_Every_Line
80--              This provides a mixture of manual/automated iterator action.
81--
82--        Examples of these three approaches appear below
83--
84--  There are many ways to use this package. The following discussion shows
85--  three approaches to using this package, using the three iterator forms.
86--  All examples will use the following file (computer.db):
87--
88--     Pluton;Windows-NT;Pentium III
89--     Mars;Linux;Pentium Pro
90--     Venus;Solaris;Sparc
91--     Saturn;OS/2;i486
92--     Jupiter;MacOS;PPC
93--
94--  1) Using Parse iterator
95--
96--     Here the first step is to register some action associated to a pattern
97--     and then to call the Parse iterator (this is the simplest way to use
98--     this unit). The default session is used here. For example to output the
99--     second field (the OS) of computer "Saturn".
100--
101--           procedure Action is
102--           begin
103--              Put_Line (AWK.Field (2));
104--           end Action;
105--
106--        begin
107--           AWK.Register (1, "Saturn", Action'Access);
108--           AWK.Parse (";", "computer.db");
109--
110--
111--  2) Using the Get_Line/End_Of_Data iterator
112--
113--     Here you have full control. For example to do the same as
114--     above but using a specific session, you could write:
115--
116--           Computer_File : Session_Type;
117--
118--        begin
119--           AWK.Set_Current (Computer_File);
120--           AWK.Open (Separators => ";",
121--                     Filename   => "computer.db");
122--
123--           --  Display Saturn OS
124--
125--           while not AWK.End_Of_File loop
126--              AWK.Get_Line;
127--
128--              if AWK.Field (1) = "Saturn" then
129--                 Put_Line (AWK.Field (2));
130--              end if;
131--           end loop;
132--
133--           AWK.Close (Computer_File);
134--
135--
136--  3) Using For_Every_Line iterator
137--
138--     In this case you use a provided iterator and you pass the procedure
139--     that must be called for each record. You could code the previous
140--     example could be coded as follows (using the iterator quick interface
141--     but without using the current session):
142--
143--           Computer_File : Session_Type;
144--
145--           procedure Action (Quit : in out Boolean) is
146--           begin
147--              if AWK.Field (1, Computer_File) = "Saturn" then
148--                 Put_Line (AWK.Field (2, Computer_File));
149--              end if;
150--           end Action;
151--
152--           procedure Look_For_Saturn is
153--              new AWK.For_Every_Line (Action);
154--
155--        begin
156--           Look_For_Saturn (Separators => ";",
157--                            Filename   => "computer.db",
158--                            Session    => Computer_File);
159--
160--           Integer_Text_IO.Put
161--             (Integer (AWK.NR (Session => Computer_File)));
162--           Put_Line (" line(s) have been processed.");
163--
164--  You can also use a regular expression for the pattern. Let us output
165--  the computer name for all computer for which the OS has a character
166--  O in its name.
167--
168--           Regexp   : String := ".*O.*";
169--
170--           Matcher  : Regpat.Pattern_Matcher := Regpat.Compile (Regexp);
171--
172--           procedure Action is
173--           begin
174--              Text_IO.Put_Line (AWK.Field (2));
175--           end Action;
176--
177--        begin
178--           AWK.Register (2, Matcher, Action'Unrestricted_Access);
179--           AWK.Parse (";", "computer.db");
180--
181
182with Ada.Finalization;
183with GNAT.Regpat;
184
185package GNAT.AWK is
186
187   Session_Error : exception;
188   --  Raised when a Session is reused but is not closed
189
190   File_Error : exception;
191   --  Raised when there is a file problem (see below)
192
193   End_Error : exception;
194   --  Raised when an attempt is made to read beyond the end of the last
195   --  file of a session.
196
197   Field_Error : exception;
198   --  Raised when accessing a field value which does not exist
199
200   Data_Error : exception;
201   --  Raised when it is impossible to convert a field value to a specific type
202
203   type Count is new Natural;
204
205   type Widths_Set is array (Positive range <>) of Positive;
206   --  Used to store a set of columns widths
207
208   Default_Separators : constant String := " " & ASCII.HT;
209
210   Use_Current : constant String := "";
211   --  Value used when no separator or filename is specified in iterators
212
213   type Session_Type is limited private;
214   --  This is the main exported type. A session is used to keep the state of
215   --  a full AWK run. The state comprises a list of files, the current file,
216   --  the number of line processed, the current line, the number of fields in
217   --  the current line... A default session is provided (see Set_Current,
218   --  Current_Session and Default_Session below).
219
220   ----------------------------
221   -- Package initialization --
222   ----------------------------
223
224   --  To be thread safe it is not possible to use the default provided
225   --  session. Each task must used a specific session and specify it
226   --  explicitly for every services.
227
228   procedure Set_Current (Session : Session_Type);
229   --  Set the session to be used by default. This file will be used when the
230   --  Session parameter in following services is not specified.
231
232   function Current_Session return not null access Session_Type;
233   --  Returns the session used by default by all services. This is the
234   --  latest session specified by Set_Current service or the session
235   --  provided by default with this implementation.
236
237   function Default_Session return not null access Session_Type;
238   --  Returns the default session provided by this package. Note that this is
239   --  the session return by Current_Session if Set_Current has not been used.
240
241   procedure Set_Field_Separators
242     (Separators : String       := Default_Separators;
243      Session    : Session_Type);
244   procedure Set_Field_Separators
245     (Separators : String       := Default_Separators);
246   --  Set the field separators. Each character in the string is a field
247   --  separator. When a line is read it will be split by field using the
248   --  separators set here. Separators can be changed at any point and in this
249   --  case the current line is split according to the new separators. In the
250   --  special case that Separators is a space and a tabulation
251   --  (Default_Separators), fields are separated by runs of spaces and/or
252   --  tabs.
253
254   procedure Set_FS
255     (Separators : String       := Default_Separators;
256      Session    : Session_Type)
257     renames Set_Field_Separators;
258   procedure Set_FS
259     (Separators : String       := Default_Separators)
260     renames Set_Field_Separators;
261   --  FS is the AWK abbreviation for above service
262
263   procedure Set_Field_Widths
264     (Field_Widths : Widths_Set;
265      Session      : Session_Type);
266   procedure Set_Field_Widths
267     (Field_Widths : Widths_Set);
268   --  This is another way to split a line by giving the length (in number of
269   --  characters) of each field in a line. Field widths can be changed at any
270   --  point and in this case the current line is split according to the new
271   --  field lengths. A line split with this method must have a length equal or
272   --  greater to the total of the field widths. All characters remaining on
273   --  the line after the latest field are added to a new automatically
274   --  created field.
275
276   procedure Add_File
277     (Filename : String;
278      Session  : Session_Type);
279   procedure Add_File
280     (Filename : String);
281   --  Add Filename to the list of file to be processed. There is no limit on
282   --  the number of files that can be added. Files are processed in the order
283   --  they have been added (i.e. the filename list is FIFO). If Filename does
284   --  not exist or if it is not readable, File_Error is raised.
285
286   procedure Add_Files
287     (Directory             : String;
288      Filenames             : String;
289      Number_Of_Files_Added : out Natural;
290      Session               : Session_Type);
291   procedure Add_Files
292     (Directory             : String;
293      Filenames             : String;
294      Number_Of_Files_Added : out Natural);
295   --  Add all files matching the regular expression Filenames in the specified
296   --  directory to the list of file to be processed. There is no limit on
297   --  the number of files that can be added. Each file is processed in
298   --  the same order they have been added (i.e. the filename list is FIFO).
299   --  The number of files (possibly 0) added is returned in
300   --  Number_Of_Files_Added.
301
302   -------------------------------------
303   -- Information about current state --
304   -------------------------------------
305
306   function Number_Of_Fields
307     (Session : Session_Type) return Count;
308   function Number_Of_Fields
309     return Count;
310   pragma Inline (Number_Of_Fields);
311   --  Returns the number of fields in the current record. It returns 0 when
312   --  no file is being processed.
313
314   function NF
315     (Session : Session_Type) return Count
316     renames Number_Of_Fields;
317   function NF
318     return Count
319     renames Number_Of_Fields;
320   --  AWK abbreviation for above service
321
322   function Number_Of_File_Lines
323     (Session : Session_Type) return Count;
324   function Number_Of_File_Lines
325     return Count;
326   pragma Inline (Number_Of_File_Lines);
327   --  Returns the current line number in the processed file. It returns 0 when
328   --  no file is being processed.
329
330   function FNR (Session : Session_Type) return Count
331     renames Number_Of_File_Lines;
332   function FNR return Count
333     renames Number_Of_File_Lines;
334   --  AWK abbreviation for above service
335
336   function Number_Of_Lines
337     (Session : Session_Type) return Count;
338   function Number_Of_Lines
339     return Count;
340   pragma Inline (Number_Of_Lines);
341   --  Returns the number of line processed until now. This is equal to number
342   --  of line in each already processed file plus FNR. It returns 0 when
343   --  no file is being processed.
344
345   function NR (Session : Session_Type) return Count
346     renames Number_Of_Lines;
347   function NR return Count
348     renames Number_Of_Lines;
349   --  AWK abbreviation for above service
350
351   function Number_Of_Files
352     (Session : Session_Type) return Natural;
353   function Number_Of_Files
354     return Natural;
355   pragma Inline (Number_Of_Files);
356   --  Returns the number of files associated with Session. This is the total
357   --  number of files added with Add_File and Add_Files services.
358
359   function File (Session : Session_Type) return String;
360   function File return String;
361   --  Returns the name of the file being processed. It returns the empty
362   --  string when no file is being processed.
363
364   ---------------------
365   -- Field accessors --
366   ---------------------
367
368   function Field
369     (Rank    : Count;
370      Session : Session_Type) return String;
371   function Field
372     (Rank    : Count) return String;
373   --  Returns field number Rank value of the current record. If Rank = 0 it
374   --  returns the current record (i.e. the line as read in the file). It
375   --  raises Field_Error if Rank > NF or if Session is not open.
376
377   function Field
378     (Rank    : Count;
379      Session : Session_Type) return Integer;
380   function Field
381     (Rank    : Count) return Integer;
382   --  Returns field number Rank value of the current record as an integer. It
383   --  raises Field_Error if Rank > NF or if Session is not open. It
384   --  raises Data_Error if the field value cannot be converted to an integer.
385
386   function Field
387     (Rank    : Count;
388      Session : Session_Type) return Float;
389   function Field
390     (Rank    : Count) return Float;
391   --  Returns field number Rank value of the current record as a float. It
392   --  raises Field_Error if Rank > NF or if Session is not open. It
393   --  raises Data_Error if the field value cannot be converted to a float.
394
395   generic
396      type Discrete is (<>);
397   function Discrete_Field
398     (Rank    : Count;
399      Session : Session_Type) return Discrete;
400   generic
401      type Discrete is (<>);
402   function Discrete_Field_Current_Session
403     (Rank    : Count) return Discrete;
404   --  Returns field number Rank value of the current record as a type
405   --  Discrete. It raises Field_Error if Rank > NF. It raises Data_Error if
406   --  the field value cannot be converted to type Discrete.
407
408   --------------------
409   -- Pattern/Action --
410   --------------------
411
412   --  AWK defines rules like "PATTERN { ACTION }". Which means that ACTION
413   --  will be executed if PATTERN match. A pattern in this implementation can
414   --  be a simple string (match function is equality), a regular expression,
415   --  a function returning a boolean. An action is associated to a pattern
416   --  using the Register services.
417   --
418   --  Each procedure Register will add a rule to the set of rules for the
419   --  session. Rules are examined in the order they have been added.
420
421   type Pattern_Callback is access function return Boolean;
422   --  This is a pattern function pointer. When it returns True the associated
423   --  action will be called.
424
425   type Action_Callback is access procedure;
426   --  A simple action pointer
427
428   type Match_Action_Callback is
429     access procedure (Matches : GNAT.Regpat.Match_Array);
430   --  An advanced action pointer used with a regular expression pattern. It
431   --  returns an array of all the matches. See GNAT.Regpat for further
432   --  information.
433
434   procedure Register
435     (Field   : Count;
436      Pattern : String;
437      Action  : Action_Callback;
438      Session : Session_Type);
439   procedure Register
440     (Field   : Count;
441      Pattern : String;
442      Action  : Action_Callback);
443   --  Register an Action associated with a Pattern. The pattern here is a
444   --  simple string that must match exactly the field number specified.
445
446   procedure Register
447     (Field   : Count;
448      Pattern : GNAT.Regpat.Pattern_Matcher;
449      Action  : Action_Callback;
450      Session : Session_Type);
451   procedure Register
452     (Field   : Count;
453      Pattern : GNAT.Regpat.Pattern_Matcher;
454      Action  : Action_Callback);
455   --  Register an Action associated with a Pattern. The pattern here is a
456   --  simple regular expression which must match the field number specified.
457
458   procedure Register
459     (Field   : Count;
460      Pattern : GNAT.Regpat.Pattern_Matcher;
461      Action  : Match_Action_Callback;
462      Session : Session_Type);
463   procedure Register
464     (Field   : Count;
465      Pattern : GNAT.Regpat.Pattern_Matcher;
466      Action  : Match_Action_Callback);
467   --  Same as above but it pass the set of matches to the action
468   --  procedure. This is useful to analyse further why and where a regular
469   --  expression did match.
470
471   procedure Register
472     (Pattern : Pattern_Callback;
473      Action  : Action_Callback;
474      Session : Session_Type);
475   procedure Register
476     (Pattern : Pattern_Callback;
477      Action  : Action_Callback);
478   --  Register an Action associated with a Pattern. The pattern here is a
479   --  function that must return a boolean. Action callback will be called if
480   --  the pattern callback returns True and nothing will happen if it is
481   --  False. This version is more general, the two other register services
482   --  trigger an action based on the value of a single field only.
483
484   procedure Register
485     (Action  : Action_Callback;
486      Session : Session_Type);
487   procedure Register
488     (Action  : Action_Callback);
489   --  Register an Action that will be called for every line. This is
490   --  equivalent to a Pattern_Callback function always returning True.
491
492   --------------------
493   -- Parse iterator --
494   --------------------
495
496   procedure Parse
497     (Separators : String := Use_Current;
498      Filename   : String := Use_Current;
499      Session    : Session_Type);
500   procedure Parse
501     (Separators : String := Use_Current;
502      Filename   : String := Use_Current);
503   --  Launch the iterator, it will read every line in all specified
504   --  session's files. Registered callbacks are then called if the associated
505   --  pattern match. It is possible to specify a filename and a set of
506   --  separators directly. This offer a quick way to parse a single
507   --  file. These parameters will override those specified by Set_FS and
508   --  Add_File. The Session will be opened and closed automatically.
509   --  File_Error is raised if there is no file associated with Session, or if
510   --  a file associated with Session is not longer readable. It raises
511   --  Session_Error is Session is already open.
512
513   -----------------------------------
514   -- Get_Line/End_Of_Data Iterator --
515   -----------------------------------
516
517   type Callback_Mode is (None, Only, Pass_Through);
518   --  These mode are used for Get_Line/End_Of_Data and For_Every_Line
519   --  iterators. The associated semantic is:
520   --
521   --    None
522   --       callbacks are not active. This is the default mode for
523   --       Get_Line/End_Of_Data and For_Every_Line iterators.
524   --
525   --    Only
526   --       callbacks are active, if at least one pattern match, the associated
527   --       action is called and this line will not be passed to the user. In
528   --       the Get_Line case the next line will be read (if there is some
529   --       line remaining), in the For_Every_Line case Action will
530   --       not be called for this line.
531   --
532   --    Pass_Through
533   --       callbacks are active, for patterns which match the associated
534   --       action is called. Then the line is passed to the user. It means
535   --       that Action procedure is called in the For_Every_Line case and
536   --       that Get_Line returns with the current line active.
537   --
538
539   procedure Open
540     (Separators : String := Use_Current;
541      Filename   : String := Use_Current;
542      Session    : Session_Type);
543   procedure Open
544     (Separators : String := Use_Current;
545      Filename   : String := Use_Current);
546   --  Open the first file and initialize the unit. This must be called once
547   --  before using Get_Line. It is possible to specify a filename and a set of
548   --  separators directly. This offer a quick way to parse a single file.
549   --  These parameters will override those specified by Set_FS and Add_File.
550   --  File_Error is raised if there is no file associated with Session, or if
551   --  the first file associated with Session is no longer readable. It raises
552   --  Session_Error is Session is already open.
553
554   procedure Get_Line
555     (Callbacks : Callback_Mode := None;
556      Session   : Session_Type);
557   procedure Get_Line
558     (Callbacks : Callback_Mode := None);
559   --  Read a line from the current input file. If the file index is at the
560   --  end of the current input file (i.e. End_Of_File is True) then the
561   --  following file is opened. If there is no more file to be processed,
562   --  exception End_Error will be raised. File_Error will be raised if Open
563   --  has not been called. Next call to Get_Line will return the following
564   --  line in the file. By default the registered callbacks are not called by
565   --  Get_Line, this can activated by setting Callbacks (see Callback_Mode
566   --  description above). File_Error may be raised if a file associated with
567   --  Session is not readable.
568   --
569   --  When Callbacks is not None, it is possible to exhaust all the lines
570   --  of all the files associated with Session. In this case, File_Error
571   --  is not raised.
572   --
573   --  This procedure can be used from a subprogram called by procedure Parse
574   --  or by an instantiation of For_Every_Line (see below).
575
576   function End_Of_Data
577     (Session : Session_Type) return Boolean;
578   function End_Of_Data
579     return Boolean;
580   pragma Inline (End_Of_Data);
581   --  Returns True if there is no more data to be processed in Session. It
582   --  means that the latest session's file is being processed and that
583   --  there is no more data to be read in this file (End_Of_File is True).
584
585   function End_Of_File
586     (Session : Session_Type) return Boolean;
587   function End_Of_File
588     return Boolean;
589   pragma Inline (End_Of_File);
590   --  Returns True when there is no more data to be processed on the current
591   --  session's file.
592
593   procedure Close (Session : Session_Type);
594   --  Release all associated data with Session. All memory allocated will
595   --  be freed, the current file will be closed if needed, the callbacks
596   --  will be unregistered. Close is convenient in reestablishing a session
597   --  for new use. Get_Line is no longer usable (will raise File_Error)
598   --  except after a successful call to Open, Parse or an instantiation
599   --  of For_Every_Line.
600
601   -----------------------------
602   -- For_Every_Line iterator --
603   -----------------------------
604
605   generic
606      with procedure Action (Quit : in out Boolean);
607   procedure For_Every_Line
608     (Separators : String := Use_Current;
609      Filename   : String := Use_Current;
610      Callbacks  : Callback_Mode := None;
611      Session    : Session_Type);
612   generic
613      with procedure Action (Quit : in out Boolean);
614   procedure For_Every_Line_Current_Session
615     (Separators : String := Use_Current;
616      Filename   : String := Use_Current;
617      Callbacks  : Callback_Mode := None);
618   --  This is another iterator. Action will be called for each new
619   --  record. The iterator's termination can be controlled by setting Quit
620   --  to True. It is by default set to False. It is possible to specify a
621   --  filename and a set of separators directly. This offer a quick way to
622   --  parse a single file. These parameters will override those specified by
623   --  Set_FS and Add_File. By default the registered callbacks are not called
624   --  by For_Every_Line, this can activated by setting Callbacks (see
625   --  Callback_Mode description above). The Session will be opened and
626   --  closed automatically. File_Error is raised if there is no file
627   --  associated with Session. It raises Session_Error is Session is already
628   --  open.
629
630private
631   type Session_Data;
632   type Session_Data_Access is access Session_Data;
633
634   type Session_Type is new Ada.Finalization.Limited_Controlled with record
635      Data : Session_Data_Access;
636      Self : not null access Session_Type := Session_Type'Unchecked_Access;
637   end record;
638
639   procedure Initialize (Session : in out Session_Type);
640   procedure Finalize   (Session : in out Session_Type);
641
642end GNAT.AWK;
643