1------------------------------------------------------------------------------ 2-- -- 3-- GNAT COMPILER COMPONENTS -- 4-- -- 5-- G N A T . A W K -- 6-- -- 7-- S p e c -- 8-- -- 9-- Copyright (C) 2000-2011, AdaCore -- 10-- -- 11-- GNAT is free software; you can redistribute it and/or modify it under -- 12-- terms of the GNU General Public License as published by the Free Soft- -- 13-- ware Foundation; either version 3, or (at your option) any later ver- -- 14-- sion. GNAT is distributed in the hope that it will be useful, but WITH- -- 15-- OUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY -- 16-- or FITNESS FOR A PARTICULAR PURPOSE. -- 17-- -- 18-- As a special exception under Section 7 of GPL version 3, you are granted -- 19-- additional permissions described in the GCC Runtime Library Exception, -- 20-- version 3.1, as published by the Free Software Foundation. -- 21-- -- 22-- You should have received a copy of the GNU General Public License and -- 23-- a copy of the GCC Runtime Library Exception along with this program; -- 24-- see the files COPYING3 and COPYING.RUNTIME respectively. If not, see -- 25-- <http://www.gnu.org/licenses/>. -- 26-- -- 27-- GNAT was originally developed by the GNAT team at New York University. -- 28-- Extensive contributions were provided by Ada Core Technologies Inc. -- 29-- -- 30------------------------------------------------------------------------------ 31 32-- This is an AWK-like unit. It provides an easy interface for parsing one 33-- or more files containing formatted data. The file can be viewed seen as 34-- a database where each record is a line and a field is a data element in 35-- this line. In this implementation an AWK record is a line. This means 36-- that a record cannot span multiple lines. The operating procedure is to 37-- read files line by line, with each line being presented to the user of 38-- the package. The interface provides services to access specific fields 39-- in the line. Thus it is possible to control actions taken on a line based 40-- on values of some fields. This can be achieved directly or by registering 41-- callbacks triggered on programmed conditions. 42-- 43-- The state of an AWK run is recorded in an object of type session. 44-- The following is the procedure for using a session to control an 45-- AWK run: 46-- 47-- 1) Specify which session is to be used. It is possible to use the 48-- default session or to create a new one by declaring an object of 49-- type Session_Type. For example: 50-- 51-- Computers : Session_Type; 52-- 53-- 2) Specify how to cut a line into fields. There are two modes: using 54-- character fields separators or column width. This is done by using 55-- Set_Fields_Separators or Set_Fields_Width. For example by: 56-- 57-- AWK.Set_Field_Separators (";,", Computers); 58-- 59-- or by using iterators' Separators parameter. 60-- 61-- 3) Specify which files to parse. This is done with Add_File/Add_Files 62-- services, or by using the iterators' Filename parameter. For 63-- example: 64-- 65-- AWK.Add_File ("myfile.db", Computers); 66-- 67-- 4) Run the AWK session using one of the provided iterators. 68-- 69-- Parse 70-- This is the most automated iterator. You can gain control on 71-- the session only by registering one or more callbacks (see 72-- Register). 73-- 74-- Get_Line/End_Of_Data 75-- This is a manual iterator to be used with a loop. You have 76-- complete control on the session. You can use callbacks but 77-- this is not required. 78-- 79-- For_Every_Line 80-- This provides a mixture of manual/automated iterator action. 81-- 82-- Examples of these three approaches appear below 83-- 84-- There are many ways to use this package. The following discussion shows 85-- three approaches to using this package, using the three iterator forms. 86-- All examples will use the following file (computer.db): 87-- 88-- Pluton;Windows-NT;Pentium III 89-- Mars;Linux;Pentium Pro 90-- Venus;Solaris;Sparc 91-- Saturn;OS/2;i486 92-- Jupiter;MacOS;PPC 93-- 94-- 1) Using Parse iterator 95-- 96-- Here the first step is to register some action associated to a pattern 97-- and then to call the Parse iterator (this is the simplest way to use 98-- this unit). The default session is used here. For example to output the 99-- second field (the OS) of computer "Saturn". 100-- 101-- procedure Action is 102-- begin 103-- Put_Line (AWK.Field (2)); 104-- end Action; 105-- 106-- begin 107-- AWK.Register (1, "Saturn", Action'Access); 108-- AWK.Parse (";", "computer.db"); 109-- 110-- 111-- 2) Using the Get_Line/End_Of_Data iterator 112-- 113-- Here you have full control. For example to do the same as 114-- above but using a specific session, you could write: 115-- 116-- Computer_File : Session_Type; 117-- 118-- begin 119-- AWK.Set_Current (Computer_File); 120-- AWK.Open (Separators => ";", 121-- Filename => "computer.db"); 122-- 123-- -- Display Saturn OS 124-- 125-- while not AWK.End_Of_File loop 126-- AWK.Get_Line; 127-- 128-- if AWK.Field (1) = "Saturn" then 129-- Put_Line (AWK.Field (2)); 130-- end if; 131-- end loop; 132-- 133-- AWK.Close (Computer_File); 134-- 135-- 136-- 3) Using For_Every_Line iterator 137-- 138-- In this case you use a provided iterator and you pass the procedure 139-- that must be called for each record. You could code the previous 140-- example could be coded as follows (using the iterator quick interface 141-- but without using the current session): 142-- 143-- Computer_File : Session_Type; 144-- 145-- procedure Action (Quit : in out Boolean) is 146-- begin 147-- if AWK.Field (1, Computer_File) = "Saturn" then 148-- Put_Line (AWK.Field (2, Computer_File)); 149-- end if; 150-- end Action; 151-- 152-- procedure Look_For_Saturn is 153-- new AWK.For_Every_Line (Action); 154-- 155-- begin 156-- Look_For_Saturn (Separators => ";", 157-- Filename => "computer.db", 158-- Session => Computer_File); 159-- 160-- Integer_Text_IO.Put 161-- (Integer (AWK.NR (Session => Computer_File))); 162-- Put_Line (" line(s) have been processed."); 163-- 164-- You can also use a regular expression for the pattern. Let us output 165-- the computer name for all computer for which the OS has a character 166-- O in its name. 167-- 168-- Regexp : String := ".*O.*"; 169-- 170-- Matcher : Regpat.Pattern_Matcher := Regpat.Compile (Regexp); 171-- 172-- procedure Action is 173-- begin 174-- Text_IO.Put_Line (AWK.Field (2)); 175-- end Action; 176-- 177-- begin 178-- AWK.Register (2, Matcher, Action'Unrestricted_Access); 179-- AWK.Parse (";", "computer.db"); 180-- 181 182with Ada.Finalization; 183with GNAT.Regpat; 184 185package GNAT.AWK is 186 187 Session_Error : exception; 188 -- Raised when a Session is reused but is not closed 189 190 File_Error : exception; 191 -- Raised when there is a file problem (see below) 192 193 End_Error : exception; 194 -- Raised when an attempt is made to read beyond the end of the last 195 -- file of a session. 196 197 Field_Error : exception; 198 -- Raised when accessing a field value which does not exist 199 200 Data_Error : exception; 201 -- Raised when it is impossible to convert a field value to a specific type 202 203 type Count is new Natural; 204 205 type Widths_Set is array (Positive range <>) of Positive; 206 -- Used to store a set of columns widths 207 208 Default_Separators : constant String := " " & ASCII.HT; 209 210 Use_Current : constant String := ""; 211 -- Value used when no separator or filename is specified in iterators 212 213 type Session_Type is limited private; 214 -- This is the main exported type. A session is used to keep the state of 215 -- a full AWK run. The state comprises a list of files, the current file, 216 -- the number of line processed, the current line, the number of fields in 217 -- the current line... A default session is provided (see Set_Current, 218 -- Current_Session and Default_Session below). 219 220 ---------------------------- 221 -- Package initialization -- 222 ---------------------------- 223 224 -- To be thread safe it is not possible to use the default provided 225 -- session. Each task must used a specific session and specify it 226 -- explicitly for every services. 227 228 procedure Set_Current (Session : Session_Type); 229 -- Set the session to be used by default. This file will be used when the 230 -- Session parameter in following services is not specified. 231 232 function Current_Session return not null access Session_Type; 233 -- Returns the session used by default by all services. This is the 234 -- latest session specified by Set_Current service or the session 235 -- provided by default with this implementation. 236 237 function Default_Session return not null access Session_Type; 238 -- Returns the default session provided by this package. Note that this is 239 -- the session return by Current_Session if Set_Current has not been used. 240 241 procedure Set_Field_Separators 242 (Separators : String := Default_Separators; 243 Session : Session_Type); 244 procedure Set_Field_Separators 245 (Separators : String := Default_Separators); 246 -- Set the field separators. Each character in the string is a field 247 -- separator. When a line is read it will be split by field using the 248 -- separators set here. Separators can be changed at any point and in this 249 -- case the current line is split according to the new separators. In the 250 -- special case that Separators is a space and a tabulation 251 -- (Default_Separators), fields are separated by runs of spaces and/or 252 -- tabs. 253 254 procedure Set_FS 255 (Separators : String := Default_Separators; 256 Session : Session_Type) 257 renames Set_Field_Separators; 258 procedure Set_FS 259 (Separators : String := Default_Separators) 260 renames Set_Field_Separators; 261 -- FS is the AWK abbreviation for above service 262 263 procedure Set_Field_Widths 264 (Field_Widths : Widths_Set; 265 Session : Session_Type); 266 procedure Set_Field_Widths 267 (Field_Widths : Widths_Set); 268 -- This is another way to split a line by giving the length (in number of 269 -- characters) of each field in a line. Field widths can be changed at any 270 -- point and in this case the current line is split according to the new 271 -- field lengths. A line split with this method must have a length equal or 272 -- greater to the total of the field widths. All characters remaining on 273 -- the line after the latest field are added to a new automatically 274 -- created field. 275 276 procedure Add_File 277 (Filename : String; 278 Session : Session_Type); 279 procedure Add_File 280 (Filename : String); 281 -- Add Filename to the list of file to be processed. There is no limit on 282 -- the number of files that can be added. Files are processed in the order 283 -- they have been added (i.e. the filename list is FIFO). If Filename does 284 -- not exist or if it is not readable, File_Error is raised. 285 286 procedure Add_Files 287 (Directory : String; 288 Filenames : String; 289 Number_Of_Files_Added : out Natural; 290 Session : Session_Type); 291 procedure Add_Files 292 (Directory : String; 293 Filenames : String; 294 Number_Of_Files_Added : out Natural); 295 -- Add all files matching the regular expression Filenames in the specified 296 -- directory to the list of file to be processed. There is no limit on 297 -- the number of files that can be added. Each file is processed in 298 -- the same order they have been added (i.e. the filename list is FIFO). 299 -- The number of files (possibly 0) added is returned in 300 -- Number_Of_Files_Added. 301 302 ------------------------------------- 303 -- Information about current state -- 304 ------------------------------------- 305 306 function Number_Of_Fields 307 (Session : Session_Type) return Count; 308 function Number_Of_Fields 309 return Count; 310 pragma Inline (Number_Of_Fields); 311 -- Returns the number of fields in the current record. It returns 0 when 312 -- no file is being processed. 313 314 function NF 315 (Session : Session_Type) return Count 316 renames Number_Of_Fields; 317 function NF 318 return Count 319 renames Number_Of_Fields; 320 -- AWK abbreviation for above service 321 322 function Number_Of_File_Lines 323 (Session : Session_Type) return Count; 324 function Number_Of_File_Lines 325 return Count; 326 pragma Inline (Number_Of_File_Lines); 327 -- Returns the current line number in the processed file. It returns 0 when 328 -- no file is being processed. 329 330 function FNR (Session : Session_Type) return Count 331 renames Number_Of_File_Lines; 332 function FNR return Count 333 renames Number_Of_File_Lines; 334 -- AWK abbreviation for above service 335 336 function Number_Of_Lines 337 (Session : Session_Type) return Count; 338 function Number_Of_Lines 339 return Count; 340 pragma Inline (Number_Of_Lines); 341 -- Returns the number of line processed until now. This is equal to number 342 -- of line in each already processed file plus FNR. It returns 0 when 343 -- no file is being processed. 344 345 function NR (Session : Session_Type) return Count 346 renames Number_Of_Lines; 347 function NR return Count 348 renames Number_Of_Lines; 349 -- AWK abbreviation for above service 350 351 function Number_Of_Files 352 (Session : Session_Type) return Natural; 353 function Number_Of_Files 354 return Natural; 355 pragma Inline (Number_Of_Files); 356 -- Returns the number of files associated with Session. This is the total 357 -- number of files added with Add_File and Add_Files services. 358 359 function File (Session : Session_Type) return String; 360 function File return String; 361 -- Returns the name of the file being processed. It returns the empty 362 -- string when no file is being processed. 363 364 --------------------- 365 -- Field accessors -- 366 --------------------- 367 368 function Field 369 (Rank : Count; 370 Session : Session_Type) return String; 371 function Field 372 (Rank : Count) return String; 373 -- Returns field number Rank value of the current record. If Rank = 0 it 374 -- returns the current record (i.e. the line as read in the file). It 375 -- raises Field_Error if Rank > NF or if Session is not open. 376 377 function Field 378 (Rank : Count; 379 Session : Session_Type) return Integer; 380 function Field 381 (Rank : Count) return Integer; 382 -- Returns field number Rank value of the current record as an integer. It 383 -- raises Field_Error if Rank > NF or if Session is not open. It 384 -- raises Data_Error if the field value cannot be converted to an integer. 385 386 function Field 387 (Rank : Count; 388 Session : Session_Type) return Float; 389 function Field 390 (Rank : Count) return Float; 391 -- Returns field number Rank value of the current record as a float. It 392 -- raises Field_Error if Rank > NF or if Session is not open. It 393 -- raises Data_Error if the field value cannot be converted to a float. 394 395 generic 396 type Discrete is (<>); 397 function Discrete_Field 398 (Rank : Count; 399 Session : Session_Type) return Discrete; 400 generic 401 type Discrete is (<>); 402 function Discrete_Field_Current_Session 403 (Rank : Count) return Discrete; 404 -- Returns field number Rank value of the current record as a type 405 -- Discrete. It raises Field_Error if Rank > NF. It raises Data_Error if 406 -- the field value cannot be converted to type Discrete. 407 408 -------------------- 409 -- Pattern/Action -- 410 -------------------- 411 412 -- AWK defines rules like "PATTERN { ACTION }". Which means that ACTION 413 -- will be executed if PATTERN match. A pattern in this implementation can 414 -- be a simple string (match function is equality), a regular expression, 415 -- a function returning a boolean. An action is associated to a pattern 416 -- using the Register services. 417 -- 418 -- Each procedure Register will add a rule to the set of rules for the 419 -- session. Rules are examined in the order they have been added. 420 421 type Pattern_Callback is access function return Boolean; 422 -- This is a pattern function pointer. When it returns True the associated 423 -- action will be called. 424 425 type Action_Callback is access procedure; 426 -- A simple action pointer 427 428 type Match_Action_Callback is 429 access procedure (Matches : GNAT.Regpat.Match_Array); 430 -- An advanced action pointer used with a regular expression pattern. It 431 -- returns an array of all the matches. See GNAT.Regpat for further 432 -- information. 433 434 procedure Register 435 (Field : Count; 436 Pattern : String; 437 Action : Action_Callback; 438 Session : Session_Type); 439 procedure Register 440 (Field : Count; 441 Pattern : String; 442 Action : Action_Callback); 443 -- Register an Action associated with a Pattern. The pattern here is a 444 -- simple string that must match exactly the field number specified. 445 446 procedure Register 447 (Field : Count; 448 Pattern : GNAT.Regpat.Pattern_Matcher; 449 Action : Action_Callback; 450 Session : Session_Type); 451 procedure Register 452 (Field : Count; 453 Pattern : GNAT.Regpat.Pattern_Matcher; 454 Action : Action_Callback); 455 -- Register an Action associated with a Pattern. The pattern here is a 456 -- simple regular expression which must match the field number specified. 457 458 procedure Register 459 (Field : Count; 460 Pattern : GNAT.Regpat.Pattern_Matcher; 461 Action : Match_Action_Callback; 462 Session : Session_Type); 463 procedure Register 464 (Field : Count; 465 Pattern : GNAT.Regpat.Pattern_Matcher; 466 Action : Match_Action_Callback); 467 -- Same as above but it pass the set of matches to the action 468 -- procedure. This is useful to analyse further why and where a regular 469 -- expression did match. 470 471 procedure Register 472 (Pattern : Pattern_Callback; 473 Action : Action_Callback; 474 Session : Session_Type); 475 procedure Register 476 (Pattern : Pattern_Callback; 477 Action : Action_Callback); 478 -- Register an Action associated with a Pattern. The pattern here is a 479 -- function that must return a boolean. Action callback will be called if 480 -- the pattern callback returns True and nothing will happen if it is 481 -- False. This version is more general, the two other register services 482 -- trigger an action based on the value of a single field only. 483 484 procedure Register 485 (Action : Action_Callback; 486 Session : Session_Type); 487 procedure Register 488 (Action : Action_Callback); 489 -- Register an Action that will be called for every line. This is 490 -- equivalent to a Pattern_Callback function always returning True. 491 492 -------------------- 493 -- Parse iterator -- 494 -------------------- 495 496 procedure Parse 497 (Separators : String := Use_Current; 498 Filename : String := Use_Current; 499 Session : Session_Type); 500 procedure Parse 501 (Separators : String := Use_Current; 502 Filename : String := Use_Current); 503 -- Launch the iterator, it will read every line in all specified 504 -- session's files. Registered callbacks are then called if the associated 505 -- pattern match. It is possible to specify a filename and a set of 506 -- separators directly. This offer a quick way to parse a single 507 -- file. These parameters will override those specified by Set_FS and 508 -- Add_File. The Session will be opened and closed automatically. 509 -- File_Error is raised if there is no file associated with Session, or if 510 -- a file associated with Session is not longer readable. It raises 511 -- Session_Error is Session is already open. 512 513 ----------------------------------- 514 -- Get_Line/End_Of_Data Iterator -- 515 ----------------------------------- 516 517 type Callback_Mode is (None, Only, Pass_Through); 518 -- These mode are used for Get_Line/End_Of_Data and For_Every_Line 519 -- iterators. The associated semantic is: 520 -- 521 -- None 522 -- callbacks are not active. This is the default mode for 523 -- Get_Line/End_Of_Data and For_Every_Line iterators. 524 -- 525 -- Only 526 -- callbacks are active, if at least one pattern match, the associated 527 -- action is called and this line will not be passed to the user. In 528 -- the Get_Line case the next line will be read (if there is some 529 -- line remaining), in the For_Every_Line case Action will 530 -- not be called for this line. 531 -- 532 -- Pass_Through 533 -- callbacks are active, for patterns which match the associated 534 -- action is called. Then the line is passed to the user. It means 535 -- that Action procedure is called in the For_Every_Line case and 536 -- that Get_Line returns with the current line active. 537 -- 538 539 procedure Open 540 (Separators : String := Use_Current; 541 Filename : String := Use_Current; 542 Session : Session_Type); 543 procedure Open 544 (Separators : String := Use_Current; 545 Filename : String := Use_Current); 546 -- Open the first file and initialize the unit. This must be called once 547 -- before using Get_Line. It is possible to specify a filename and a set of 548 -- separators directly. This offer a quick way to parse a single file. 549 -- These parameters will override those specified by Set_FS and Add_File. 550 -- File_Error is raised if there is no file associated with Session, or if 551 -- the first file associated with Session is no longer readable. It raises 552 -- Session_Error is Session is already open. 553 554 procedure Get_Line 555 (Callbacks : Callback_Mode := None; 556 Session : Session_Type); 557 procedure Get_Line 558 (Callbacks : Callback_Mode := None); 559 -- Read a line from the current input file. If the file index is at the 560 -- end of the current input file (i.e. End_Of_File is True) then the 561 -- following file is opened. If there is no more file to be processed, 562 -- exception End_Error will be raised. File_Error will be raised if Open 563 -- has not been called. Next call to Get_Line will return the following 564 -- line in the file. By default the registered callbacks are not called by 565 -- Get_Line, this can activated by setting Callbacks (see Callback_Mode 566 -- description above). File_Error may be raised if a file associated with 567 -- Session is not readable. 568 -- 569 -- When Callbacks is not None, it is possible to exhaust all the lines 570 -- of all the files associated with Session. In this case, File_Error 571 -- is not raised. 572 -- 573 -- This procedure can be used from a subprogram called by procedure Parse 574 -- or by an instantiation of For_Every_Line (see below). 575 576 function End_Of_Data 577 (Session : Session_Type) return Boolean; 578 function End_Of_Data 579 return Boolean; 580 pragma Inline (End_Of_Data); 581 -- Returns True if there is no more data to be processed in Session. It 582 -- means that the latest session's file is being processed and that 583 -- there is no more data to be read in this file (End_Of_File is True). 584 585 function End_Of_File 586 (Session : Session_Type) return Boolean; 587 function End_Of_File 588 return Boolean; 589 pragma Inline (End_Of_File); 590 -- Returns True when there is no more data to be processed on the current 591 -- session's file. 592 593 procedure Close (Session : Session_Type); 594 -- Release all associated data with Session. All memory allocated will 595 -- be freed, the current file will be closed if needed, the callbacks 596 -- will be unregistered. Close is convenient in reestablishing a session 597 -- for new use. Get_Line is no longer usable (will raise File_Error) 598 -- except after a successful call to Open, Parse or an instantiation 599 -- of For_Every_Line. 600 601 ----------------------------- 602 -- For_Every_Line iterator -- 603 ----------------------------- 604 605 generic 606 with procedure Action (Quit : in out Boolean); 607 procedure For_Every_Line 608 (Separators : String := Use_Current; 609 Filename : String := Use_Current; 610 Callbacks : Callback_Mode := None; 611 Session : Session_Type); 612 generic 613 with procedure Action (Quit : in out Boolean); 614 procedure For_Every_Line_Current_Session 615 (Separators : String := Use_Current; 616 Filename : String := Use_Current; 617 Callbacks : Callback_Mode := None); 618 -- This is another iterator. Action will be called for each new 619 -- record. The iterator's termination can be controlled by setting Quit 620 -- to True. It is by default set to False. It is possible to specify a 621 -- filename and a set of separators directly. This offer a quick way to 622 -- parse a single file. These parameters will override those specified by 623 -- Set_FS and Add_File. By default the registered callbacks are not called 624 -- by For_Every_Line, this can activated by setting Callbacks (see 625 -- Callback_Mode description above). The Session will be opened and 626 -- closed automatically. File_Error is raised if there is no file 627 -- associated with Session. It raises Session_Error is Session is already 628 -- open. 629 630private 631 type Session_Data; 632 type Session_Data_Access is access Session_Data; 633 634 type Session_Type is new Ada.Finalization.Limited_Controlled with record 635 Data : Session_Data_Access; 636 Self : not null access Session_Type := Session_Type'Unchecked_Access; 637 end record; 638 639 procedure Initialize (Session : in out Session_Type); 640 procedure Finalize (Session : in out Session_Type); 641 642end GNAT.AWK; 643