1\input texinfo @c -*-texinfo-*- 2@c %**start of header (This is for running Texinfo on a region.) 3@setfilename gawkinet.info 4@settitle TCP/IP Internetworking With @command{gawk} 5@c %**end of header (This is for running Texinfo on a region.) 6@c FIXME: web vs. Web 7@c Correct spelling of web is still under discussion. 8@c https://english.stackexchange.com/questions/120869/should-i-capitalize-the-word-web-in-this-sentence 9@c We leave the many occurrences of web in this file as they are. 10 11@dircategory Network applications 12@direntry 13* awkinet: (gawkinet). TCP/IP Internetworking With `gawk'. 14@end direntry 15 16@iftex 17@set DOCUMENT book 18@set CHAPTER chapter 19@set SECTION section 20@set DARKCORNER @inmargin{@image{lflashlight,1cm}, @image{rflashlight,1cm}} 21@end iftex 22@ifinfo 23@set DOCUMENT Info file 24@set CHAPTER major node 25@set SECTION node 26@set DARKCORNER (d.c.) 27@end ifinfo 28@ifhtml 29@set DOCUMENT web page 30@set CHAPTER chapter 31@set SECTION section 32@set DARKCORNER (d.c.) 33@end ifhtml 34 35@set FSF 36 37@set FN file name 38@set FFN File Name 39 40@c merge the function and variable indexes into the concept index 41@ifinfo 42@synindex fn cp 43@synindex vr cp 44@end ifinfo 45@iftex 46@syncodeindex fn cp 47@syncodeindex vr cp 48@end iftex 49 50@c If "finalout" is commented out, the printed output will show 51@c black boxes that mark lines that are too long. Thus, it is 52@c unwise to comment it out when running a master in case there are 53@c overfulls which are deemed okay. 54 55@iftex 56@finalout 57@end iftex 58 59@smallbook 60 61@c Special files are described in chapter 6 Printing Output under 62@c 6.7 Special File Names in gawk. I think the networking does not 63@c fit into that chapter, thus this separate document. At over 50 64@c pages, I think this is the right decision. ADR. 65 66@set TITLE TCP/IP Internetworking with @command{gawk} 67@set EDITION 1.6 68@set UPDATE-MONTH November, 2020 69@c gawk versions: 70@set VERSION 5.1 71@set PATCHLEVEL 0 72 73@copying 74This is Edition @value{EDITION} of @cite{@value{TITLE}}, 75for the @value{VERSION}.@value{PATCHLEVEL} (or later) version of the GNU 76implementation of AWK. 77@sp 2 78Copyright (C) 2000, 2001, 2002, 2004, 2009, 2010, 2016, 2019, 2020, 2021 79Free Software Foundation, Inc. 80@sp 2 81Permission is granted to copy, distribute and/or modify this document 82under the terms of the GNU Free Documentation License, Version 1.3 or 83any later version published by the Free Software Foundation; with the 84Invariant Sections being ``GNU General Public License'', the Front-Cover 85texts being (a) (see below), and with the Back-Cover Texts being (b) 86(see below). A copy of the license is included in the section entitled 87``GNU Free Documentation License''. 88 89@enumerate a 90@item 91``A GNU Manual'' 92 93@item 94``You have the freedom to 95copy and modify this GNU manual. Buying copies from the FSF 96supports it in developing GNU and promoting software freedom.'' 97@end enumerate 98@end copying 99 100@setchapternewpage odd 101 102@titlepage 103@title @value{TITLE} 104@subtitle Edition @value{EDITION} 105@subtitle @value{UPDATE-MONTH} 106@author J@"urgen Kahrs 107@author with Arnold D. Robbins 108 109@c Include the Distribution inside the titlepage environment so 110@c that headings are turned off. Headings on and off do not work. 111 112@page 113@vskip 0pt plus 1filll 114@sp 2 115Published by: 116@sp 1 117 118Free Software Foundation @* 11951 Franklin Street, Fifth Floor @* 120Boston, MA 02110-1301 USA @* 121Phone: +1-617-542-5942 @* 122Fax: +1-617-542-2652 @* 123Email: @email{gnu@@gnu.org} @* 124URL: @uref{http://www.gnu.org/} @* 125 126ISBN 1-882114-93-0 @* 127 128@insertcopying 129 130@c @sp 2 131@c Cover art by ?????. 132@end titlepage 133 134@iftex 135@headings off 136@evenheading @thispage@ @ @ @strong{@value{TITLE}} @| @| 137@oddheading @| @| @strong{@thischapter}@ @ @ @thispage 138@end iftex 139 140@ifnottex 141@node Top, Preface, (dir), (dir) 142@top General Introduction 143@comment node-name, next, previous, up 144 145This file documents the networking features in GNU Awk (@command{gawk}) 146version 4.0 and later. 147 148@insertcopying 149@end ifnottex 150 151@menu 152* Preface:: About this document. 153* Introduction:: About networking. 154* Using Networking:: Some examples. 155* Some Applications and Techniques:: More extended examples. 156* Links:: Where to find the stuff mentioned in this 157 document. 158* GNU Free Documentation License:: The license for this document. 159* Index:: The index. 160 161@detailmenu 162* Stream Communications:: Sending data streams. 163* Datagram Communications:: Sending self-contained messages. 164* The TCP/IP Protocols:: How these models work in the Internet. 165* Basic Protocols:: The basic protocols. 166* Ports:: The idea behind ports. 167* Making Connections:: Making TCP/IP connections. 168* Gawk Special Files:: How to do @command{gawk} networking. 169* Special File Fields:: The fields in the special file name. 170* Comparing Protocols:: Differences between the protocols. 171* File /inet/tcp:: The TCP special file. 172* File /inet/udp:: The UDP special file. 173* TCP Connecting:: Making a TCP connection. 174* Troubleshooting:: Troubleshooting TCP/IP connections. 175* Interacting:: Interacting with a service. 176* Setting Up:: Setting up a service. 177* Email:: Reading email. 178* Web page:: Reading a Web page. 179* Primitive Service:: A primitive Web service. 180* Interacting Service:: A Web service with interaction. 181* CGI Lib:: A simple CGI library. 182* Simple Server:: A simple Web server. 183* Caveats:: Network programming caveats. 184* Challenges:: Where to go from here. 185* PANIC:: An Emergency Web Server. 186* GETURL:: Retrieving Web Pages. 187* REMCONF:: Remote Configuration Of Embedded Systems. 188* URLCHK:: Look For Changed Web Pages. 189* WEBGRAB:: Extract Links From A Page. 190* STATIST:: Graphing A Statistical Distribution. 191* MAZE:: Walking Through A Maze In Virtual Reality. 192* MOBAGWHO:: A Simple Mobile Agent. 193* STOXPRED:: Stock Market Prediction As A Service. 194* PROTBASE:: Searching Through A Protein Database. 195@end detailmenu 196@end menu 197 198@contents 199 200@node Preface, Introduction, Top, Top 201@unnumbered Preface 202 203In May of 1997, J@"urgen Kahrs felt the need for network access 204from @command{awk}, and, with a little help from me, set about adding 205features to do this for @command{gawk}. At that time, he 206wrote the bulk of this @value{DOCUMENT}. 207 208The code and documentation were added to the @command{gawk} 3.1 development 209tree, and languished somewhat until I could finally get 210down to some serious work on that version of @command{gawk}. 211This finally happened in the middle of 2000. 212 213Meantime, J@"urgen wrote an article about the Internet special 214files and @samp{|&} operator for @cite{Linux Journal}, and made a 215networking patch for the production versions of @command{gawk} 216available from his home page. 217In August of 2000 (for @command{gawk} 3.0.6), this patch 218also made it to the main GNU @command{ftp} distribution site. 219 220For release with @command{gawk}, I edited J@"urgen's prose 221for English grammar and style, as he is not a native English 222speaker. I also 223rearranged the material somewhat for what I felt was a better order of 224presentation, and (re)wrote some of the introductory material. 225 226The majority of this document and the code are his work, and the 227high quality and interesting ideas speak for themselves. It is my 228hope that these features will be of significant value to the @command{awk} 229community. 230 231@sp 1 232@noindent 233Arnold Robbins @* 234Nof Ayalon, ISRAEL @* 235March, 2001 236 237@c system if test ! -d eg ; then mkdir eg ; fi 238@c system if test ! -d eg/network ; then mkdir eg/network ; fi 239@node Introduction, Using Networking, Preface, Top 240@chapter Networking Concepts 241 242This @value{CHAPTER} provides a (necessarily) brief introduction to 243computer networking concepts. For many applications of @command{gawk} 244to TCP/IP networking, we hope that this is enough. For more 245advanced tasks, you will need deeper background, and it may be necessary 246to switch to lower-level programming in C or C++. 247 248There are two real-life models for the way computers send messages 249to each other over a network. While the analogies are not perfect, 250they are close enough to convey the major concepts. 251These two models are the phone system (reliable byte-stream communications), 252and the postal system (best-effort datagrams). 253 254@menu 255* Stream Communications:: Sending data streams. 256* Datagram Communications:: Sending self-contained messages. 257* The TCP/IP Protocols:: How these models work in the Internet. 258* Making Connections:: Making TCP/IP connections. 259@end menu 260 261@node Stream Communications, Datagram Communications, Introduction, Introduction 262@section Reliable Byte-streams (Phone Calls) 263 264When you make a phone call, the following steps occur: 265 266@enumerate 267@item 268You dial a number. 269 270@item 271The phone system connects to the called party, telling 272them there is an incoming call. (Their phone rings.) 273 274@item 275The other party answers the call, or, in the case of a 276computer network, refuses to answer the call. 277 278@item 279Assuming the other party answers, the connection between 280you is now a @dfn{duplex} (two-way), @dfn{reliable} (no data lost), 281sequenced (data comes out in the order sent) data stream. 282 283@item 284You and your friend may now talk freely, with the phone system 285moving the data (your voices) from one end to the other. 286From your point of view, you have a direct end-to-end 287connection with the person on the other end. 288@end enumerate 289 290The same steps occur in a duplex reliable computer networking connection. 291There is considerably more overhead in setting up the communications, 292but once it's done, data moves in both directions, reliably, in sequence. 293 294@node Datagram Communications, The TCP/IP Protocols, Stream Communications, Introduction 295@section Best-effort Datagrams (Mailed Letters) 296 297Suppose you mail three different documents to your office on the 298other side of the country on two different days. Doing so 299entails the following. 300 301@enumerate 302@item 303Each document travels in its own envelope. 304 305@item 306Each envelope contains both the sender and the 307recipient address. 308 309@item 310Each envelope may travel a different route to its destination. 311 312@item 313The envelopes may arrive in a different order from the one 314in which they were sent. 315 316@item 317One or more may get lost in the mail. 318(Although, fortunately, this does not occur very often.) 319 320@item 321In a computer network, one or more @dfn{packets} 322may also arrive multiple times. (This doesn't happen 323with the postal system!) 324 325@end enumerate 326 327The important characteristics of datagram communications, like 328those of the postal system are thus: 329 330@itemize @bullet 331@item 332Delivery is ``best effort;'' the data may never get there. 333 334@item 335Each message is self-contained, including the source and 336destination addresses. 337 338@item 339Delivery is @emph{not} sequenced; packets may arrive out 340of order, and/or multiple times. 341 342@item 343Unlike the phone system, overhead is considerably lower. 344It is not necessary to set up the call first. 345@end itemize 346 347The price the user pays for the lower overhead of datagram communications 348is exactly the lower reliability; it is often necessary for user-level 349protocols that use datagram communications to add their own reliability 350features on top of the basic communications. 351 352@node The TCP/IP Protocols, Making Connections, Datagram Communications, Introduction 353@section The Internet Protocols 354 355The Internet Protocol Suite (usually referred to as just TCP/IP)@footnote{It 356should be noted that although the Internet seems to have conquered the 357world, there are other networking protocol suites in existence and in use.} 358consists of a number of different protocols at different levels or ``layers.'' 359For our purposes, three protocols provide the fundamental communications 360mechanisms. All other defined protocols are referred to as user-level 361protocols (e.g., HTTP, used later in this @value{DOCUMENT}). 362 363@menu 364* Basic Protocols:: The basic protocols. 365* Ports:: The idea behind ports. 366@end menu 367 368@node Basic Protocols, Ports, The TCP/IP Protocols, The TCP/IP Protocols 369@subsection The Basic Internet Protocols 370 371@table @asis 372@item IP 373The Internet Protocol. This protocol is almost never used directly by 374applications. It provides the basic packet delivery and routing infrastructure 375of the Internet. Much like the phone company's switching centers or the Post 376Office's trucks, it is not of much day-to-day interest to the regular user 377(or programmer). 378It happens to be a best effort datagram protocol. 379In the early twenty-first century, there are two versions of this protocol 380in use: 381 382@table @asis 383@item IPv4 384The original version of the Internet Protocol, with 32-bit addresses, on which 385most of the current Internet is based. 386 387@item IPv6 388The ``next generation'' of the Internet Protocol, with 128-bit addresses. 389This protocol is in wide use in certain parts of the world, but has not 390yet replaced IPv4.@footnote{There isn't an IPv5.} 391@end table 392 393Versions of the other protocols that sit ``atop'' IP exist for both 394IPv4 and IPv6. However, as the IPv6 versions are fundamentally the same 395as the original IPv4 versions, we will not distinguish further between them. 396 397@item UDP 398The User Datagram Protocol. This is a best effort datagram protocol. 399It provides a small amount of extra reliability over IP, and adds 400the notion of @dfn{ports}, described in @ref{Ports, ,TCP and UDP Ports}. 401 402@item TCP 403The Transmission Control Protocol. This is a duplex, reliable, sequenced 404byte-stream protocol, again layered on top of IP, and also providing the 405notion of ports. This is the protocol that you will most likely use 406when using @command{gawk} for network programming. 407@end table 408 409All other user-level protocols use either TCP or UDP to do their basic 410communications. Examples are SMTP (Simple Mail Transfer Protocol), 411FTP (File Transfer Protocol), and HTTP (HyperText Transfer Protocol). 412@cindex SMTP (Simple Mail Transfer Protocol) 413@cindex Simple Mail Transfer Protocol (SMTP) 414@cindex FTP (File Transfer Protocol) 415@cindex HTTP (Hypertext Transfer Protocol) 416 417@node Ports, , Basic Protocols, The TCP/IP Protocols 418@subsection TCP and UDP Ports 419 420In the postal system, the address on an envelope indicates a physical 421location, such as a residence or office building. But there may be 422more than one person at the location; thus you have to further quantify 423the recipient by putting a person or company name on the envelope. 424 425In the phone system, one phone number may represent an entire company, 426in which case you need a person's extension number in order to 427reach that individual directly. Or, when you call a home, you have to 428say, ``May I please speak to ...'' before talking to the person directly. 429 430IP networking provides the concept of addressing. An IP address represents 431a particular computer, but no more. In order to reach the mail service 432on a system, or the FTP or WWW service on a system, you must have some 433way to further specify which service you want. In the Internet Protocol suite, 434this is done with @dfn{port numbers}, which represent the services, much 435like an extension number used with a phone number. 436 437Port numbers are 16-bit integers. Unix and Unix-like systems reserve ports 438below 1024 for ``well known'' services, such as SMTP, FTP, and HTTP. 439Numbers 1024 and above may be used by any application, although there is no 440promise made that a particular port number is always available. 441 442@node Making Connections, , The TCP/IP Protocols, Introduction 443@section Making TCP/IP Connections (And Some Terminology) 444 445Two terms come up repeatedly when discussing networking: 446@dfn{client} and @dfn{server}. For now, we'll discuss these terms 447at the @dfn{connection level}, when first establishing connections 448between two processes on different systems over a network. 449(Once the connection is established, the higher level, or 450@dfn{application level} protocols, 451such as HTTP or FTP, determine who is the client and who is the 452server. Often, it turns out that the client and server are the 453same in both roles.) 454 455@cindex servers 456The @dfn{server} is the system providing the service, such as the 457web server or email server. It is the @dfn{host} (system) which 458is @emph{connected to} in a transaction. 459For this to work though, the server must be expecting connections. 460Much as there has to be someone at the office building to answer 461the phone,@footnote{In the days before voice mail systems!} the 462server process (usually) has to be started first and be waiting 463for a connection. 464 465@cindex clients 466The @dfn{client} is the system requesting the service. 467It is the system @emph{initiating the connection} in a transaction. 468(Just as when you pick up the phone to call an office or store.) 469 470In the TCP/IP framework, each end of a connection is represented by a pair 471of (@var{address}, @var{port}) pairs. For the duration of the connection, 472the ports in use at each end are unique, and cannot be used simultaneously 473by other processes on the same system. (Only after closing a connection 474can a new one be built up on the same port. This is contrary to the usual 475behavior of fully developed web servers which have to avoid situations 476in which they are not reachable. We have to pay this price in order to 477enjoy the benefits of a simple communication paradigm in @command{gawk}.) 478 479@cindex blocking 480@cindex synchronous communications 481Furthermore, once the connection is established, communications are 482@dfn{synchronous}.@footnote{For the technically savvy, data reads 483block---if there's no incoming data, the program is made to wait until 484there is, instead of receiving a ``there's no data'' error return.} I.e., 485each end waits on the other to finish transmitting, before replying. This 486is much like two people in a phone conversation. While both could talk 487simultaneously, doing so usually doesn't work too well. 488 489In the case of TCP, the synchronicity is enforced by the protocol when 490sending data. Data writes @dfn{block} until the data have been received on the 491other end. For both TCP and UDP, data reads block until there is incoming 492data waiting to be read. This is summarized in the following table, 493where an ``x'' indicates that the given action blocks. 494 495@ifnottex 496@multitable {Protocol} {Reads} {Writes} 497@item TCP @tab x @tab x 498@item UDP @tab x @tab 499@end multitable 500@end ifnottex 501@tex 502\centerline{ 503\vbox{\bigskip % space above the table (about 1 linespace) 504% Because we have vertical rules, we can't let TeX insert interline space 505% in its usual way. 506\offinterlineskip 507\halign{\hfil\strut# &\vrule #& \hfil#\hfil& \hfil#\hfil\cr 508Protocol&&\quad Reads\quad &Writes\cr 509\noalign{\hrule} 510\omit&height 2pt\cr 511\noalign{\hrule height0pt}% without this the rule does not extend; why? 512TCP&&X&X\cr 513UDP&&X&\cr 514}}} 515@end tex 516 517@node Using Networking, Some Applications and Techniques, Introduction, Top 518@comment node-name, next, previous, up 519@chapter Networking With @command{gawk} 520 521@cindex networks @subentry @command{gawk} and 522@cindex @command{gawk} @subentry networking 523The @command{awk} programming language was originally developed as a 524pattern-matching language for writing short programs to perform 525data manipulation tasks. 526@command{awk}'s strength is the manipulation of textual data 527that is stored in files. 528It was never meant to be used for networking purposes. 529To exploit its features in a 530networking context, it's necessary to use an access mode for network connections 531that resembles the access of files as closely as possible. 532 533@cindex Perl 534@cindex Python 535@cindex Tcl/Tk 536@command{awk} is also meant to be a prototyping language. It is used 537to demonstrate feasibility and to play with features and user interfaces. 538This can be done with file-like handling of network 539connections. 540@command{gawk} trades the lack 541of many of the advanced features of the TCP/IP family of protocols 542for the convenience of simple connection handling. 543The advanced 544features are available when programming in C or Perl. In fact, the 545network programming 546in this @value{CHAPTER} 547is very similar to what is described in books such as 548@cite{Internet Programming with Python}, 549@cite{Advanced Perl Programming}, 550or 551@cite{Web Client Programming with Perl}. 552 553@cindex Perl @subentry @command{gawk} networking and 554@cindex Python @subentry @command{gawk} networking and 555@cindex Tcl/Tk @subentry @command{gawk} and 556However, you can do the programming here without first having to learn object-oriented 557ideology; underlying languages such as Tcl/Tk, Perl, Python; or all of 558the libraries necessary to extend these languages before they are ready for the Internet. 559 560@cindex Transmission Control Protocol @seeentry{TCP} 561@cindex TCP (Transmission Control Protocol) 562This @value{CHAPTER} demonstrates how to use the TCP protocol. The 563UDP protocol is much less important for most users. 564 565@menu 566* Gawk Special Files:: How to do @command{gawk} networking. 567* TCP Connecting:: Making a TCP connection. 568* Troubleshooting:: Troubleshooting TCP/IP connections. 569* Interacting:: Interacting with a service. 570* Setting Up:: Setting up a service. 571* Email:: Reading email. 572* Web page:: Reading a Web page. 573* Primitive Service:: A primitive Web service. 574* Interacting Service:: A Web service with interaction. 575* Simple Server:: A simple Web server. 576* Caveats:: Network programming caveats. 577* Challenges:: Where to go from here. 578@end menu 579 580@node Gawk Special Files, TCP Connecting, Using Networking, Using Networking 581@comment node-name, next, previous, up 582@section @command{gawk}'s Networking Mechanisms 583 584The @samp{|&} operator for use in 585communicating with a @dfn{coprocess} is described in 586@ref{Two-way I/O, ,Two-way Communications With Another Process, gawk, GAWK: Effective AWK Programming}. 587It shows how to do two-way I/O to a 588separate process, sending it data with @code{print} or @code{printf} and 589reading data with @code{getline}. If you haven't read it already, you should 590detour there to do so. 591 592@command{gawk} transparently extends the two-way I/O mechanism to simple networking through 593the use of special @value{FN}s. When a ``coprocess'' that matches 594the special files we are about to describe 595is started, @command{gawk} creates the appropriate network 596connection, and then two-way I/O proceeds as usual. 597 598@c last comma is part of see-also 599@cindex input/output, two-way, @seealso{@command{gawk}, networking} 600@cindex TCP/IP @subentry sockets and 601At the C, C++, and Perl level, networking is accomplished 602via @dfn{sockets}, an Application Programming Interface (API) originally 603developed at the University of California at Berkeley that is now used 604almost universally for TCP/IP networking. 605Socket level programming, while fairly straightforward, requires paying 606attention to a number of details, as well as using binary data. It is not 607well-suited for use from a high-level language like @command{awk}. 608The special files provided in @command{gawk} hide the details from 609the programmer, making things much simpler and easier to use. 610@c Who sez we can't toot our own horn occasionally? 611 612@cindex filenames, for network access 613@cindex @command{gawk} @subentry networking @subentry filenames 614@cindex networks @subentry @command{gawk} and @subentry filenames 615The special @value{FN} for network access is made up of several fields, all 616of which are mandatory: 617 618@example 619/@var{net-type}/@var{protocol}/@var{localport}/@var{hostname}/@var{remoteport} 620@end example 621 622@cindex @code{/inet/} files (@command{gawk}) 623@cindex files @subentry @code{/inet/} (@command{gawk}) 624@cindex localport field 625@cindex remoteport field 626The @var{net-type} field lets you specify IPv4 versus IPv6, or lets 627you allow the system to choose. 628 629@menu 630* Special File Fields:: The fields in the special file name. 631* Comparing Protocols:: Differences between the protocols. 632@end menu 633 634@node Special File Fields, Comparing Protocols, Gawk Special Files, Gawk Special Files 635@subsection The Fields of the Special @value{FFN} 636This @value{SECTION} explains the meaning of all of the fields, 637as well as the range of values and the defaults. 638All of the fields are mandatory. To let the system pick a value, 639or if the field doesn't apply to the protocol, specify it as @samp{0} (zero): 640 641@table @var 642@cindex network type field 643@c last comma is part of secondary 644@cindex TCP/IP @subentry network type, selecting 645@item net-type 646This is one of @samp{inet4} for IPv4, @samp{inet6} for IPv6, 647or @samp{inet} to use the system default (which is likely to be IPv4). 648For the rest of this document, we will use the generic @samp{/inet} 649in our descriptions of how @command{gawk}'s networking works. 650 651@cindex protocol field 652@c last comma is part of secondary 653@cindex TCP/IP @subentry protocols, selecting 654@item protocol 655Determines which member of the TCP/IP 656family of protocols is selected to transport the data across the 657network. There are two possible values (always written in lowercase): 658@samp{tcp} and @samp{udp}. The exact meaning of each is 659explained later in this @value{SECTION}. 660 661@item localport 662@cindex networks @subentry ports @subentry specifying 663Determines which port on the local 664machine is used to communicate across the network. Application-level clients 665usually use @samp{0} to indicate they do not care which local port is 666used---instead they specify a remote port to connect to. 667 668It is vital for 669application-level servers to use a number different from @samp{0} here 670because their service has to be available at a specific publicly known 671port number. It is possible to use a name from @file{/etc/services} here. 672 673@item hostname 674@cindex hostname field 675@cindex servers @subentry as hosts 676Determines which remote host is to 677be at the other end of the connection. 678Application-level clients must enter a name different from @samp{0}. 679The name can be either symbolic 680(e.g., @samp{jpl-devvax.jpl.nasa.gov}) or numeric (e.g., @samp{128.149.1.143}). 681 682Application-level servers must fill 683this field with a @samp{0} to indicate their being open for all other hosts 684to connect to them and enforce connection level server behavior this way. 685It is not possible for an application-level server to restrict its 686availability to one remote host by entering a host name here. 687 688@item remoteport 689Determines which port on the remote 690machine is used to communicate across the network. 691For @file{/inet/tcp} and @file{/inet/udp}, 692application-level clients @emph{must} use a number 693other than @samp{0} to indicate to which port on the remote machine 694they want to connect. 695 696Application-level servers must not fill this field with 697a @samp{0}. Instead they specify a local port to which clients connect. 698It is possible to use a name from @file{/etc/services} here. 699@end table 700 701@cindex networks @subentry @command{gawk} and @subentry connections 702@cindex @command{gawk} @subentry networking @subentry connections 703Experts in network programming will notice that the usual 704client/server asymmetry found at the level of the socket API is not visible 705here. This is for the sake of simplicity of the high-level concept. If this 706asymmetry is necessary for your application, 707use another language. 708For @command{gawk}, it is 709more important to enable users to write a client program with a minimum 710of code. What happens when first accessing a network connection is seen 711in the following pseudocode: 712 713@smallexample 714if ((name of remote host given) && (other side accepts connection)) @{ 715 rendez-vous successful; transmit with getline or print 716@} else @{ 717 if ((other side did not accept) && (localport == 0)) 718 exit unsuccessful 719 if (TCP) @{ 720 set up a server accepting connections 721 this means waiting for the client on the other side to connect 722 @} else 723 ready 724@} 725@end smallexample 726 727The exact behavior of this algorithm depends on the values of the 728fields of the special @value{FN}. When in doubt, @ref{table-inet-components} 729gives you the combinations of values and their meaning. If this 730table is too complicated, focus on the three lines printed in 731@strong{bold}. All the examples in 732@ref{Using Networking, ,Networking With @command{gawk}}, 733use only the 734patterns printed in bold letters. 735 736@float Table,table-inet-components 737@caption{@code{/inet} Special File Components} 738@multitable @columnfractions .15 .15 .15 .15 .40 739@headitem @sc{protocol} @tab @sc{local port} @tab @sc{host name} 740@tab @sc{remote port} @tab @sc{Resulting connection-level behavior} 741@item @strong{tcp} @tab @strong{0} @tab @strong{x} @tab @strong{x} @tab 742 @strong{Dedicated client, fails if immediately connecting to a 743 server on the other side fails} 744@item udp @tab 0 @tab x @tab x @tab Dedicated client 745@item @strong{tcp, udp} @tab @strong{x} @tab @strong{x} @tab @strong{x} @tab 746 @strong{Client, switches to dedicated server if necessary} 747@item @strong{tcp, udp} @tab @strong{x} @tab @strong{0} @tab @strong{0} @tab 748 @strong{Dedicated server} 749@item tcp, udp @tab x @tab x @tab 0 @tab Invalid 750@item tcp, udp @tab 0 @tab 0 @tab x @tab Invalid 751@item tcp, udp @tab x @tab 0 @tab x @tab Invalid 752@item tcp, udp @tab 0 @tab 0 @tab 0 @tab Invalid 753@item tcp, udp @tab 0 @tab x @tab 0 @tab Invalid 754@end multitable 755@end float 756 757In general, TCP is the preferred mechanism to use. It is the simplest 758protocol to understand and to use. Use UDP only if circumstances 759demand low-overhead. 760 761@node Comparing Protocols, , Special File Fields, Gawk Special Files 762@subsection Comparing Protocols 763 764This @value{SECTION} develops a pair of programs (sender and receiver) 765that do nothing but send a timestamp from one machine to another. The 766sender and the receiver are implemented with each of the two protocols 767available and demonstrate the differences between them. 768 769@menu 770* File /inet/tcp:: The TCP special file. 771* File /inet/udp:: The UDP special file. 772@end menu 773 774@node File /inet/tcp, File /inet/udp, Comparing Protocols, Comparing Protocols 775@subsubsection @file{/inet/tcp} 776@cindex @code{/inet/tcp} special files (@command{gawk}) 777@cindex files @subentry @code{/inet/tcp} (@command{gawk}) 778@cindex TCP (Transmission Control Protocol) 779Once again, always use TCP. 780(Use UDP when low overhead is a necessity.) 781The first example is the sender 782program: 783 784@example 785# Server 786BEGIN @{ 787 print strftime() |& "/inet/tcp/8888/0/0" 788 close("/inet/tcp/8888/0/0") 789@} 790@end example 791 792The receiver is very simple: 793 794@example 795# Client 796BEGIN @{ 797 "/inet/tcp/0/localhost/8888" |& getline 798 print $0 799 close("/inet/tcp/0/localhost/8888") 800@} 801@end example 802 803TCP guarantees that the bytes arrive at the receiving end in exactly 804the same order that they were sent. No byte is lost 805(except for broken connections), doubled, or out of order. Some 806overhead is necessary to accomplish this, but this is the price to pay for 807a reliable service. 808It does matter which side starts first. The sender/server has to be started 809first, and it waits for the receiver to read a line. 810 811@node File /inet/udp, , File /inet/tcp, Comparing Protocols 812@subsubsection @file{/inet/udp} 813@cindex @code{/inet/udp} special files (@command{gawk}) 814@cindex files @subentry @code{/inet/udp} (@command{gawk}) 815@cindex UDP (User Datagram Protocol) 816@cindex User Datagram Protocol @seeentry{UDP} 817The server and client programs that use UDP are almost identical to their TCP counterparts; 818only the @var{protocol} has changed. As before, it does matter which side 819starts first. The receiving side blocks and waits for the sender. 820In this case, the receiver/client has to be started first: 821 822@example 823# Server 824BEGIN @{ 825 print strftime() |& "/inet/udp/8888/0/0" 826 close("/inet/udp/8888/0/0") 827@} 828@end example 829 830The receiver is almost identical to the TCP receiver: 831 832@example 833# Client 834BEGIN @{ 835 print "hi!" |& "/inet/udp/0/localhost/8888" 836 "/inet/udp/0/localhost/8888" |& getline 837 print $0 838 close("/inet/udp/0/localhost/8888") 839@} 840@end example 841 842In the case of UDP, the initial @code{print} command is the one 843that actually sends data so that there is a connection. 844UDP and ``connection'' sounds strange to anyone 845who has learned that UDP is a connectionless protocol. 846Here, ``connection'' means that the @code{connect()} system call 847has completed its work and completed the ``association'' 848between a certain socket and an IP address. Thus there are 849subtle differences between @code{connect()} for TCP and UDP; 850see the man page for details.@footnote{This subtlety 851is just one of many details that are hidden in the socket 852API, invisible and intractable for the @command{gawk} user. 853The developers are currently considering how to rework the 854network facilities to make them easier to understand and use.} 855 856UDP cannot guarantee that the datagrams at the receiving end will arrive in exactly 857the same order they were sent. Some datagrams could be 858lost, some doubled, and some could arrive out of order. 859But no overhead is necessary to 860accomplish this. This unreliable behavior is good enough for tasks 861such as data acquisition, logging, and even stateless services like 862the original versions of NFS. 863 864@node TCP Connecting, Troubleshooting, Gawk Special Files, Using Networking 865@section Establishing a TCP Connection 866 867@cindex TCP (Transmission Control Protocol) @subentry connection, establishing 868@cindex networks @subentry @command{gawk} and @subentry connections 869@cindex @command{gawk} @subentry networking @subentry connections 870Let's observe a network connection at work. Type in the following program 871and watch the output. Within a second, it connects via TCP (@file{/inet/tcp}) 872to a remote server and asks the service 873@samp{daytime} on the machine what time it is: 874 875@cindex @code{getline} command 876@example 877@c file eg/network/daytimeclient.awk 878BEGIN @{ 879 daytime_server = "time-a-g.nist.gov" 880 daytime_connection = "/inet/tcp/0/" daytime_server "/daytime" 881 daytime_connection |& getline 882 print $0 883 daytime_connection |& getline 884 print $0 885 close(daytime_connection) 886@} 887@c endfile 888@end example 889 890Even experienced @command{awk} users will find the fourth and sixth line 891strange in two respects: 892 893@itemize @bullet 894@item 895A string containing the name of a special file is used as a shell command that pipes its output 896into @code{getline}. One would rather expect to see the special file 897being read like any other file (@samp{getline < 898"/inet/tcp/0/time-a-g.nist.gov/daytime"}). 899 900@item 901@cindex @code{|} (vertical bar), @code{|&} operator (I/O) 902@cindex vertical bar (@code{|}), @code{|&} operator (I/O) 903The operator @samp{|&} has not been part of any @command{awk} 904implementation (until now). 905It is actually the only extension of the @command{awk} 906language needed (apart from the special files) to introduce network access. 907@end itemize 908 909@cindex pipes, networking and 910The @samp{|&} operator was introduced in @command{gawk} 3.1 in order to 911overcome the crucial restriction that access to files and pipes in 912@command{awk} is always unidirectional. It was formerly impossible to use 913both access modes on the same file or pipe. Instead of changing the whole 914concept of file access, the @samp{|&} operator 915behaves exactly like the usual pipe operator except for two additions: 916 917@itemize @bullet 918@item 919Normal shell commands connected to their @command{gawk} program with a @samp{|&} 920pipe can be accessed bidirectionally. The @samp{|&} turns out to be a quite 921general, useful, and natural extension of @command{awk}. 922 923@item 924Pipes that consist of a special @value{FN} for network connections are not 925executed as shell commands. Instead, they can be read and written to, just 926like a full-duplex network connection. 927@end itemize 928 929In the earlier example, the @samp{|&} operator tells @code{getline} 930to read a line from the special file @file{/inet/tcp/0/time-a-g.nist.gov/daytime}. 931We could also have printed a line into the special file. But instead we just 932consumed an empty leading line, printed it, then read a line with the time, 933printed that, and closed the connection. 934(While we could just let @command{gawk} close the connection by finishing 935the program, in this @value{DOCUMENT} 936we are pedantic and always explicitly close the connections.) 937 938Network services like @file{daytime} are not really useful because 939there are so many better ways to print the current time. 940In the early days of TCP networking, such a service may have looked 941like a good idea for testing purposes. Later, simple TCP services 942like these have been used to teach TCP/IP networking and therefore 943you can still find much educational material of good quality on the 944Internet about such outdated services. The 945@uref{https://tf.nist.gov/tf-cgi/servers.cgi, list of servers} 946that still support the legacy service 947@uref{https://en.wikipedia.org/wiki/Daytime_Protocol, daytime} 948can be found at Wikipedia. We hesitated to use this service in 949this manual because it is hard to find servers that still support 950services like @file{daytime} openly to the Internet. 951Later on we will see that some of these nostalgic 952protocols have turned into security risks. 953 954@node Troubleshooting, Interacting, TCP Connecting, Using Networking 955@section Troubleshooting Connection Problems 956@cindex advanced features, network connections 957@c last comma is part of secondary 958@cindex troubleshooting @subentry networks @subentry connections 959It may well be that for some reason the program shown in the previous example does not run on your 960machine. When looking at possible reasons for this, you will learn much 961about typical problems that arise in network programming. 962@ignore 963First of all, 964your implementation of @command{gawk} may not support network access 965because it is 966a pre-3.1 version or you do not have a network interface in your machine. 967Perhaps your machine uses some other protocol, such as 968DECnet or Novell's IPX. 969@end ignore 970 971For the rest of this @value{CHAPTER}, we will assume you work on a POSIX-style 972system that supports TCP/IP. If the previous example program does not 973run on your machine, it may help to replace the value assigned to the variable 974@samp{daytime_server} with the name (or the IP address) of another server 975from the list mentioned above. 976Now you should see the date and time being printed by the program, 977otherwise you may have run out of servers that support the @samp{daytime} service. 978 979Try changing the service to @samp{chargen} or @samp{ftp}. This way, the program 980connects to other services that should give you some response. If you are 981curious, you should have a look at your @file{/etc/services} file. It could 982look like this: 983 984@smallexample 985# /etc/services: 986# 987# Network services, Internet style 988# 989# Name Number/Protocol Alternate name # Comments 990 991echo 7/tcp 992echo 7/udp 993discard 9/tcp sink null 994discard 9/udp sink null 995daytime 13/tcp 996daytime 13/udp 997chargen 19/tcp ttytst source 998chargen 19/udp ttytst source 999ftp 21/tcp 1000telnet 23/tcp 1001smtp 25/tcp mail 1002finger 79/tcp 1003www 80/tcp http # WorldWideWeb HTTP 1004www 80/udp # HyperText Transfer Protocol 1005pop-2 109/tcp postoffice # POP version 2 1006pop-2 109/udp 1007pop-3 110/tcp # POP version 3 1008pop-3 110/udp 1009nntp 119/tcp readnews untp # USENET News 1010irc 194/tcp # Internet Relay Chat 1011irc 194/udp 1012@dots{} 1013@end smallexample 1014 1015@cindex Linux 1016@cindex GNU/Linux 1017@cindex Microsoft Windows @subentry networking 1018Here, you find a list of services that traditional Unix machines usually 1019support. If your GNU/Linux machine does not do so, it may be that these 1020services are switched off in some startup script. Systems running some 1021flavor of Microsoft Windows usually do @emph{not} support these services. 1022Nevertheless, it @emph{is} possible to do networking with @command{gawk} on 1023Microsoft 1024Windows.@footnote{Microsoft preferred to ignore the TCP/IP 1025family of protocols until 1995. Then came the rise of the Netscape browser 1026as a landmark ``killer application.'' Microsoft added TCP/IP support and 1027their own browser to Microsoft Windows 95 at the last minute. They even back-ported 1028their TCP/IP implementation to Microsoft Windows for Workgroups 3.11, but it was 1029a rather rudimentary and half-hearted implementation. Nevertheless, 1030the equivalent of @file{/etc/services} resides under 1031@file{C:\WINNT\system32\drivers\etc\services} on Microsoft Windows 2000 1032and Microsoft Windows XP. 1033On Microsoft Windows 7, 8 and 10 there is a directory 1034@file{%WinDir%\System32\Drivers\Etc} 1035that holds the 1036@uref{https://support.microsoft.com/en-us/help/972034/how-to-reset-the-hosts-file-back-to-the-default, @file{hosts} file} 1037and probably also a 1038@uref{https://www.ibm.com/support/knowledgecenter/SSRNYG_7.2.1/com.ibm.rational.synergy.install.win.doc/topics/sg_r_igw_services_file.html, @file{services} file}.} 1039The first column of the file gives the name of the service, and 1040the second column gives a unique number and the protocol that one can use to connect to 1041this service. 1042The rest of the line is treated as a comment. 1043You see that some services (@samp{echo}) support TCP as 1044well as UDP. 1045 1046@node Interacting, Setting Up, Troubleshooting, Using Networking 1047@section Interacting with a Network Service 1048 1049The next program begins really interacting with a 1050network service by printing something into the special file. It asks the 1051so-called @command{finger} service if a user of the machine is logged in. When 1052testing this program, try to change the variable @samp{finger_server} 1053to some other machine name in your local network: 1054@c This really worked in 2020. 1055@c Thanks to some people at cmu.edu who keep this service alive. 1056@c https://www.techrepublic.com/article/everything-you-need-to-know-about-tcp-ips-finger-utility/ 1057 1058@example 1059@c file eg/network/fingerclient.awk 1060BEGIN @{ 1061 finger_server = "andrew.cmu.edu" 1062 finger_connection = "/inet/tcp/0/" finger_server "/finger" 1063 print "wnace" |& finger_connection 1064 while ((finger_connection |& getline) > 0) 1065 print $0 1066 close(finger_connection) 1067@} 1068@c endfile 1069@end example 1070 1071After telling the service on the machine which user to look for, 1072the program repeatedly reads lines that come as a reply. When no more 1073lines are available (because the service has closed the connection), the 1074program also closes the connection. If you tried to replace @samp{finger_server} 1075with some other server name, the script probably reported being unable to 1076open the connection, because most servers today no longer support this 1077service. Try replacing the login name of Professor Nace (@code{wnace}) 1078with another login name (like @code{help}). You will receive a list of 1079login names similar to the one you asked for. In the 1980s you could get 1080a list of all users currently logged in by asking for an empty string (@code{""}). 1081 1082@cindex Linux 1083@cindex GNU/Linux 1084The final @code{close()} call could be safely deleted from 1085the above script, because the operating system closes any open connection 1086by default when a script reaches the end of execution. But, in order to avoid 1087portability problems, it is best to always close connections explicitly. 1088@c FIXME: This following statement isn't really true; gawk flushes 1089@c and closes all open files before exiting. 1090With the Linux kernel, 1091for example, proper closing results in flushing of buffers. Letting 1092the close happen by default may result in discarding buffers. 1093 1094When looking at @file{/etc/services} you may have noticed that the 1095@samp{daytime} service is also available with @samp{udp}. In the earlier 1096examples, change @samp{tcp} to @samp{udp} and try if the @samp{finger} and @samp{daytime} 1097clients still work as expected. They probably will not respond because 1098a wise administrator switched off these services. 1099But if they do, you may see the expected day and time message. 1100The program then hangs, because it waits for more lines to come from the 1101service. However, they never do. This behavior is a consequence of the 1102differences between TCP and UDP. When using UDP, neither party is 1103automatically informed about the other closing the connection. 1104Continuing to experiment this way reveals many other subtle 1105differences between TCP and UDP. To avoid such trouble, you should always 1106remember the advice Douglas E.@: Comer and David Stevens give in 1107Volume III of their series @cite{Internetworking With TCP} 1108(page 14): 1109 1110@cindex TCP (Transmission Control Protocol) @subentry UDP and 1111@cindex UDP (User Datagram Protocol) @subentry TCP and 1112@cindex Internet @seeentry{networks} 1113@quotation 1114When designing client-server applications, beginners are strongly 1115advised to use TCP because it provides reliable, connection-oriented 1116communication. Programs only use UDP if the application protocol handles 1117reliability, the application requires hardware broadcast or multicast, 1118or the application cannot tolerate virtual circuit overhead. 1119@end quotation 1120 1121This advice is actually quite dated and we hesitated to repeat it here. 1122But we left it in because we are still observing beginners running 1123into this pitfall. While this advice has aged quite well, some other 1124ideas from the 1980s have not. The @samp{finger} service may still be 1125available in Microsoft 1126@uref{https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/finger, Windows Server 2019}, 1127but it turned out to be a never-ending cause of trouble. First of all, 1128it is now obvious that a server should never reveal personal data about 1129its users to anonymous client software that connects over the wild wild Internet. 1130So every server on the Internet should reject @samp{finger} requests 1131(by disabling the port and by disabling the software serving this port). 1132But things got even worse in 2020 when it turned out that even the client 1133software (the @samp{finger} command documented in the link above) is a 1134security problem. A tool called 1135@uref{https://seclists.org/fulldisclosure/2020/Sep/30, DarkFinger} 1136allows to leverage the Microsoft Windows @samp{finger.exe} as a file downloader 1137and help evade network security devices. 1138 1139@node Setting Up, Email, Interacting, Using Networking 1140@section Setting Up a Service 1141@c last comma is part of tertiary 1142@cindex networks @subentry @command{gawk} and @subentry service@comma{} establishing 1143@c last comma is part of tertiary 1144@cindex @command{gawk} @subentry networking @subentry service@comma{} establishing 1145The preceding programs behaved as clients that connect to a server somewhere 1146on the Internet and request a particular service. Now we set up such a 1147service to mimic the behavior of the @samp{daytime} service. 1148Such a server does not know in advance who is going to connect to it over 1149the network. Therefore, we cannot insert a name for the host to connect to 1150in our special @value{FN}. 1151 1152Start the following program in one window. Notice that the service does 1153not have the name @samp{daytime}, but the number @samp{8888}. 1154From looking at @file{/etc/services}, you know that names like @samp{daytime} 1155are just mnemonics for predetermined 16-bit integers. 1156Only the system administrator (@code{root}) could enter 1157our new service into @file{/etc/services} with an appropriate name. 1158Also notice that the service name has to be entered into a different field 1159of the special @value{FN} because we are setting up a server, not a client: 1160 1161@cindex @command{finger} utility 1162@cindex servers 1163@example 1164@c file eg/network/daytimeserver.awk 1165BEGIN @{ 1166 print strftime() |& "/inet/tcp/8888/0/0" 1167 close("/inet/tcp/8888/0/0") 1168@} 1169@c endfile 1170@end example 1171 1172Now open another window on the same machine. 1173Copy the client program given as the first example 1174(@pxref{TCP Connecting, ,Establishing a TCP Connection}) 1175to a new file and edit it, changing the variable @samp{daytime_server} to 1176@samp{localhost} and the port name @samp{daytime} to @samp{8888}. 1177Then start the modified client. You should get a reply like this: 1178 1179@example 1180$ @kbd{gawk -f awklib/eg/network/daytimeclient.awk} 1181@print{} Sun Dec 27 17:33:57 CET 2020 1182@print{} Sun Dec 27 17:33:57 CET 2020 1183@end example 1184 1185@noindent 1186Both programs explicitly close the connection. 1187 1188@c first comma is part of primary 1189@cindex Microsoft Windows @subentry networking @subentry ports 1190@cindex networks @subentry ports @subentry reserved 1191@cindex Unix, network ports and 1192Now we will intentionally make a mistake to see what happens when the name 1193@samp{8888} (the port) is already used by another service. 1194Start the server 1195program in both windows. The first one works, but the second one 1196complains that it could not open the connection. Each port on a single 1197machine can only be used by one server program at a time. Now terminate the 1198server program and change the name @samp{8888} to @samp{echo}. After restarting it, 1199the server program does not run any more, and you know why: there is already 1200an @samp{echo} service running on your machine. But even if this isn't true, 1201you would not get 1202your own @samp{echo} server running on a Unix machine, 1203because the ports with numbers smaller 1204than 1024 (@samp{echo} is at port 7) are reserved for @code{root}. 1205On machines running some flavor of Microsoft Windows, there is no restriction 1206that reserves ports 1 to 1024 for a privileged user; hence, you can start 1207an @samp{echo} server there. 1208Even in later version of Microsoft Windows, this restriction of 1209the Unix world seems to have never been adopted 1210@uref{https://social.technet.microsoft.com/Forums/windowsserver/en-US/334f0770-eda9-475a-a27f-46b80ab7e872/does-windows10server2016-have-privileged-ports-?forum=ws2016, 1211@cite{Does windows(10/server-2016) have privileged ports?}}. 1212In Microsoft Windows it is the level of the firewall that handles 1213port access restrictions, not the level of the operating system's kernel. 1214 1215Turning this short server program into something really useful is simple. 1216Imagine a server that first reads a @value{FN} from the client through the 1217network connection, then does something with the file and 1218sends a result back to the client. The server-side processing 1219could be: 1220 1221@example 1222@c file eg/network/catpipeserver.awk 1223BEGIN @{ 1224 NetService = "/inet/tcp/8888/0/0" 1225 NetService |& getline # sets $0 and the fields 1226 CatPipe = ("cat " $1) 1227 while ((CatPipe | getline) > 0) 1228 print $0 |& NetService 1229 close(NetService) 1230@} 1231@c endfile 1232@end example 1233 1234@noindent 1235and we would 1236have a remote copying facility. Such a server reads the name of a file 1237from any client that connects to it and transmits the contents of the 1238named file across the net. The server-side processing could also be 1239the execution of a command that is transmitted across the network. From this 1240example, you can see how simple it is to open up a security hole on your 1241machine. If you allow clients to connect to your machine and 1242execute arbitrary commands, anyone would be free to do @samp{rm -rf *}. 1243 1244The client side connects to port number 8888 on the server side and 1245sends the name of the desired file to be sent across the same TCP 1246connection. The main loop reads all content coming in from the TCP 1247connection line-wise and prints it. 1248 1249@example 1250@c file eg/network/catpipeclient.awk 1251BEGIN @{ 1252 NetService = "/inet/tcp/0/localhost/8888" 1253 print "README" |& NetService 1254 while ((NetService |& getline) > 0) 1255 print $0 1256 close(NetService) 1257@} 1258@c endfile 1259@end example 1260 1261@node Email, Web page, Setting Up, Using Networking 1262@section Reading Email 1263@cindex RFC 1939 1264@cindex RFC 821 1265@cindex @command{gawk} @subentry networking @subentry email 1266@cindex networks @subentry @command{gawk} and @subentry email 1267@cindex POP (Post Office Protocol) 1268@cindex SMTP (Simple Mail Transfer Protocol) 1269@cindex Post Office Protocol (POP) 1270@cindex Simple Mail Transfer Protocol (SMTP) 1271The distribution of email is usually done by dedicated email servers that 1272communicate with your machine using special protocols. 1273In this @value{SECTION} we show how simple the basic steps are.@footnote{No, 1274things are @emph{not} that simple any more. Things @emph{were} that simple 1275when email was young in the 20th century. These days, unencrypted plaintext 1276authentication is usually disallowed on non-secure connections. 1277Since encryption of network connections is not supported in @command{gawk}, 1278you should not use @command{gawk} to write such scripts. 1279We left this @value{SECTION} as it is because it demonstrates how 1280application level protocols work in principle (a command being issued 1281by the client followed by a reply coming back). Unfortunately, modern 1282application level protocols are much more flexible in the sequence of 1283actions. For example, modern POP3 servers may introduce themselves 1284with an unprompted initial line that arrives before the initial command. 1285Dealing with such variance is not worth the effort in @command{gawk}.} 1286@c FIXME: This would be the proper place to refer to Arnold's work on 1287@c writing SMTP client and server. 1288 1289To receive email, we use the Post Office Protocol (POP). Sending can 1290be done with the much older Simple Mail Transfer Protocol (SMTP). 1291 1292@cindex email 1293When you type in the following program, replace the @var{emailhost} by the 1294name of your local email server. Ask your administrator if the server has a 1295POP service, and then use its name or number in the program below. 1296Now the program is ready to connect to your email server, but it will not 1297succeed in retrieving your mail because it does not yet know your login 1298name or password. Replace them in the program and it 1299shows you the first email the server has in store: 1300 1301@example 1302@c file eg/network/mailpopclient.awk 1303BEGIN @{ 1304 POPService = "/inet/tcp/0/@var{emailhost}/pop3" 1305 RS = ORS = "\r\n" 1306 print "user @var{name}" |& POPService 1307 POPService |& getline 1308 print "pass @var{password}" |& POPService 1309 POPService |& getline 1310 print "retr 1" |& POPService 1311 POPService |& getline 1312 if ($1 != "+OK") exit 1313 print "quit" |& POPService 1314 RS = "\r\n\\.\r\n" 1315 POPService |& getline 1316 print $0 1317 close(POPService) 1318@} 1319@c endfile 1320@end example 1321 1322@cindex RFC 1939 1323@cindex record separators @subentry POP and 1324@cindex @code{RS} variable @subentry POP and 1325@cindex @code{ORS} variable @subentry POP and 1326@cindex POP (Post Office Protocol) 1327We redefine the record separators @code{RS} and @code{ORS} because the 1328protocol (POP) requires CR-LF to separate lines. After identifying 1329yourself to the email service, the command @samp{retr 1} instructs the 1330service to send the first of all your email messages in line. If the service 1331replies with something other than @samp{+OK}, the program exits; maybe there 1332is no email. Otherwise, the program first announces that it intends to finish 1333reading email, and then redefines @code{RS} in order to read the entire 1334email as multiline input in one record. From the POP RFC, we know that the body 1335of the email always ends with a single line containing a single dot. 1336The program looks for this using @samp{RS = "\r\n\\.\r\n"}. 1337When it finds this sequence in the mail message, it quits. 1338You can invoke this program as often as you like; it does not delete the 1339message it reads, but instead leaves it on the server. 1340 1341@node Web page, Primitive Service, Email, Using Networking 1342@section Reading a Web Page 1343@cindex web pages 1344@cindex HTTP (Hypertext Transfer Protocol) 1345@cindex Hypertext Transfer Protocol @seeentry{HTTP} 1346@cindex RFC 2068 1347@cindex RFC 2616 1348 1349Retrieving a web page from a web server is as simple as 1350retrieving email from an email server. We only have to use a 1351similar, but not identical, protocol and a different port. The name of the 1352protocol is HyperText Transfer Protocol (HTTP) and the port number is usually 135380. As in the preceding @value{SECTION}, ask your administrator about the 1354name of your local web server or proxy web server and its port number 1355for HTTP requests. 1356 1357The following program employs a rather crude approach toward retrieving a 1358web page. It uses the prehistoric syntax of HTTP 0.9, which almost all 1359web servers still support. The most noticeable thing about it is that the 1360program directs the request to the local proxy server whose name you insert 1361in the special @value{FN} (which in turn calls @samp{www.yahoo.com}): 1362 1363@example 1364BEGIN @{ 1365 RS = ORS = "\r\n" 1366 HttpService = "/inet/tcp/0/@var{proxy}/80" 1367 print "GET http://www.yahoo.com" |& HttpService 1368 while ((HttpService |& getline) > 0) 1369 print $0 1370 close(HttpService) 1371@} 1372@end example 1373 1374@cindex RFC 1945 1375@cindex record separators @subentry HTTP and 1376@cindex @code{RS} variable @subentry HTTP and 1377@cindex @code{ORS} variable @subentry HTTP and 1378@cindex HTTP (Hypertext Transfer Protocol) @subentry record separators and 1379@cindex HTML (Hypertext Markup Language) 1380@cindex Hypertext Markup Language (HTML) 1381Again, lines are separated by a redefined @code{RS} and @code{ORS}. 1382The @code{GET} request that we send to the server is the only kind of 1383HTTP request that existed when the web was created in the early 1990s. 1384HTTP calls this @code{GET} request a ``method,'' which tells the 1385service to transmit a web page (here the home page of the Yahoo! search 1386engine). Version 1.0 added the request methods @code{HEAD} and 1387@code{POST}. The current version of HTTP is 1.1,@footnote{Version 1.0 of 1388HTTP was defined in RFC 1945. HTTP 1.1 was initially specified in RFC 13892068. In June 1999, RFC 2068 was made obsolete by RFC 2616, an update 1390without any substantial changes.}@footnote{@uref{https://en.wikipedia.org/wiki/HTTP/2, 1391Version 2.0 of HTTP} 1392was defined in 1393@uref{https://tools.ietf.org/html/rfc7540,RFC7540} 1394and was derived from Google's 1395@uref{https://en.wikipedia.org/wiki/SPDY,SPDY} 1396protocol. It is said to be widely supported. As of 2020 the most popular 1397web sites still identify themselves as supporting HTTP/1.1. 1398@uref{https://en.wikipedia.org/wiki/HTTP/3, Version 3.0 of HTTP} 1399is still a draft and was derived from Google's 1400@uref{https://en.wikipedia.org/wiki/QUIC,QUIC} protocol.} 1401and knows the additional request 1402methods @code{OPTIONS}, @code{PUT}, @code{DELETE}, and @code{TRACE}. 1403You can fill in any valid web address, and the program prints the 1404HTML code of that page to your screen. 1405 1406Notice the similarity between the responses of the POP and HTTP 1407services. First, you get a header that is terminated by an empty line, and 1408then you get the body of the page in HTML. The lines of the headers also 1409have the same form as in POP. There is the name of a parameter, 1410then a colon, and finally the value of that parameter. 1411 1412@cindex CGI (Common Gateway Interface) @subentry dynamic web pages and 1413@cindex Common Gateway Interface @seeentry{CGI} 1414@cindex GIF image format 1415@cindex PNG image format 1416@cindex images @subentry retrieving over networks 1417Images (@file{.png} or @file{.gif} files) can also be retrieved this way, 1418but then you 1419get binary data that should be redirected into a file. Another 1420application is calling a CGI (Common Gateway Interface) script on some 1421server. CGI scripts are used when the contents of a web page are not 1422constant, but generated on demand at the moment you send a request 1423for the page. For example, to get a detailed report about the current 1424quotes of Motorola stock shares, call a CGI script at Yahoo! with 1425the following: 1426 1427@example 1428get = "GET http://quote.yahoo.com/q?s=MOT&d=t" 1429print get |& HttpService 1430@end example 1431 1432You can also request weather reports this way. 1433 1434@node Primitive Service, Interacting Service, Web page, Using Networking 1435@section A Primitive Web Service 1436@cindex web service 1437Now we know enough about HTTP to set up a primitive web service that just 1438says @code{"Hello, world"} when someone connects to it with a browser. 1439Compared 1440to the situation in the preceding @value{SECTION}, our program changes the role. It 1441tries to behave just like the server we have observed. Since we are setting 1442up a server here, we have to insert the port number in the @samp{localport} 1443field of the special @value{FN}. The other two fields (@var{hostname} and 1444@var{remoteport}) have to contain a @samp{0} because we do not know in 1445advance which host will connect to our service. 1446 1447In the early 1990s, all a server had to do was send an HTML document and 1448close the connection. Here, we adhere to the modern syntax of HTTP. 1449The steps are as follows: 1450 1451@enumerate 1 1452@item 1453Send a status line telling the web browser that everything 1454is okay. 1455 1456@item 1457Send a line to tell the browser how many bytes follow in the 1458body of the message. This was not necessary earlier because both 1459parties knew that the document ended when the connection closed. Nowadays 1460it is possible to stay connected after the transmission of one web page. 1461This avoids the network traffic necessary for repeatedly establishing 1462TCP connections for requesting several images. Thus, it is necessary to tell 1463the receiving party how many bytes will be sent. The header is terminated 1464as usual with an empty line. 1465 1466@item 1467Send the @code{"Hello, world"} body 1468in HTML. 1469The useless @code{while} loop swallows the request of the browser. 1470We could actually omit the loop, and on most machines the program would still 1471work. 1472First, start the following program: 1473@end enumerate 1474 1475@example 1476@c file eg/network/hello-serv.awk 1477BEGIN @{ 1478 RS = ORS = "\r\n" 1479 HttpService = "/inet/tcp/8080/0/0" 1480 Hello = "<HTML><HEAD>" \ 1481 "<TITLE>A Famous Greeting</TITLE></HEAD>" \ 1482 "<BODY><H1>Hello, world</H1></BODY></HTML>" 1483 Len = length(Hello) + length(ORS) 1484 print "HTTP/1.0 200 OK" |& HttpService 1485 print "Content-Length: " Len ORS |& HttpService 1486 print Hello |& HttpService 1487 while ((HttpService |& getline) > 0) 1488 continue; 1489 close(HttpService) 1490@} 1491@c endfile 1492@end example 1493 1494Now, on the same machine, start your favorite browser and let it point to 1495@uref{http://localhost:8080} (the browser needs to know on which port 1496our server is listening for requests). If this does not work, the browser 1497probably tries to connect to a proxy server that does not know your machine. 1498If so, change the browser's configuration so that the browser does not try to 1499use a proxy to connect to your machine. 1500 1501@node Interacting Service, Simple Server, Primitive Service, Using Networking 1502@section A Web Service with Interaction 1503@cindex @command{gawk} @subentry web and @seeentry{web service} 1504@cindex web browsers, @seeentry{web service} 1505@c comma is part of primary 1506@cindex HTTP server, core logic 1507@cindex servers @subentry HTTP 1508@ifinfo 1509This node shows how to set up a simple web server. 1510The subnode is a library file that we will use with all the examples in 1511@ref{Some Applications and Techniques}. 1512@end ifinfo 1513 1514@menu 1515* CGI Lib:: A simple CGI library. 1516@end menu 1517 1518Setting up a web service that allows user interaction is more difficult and 1519shows us the limits of network access in @command{gawk}. In this @value{SECTION}, 1520we develop a main program (a @code{BEGIN} pattern and its action) 1521that will become the core of event-driven execution controlled by a 1522graphical user interface (GUI). 1523Each HTTP event that the user triggers by some action within the browser 1524is received in this central procedure. Parameters and menu choices are 1525extracted from this request, and an appropriate measure is taken according to 1526the user's choice: 1527 1528@cindex HTTP server, core logic 1529@example 1530BEGIN @{ 1531 if (MyHost == "") @{ 1532 "uname -n" | getline MyHost 1533 close("uname -n") 1534 @} 1535 if (MyPort == 0) MyPort = 8080 1536 HttpService = "/inet/tcp/" MyPort "/0/0" 1537 MyPrefix = "http://" MyHost ":" MyPort 1538 SetUpServer() 1539 while ("awk" != "complex") @{ 1540 # header lines are terminated this way 1541 RS = ORS = "\r\n" 1542 Status = 200 # this means OK 1543 Reason = "OK" 1544 Header = TopHeader 1545 Document = TopDoc 1546 Footer = TopFooter 1547 if (GETARG["Method"] == "GET") @{ 1548 HandleGET() 1549 @} else if (GETARG["Method"] == "HEAD") @{ 1550 # not yet implemented 1551 @} else if (GETARG["Method"] != "") @{ 1552 print "bad method", GETARG["Method"] 1553 @} 1554 Prompt = Header Document Footer 1555 print "HTTP/1.0", Status, Reason |& HttpService 1556 print "Connection: Close" |& HttpService 1557 print "Pragma: no-cache" |& HttpService 1558 len = length(Prompt) + length(ORS) 1559 print "Content-length:", len |& HttpService 1560 print ORS Prompt |& HttpService 1561 # ignore all the header lines 1562 while ((HttpService |& getline) > 0) 1563 ; 1564 # stop talking to this client 1565 close(HttpService) 1566 # wait for new client request 1567 HttpService |& getline 1568 # do some logging 1569 print systime(), strftime(), $0 1570 # read request parameters 1571 CGI_setup($1, $2, $3) 1572 @} 1573@} 1574@end example 1575 1576This web server presents menu choices in the form of HTML links. 1577Therefore, it has to tell the browser the name of the host it is 1578residing on. When starting the server, the user may supply the name 1579of the host from the command line with @samp{gawk -v MyHost="Rumpelstilzchen"}. 1580If the user does not do this, the server looks up the name of the host it is 1581running on for later use as a web address in HTML documents. The same 1582applies to the port number. These values are inserted later into the 1583HTML content of the web pages to refer to the home system. 1584 1585Each server that is built around this core has to initialize some 1586application-dependent variables (such as the default home page) in a function 1587@code{SetUpServer()}, which is called immediately before entering the 1588infinite loop of the server. For now, we will write an instance that 1589initiates a trivial interaction. With this home page, the client user 1590can click on two possible choices, and receive the current date either 1591in human-readable format or in seconds since 1970: 1592 1593@example 1594function SetUpServer() @{ 1595 TopHeader = "<HTML><HEAD>" 1596 TopHeader = TopHeader \ 1597 "<title>My name is GAWK, GNU AWK</title></HEAD>" 1598 TopDoc = "<BODY><h2>\ 1599 Do you prefer your date <A HREF=" MyPrefix \ 1600 "/human>human</A> or \ 1601 <A HREF=" MyPrefix "/POSIX>POSIXed</A>?</h2>" ORS ORS 1602 TopFooter = "</BODY></HTML>" 1603@} 1604@end example 1605 1606On the first run through the main loop, the default line terminators are 1607set and the default home page is copied to the actual home page. Since this 1608is the first run, @code{GETARG["Method"]} is not initialized yet, hence the 1609case selection over the method does nothing. Now that the home page is 1610initialized, the server can start communicating to a client browser. 1611 1612@cindex RFC 2068 1613It does so by printing the HTTP header into the network connection 1614(@samp{print @dots{} |& HttpService}). This command blocks execution of 1615the server script until a client connects. 1616 1617If you compare this server 1618script with the primitive one we wrote before, you will notice 1619two additional lines in the header. The first instructs the browser 1620to close the connection after each request. The second tells the 1621browser that it should never try to @emph{remember} earlier requests 1622that had identical web addresses (no caching). Otherwise, it could happen 1623that the browser retrieves the time of day in the previous example just once, 1624and later it takes the web page from the cache, always displaying the same 1625time of day although time advances each second. 1626 1627Having supplied the initial home page to the browser with a valid document 1628stored in the parameter @code{Prompt}, it closes the connection and waits 1629for the next request. When the request comes, a log line is printed that 1630allows us to see which request the server receives. The final step in the 1631loop is to call the function @code{CGI_setup()}, which reads all the lines 1632of the request (coming from the browser), processes them, and stores the 1633transmitted parameters in the array @code{PARAM}. The complete 1634text of these application-independent functions can be found in 1635@ref{CGI Lib, ,A Simple CGI Library}. 1636For now, we use a simplified version of @code{CGI_setup()}: 1637 1638@example 1639function CGI_setup( method, uri, version, i) @{ 1640 delete GETARG; delete MENU; delete PARAM 1641 GETARG["Method"] = $1 1642 GETARG["URI"] = $2 1643 GETARG["Version"] = $3 1644 i = index($2, "?") 1645 # is there a "?" indicating a CGI request? 1646@group 1647 if (i > 0) @{ 1648 split(substr($2, 1, i-1), MENU, "[/:]") 1649 split(substr($2, i+1), PARAM, "&") 1650 for (i in PARAM) @{ 1651 j = index(PARAM[i], "=") 1652 GETARG[substr(PARAM[i], 1, j-1)] = \ 1653 substr(PARAM[i], j+1) 1654 @} 1655 @} else @{ # there is no "?", no need for splitting PARAMs 1656 split($2, MENU, "[/:]") 1657 @} 1658@end group 1659@} 1660@end example 1661 1662At first, the function clears all variables used for 1663global storage of request parameters. The rest of the function serves 1664the purpose of filling the global parameters with the extracted new values. 1665To accomplish this, the name of the requested resource is split into 1666parts and stored for later evaluation. If the request contains a @samp{?}, 1667then the request has CGI variables seamlessly appended to the web address. 1668Everything in front of the @samp{?} is split up into menu items, and 1669everything behind the @samp{?} is a list of @samp{@var{variable}=@var{value}} pairs 1670(separated by @samp{&}) that also need splitting. This way, CGI variables are 1671isolated and stored. This procedure lacks recognition of special characters 1672that are transmitted in coded form@footnote{As defined in RFC 2068.}. Here, any 1673optional request header and body parts are ignored. We do not need 1674header parameters and the request body. However, when refining our approach or 1675working with the @code{POST} and @code{PUT} methods, reading the header 1676and body 1677becomes inevitable. Header parameters should then be stored in a global 1678array as well as the body. 1679 1680On each subsequent run through the main loop, one request from a browser is 1681received, evaluated, and answered according to the user's choice. This can be 1682done by letting the value of the HTTP method guide the main loop into 1683execution of the procedure @code{HandleGET()}, which evaluates the user's 1684choice. In this case, we have only one hierarchical level of menus, 1685but in the general case, 1686menus are nested. 1687The menu choices at each level are 1688separated by @samp{/}, just as in @value{FN}s. Notice how simple it is to 1689construct menus of arbitrary depth: 1690 1691@example 1692function HandleGET() @{ 1693 if ( MENU[2] == "human") @{ 1694 Footer = strftime() TopFooter 1695 @} else if (MENU[2] == "POSIX") @{ 1696 Footer = systime() TopFooter 1697 @} 1698@} 1699@end example 1700 1701The disadvantage of this approach is that our server is slow and can 1702handle only one request at a time. Its main advantage, however, is that 1703the server 1704consists of just one @command{gawk} program. No need for installing an 1705@command{httpd}, and no need for static separate HTML files, CGI scripts, or 1706@code{root} privileges. This is rapid prototyping. 1707This program can be started on the same host that runs your browser. 1708Then let your browser point to @uref{http://localhost:8080}. 1709 1710@cindex XBM image format 1711@cindex images @subentry in web pages 1712@cindex web pages @subentry images in 1713@cindex GNUPlot utility 1714It is also possible to include images into the HTML pages. 1715Most browsers support the not very well-known 1716@file{.xbm} format, 1717which may contain only 1718monochrome pictures but is an ASCII format. Binary images are possible but 1719not so easy to handle. Another way of including images is to generate them 1720with a tool such as GNUPlot, 1721by calling the tool with the @code{system()} function or through a pipe. 1722 1723@node CGI Lib, , Interacting Service, Interacting Service 1724@subsection A Simple CGI Library 1725@quotation 1726@i{HTTP is like being married: you have to be able to handle whatever 1727you're given, while being very careful what you send back.}@* 1728@author Phil Smith III,@* @uref{http://www.netfunny.com/rhf/jokes/99/Mar/http.html} 1729@end quotation 1730 1731@cindex CGI (Common Gateway Interface) @subentry library 1732In @ref{Interacting Service, ,A Web Service with Interaction}, 1733we saw the function @code{CGI_setup()} as part of the web server 1734``core logic'' framework. The code presented there handles almost 1735everything necessary for CGI requests. 1736One thing it doesn't do is handle encoded characters in the requests. 1737For example, an @samp{&} is encoded as a percent sign followed by 1738the hexadecimal value: @samp{%26}. These encoded values should be 1739decoded. 1740Following is a simple library to perform these tasks. 1741This code is used for all web server examples 1742throughout the rest of this @value{DOCUMENT}. 1743If you want to use it for your own web server, store the source code 1744into a file named @file{inetlib.awk}. Then you can include 1745these functions into your code by placing the following statement 1746into your program 1747(on the first line of your script): 1748 1749@example 1750@@include inetlib.awk 1751@end example 1752 1753@c FIXME: Needs revising, now that gawk has @include 1754@noindent 1755But beware, this mechanism is 1756only possible if you invoke your web server script with @command{igawk} 1757instead of the usual @command{awk} or @command{gawk}. 1758Here is the code: 1759 1760@example 1761@c file eg/network/coreserv.awk 1762# CGI Library and core of a web server 1763@c endfile 1764@ignore 1765@c file eg/network/coreserv.awk 1766# 1767# Juergen Kahrs, Juergen.Kahrs@@vr-web.de 1768# with Arnold Robbins, arnold@@skeeve.com 1769# September 2000 1770 1771@c endfile 1772@end ignore 1773@c file eg/network/coreserv.awk 1774# Global arrays 1775# GETARG --- arguments to CGI GET command 1776# MENU --- menu items (path names) 1777# PARAM --- parameters of form x=y 1778 1779# Optional variable MyHost contains host address 1780# Optional variable MyPort contains port number 1781# Needs TopHeader, TopDoc, TopFooter 1782# Sets MyPrefix, HttpService, Status, Reason 1783 1784BEGIN @{ 1785 if (MyHost == "") @{ 1786 "uname -n" | getline MyHost 1787 close("uname -n") 1788 @} 1789 if (MyPort == 0) MyPort = 8080 1790 HttpService = "/inet/tcp/" MyPort "/0/0" 1791 MyPrefix = "http://" MyHost ":" MyPort 1792 SetUpServer() 1793 while ("awk" != "complex") @{ 1794 # header lines are terminated this way 1795 RS = ORS = "\r\n" 1796 Status = 200 # this means OK 1797 Reason = "OK" 1798 Header = TopHeader 1799 Document = TopDoc 1800 Footer = TopFooter 1801 if (GETARG["Method"] == "GET") @{ 1802 HandleGET() 1803 @} else if (GETARG["Method"] == "HEAD") @{ 1804 # not yet implemented 1805 @} else if (GETARG["Method"] != "") @{ 1806 print "bad method", GETARG["Method"] 1807 @} 1808 Prompt = Header Document Footer 1809 print "HTTP/1.0", Status, Reason |& HttpService 1810 print "Connection: Close" |& HttpService 1811 print "Pragma: no-cache" |& HttpService 1812 len = length(Prompt) + length(ORS) 1813 print "Content-length:", len |& HttpService 1814 print ORS Prompt |& HttpService 1815 # ignore all the header lines 1816 while ((HttpService |& getline) > 0) 1817 continue 1818 # stop talking to this client 1819 close(HttpService) 1820 # wait for new client request 1821 HttpService |& getline 1822 # do some logging 1823 print systime(), strftime(), $0 1824 CGI_setup($1, $2, $3) 1825 @} 1826@} 1827 1828function CGI_setup(method, uri, version, i) 1829@{ 1830 delete GETARG 1831 delete MENU 1832 delete PARAM 1833 GETARG["Method"] = method 1834 GETARG["URI"] = uri 1835 GETARG["Version"] = version 1836 1837 i = index(uri, "?") 1838 if (i > 0) @{ # is there a "?" indicating a CGI request? 1839 split(substr(uri, 1, i-1), MENU, "[/:]") 1840 split(substr(uri, i+1), PARAM, "&") 1841 for (i in PARAM) @{ 1842 PARAM[i] = _CGI_decode(PARAM[i]) 1843 j = index(PARAM[i], "=") 1844 GETARG[substr(PARAM[i], 1, j-1)] = \ 1845 substr(PARAM[i], j+1) 1846 @} 1847 @} else @{ # there is no "?", no need for splitting PARAMs 1848 split(uri, MENU, "[/:]") 1849 @} 1850 for (i in MENU) # decode characters in path 1851 if (i > 4) # but not those in host name 1852 MENU[i] = _CGI_decode(MENU[i]) 1853@} 1854@c endfile 1855@end example 1856 1857This isolates details in a single function, @code{CGI_setup()}. 1858Decoding of encoded characters is pushed off to a helper function, 1859@code{_CGI_decode()}. The use of the leading underscore (@samp{_}) in 1860the function name is intended to indicate that it is an ``internal'' 1861function, although there is nothing to enforce this: 1862 1863@example 1864@c file eg/network/coreserv.awk 1865function _CGI_decode(str, hexdigs, i, pre, code1, code2, 1866 val, result) 1867@{ 1868 hexdigs = "123456789abcdef" 1869 1870 i = index(str, "%") 1871 if (i == 0) # no work to do 1872 return str 1873 1874 do @{ 1875 pre = substr(str, 1, i-1) # part before %xx 1876 code1 = substr(str, i+1, 1) # first hex digit 1877 code2 = substr(str, i+2, 1) # second hex digit 1878 str = substr(str, i+3) # rest of string 1879 1880 code1 = tolower(code1) 1881 code2 = tolower(code2) 1882 val = index(hexdigs, code1) * 16 \ 1883 + index(hexdigs, code2) 1884 1885 result = result pre sprintf("%c", val) 1886 i = index(str, "%") 1887 @} while (i != 0) 1888 if (length(str) > 0) 1889 result = result str 1890 return result 1891@} 1892@c endfile 1893@end example 1894 1895This works by splitting the string apart around an encoded character. 1896The two digits are converted to lowercase characters and looked up in a string 1897of hex digits. Note that @code{0} is not in the string on purpose; 1898@code{index()} returns zero when it's not found, automatically giving 1899the correct value! Once the hexadecimal value is converted from 1900characters in a string into a numerical value, @code{sprintf()} 1901converts the value back into a real character. 1902The following is a simple test harness for the above functions: 1903 1904@example 1905@c file eg/network/testserv.awk 1906BEGIN @{ 1907 CGI_setup("GET", 1908 "http://www.gnu.org/cgi-bin/foo?p1=stuff&p2=stuff%26junk" \ 1909 "&percent=a %25 sign", 1910 "1.0") 1911 for (i in MENU) 1912 printf "MENU[\"%s\"] = %s\n", i, MENU[i] 1913 for (i in PARAM) 1914 printf "PARAM[\"%s\"] = %s\n", i, PARAM[i] 1915 for (i in GETARG) 1916 printf "GETARG[\"%s\"] = %s\n", i, GETARG[i] 1917@} 1918@c endfile 1919@end example 1920 1921@c FIXME: Rerun to make sure still correct 1922And this is the result when we run it: 1923 1924@c artificial line wrap in last output line 1925@example 1926$ gawk -f testserv.awk 1927@print{} MENU["4"] = www.gnu.org 1928@print{} MENU["5"] = cgi-bin 1929@print{} MENU["6"] = foo 1930@print{} MENU["1"] = http 1931@print{} MENU["2"] = 1932@print{} MENU["3"] = 1933@print{} PARAM["1"] = p1=stuff 1934@print{} PARAM["2"] = p2=stuff&junk 1935@print{} PARAM["3"] = percent=a % sign 1936@print{} GETARG["p1"] = stuff 1937@print{} GETARG["percent"] = a % sign 1938@print{} GETARG["p2"] = stuff&junk 1939@print{} GETARG["Method"] = GET 1940@print{} GETARG["Version"] = 1.0 1941@print{} GETARG["URI"] = http://www.gnu.org/cgi-bin/foo?p1=stuff& 1942p2=stuff%26junk&percent=a %25 sign 1943@end example 1944 1945@node Simple Server, Caveats, Interacting Service, Using Networking 1946@section A Simple Web Server 1947@cindex web servers 1948@cindex servers @subentry web 1949In the preceding @value{SECTION}, we built the core logic for event-driven GUIs. 1950In this @value{SECTION}, we finally extend the core to a real application. 1951No one would actually write a commercial web server in @command{gawk}, but 1952it is instructive to see that it is feasible in principle. 1953 1954@cindex ELIZA program 1955@cindex Weizenbaum, Joseph 1956The application is ELIZA, the famous program by Joseph Weizenbaum that 1957mimics the behavior of a professional psychotherapist when talking to you. 1958Weizenbaum would certainly object to this description, but this is part of 1959the legend around ELIZA. 1960Take the site-independent core logic and append the following code: 1961 1962@example 1963@c file eg/network/eliza.awk 1964function SetUpServer() @{ 1965 SetUpEliza() 1966 TopHeader = \ 1967 "<HTML><title>An HTTP-based System with GAWK</title>\ 1968 <HEAD><META HTTP-EQUIV=\"Content-Type\"\ 1969 CONTENT=\"text/html; charset=iso-8859-1\"></HEAD>\ 1970 <BODY BGCOLOR=\"#ffffff\" TEXT=\"#000000\"\ 1971 LINK=\"#0000ff\" VLINK=\"#0000ff\"\ 1972 ALINK=\"#0000ff\"> <A NAME=\"top\">" 1973 TopDoc = "\ 1974 <h2>Please choose one of the following actions:</h2>\ 1975 <UL>\ 1976 <LI>\ 1977 <A HREF=" MyPrefix "/AboutServer>About this server</A>\ 1978 </LI><LI>\ 1979 <A HREF=" MyPrefix "/AboutELIZA>About Eliza</A></LI>\ 1980 <LI>\ 1981 <A HREF=" MyPrefix \ 1982 "/StartELIZA>Start talking to Eliza</A></LI></UL>" 1983 TopFooter = "</BODY></HTML>" 1984@} 1985@c endfile 1986@end example 1987 1988@code{SetUpServer()} is similar to the previous example, 1989except for calling another function, @code{SetUpEliza()}. 1990This approach can be used to implement other kinds of servers. 1991The only changes needed to do so are hidden in the functions 1992@code{SetUpServer()} and @code{HandleGET()}. Perhaps it might be necessary to 1993implement other HTTP methods. 1994@c FIXME: @include? 1995The @command{igawk} program that comes with @command{gawk} 1996may be useful for this process. 1997 1998When extending this example to a complete application, the first 1999thing to do is to implement the function @code{SetUpServer()} to 2000initialize the HTML pages and some variables. These initializations 2001determine the way your HTML pages look (colors, titles, menu 2002items, etc.). 2003 2004The function @code{HandleGET()} is a nested case selection that decides 2005which page the user wants to see next. Each nesting level refers to a menu 2006level of the GUI. Each case implements a certain action of the menu. At the 2007deepest level of case selection, the handler essentially knows what the 2008user wants and stores the answer into the variable that holds the HTML 2009page contents: 2010 2011@smallexample 2012@c file eg/network/eliza.awk 2013function HandleGET() @{ 2014 # A real HTTP server would treat some parts of the URI as a file name. 2015 # We take parts of the URI as menu choices and go on accordingly. 2016 if (MENU[2] == "AboutServer") @{ 2017 Document = "This is not a CGI script.\ 2018 This is an httpd, an HTML file, and a CGI script all \ 2019 in one GAWK script. It needs no separate www-server, \ 2020 no installation, and no root privileges.\ 2021 <p>To run it, do this:</p><ul>\ 2022 <li> start this script with \"gawk -f httpserver.awk\",</li>\ 2023 <li> and on the same host let your www browser open location\ 2024 \"http://localhost:8080\"</li>\ 2025 </ul>\<p>\ Details of HTTP come from:</p><ul>\ 2026 <li>Hethmon: Illustrated Guide to HTTP</p>\ 2027 <li>RFC 2068</li></ul><p>JK 14.9.1997</p>" 2028 @} else if (MENU[2] == "AboutELIZA") @{ 2029 Document = "This is an implementation of the famous ELIZA\ 2030 program by Joseph Weizenbaum. It is written in GAWK and\ 2031 uses an HTML GUI." 2032 @} else if (MENU[2] == "StartELIZA") @{ 2033 gsub(/\+/, " ", GETARG["YouSay"]) 2034 # Here we also have to substitute coded special characters 2035 Document = "<form method=GET>" \ 2036 "<h3>" ElizaSays(GETARG["YouSay"]) "</h3>\ 2037 <p><input type=text name=YouSay value=\"\" size=60>\ 2038 <br><input type=submit value=\"Tell her about it\"></p></form>" 2039 @} 2040@} 2041@c endfile 2042@end smallexample 2043 2044Now we are down to the heart of ELIZA, so you can see how it works. 2045Initially the user does not say anything; then ELIZA resets its money 2046counter and asks the user to tell what comes to mind open-heartedly. 2047The subsequent answers are converted to uppercase characters and stored for 2048later comparison. ELIZA presents the bill when being confronted with 2049a sentence that contains the phrase ``shut up.'' Otherwise, it looks for 2050keywords in the sentence, conjugates the rest of the sentence, remembers 2051the keyword for later use, and finally selects an answer from the set of 2052possible answers: 2053 2054@smallexample 2055@c file eg/network/eliza.awk 2056function ElizaSays(YouSay) @{ 2057 if (YouSay == "") @{ 2058 cost = 0 2059 answer = "HI, IM ELIZA, TELL ME YOUR PROBLEM" 2060 @} else @{ 2061 q = toupper(YouSay) 2062 gsub("'", "", q) 2063 if (q == qold) @{ 2064 answer = "PLEASE DONT REPEAT YOURSELF !" 2065 @} else @{ 2066 if (index(q, "SHUT UP") > 0) @{ 2067 answer = "WELL, PLEASE PAY YOUR BILL. ITS EXACTLY ... $"\ 2068 int(100*rand()+30+cost/100) 2069 @} else @{ 2070 qold = q 2071 w = "-" # no keyword recognized yet 2072 for (i in k) @{ # search for keywords 2073 if (index(q, i) > 0) @{ 2074 w = i 2075 break 2076 @} 2077 @} 2078 if (w == "-") @{ # no keyword, take old subject 2079 w = wold 2080 subj = subjold 2081 @} else @{ # find subject 2082 subj = substr(q, index(q, w) + length(w)+1) 2083 wold = w 2084 subjold = subj # remember keyword and subject 2085 @} 2086 for (i in conj) 2087 gsub(i, conj[i], q) # conjugation 2088 # from all answers to this keyword, select one randomly 2089 answer = r[indices[int(split(k[w], indices) * rand()) + 1]] 2090 # insert subject into answer 2091 gsub("_", subj, answer) 2092 @} 2093 @} 2094 @} 2095 cost += length(answer) # for later payment : 1 cent per character 2096 return answer 2097@} 2098@c endfile 2099@end smallexample 2100 2101In the long but simple function @code{SetUpEliza()}, you can see tables 2102for conjugation, keywords, and answers.@footnote{The version shown 2103here is abbreviated. The full version comes with the @command{gawk} 2104distribution.} The associative array @code{k} 2105contains indices into the array of answers @code{r}. To choose an 2106answer, ELIZA just picks an index randomly: 2107 2108@example 2109@c file eg/network/eliza.awk 2110function SetUpEliza() @{ 2111 srand() 2112 wold = "-" 2113 subjold = " " 2114 2115 # table for conjugation 2116 conj[" ARE " ] = " AM " 2117 conj["WERE " ] = "WAS " 2118 conj[" YOU " ] = " I " 2119 conj["YOUR " ] = "MY " 2120 conj[" IVE " ] =\ 2121 conj[" I HAVE " ] = " YOU HAVE " 2122 conj[" YOUVE " ] =\ 2123 conj[" YOU HAVE "] = " I HAVE " 2124 conj[" IM " ] =\ 2125 conj[" I AM " ] = " YOU ARE " 2126 conj[" YOURE " ] =\ 2127 conj[" YOU ARE " ] = " I AM " 2128 2129 # table of all answers 2130 r[1] = "DONT YOU BELIEVE THAT I CAN _" 2131 r[2] = "PERHAPS YOU WOULD LIKE TO BE ABLE TO _ ?" 2132@c endfile 2133 @dots{} 2134@end example 2135@ignore 2136@c file eg/network/eliza.awk 2137 r[3] = "YOU WANT ME TO BE ABLE TO _ ?" 2138 r[4] = "PERHAPS YOU DONT WANT TO _ " 2139 r[5] = "DO YOU WANT TO BE ABLE TO _ ?" 2140 r[6] = "WHAT MAKES YOU THINK I AM _ ?" 2141 r[7] = "DOES IT PLEASE YOU TO BELIEVE I AM _ ?" 2142 r[8] = "PERHAPS YOU WOULD LIKE TO BE _ ?" 2143 r[9] = "DO YOU SOMETIMES WISH YOU WERE _ ?" 2144 r[10] = "DONT YOU REALLY _ ?" 2145 r[11] = "WHY DONT YOU _ ?" 2146 r[12] = "DO YOU WISH TO BE ABLE TO _ ?" 2147 r[13] = "DOES THAT TROUBLE YOU ?" 2148 r[14] = "TELL ME MORE ABOUT SUCH FEELINGS" 2149 r[15] = "DO YOU OFTEN FEEL _ ?" 2150 r[16] = "DO YOU ENJOY FEELING _ ?" 2151 r[17] = "DO YOU REALLY BELIEVE I DONT _ ?" 2152 r[18] = "PERHAPS IN GOOD TIME I WILL _ " 2153 r[19] = "DO YOU WANT ME TO _ ?" 2154 r[20] = "DO YOU THINK YOU SHOULD BE ABLE TO _ ?" 2155 r[21] = "WHY CANT YOU _ ?" 2156 r[22] = "WHY ARE YOU INTERESTED IN WHETHER OR NOT I AM _ ?" 2157 r[23] = "WOULD YOU PREFER IF I WERE NOT _ ?" 2158 r[24] = "PERHAPS IN YOUR FANTASIES I AM _ " 2159 r[25] = "HOW DO YOU KNOW YOU CANT _ ?" 2160 r[26] = "HAVE YOU TRIED ?" 2161 r[27] = "PERHAPS YOU CAN NOW _ " 2162 r[28] = "DID YOU COME TO ME BECAUSE YOU ARE _ ?" 2163 r[29] = "HOW LONG HAVE YOU BEEN _ ?" 2164 r[30] = "DO YOU BELIEVE ITS NORMAL TO BE _ ?" 2165 r[31] = "DO YOU ENJOY BEING _ ?" 2166 r[32] = "WE WERE DISCUSSING YOU -- NOT ME" 2167 r[33] = "Oh, I _" 2168 r[34] = "YOU'RE NOT REALLY TALKING ABOUT ME, ARE YOU ?" 2169 r[35] = "WHAT WOULD IT MEAN TO YOU, IF YOU GOT _ ?" 2170 r[36] = "WHY DO YOU WANT _ ?" 2171 r[37] = "SUPPOSE YOU SOON GOT _" 2172 r[38] = "WHAT IF YOU NEVER GOT _ ?" 2173 r[39] = "I SOMETIMES ALSO WANT _" 2174 r[40] = "WHY DO YOU ASK ?" 2175 r[41] = "DOES THAT QUESTION INTEREST YOU ?" 2176 r[42] = "WHAT ANSWER WOULD PLEASE YOU THE MOST ?" 2177 r[43] = "WHAT DO YOU THINK ?" 2178 r[44] = "ARE SUCH QUESTIONS IN YOUR MIND OFTEN ?" 2179 r[45] = "WHAT IS IT THAT YOU REALLY WANT TO KNOW ?" 2180 r[46] = "HAVE YOU ASKED ANYONE ELSE ?" 2181 r[47] = "HAVE YOU ASKED SUCH QUESTIONS BEFORE ?" 2182 r[48] = "WHAT ELSE COMES TO MIND WHEN YOU ASK THAT ?" 2183 r[49] = "NAMES DON'T INTEREST ME" 2184 r[50] = "I DONT CARE ABOUT NAMES -- PLEASE GO ON" 2185 r[51] = "IS THAT THE REAL REASON ?" 2186 r[52] = "DONT ANY OTHER REASONS COME TO MIND ?" 2187 r[53] = "DOES THAT REASON EXPLAIN ANYTHING ELSE ?" 2188 r[54] = "WHAT OTHER REASONS MIGHT THERE BE ?" 2189 r[55] = "PLEASE DON'T APOLOGIZE !" 2190 r[56] = "APOLOGIES ARE NOT NECESSARY" 2191 r[57] = "WHAT FEELINGS DO YOU HAVE WHEN YOU APOLOGIZE ?" 2192 r[58] = "DON'T BE SO DEFENSIVE" 2193 r[59] = "WHAT DOES THAT DREAM SUGGEST TO YOU ?" 2194 r[60] = "DO YOU DREAM OFTEN ?" 2195 r[61] = "WHAT PERSONS APPEAR IN YOUR DREAMS ?" 2196 r[62] = "ARE YOU DISTURBED BY YOUR DREAMS ?" 2197 r[63] = "HOW DO YOU DO ... PLEASE STATE YOUR PROBLEM" 2198 r[64] = "YOU DON'T SEEM QUITE CERTAIN" 2199 r[65] = "WHY THE UNCERTAIN TONE ?" 2200 r[66] = "CAN'T YOU BE MORE POSITIVE ?" 2201 r[67] = "YOU AREN'T SURE ?" 2202 r[68] = "DON'T YOU KNOW ?" 2203 r[69] = "WHY NO _ ?" 2204 r[70] = "DON'T SAY NO, IT'S ALWAYS SO NEGATIVE" 2205 r[71] = "WHY NOT ?" 2206 r[72] = "ARE YOU SURE ?" 2207 r[73] = "WHY NO ?" 2208 r[74] = "WHY ARE YOU CONCERNED ABOUT MY _ ?" 2209 r[75] = "WHAT ABOUT YOUR OWN _ ?" 2210 r[76] = "CAN'T YOU THINK ABOUT A SPECIFIC EXAMPLE ?" 2211 r[77] = "WHEN ?" 2212 r[78] = "WHAT ARE YOU THINKING OF ?" 2213 r[79] = "REALLY, ALWAYS ?" 2214 r[80] = "DO YOU REALLY THINK SO ?" 2215 r[81] = "BUT YOU ARE NOT SURE YOU _ " 2216 r[82] = "DO YOU DOUBT YOU _ ?" 2217 r[83] = "IN WHAT WAY ?" 2218 r[84] = "WHAT RESEMBLANCE DO YOU SEE ?" 2219 r[85] = "WHAT DOES THE SIMILARITY SUGGEST TO YOU ?" 2220 r[86] = "WHAT OTHER CONNECTION DO YOU SEE ?" 2221 r[87] = "COULD THERE REALLY BE SOME CONNECTIONS ?" 2222 r[88] = "HOW ?" 2223 r[89] = "YOU SEEM QUITE POSITIVE" 2224 r[90] = "ARE YOU SURE ?" 2225 r[91] = "I SEE" 2226 r[92] = "I UNDERSTAND" 2227 r[93] = "WHY DO YOU BRING UP THE TOPIC OF FRIENDS ?" 2228 r[94] = "DO YOUR FRIENDS WORRY YOU ?" 2229 r[95] = "DO YOUR FRIENDS PICK ON YOU ?" 2230 r[96] = "ARE YOU SURE YOU HAVE ANY FRIENDS ?" 2231 r[97] = "DO YOU IMPOSE ON YOUR FRIENDS ?" 2232 r[98] = "PERHAPS YOUR LOVE FOR FRIENDS WORRIES YOU" 2233 r[99] = "DO COMPUTERS WORRY YOU ?" 2234 r[100] = "ARE YOU TALKING ABOUT ME IN PARTICULAR ?" 2235 r[101] = "ARE YOU FRIGHTENED BY MACHINES ?" 2236 r[102] = "WHY DO YOU MENTION COMPUTERS ?" 2237 r[103] = "WHAT DO YOU THINK MACHINES HAVE TO DO WITH YOUR PROBLEMS ?" 2238 r[104] = "DON'T YOU THINK COMPUTERS CAN HELP PEOPLE ?" 2239 r[105] = "WHAT IS IT ABOUT MACHINES THAT WORRIES YOU ?" 2240 r[106] = "SAY, DO YOU HAVE ANY PSYCHOLOGICAL PROBLEMS ?" 2241 r[107] = "WHAT DOES THAT SUGGEST TO YOU ?" 2242 r[108] = "I SEE" 2243 r[109] = "IM NOT SURE I UNDERSTAND YOU FULLY" 2244 r[110] = "COME COME ELUCIDATE YOUR THOUGHTS" 2245 r[111] = "CAN YOU ELABORATE ON THAT ?" 2246 r[112] = "THAT IS QUITE INTERESTING" 2247 r[113] = "WHY DO YOU HAVE PROBLEMS WITH MONEY ?" 2248 r[114] = "DO YOU THINK MONEY IS EVERYTHING ?" 2249 r[115] = "ARE YOU SURE THAT MONEY IS THE PROBLEM ?" 2250 r[116] = "I THINK WE WANT TO TALK ABOUT YOU, NOT ABOUT ME" 2251 r[117] = "WHAT'S ABOUT ME ?" 2252 r[118] = "WHY DO YOU ALWAYS BRING UP MY NAME ?" 2253@c endfile 2254@end ignore 2255 2256@example 2257@c file eg/network/eliza.awk 2258 # table for looking up answers that 2259 # fit to a certain keyword 2260 k["CAN YOU"] = "1 2 3" 2261 k["CAN I"] = "4 5" 2262 k["YOU ARE"] =\ 2263 k["YOURE"] = "6 7 8 9" 2264@c endfile 2265 @dots{} 2266@end example 2267@ignore 2268@c file eg/network/eliza.awk 2269 k["I DONT"] = "10 11 12 13" 2270 k["I FEEL"] = "14 15 16" 2271 k["WHY DONT YOU"] = "17 18 19" 2272 k["WHY CANT I"] = "20 21" 2273 k["ARE YOU"] = "22 23 24" 2274 k["I CANT"] = "25 26 27" 2275 k["I AM"] =\ 2276 k["IM "] = "28 29 30 31" 2277 k["YOU "] = "32 33 34" 2278 k["I WANT"] = "35 36 37 38 39" 2279 k["WHAT"] =\ 2280 k["HOW"] =\ 2281 k["WHO"] =\ 2282 k["WHERE"] =\ 2283 k["WHEN"] =\ 2284 k["WHY"] = "40 41 42 43 44 45 46 47 48" 2285 k["NAME"] = "49 50" 2286 k["CAUSE"] = "51 52 53 54" 2287 k["SORRY"] = "55 56 57 58" 2288 k["DREAM"] = "59 60 61 62" 2289 k["HELLO"] =\ 2290 k["HI "] = "63" 2291 k["MAYBE"] = "64 65 66 67 68" 2292 k[" NO "] = "69 70 71 72 73" 2293 k["YOUR"] = "74 75" 2294 k["ALWAYS"] = "76 77 78 79" 2295 k["THINK"] = "80 81 82" 2296 k["LIKE"] = "83 84 85 86 87 88 89" 2297 k["YES"] = "90 91 92" 2298 k["FRIEND"] = "93 94 95 96 97 98" 2299 k["COMPUTER"] = "99 100 101 102 103 104 105" 2300 k["-"] = "106 107 108 109 110 111 112" 2301 k["MONEY"] = "113 114 115" 2302 k["ELIZA"] = "116 117 118" 2303@c endfile 2304@end ignore 2305@example 2306@c file eg/network/eliza.awk 2307@} 2308@c endfile 2309@end example 2310 2311@cindex Humphrys, Mark 2312@cindex ELIZA program 2313Some interesting remarks and details (including the original source code 2314of ELIZA) are found on Mark Humphrys's home page 2315@uref{https://computing.dcu.ie/~humphrys/eliza.html, 2316@cite{How my program passed the Turing Test}}. 2317Wikipedia provides much background information about 2318@uref{https://en.wikipedia.org/wiki/ELIZA, ELIZA}, 2319including the original design of the software and 2320its early implementations. 2321 2322@node Caveats, Challenges, Simple Server, Using Networking 2323@section Network Programming Caveats 2324 2325@cindex networks @subentry @command{gawk} and @subentry troubleshooting 2326@cindex @command{gawk} @subentry networking @subentry troubleshooting 2327@cindex troubleshooting @subentry @command{gawk}, networks 2328By now it should be clear 2329that debugging a networked application is more 2330complicated than debugging a single-process single-hosted application. 2331The behavior of a networked application sometimes looks noncausal because 2332it is not reproducible in a strong sense. Whether a network application 2333works or not sometimes depends on the following: 2334 2335@itemize @bullet 2336@item 2337How crowded the underlying network is 2338 2339@item 2340If the party at the other end is running or not 2341 2342@item 2343The state of the party at the other end 2344@end itemize 2345 2346@cindex troubleshooting @subentry networks @subentry timeouts 2347The most difficult problems for a beginner arise from the hidden states of the 2348underlying network. After closing a TCP connection, it's often necessary to wait 2349a short while before reopening the connection. Even more difficult is the 2350establishment of a connection that previously ended with a ``broken pipe.'' 2351Those connections have to ``time out'' for a minute or so 2352before they can reopen. 2353Check this with the command @samp{netstat -a}, which 2354provides a list of still-active connections. 2355 2356@node Challenges, , Caveats, Using Networking 2357@section Where To Go From Here 2358 2359@cindex Loebner, Hugh 2360@cindex contest 2361Now, you have learned enough to build your own application. You could, 2362for example, take part in the 2363Loebner Contest 2364to win $100,000. The Loebner Prize is the first formal instantiation of 2365a Turing Test. Hugh Loebner agreed with The Cambridge Center for Behavioral 2366Studies to underwrite a contest designed to implement the Turing Test. 2367Dr.@: Loebner pledged a Grand Prize of $100,000 for the first computer whose 2368responses were indistinguishable from a human's. Each year an annual prize 2369of $2000 and a bronze medal is awarded to the @emph{most} human computer. 2370The winner of the annual contest is the best entry relative to other entries 2371that year, irrespective of how good it is in an absolute sense. Here is 2372an example of a conversation with the winning program of 1997: 2373 2374@smallexample 2375(c)1997 Cambridge Center For Behavioral Studies all rights reserved [CONVERSE] 2376[Intelligent Research Ltd.] Start at: [1997/04/29 08:59:40] 2377PROGRAM[10:39:20]Hi, my name is Catherine, what's yours? 2378JUDGE02[10:39:36]My name is Nathan. 2379PROGRAM[10:39:37]Well, it's nice to meet you, Nathan. 2380PROGRAM[10:39:45]Did you see that story on CNN last night about the lesbian 2381PROGRAM[10:40:03]couple who came out at a White House party on Sunday? 2382JUDGE02[10:40:45]No. I'm sorry I missed that. How did the Prez take it? 2383PROGRAM[10:40:47]Ellen Degeneres was one of them - she was kissing her lover 2384PROGRAM[10:41:08]in front of Bill Clinton. Do you think he was very 2385PROGRAM[10:41:21]embarrassed? 2386JUDGE02[10:41:37]You tell me. I didn't see it. 2387PROGRAM[10:41:38]Are you still a Friend of Bill Clinton? 2388JUDGE02[10:42:22]Yes. I would like to talk about something else. Do you know anything about sculpture. 2389PROGRAM[10:42:24]Well, I suppose he's doing his best like most of us, but he 2390PROGRAM[10:42:35]does seem to attract an UNUSUAL amount of bad publicity, 2391PROGRAM[10:42:51]wouldnt you say--can it all be bad luck? 2392JUDGE02[10:43:20]I asked you if you knew anything about sculpture. 2393PROGRAM[10:43:21]OK, but what happens now--does it all drift on, with 2394PROGRAM[10:43:36]everyone waiting to see how much his friends have said about 2395PROGRAM[10:43:47]him under threat of lots and lots of jail-or will something 2396PROGRAM[10:44:01]dramatic actually happen? 2397JUDGE02[10:44:19]Do you know what sculpture is? 2398PROGRAM[10:44:20]Well, youre more a fan of his than I am I guess---though I 2399PROGRAM[10:44:34]cant SWEAR what Id have done in that locked hotel room, I 2400PROGRAM[10:44:51]must say. 2401JUDGE02[10:45:26]Have you ever been to the Salmagundi Club? 2402PROGRAM[10:45:29]Never. to go back to the original discussion Time to get 2403PROGRAM[10:45:43]off this one I think and onto something more down to earth! 2404JUDGE02[10:46:20]What is more down to earth than art? 2405PROGRAM[10:46:21]I don't know what -- to be honest. - 2406PROGRAM[10:46:30]I suppose it depends on where you're coming from, but as the 2407PROGRAM[10:46:46]song goes in My Fair Lady, "Why can't a woman be more like a 2408PROGRAM[10:47:03]man? " 2409JUDGE02[10:47:52]I liked that song but I prefer "Why can't children be more like we are" 2410JUDGE02[10:48:10]from another play the name of which I cannot remember. 2411PROGRAM[10:48:12]What, if anything, do you enjoy about being a man? 2412JUDGE02[10:49:01]I would enjoy all aspects of life no matter what my gender. 2413PROGRAM[10:49:02]Do you think of yourself as being attractive? 2414@end smallexample 2415 2416@cindex Clinton, Bill 2417This program insists on always speaking about the same story around Bill 2418Clinton. You see, even a program with a rather narrow mind can behave so 2419much like a human being that it can win this prize. It is quite common to 2420let these programs talk to each other via network connections. But during the 2421competition itself, the program and its computer have to be present at the 2422place the competition is held. We all would love to see a @command{gawk} 2423program win in such an event. Maybe it is up to you to accomplish this? 2424 2425Some other ideas for useful networked applications: 2426@itemize @bullet 2427@item 2428Read the file @file{doc/awkforai.txt} in earlier @command{gawk} 2429distributions.@footnote{The file is no longer distributed with 2430@command{gawk}, since the copyright on the file is not clear.} 2431It was written by Ronald P.@: Loui (at the time, Associate 2432Professor of Computer Science, at Washington University in St. Louis, 2433@email{loui@@ai.wustl.edu}) and summarizes why he taught @command{gawk} to 2434students of Artificial Intelligence. Here are some passages from the text: 2435 2436@cindex AI 2437@cindex PROLOG 2438@cindex Loui, Ronald 2439@cindex agent 2440@quotation 2441The GAWK manual can 2442be consumed in a single lab session and the language can be mastered by 2443the next morning by the average student. GAWK's automatic 2444initialization, implicit coercion, I/O support and lack of pointers 2445forgive many of the mistakes that young programmers are likely to make. 2446Those who have seen C but not mastered it are happy to see that GAWK 2447retains some of the same sensibilities while adding what must be 2448regarded as spoonsful of syntactic sugar.@* 2449@dots{}@* 2450@cindex robot 2451There are further simple answers. Probably the best is the fact that 2452increasingly, undergraduate AI programming is involving the Web. Oren 2453Etzioni (University of Washington, Seattle) has for a while been arguing 2454that the ``softbot'' is replacing the mechanical engineers' robot as the 2455most glamorous AI testbed. If the artifact whose behavior needs to be 2456controlled in an intelligent way is the software agent, then a language 2457that is well-suited to controlling the software environment is the 2458appropriate language. That would imply a scripting language. If the 2459robot is KAREL, then the right language is ``turn left; turn right.'' If 2460the robot is Netscape, then the right language is something that can 2461generate @samp{netscape -remote 'openURL(http://cs.wustl.edu/~loui)'} with 2462elan.@* 2463@dots{}@* 2464AI programming requires high-level thinking. There have always been a few 2465gifted programmers who can write high-level programs in assembly language. 2466Most however need the ambient abstraction to have a higher floor.@* 2467@dots{}@* 2468Second, inference is merely the expansion of notation. No matter whether 2469the logic that underlies an AI program is fuzzy, probabilistic, deontic, 2470defeasible, or deductive, the logic merely defines how strings can be 2471transformed into other strings. A language that provides the best 2472support for string processing in the end provides the best support for 2473logic, for the exploration of various logics, and for most forms of 2474symbolic processing that AI might choose to call ``reasoning'' instead of 2475``logic.'' The implication is that PROLOG, which saves the AI programmer 2476from having to write a unifier, saves perhaps two dozen lines of GAWK 2477code at the expense of strongly biasing the logic and representational 2478expressiveness of any approach. 2479@end quotation 2480 2481Now that @command{gawk} itself can connect to the Internet, it should be obvious 2482that it is suitable for writing intelligent web agents. 2483 2484@item 2485@command{awk} is strong at pattern recognition and string processing. 2486So, it is well suited to the classic problem of language translation. 2487A first try could be a program that knows the 100 most frequent English 2488words and their counterparts in German or French. The service could be 2489implemented by regularly reading email with the program above, replacing 2490each word by its translation and sending the translation back via SMTP. 2491Users would send English email to their translation service and get 2492back a translated email message in return. As soon as this works, 2493more effort can be spent on a real translation program. 2494 2495@item 2496Another dialogue-oriented application (on the verge 2497of ridicule) is the email ``support service.'' Troubled customers write an 2498email to an automatic @command{gawk} service that reads the email. It looks 2499for keywords in the mail and assembles a reply email accordingly. By carefully 2500investigating the email header, and repeating these keywords through the 2501reply email, it is rather simple to give the customer a feeling that 2502someone cares. Ideally, such a service would search a database of previous 2503cases for solutions. If none exists, the database could, for example, consist 2504of all the newsgroups, mailing lists and FAQs on the Internet. 2505@end itemize 2506 2507@node Some Applications and Techniques, Links, Using Networking, Top 2508@comment node-name, next, previous, up 2509 2510@chapter Some Applications and Techniques 2511In this @value{CHAPTER}, we look at a number of self-contained 2512scripts, with an emphasis on concise networking. Along the way, we 2513work towards creating building blocks that encapsulate often-needed 2514functions of the networking world, show new techniques that 2515broaden the scope of problems that can be solved with @command{gawk}, and 2516explore leading edge technology that may shape the future of networking. 2517 2518We often refer to the site-independent core of the server that 2519we built in 2520@ref{Simple Server, ,A Simple Web Server}. 2521When building new and nontrivial servers, we 2522always copy this building block and append new instances of the two 2523functions @code{SetUpServer()} and @code{HandleGET()}. 2524 2525This makes a lot of sense, since 2526this scheme of event-driven 2527execution provides @command{gawk} with an interface to the most widely 2528accepted standard for GUIs: the web browser. Now, @command{gawk} can rival even 2529Tcl/Tk. 2530 2531@cindex Tcl/Tk @subentry @command{gawk} and 2532Tcl and @command{gawk} have much in common. Both are simple scripting 2533languages that allow us to quickly solve problems with short programs. But 2534Tcl has Tk on top of it, and @command{gawk} had nothing comparable up 2535to now. While Tcl needs a large and ever-changing library (Tk, which was 2536originally bound to the X Window System), @command{gawk} needs just the 2537networking interface 2538and some kind of browser on the client's side. Besides better portability, 2539the most important advantage of this approach (embracing well-established 2540standards such HTTP and HTML) is that @emph{we do not need to change the 2541language}. We let others do the work of fighting over protocols and standards. 2542We can use HTML, JavaScript, VRML, or whatever else comes along to do our work. 2543 2544@menu 2545* PANIC:: An Emergency Web Server. 2546* GETURL:: Retrieving Web Pages. 2547* REMCONF:: Remote Configuration Of Embedded Systems. 2548* URLCHK:: Look For Changed Web Pages. 2549* WEBGRAB:: Extract Links From A Page. 2550* STATIST:: Graphing A Statistical Distribution. 2551* MAZE:: Walking Through A Maze In Virtual Reality. 2552* MOBAGWHO:: A Simple Mobile Agent. 2553* STOXPRED:: Stock Market Prediction As A Service. 2554* PROTBASE:: Searching Through A Protein Database. 2555@end menu 2556 2557@node PANIC, GETURL, Some Applications and Techniques, Some Applications and Techniques 2558@section PANIC: An Emergency Web Server 2559@cindex PANIC program 2560@cindex networks @seealso{web pages} 2561@cindex web service 2562At first glance, the @code{"Hello, world"} example in 2563@ref{Primitive Service, ,A Primitive Web Service}, 2564seems useless. By adding just a few lines, we can turn it into something useful. 2565 2566The PANIC program tells everyone who connects that the local 2567site is not working. When a web server breaks down, it makes a difference 2568if customers get a strange ``network unreachable'' message, or a short message 2569telling them that the server has a problem. In such an emergency, 2570the hard disk and everything on it (including the regular web service) may 2571be unavailable. Rebooting the web server off a USB drive makes sense in this 2572setting. 2573 2574To use the PANIC program as an emergency web server, all you need are the 2575@command{gawk} executable and the program below on a USB drive. By default, 2576it connects to port 8080. A different value may be supplied on the 2577command line: 2578 2579@example 2580@c file eg/network/panic.awk 2581BEGIN @{ 2582 RS = ORS = "\r\n" 2583 if (MyPort == 0) MyPort = 8080 2584 HttpService = "/inet/tcp/" MyPort "/0/0" 2585 Hello = "<HTML><HEAD><TITLE>Out Of Service</TITLE>" \ 2586 "</HEAD><BODY><H1>" \ 2587 "This site is temporarily out of service." \ 2588 "</H1></BODY></HTML>" 2589 Len = length(Hello) + length(ORS) 2590 while ("awk" != "complex") @{ 2591 print "HTTP/1.0 200 OK" |& HttpService 2592 print "Content-Length: " Len ORS |& HttpService 2593 print Hello |& HttpService 2594 while ((HttpService |& getline) > 0) 2595 continue; 2596 close(HttpService) 2597 @} 2598@} 2599@c endfile 2600@end example 2601 2602@node GETURL, REMCONF, PANIC, Some Applications and Techniques 2603@section GETURL: Retrieving Web Pages 2604@cindex GETURL program 2605@cindex web pages @subentry retrieving 2606GETURL is a versatile building block for shell scripts that need to retrieve 2607files from the Internet. It takes a web address as a command-line parameter and 2608tries to retrieve the contents of this address. The contents are printed 2609to standard output, while the header is printed to @file{/dev/stderr}. 2610A surrounding shell script 2611could analyze the contents and extract the text or the links. An ASCII 2612browser could be written around GETURL. But more interestingly, web robots are 2613straightforward to write on top of GETURL. On the Internet, you can find 2614several programs of the same name that do the same job. They are usually 2615much more complex internally and at least 10 times as big. 2616 2617At first, GETURL checks if it was called with exactly one web address. 2618Then, it checks if the user chose to use a special proxy server whose name 2619is handed over in a variable. By default, it is assumed that the local 2620machine serves as proxy. GETURL uses the @code{GET} method by default 2621to access the web page. By handing over the name of a different method 2622(such as @code{HEAD}), it is possible to choose a different behavior. With 2623the @code{HEAD} method, the user does not receive the body of the page 2624content, but does receive the header: 2625 2626@example 2627@c file eg/network/geturl.awk 2628BEGIN @{ 2629 if (ARGC != 2) @{ 2630 print "GETURL - retrieve Web page via HTTP 1.0" 2631 print "IN:\n the URL as a command-line parameter" 2632 print "PARAM(S):\n -v Proxy=MyProxy" 2633 print "OUT:\n the page content on stdout" 2634 print " the page header on stderr" 2635 print "JK 16.05.1997" 2636 print "ADR 13.08.2000" 2637 exit 2638 @} 2639 URL = ARGV[1]; ARGV[1] = "" 2640 if (Proxy == "") Proxy = "127.0.0.1" 2641 if (ProxyPort == 0) ProxyPort = 80 2642 if (Method == "") Method = "GET" 2643 HttpService = "/inet/tcp/0/" Proxy "/" ProxyPort 2644 ORS = RS = "\r\n\r\n" 2645 print Method " " URL " HTTP/1.0" |& HttpService 2646 HttpService |& getline Header 2647 print Header > "/dev/stderr" 2648 while ((HttpService |& getline) > 0) 2649 printf "%s", $0 2650 close(HttpService) 2651@} 2652@c endfile 2653@end example 2654 2655This program can be changed as needed, but be careful with the last lines. 2656Make sure transmission of binary data is not corrupted by additional line 2657breaks. Even as it is now, the byte sequence @code{"\r\n\r\n"} would 2658disappear if it were contained in binary data. Don't get caught in a 2659trap when trying a quick fix on this one. 2660 2661@node REMCONF, URLCHK, GETURL, Some Applications and Techniques 2662@section REMCONF: Remote Configuration of Embedded Systems 2663@cindex REMCONF program 2664@cindex Linux 2665@cindex GNU/Linux 2666@cindex Yahoo! 2667Today, you often find powerful processors in embedded systems. Dedicated 2668network routers and controllers for all kinds of machinery are examples 2669of embedded systems. Processors like the Intel 80x86 or the AMD Elan are 2670able to run multitasking operating systems, such as XINU or GNU/Linux 2671in embedded PCs. These systems are small and usually do not have 2672a keyboard or a display. Therefore it is difficult to set up their 2673configuration. There are several widespread ways to set them up: 2674 2675@itemize @bullet 2676@item 2677DIP switches 2678 2679@item 2680Read Only Memories such as EPROMs 2681 2682@item 2683Serial lines or some kind of keyboard 2684 2685@item 2686Network connections via @command{telnet} or SNMP 2687 2688@item 2689HTTP connections with HTML GUIs 2690@end itemize 2691 2692In this @value{SECTION}, we look at a solution that uses HTTP connections 2693to control variables of an embedded system that are stored in a file. 2694Since embedded systems have tight limits on resources like memory, 2695it is difficult to employ advanced techniques such as SNMP and HTTP 2696servers. @command{gawk} fits in quite nicely with its single executable 2697which needs just a short script to start working. 2698The following program stores the variables in a file, and a concurrent 2699process in the embedded system may read the file. The program uses the 2700site-independent part of the simple web server that we developed in 2701@ref{Interacting Service, ,A Web Service with Interaction}. 2702As mentioned there, all we have to do is to write two new procedures 2703@code{SetUpServer()} and @code{HandleGET()}: 2704 2705@smallexample 2706@c file eg/network/remconf.awk 2707function SetUpServer() @{ 2708 TopHeader = "<HTML><title>Remote Configuration</title>" 2709 TopDoc = "<BODY>\ 2710 <h2>Please choose one of the following actions:</h2>\ 2711 <UL>\ 2712 <LI><A HREF=" MyPrefix "/AboutServer>About this server</A></LI>\ 2713 <LI><A HREF=" MyPrefix "/ReadConfig>Read Configuration</A></LI>\ 2714 <LI><A HREF=" MyPrefix "/CheckConfig>Check Configuration</A></LI>\ 2715 <LI><A HREF=" MyPrefix "/ChangeConfig>Change Configuration</A></LI>\ 2716 <LI><A HREF=" MyPrefix "/SaveConfig>Save Configuration</A></LI>\ 2717 </UL>" 2718 TopFooter = "</BODY></HTML>" 2719 if (ConfigFile == "") ConfigFile = "config.asc" 2720@} 2721@c endfile 2722@end smallexample 2723 2724The function @code{SetUpServer()} initializes the top level HTML texts 2725as usual. It also initializes the name of the file that contains the 2726configuration parameters and their values. In case the user supplies 2727a name from the command line, that name is used. The file is expected to 2728contain one parameter per line, with the name of the parameter in 2729column one and the value in column two. 2730 2731The function @code{HandleGET()} reflects the structure of the menu 2732tree as usual. The first menu choice tells the user what this is all 2733about. The second choice reads the configuration file line by line 2734and stores the parameters and their values. Notice that the record 2735separator for this file is @code{"\n"}, in contrast to the record separator 2736for HTTP. The third menu choice builds an HTML table to show 2737the contents of the configuration file just read. The fourth choice 2738does the real work of changing parameters, and the last one just saves 2739the configuration into a file: 2740 2741@smallexample 2742@c file eg/network/remconf.awk 2743function HandleGET() @{ 2744 if (MENU[2] == "AboutServer") @{ 2745 Document = "This is a GUI for remote configuration of an\ 2746 embedded system. It is is implemented as one GAWK script." 2747 @} else if (MENU[2] == "ReadConfig") @{ 2748 RS = "\n" 2749 while ((getline < ConfigFile) > 0) 2750 config[$1] = $2; 2751 close(ConfigFile) 2752 RS = "\r\n" 2753 Document = "Configuration has been read." 2754 @} else if (MENU[2] == "CheckConfig") @{ 2755 Document = "<TABLE BORDER=1 CELLPADDING=5>" 2756 for (i in config) 2757 Document = Document "<TR><TD>" i "</TD>" \ 2758 "<TD>" config[i] "</TD></TR>" 2759 Document = Document "</TABLE>" 2760 @} else if (MENU[2] == "ChangeConfig") @{ 2761 if ("Param" in GETARG) @{ # any parameter to set? 2762 if (GETARG["Param"] in config) @{ # is parameter valid? 2763 config[GETARG["Param"]] = GETARG["Value"] 2764 Document = (GETARG["Param"] " = " GETARG["Value"] ".") 2765 @} else @{ 2766 Document = "Parameter <b>" GETARG["Param"] "</b> is invalid." 2767 @} 2768 @} else @{ 2769 Document = "<FORM method=GET><h4>Change one parameter</h4>\ 2770 <TABLE BORDER CELLPADDING=5>\ 2771 <TR><TD>Parameter</TD><TD>Value</TD></TR>\ 2772 <TR><TD><input type=text name=Param value=\"\" size=20></TD>\ 2773 <TD><input type=text name=Value value=\"\" size=40></TD>\ 2774 </TR></TABLE><input type=submit value=\"Set\"></FORM>" 2775 @} 2776 @} else if (MENU[2] == "SaveConfig") @{ 2777 for (i in config) 2778 printf("%s %s\n", i, config[i]) > ConfigFile 2779 close(ConfigFile) 2780 Document = "Configuration has been saved." 2781 @} 2782@} 2783@c endfile 2784@end smallexample 2785 2786@cindex MiniSQL 2787We could also view the configuration file as a database. From this 2788point of view, the previous program acts like a primitive database server. 2789Real SQL database systems also make a service available by providing 2790a TCP port that clients can connect to. But the application level protocols 2791they use are usually proprietary and also change from time to time. 2792This is also true for the protocol that 2793MiniSQL uses. 2794 2795@node URLCHK, WEBGRAB, REMCONF, Some Applications and Techniques 2796@section URLCHK: Look for Changed Web Pages 2797@cindex URLCHK program 2798Most people who make heavy use of Internet resources have a large 2799bookmark file with pointers to interesting web sites. It is impossible 2800to regularly check by hand if any of these sites have changed. A program 2801is needed to automatically look at the headers of web pages and tell 2802which ones have changed. URLCHK does the comparison after using GETURL 2803with the @code{HEAD} method to retrieve the header. 2804 2805Like GETURL, this program first checks that it is called with exactly 2806one command-line parameter. URLCHK also takes the same command-line variables 2807@code{Proxy} and @code{ProxyPort} as GETURL, 2808because these variables are handed over to GETURL for each URL 2809that gets checked. The one and only parameter is the name of a file that 2810contains one line for each URL. In the first column, we find the URL, and 2811the second and third columns hold the length of the URL's body when checked 2812for the two last times. Now, we follow this plan: 2813 2814@enumerate 2815@item 2816Read the URLs from the file and remember their most recent lengths 2817 2818@item 2819Delete the contents of the file 2820 2821@item 2822For each URL, check its new length and write it into the file 2823 2824@item 2825If the most recent and the new length differ, tell the user 2826@end enumerate 2827 2828It may seem a bit peculiar to read the URLs from a file together 2829with their two most recent lengths, but this approach has several 2830advantages. You can call the program again and again with the same 2831file. After running the program, you can regenerate the changed URLs 2832by extracting those lines that differ in their second and third columns: 2833 2834@c inspired by URLCHK in iX 5/97 166. 2835@smallexample 2836@c file eg/network/urlchk.awk 2837BEGIN @{ 2838 if (ARGC != 2) @{ 2839 print "URLCHK - check if URLs have changed" 2840 print "IN:\n the file with URLs as a command-line parameter" 2841 print " file contains URL, old length, new length" 2842 print "PARAMS:\n -v Proxy=MyProxy -v ProxyPort=8080" 2843 print "OUT:\n same as file with URLs" 2844 print "JK 02.03.1998" 2845 exit 2846 @} 2847 URLfile = ARGV[1]; ARGV[1] = "" 2848 if (Proxy != "") Proxy = " -v Proxy=" Proxy 2849 if (ProxyPort != "") ProxyPort = " -v ProxyPort=" ProxyPort 2850 while ((getline < URLfile) > 0) 2851 Length[$1] = $3 + 0 2852 close(URLfile) # now, URLfile is read in and can be updated 2853 GetHeader = "gawk " Proxy ProxyPort " -v Method=\"HEAD\" -f geturl.awk " 2854 for (i in Length) @{ 2855 GetThisHeader = GetHeader i " 2>&1" 2856 while ((GetThisHeader | getline) > 0) 2857 if (toupper($0) ~ /CONTENT-LENGTH/) NewLength = $2 + 0 2858 close(GetThisHeader) 2859 print i, Length[i], NewLength > URLfile 2860 if (Length[i] != NewLength) # report only changed URLs 2861 print i, Length[i], NewLength 2862 @} 2863 close(URLfile) 2864@} 2865@c endfile 2866@end smallexample 2867 2868Another thing that may look strange is the way GETURL is called. 2869Before calling GETURL, we have to check if the proxy variables need 2870to be passed on. If so, we prepare strings that will become part 2871of the command line later. In @code{GetHeader}, we store these strings 2872together with the longest part of the command line. Later, in the loop 2873over the URLs, @code{GetHeader} is appended with the URL and a redirection 2874operator to form the command that reads the URL's header over the Internet. 2875GETURL always sends the headers to @file{/dev/stderr}. That is 2876the reason why we need the redirection operator to have the header 2877piped in. 2878 2879This program is not perfect because it assumes that changing URLs 2880results in changed lengths, which is not necessarily true. A more 2881advanced approach is to look at some other header line that 2882holds time information. But, as always when things get a bit more 2883complicated, this is left as an exercise to the reader. 2884 2885@node WEBGRAB, STATIST, URLCHK, Some Applications and Techniques 2886@section WEBGRAB: Extract Links from a Page 2887@cindex WEBGRAB program 2888@c Inspired by iX 1/98 157. 2889@cindex robot 2890Sometimes it is necessary to extract links from web pages. 2891Browsers do it, web robots do it, and sometimes even humans do it. 2892Since we have a tool like GETURL at hand, we can solve this problem with 2893some help from the Bourne shell: 2894 2895@example 2896@c file eg/network/webgrab.awk 2897BEGIN @{ RS = "https?://[#%&\\+\\-\\./0-9\\:;\\?A-Z_a-z\\~]*" @} 2898RT != "" @{ 2899 command = ("gawk -v Proxy=MyProxy -f geturl.awk " RT \ 2900 " > doc" NR ".html") 2901 print command 2902@} 2903@c endfile 2904@end example 2905 2906Notice that the regular expression for URLs is rather crude. A precise 2907regular expression is much more complex. But this one works 2908rather well. One problem is that it is unable to find internal links of 2909an HTML document. Another problem is that 2910@samp{ftp}, @samp{telnet}, @samp{news}, @samp{mailto}, and other kinds 2911of links are missing in the regular expression. 2912However, it is straightforward to add them, if doing so is necessary for other tasks. 2913 2914This program reads an HTML file and prints all the HTTP links that it finds. 2915It relies on @command{gawk}'s ability to use regular expressions as the record 2916separator. With @code{RS} set to a regular expression that matches links, 2917the second action is executed each time a non-empty link is found. 2918We can find the matching link itself in @code{RT}. 2919 2920The action could use the @code{system()} function to let another GETURL 2921retrieve the page, but here we use a different approach. 2922This simple program prints shell commands that can be piped into @command{sh} 2923for execution. This way it is possible to first extract 2924the links, wrap shell commands around them, and pipe all the shell commands 2925into a file. After editing the file, execution of the file retrieves 2926only those files that we really need. In case we do not want to edit, 2927we can retrieve all the pages like this: 2928 2929@smallexample 2930gawk -f geturl.awk http://www.suse.de | gawk -f webgrab.awk | sh 2931@end smallexample 2932 2933@cindex Microsoft Windows 2934After this, you will find the contents of all referenced documents in 2935files named @file{doc*.html} even if they do not contain HTML code. 2936The most annoying thing is that we always have to pass the proxy to 2937GETURL. If you do not like to see the headers of the web pages 2938appear on the screen, you can redirect them to @file{/dev/null}. 2939Watching the headers appear can be quite interesting, because 2940it reveals 2941interesting details such as which web server the companies use. 2942Now, it is clear how the clever marketing people 2943use web robots to determine the 2944market shares 2945of Microsoft and Netscape in the web server market. 2946 2947Port 80 of any web server is like a small hole in a repellent firewall. 2948After attaching a browser to port 80, we usually catch a glimpse 2949of the bright side of the server (its home page). With a tool like GETURL 2950at hand, we are able to discover some of the more concealed 2951or even ``indecent'' services (i.e., lacking conformity to standards of quality). 2952It can be exciting to see the fancy CGI scripts that lie 2953there, revealing the inner workings of the server, ready to be called: 2954 2955@itemize @bullet 2956@item 2957With a command such as: 2958 2959@example 2960gawk -f geturl.awk http://any.host.on.the.net/cgi-bin/ 2961@end example 2962 2963some servers give you a directory listing of the CGI files. 2964Knowing the names, you can try to call some of them and watch 2965for useful results. Sometimes there are executables in such directories 2966(such as Perl interpreters) that you may call remotely. If there are 2967subdirectories with configuration data of the web server, this can also 2968be quite interesting to read. 2969 2970@item 2971@cindex apache 2972The well-known Apache web server usually has its CGI files in the 2973directory @file{/cgi-bin}. There you can often find the scripts 2974@file{test-cgi} and @file{printenv}. Both tell you some things 2975about the current connection and the installation of the web server. 2976Just call: 2977 2978@smallexample 2979gawk -f geturl.awk http://any.host.on.the.net/cgi-bin/test-cgi 2980gawk -f geturl.awk http://any.host.on.the.net/cgi-bin/printenv 2981@end smallexample 2982 2983@item 2984Sometimes it is even possible to retrieve system files like the web 2985server's log file---possibly containing customer data---or even the file 2986@file{/etc/passwd}. 2987(We don't recommend this!) 2988@end itemize 2989 2990@strong{Caution:} 2991Although this may sound funny or simply irrelevant, we are talking about 2992severe security holes. Try to explore your own system this way and make 2993sure that none of the above reveals too much information about your system. 2994 2995@node STATIST, MAZE, WEBGRAB, Some Applications and Techniques 2996@section STATIST: Graphing a Statistical Distribution 2997@cindex STATIST program 2998 2999@cindex GNUPlot utility 3000@cindex image format 3001@cindex GIF image format 3002@cindex PNG image format 3003@cindex PS image format 3004@cindex Boutell, Thomas 3005@image{statist,3in} 3006 3007In the HTTP server examples we've shown thus far, we never present an image 3008to the browser and its user. Presenting images is one task. Generating 3009images that reflect some user input and presenting these dynamically 3010generated images is another. In this @value{SECTION}, we use GNUPlot 3011for generating @file{.png}, @file{.ps}, or @file{.gif} 3012files.@footnote{Due to licensing problems, the default 3013installation of GNUPlot disables the generation of @file{.gif} files. 3014If your installed version does not accept @samp{set term gif}, 3015just download and install the most recent version of GNUPlot and the 3016@uref{https://libgd.github.io/, GD library} 3017by Thomas Boutell. 3018Otherwise you still have the chance to generate some 3019ASCII-art style images with GNUPlot by using @samp{set term dumb}. 3020(We tried it and it worked.)} 3021 3022@cindex Numerical Recipes 3023The program we develop takes the statistical parameters of two samples 3024and computes the t-test statistics. As a result, we get the probabilities 3025that the means and the variances of both samples are the same. In order to 3026let the user check plausibility, the program presents an image of the 3027distributions. The statistical computation follows 3028@cite{Numerical Recipes in C: The Art of Scientific Computing} 3029by William H.@: Press, Saul A.@: Teukolsky, William T.@: Vetterling, and Brian P. Flannery. 3030Since @command{gawk} does not have a built-in function 3031for the computation of the beta function, we use the @code{ibeta()} function 3032of GNUPlot. As a side effect, we learn how to use GNUPlot as a 3033sophisticated calculator. The comparison of means is done as in @code{tutest}, 3034paragraph 14.2, page 613, and the comparison of variances is done as in @code{ftest}, 3035page 611 in @cite{Numerical Recipes}. 3036 3037As usual, we take the site-independent code for servers and append 3038our own functions @code{SetUpServer()} and @code{HandleGET()}: 3039 3040@smallexample 3041@c file eg/network/statist.awk 3042function SetUpServer() @{ 3043 TopHeader = "<HTML><title>Statistics with GAWK</title>" 3044 TopDoc = "<BODY>\ 3045 <h2>Please choose one of the following actions:</h2>\ 3046 <UL>\ 3047 <LI><A HREF=" MyPrefix "/AboutServer>About this server</A></LI>\ 3048 <LI><A HREF=" MyPrefix "/EnterParameters>Enter Parameters</A></LI>\ 3049 </UL>" 3050 TopFooter = "</BODY></HTML>" 3051 GnuPlot = "gnuplot 2>&1" 3052 m1=m2=0; v1=v2=1; n1=n2=10 3053@} 3054@c endfile 3055@end smallexample 3056 3057Here, you see the menu structure that the user sees. Later, we 3058will see how the program structure of the @code{HandleGET()} function 3059reflects the menu structure. What is missing here is the link for the 3060image we generate. In an event-driven environment, request, 3061generation, and delivery of images are separated. 3062 3063Notice the way we initialize the @code{GnuPlot} command string for 3064the pipe. By default, 3065GNUPlot outputs the generated image via standard output, as well as 3066the results of @code{print}(ed) calculations via standard error. 3067The redirection causes standard error to be mixed into standard 3068output, enabling us to read results of calculations with @code{getline}. 3069By initializing the statistical parameters with some meaningful 3070defaults, we make sure the user gets an image the first time 3071he uses the program. 3072 3073@cindex JavaScript 3074Following is the rather long function @code{HandleGET()}, which 3075implements the contents of this service by reacting to the different 3076kinds of requests from the browser. Before you start playing with 3077this script, make sure that your browser supports JavaScript and that it also 3078has this option switched on. The script uses a short snippet of 3079JavaScript code for delayed opening of a window with an image. 3080A more detailed explanation follows: 3081 3082@smallexample 3083@c file eg/network/statist.awk 3084function HandleGET() @{ 3085 if (MENU[2] == "AboutServer") @{ 3086 Document = "This is a GUI for a statistical computation.\ 3087 It compares means and variances of two distributions.\ 3088 It is implemented as one GAWK script and uses GNUPLOT." 3089 @} else if (MENU[2] == "EnterParameters") @{ 3090 Document = "" 3091 if ("m1" in GETARG) @{ # are there parameters to compare? 3092 Document = Document "<SCRIPT LANGUAGE=\"JavaScript\">\ 3093 setTimeout(\"window.open(\\\"" MyPrefix "/Image" systime()\ 3094 "\\\",\\\"dist\\\", \\\"status=no\\\");\", 1000); </SCRIPT>" 3095 m1 = GETARG["m1"]; v1 = GETARG["v1"]; n1 = GETARG["n1"] 3096 m2 = GETARG["m2"]; v2 = GETARG["v2"]; n2 = GETARG["n2"] 3097 t = (m1-m2)/sqrt(v1/n1+v2/n2) 3098 df = (v1/n1+v2/n2)*(v1/n1+v2/n2)/((v1/n1)*(v1/n1)/(n1-1) \ 3099 + (v2/n2)*(v2/n2) /(n2-1)) 3100 if (v1>v2) @{ 3101 f = v1/v2 3102 df1 = n1 - 1 3103 df2 = n2 - 1 3104 @} else @{ 3105 f = v2/v1 3106 df1 = n2 - 1 3107 df2 = n1 - 1 3108 @} 3109 print "pt=ibeta(" df/2 ",0.5," df/(df+t*t) ")" |& GnuPlot 3110 print "pF=2.0*ibeta(" df2/2 "," df1/2 "," \ 3111 df2/(df2+df1*f) ")" |& GnuPlot 3112 print "print pt, pF" |& GnuPlot 3113 RS="\n"; GnuPlot |& getline; RS="\r\n" # $1 is pt, $2 is pF 3114 print "invsqrt2pi=1.0/sqrt(2.0*pi)" |& GnuPlot 3115 print "nd(x)=invsqrt2pi/sd*exp(-0.5*((x-mu)/sd)**2)" |& GnuPlot 3116 print "set term png small color" |& GnuPlot 3117 #print "set term postscript color" |& GnuPlot 3118 #print "set term gif medium size 320,240" |& GnuPlot 3119 print "set yrange[-0.3:]" |& GnuPlot 3120 print "set label 'p(m1=m2) =" $1 "' at 0,-0.1 left" |& GnuPlot 3121 print "set label 'p(v1=v2) =" $2 "' at 0,-0.2 left" |& GnuPlot 3122 print "plot mu=" m1 ",sd=" sqrt(v1) ", nd(x) title 'sample 1',\ 3123 mu=" m2 ",sd=" sqrt(v2) ", nd(x) title 'sample 2'" |& GnuPlot 3124 print "quit" |& GnuPlot 3125 GnuPlot |& getline Image 3126 while ((GnuPlot |& getline) > 0) 3127 Image = Image RS $0 3128 close(GnuPlot) 3129 @} 3130 Document = Document "\ 3131 <h3>Do these samples have the same Gaussian distribution?</h3>\ 3132 <FORM METHOD=GET> <TABLE BORDER CELLPADDING=5>\ 3133 <TR>\ 3134 <TD>1. Mean </TD> 3135 <TD><input type=text name=m1 value=" m1 " size=8></TD>\ 3136 <TD>1. Variance</TD> 3137 <TD><input type=text name=v1 value=" v1 " size=8></TD>\ 3138 <TD>1. Count </TD> 3139 <TD><input type=text name=n1 value=" n1 " size=8></TD>\ 3140 </TR><TR>\ 3141 <TD>2. Mean </TD> 3142 <TD><input type=text name=m2 value=" m2 " size=8></TD>\ 3143 <TD>2. Variance</TD> 3144 <TD><input type=text name=v2 value=" v2 " size=8></TD>\ 3145 <TD>2. Count </TD> 3146 <TD><input type=text name=n2 value=" n2 " size=8></TD>\ 3147 </TR> <input type=submit value=\"Compute\">\ 3148 </TABLE></FORM><BR>" 3149 @} else if (MENU[2] ~ "Image") @{ 3150 Reason = "OK" ORS "Content-type: image/png" 3151 #Reason = "OK" ORS "Content-type: application/x-postscript" 3152 #Reason = "OK" ORS "Content-type: image/gif" 3153 Header = Footer = "" 3154 Document = Image 3155 @} 3156@} 3157@c endfile 3158@end smallexample 3159 3160@cindex PostScript 3161As usual, we give a short description of the service in the first 3162menu choice. The third menu choice shows us that generation and 3163presentation of an image are two separate actions. While the latter 3164takes place quite instantly in the third menu choice, the former 3165takes place in the much longer second choice. Image data passes from the 3166generating action to the presenting action via the variable @code{Image} 3167that contains a complete @file{.png} image, which is otherwise stored 3168in a file. If you prefer @file{.ps} or @file{.gif} images over the 3169default @file{.png} images, you may select these options by uncommenting 3170the appropriate lines. But remember to do so in two places: when 3171telling GNUPlot which kind of images to generate, and when transmitting the 3172image at the end of the program. 3173 3174Looking at the end of the program, 3175the way we pass the @samp{Content-type} to the browser is a bit unusual. 3176It is appended to the @samp{OK} of the first header line 3177to make sure the type information becomes part of the header. 3178The other variables that get transmitted across the network are 3179made empty, because in this case we do not have an HTML document to 3180transmit, but rather raw image data to contain in the body. 3181 3182Most of the work is done in the second menu choice. It starts with a 3183strange JavaScript code snippet. When first implementing this server, 3184we used a short @samp{@w{"<IMG SRC="} MyPrefix "/Image>"} here. But then 3185browsers got smarter and tried to improve on speed by requesting the 3186image and the HTML code at the same time. When doing this, the browser 3187tries to build up a connection for the image request while the request for 3188the HTML text is not yet completed. The browser tries to connect 3189to the @command{gawk} server on port 8080 while port 8080 is still in use for 3190transmission of the HTML text. The connection for the image cannot be 3191built up, so the image appears as ``broken'' in the browser window. 3192We solved this problem by telling the browser to open a separate window 3193for the image, but only after a delay of 1000 milliseconds. 3194By this time, the server should be ready for serving the next request. 3195 3196But there is one more subtlety in the JavaScript code. 3197Each time the JavaScript code opens a window for the image, the 3198name of the image is appended with a timestamp (@code{systime()}). 3199Why this constant change of name for the image? Initially, we always named 3200the image @code{Image}, but then the Netscape browser noticed the name 3201had @emph{not} changed since the previous request and displayed the 3202previous image (caching behavior). The server core 3203is implemented so that browsers are told @emph{not} to cache anything. 3204Obviously HTTP requests do not always work as expected. One way to 3205circumvent the cache of such overly smart browsers is to change the 3206name of the image with each request. These three lines of JavaScript 3207caused us a lot of trouble. 3208 3209The rest can be broken 3210down into two phases. At first, we check if there are statistical 3211parameters. When the program is first started, there usually are no 3212parameters because it enters the page coming from the top menu. 3213Then, we only have to present the user a form that he can use to change 3214statistical parameters and submit them. Subsequently, the submission of 3215the form causes the execution of the first phase because @emph{now} 3216there @emph{are} parameters to handle. 3217 3218Now that we have parameters, we know there will be an image available. 3219Therefore we insert the JavaScript code here to initiate the opening 3220of the image in a separate window. Then, 3221we prepare some variables that will be passed to GNUPlot for calculation 3222of the probabilities. Prior to reading the results, we must temporarily 3223change @code{RS} because GNUPlot separates lines with newlines. 3224After instructing GNUPlot to generate a @file{.png} (or @file{.ps} or 3225@file{.gif}) image, we initiate the insertion of some text, 3226explaining the resulting probabilities. The final @samp{plot} command 3227actually generates the image data. This raw binary has to be read in carefully 3228without adding, changing, or deleting a single byte. Hence the unusual 3229initialization of @code{Image} and completion with a @code{while} loop. 3230 3231When using this server, it soon becomes clear that it is far from being 3232perfect. It mixes source code of six scripting languages or protocols: 3233 3234@itemize @bullet 3235@item GNU @command{awk} implements a server for the protocol: 3236@item HTTP which transmits: 3237@item HTML text which contains a short piece of: 3238@item JavaScript code opening a separate window. 3239@item A Bourne shell script is used for piping commands into: 3240@item GNUPlot to generate the image to be opened. 3241@end itemize 3242 3243After all this work, the GNUPlot image opens in the JavaScript window 3244where it can be viewed by the user. 3245 3246It is probably better not to mix up so many different languages. 3247The result is not very readable. Furthermore, the 3248statistical part of the server does not take care of invalid input. 3249Among others, using negative variances causes invalid results. 3250 3251@node MAZE, MOBAGWHO, STATIST, Some Applications and Techniques 3252@section MAZE: Walking Through a Maze In Virtual Reality 3253@cindex MAZE 3254@cindex VRML 3255@c VRML in iX 11/96 134. 3256@quotation 3257@cindex Perlis, Alan 3258@i{In the long run, every program becomes rococo, and then rubble.}@* 3259@author Alan Perlis 3260@end quotation 3261 3262By now, we know how to present arbitrary @samp{Content-type}s to a browser. 3263In this @value{SECTION}, our server presents a 3D world to our browser. 3264The 3D world is described in a scene description language (VRML, 3265Virtual Reality Modeling Language) that allows us to travel through a 3266perspective view of a 2D maze with our browser. Browsers with a 3267VRML plugin enable exploration of this technology. We could do 3268one of those boring @samp{Hello world} examples here, that are usually 3269presented when introducing novices to 3270VRML. If you have never written 3271any VRML code, have a look at 3272the VRML FAQ. 3273Presenting a static VRML scene is a bit trivial; in order to expose 3274@command{gawk}'s capabilities, we will present a dynamically generated 3275VRML scene. The function @code{SetUpServer()} is very simple because it 3276only sets the default HTML page and initializes the random number 3277generator. As usual, the surrounding server lets you browse the maze. 3278 3279@smallexample 3280@c file eg/network/maze.awk 3281function SetUpServer() @{ 3282 TopHeader = "<HTML><title>Walk through a maze</title>" 3283 TopDoc = "\ 3284 <h2>Please choose one of the following actions:</h2>\ 3285 <UL>\ 3286 <LI><A HREF=" MyPrefix "/AboutServer>About this server</A>\ 3287 <LI><A HREF=" MyPrefix "/VRMLtest>Watch a simple VRML scene</A>\ 3288 </UL>" 3289 TopFooter = "</HTML>" 3290 srand() 3291@} 3292@c endfile 3293@end smallexample 3294 3295The function @code{HandleGET()} is a bit longer because it first computes 3296the maze and afterwards generates the VRML code that is sent across 3297the network. As shown in the STATIST example 3298(@pxref{STATIST}), 3299we set the type of the 3300content to VRML and then store the VRML representation of the maze as the 3301page content. We assume that the maze is stored in a 2D array. Initially, 3302the maze consists of walls only. Then, we add an entry and an exit to the 3303maze and let the rest of the work be done by the function @code{MakeMaze()}. 3304Now, only the wall fields are left in the maze. By iterating over the these 3305fields, we generate one line of VRML code for each wall field. 3306 3307@smallexample 3308@c file eg/network/maze.awk 3309function HandleGET() @{ 3310 if (MENU[2] == "AboutServer") @{ 3311 Document = "If your browser has a VRML 2 plugin,\ 3312 this server shows you a simple VRML scene." 3313 @} else if (MENU[2] == "VRMLtest") @{ 3314 XSIZE = YSIZE = 11 # initially, everything is wall 3315 for (y = 0; y < YSIZE; y++) 3316 for (x = 0; x < XSIZE; x++) 3317 Maze[x, y] = "#" 3318 delete Maze[0, 1] # entry is not wall 3319 delete Maze[XSIZE-1, YSIZE-2] # exit is not wall 3320 MakeMaze(1, 1) 3321 Document = "\ 3322#VRML V2.0 utf8\n\ 3323Group @{\n\ 3324 children [\n\ 3325 PointLight @{\n\ 3326 ambientIntensity 0.2\n\ 3327 color 0.7 0.7 0.7\n\ 3328 location 0.0 8.0 10.0\n\ 3329 @}\n\ 3330 DEF B1 Background @{\n\ 3331 skyColor [0 0 0, 1.0 1.0 1.0 ]\n\ 3332 skyAngle 1.6\n\ 3333 groundColor [1 1 1, 0.8 0.8 0.8, 0.2 0.2 0.2 ]\n\ 3334 groundAngle [ 1.2 1.57 ]\n\ 3335 @}\n\ 3336 DEF Wall Shape @{\n\ 3337 geometry Box @{size 1 1 1@}\n\ 3338 appearance Appearance @{ material Material @{ diffuseColor 0 0 1 @} @}\n\ 3339 @}\n\ 3340 DEF Entry Viewpoint @{\n\ 3341 position 0.5 1.0 5.0\n\ 3342 orientation 0.0 0.0 -1.0 0.52\n\ 3343 @}\n" 3344 for (i in Maze) @{ 3345 split(i, t, SUBSEP) 3346 Document = Document " Transform @{ translation " 3347 Document = Document t[1] " 0 -" t[2] " children USE Wall @}\n" 3348 @} 3349 Document = Document " ] # end of group for world\n@}" 3350 Reason = "OK" ORS "Content-type: model/vrml" 3351 Header = Footer = "" 3352 @} 3353@} 3354@c endfile 3355@end smallexample 3356 3357Finally, we have a look at @code{MakeMaze()}, the function that generates 3358the @code{Maze} array. When entered, this function assumes that the array 3359has been initialized so that each element represents a wall element and 3360the maze is initially full of wall elements. Only the entrance and the exit 3361of the maze should have been left free. The parameters of the function tell 3362us which element must be marked as not being a wall. After this, we take 3363a look at the four neighboring elements and remember which we have already 3364treated. Of all the neighboring elements, we take one at random and 3365walk in that direction. Therefore, the wall element in that direction has 3366to be removed and then, we call the function recursively for that element. 3367The maze is only completed if we iterate the above procedure for 3368@emph{all} neighboring elements (in random order) and for our present 3369element by recursively calling the function for the present element. This 3370last iteration could have been done in a loop, 3371but it is done much simpler recursively. 3372 3373Notice that elements with coordinates that are both odd are assumed to be 3374on our way through the maze and the generating process cannot terminate 3375as long as there is such an element not being @code{delete}d. All other 3376elements are potentially part of the wall. 3377 3378@smallexample 3379@c file eg/network/maze.awk 3380function MakeMaze(x, y) @{ 3381 delete Maze[x, y] # here we are, we have no wall here 3382 p = 0 # count unvisited fields in all directions 3383 if (x-2 SUBSEP y in Maze) d[p++] = "-x" 3384 if (x SUBSEP y-2 in Maze) d[p++] = "-y" 3385 if (x+2 SUBSEP y in Maze) d[p++] = "+x" 3386 if (x SUBSEP y+2 in Maze) d[p++] = "+y" 3387 if (p>0) @{ # if there are unvisited fields, go there 3388 p = int(p*rand()) # choose one unvisited field at random 3389 if (d[p] == "-x") @{ delete Maze[x - 1, y]; MakeMaze(x - 2, y) 3390 @} else if (d[p] == "-y") @{ delete Maze[x, y - 1]; MakeMaze(x, y - 2) 3391 @} else if (d[p] == "+x") @{ delete Maze[x + 1, y]; MakeMaze(x + 2, y) 3392 @} else if (d[p] == "+y") @{ delete Maze[x, y + 1]; MakeMaze(x, y + 2) 3393 @} # we are back from recursion 3394 MakeMaze(x, y); # try again while there are unvisited fields 3395 @} 3396@} 3397@c endfile 3398@end smallexample 3399 3400@node MOBAGWHO, STOXPRED, MAZE, Some Applications and Techniques 3401@section MOBAGWHO: a Simple Mobile Agent 3402@cindex MOBAGWHO program 3403@cindex agent 3404@quotation 3405@cindex Hoare, C.A.R. 3406@i{There are two ways of constructing a software design: One way is to 3407make it so simple that there are obviously no deficiencies, and the 3408other way is to make it so complicated that there are no obvious 3409deficiencies.} 3410@author C.A.R.@: Hoare 3411@end quotation 3412 3413A @dfn{mobile agent} is a program that can be dispatched from a computer and 3414transported to a remote server for execution. This is called @dfn{migration}, 3415which means that a process on another system is started that is independent 3416from its originator. Ideally, it wanders through 3417a network while working for its creator or owner. In places like 3418the UMBC Agent Web, 3419people are quite confident that (mobile) agents are a software engineering 3420paradigm that enables us to significantly increase the efficiency 3421of our work. Mobile agents could become the mediators between users and 3422the networking world. For an unbiased view at this technology, 3423see the remarkable paper @cite{Mobile Agents: Are they a good 3424idea?}.@footnote{@uref{https://link.springer.com/chapter/10.1007/3-540-62852-5_4}} 3425 3426When trying to migrate a process from one system to another, 3427a server process is needed on the receiving side. Depending on the kind 3428of server process, several ways of implementation come to mind. 3429How the process is implemented depends upon the kind of server process: 3430 3431@itemize @bullet 3432@item 3433HTTP can be used as the protocol for delivery of the migrating 3434process. In this case, we use a common web 3435server as the receiving server process. A universal CGI script 3436mediates between migrating process and web server. 3437Each server willing to accept migrating agents makes this universal 3438service available. HTTP supplies the @code{POST} method to transfer 3439some data to a file on the web server. When a CGI script is called 3440remotely with the @code{POST} method instead of the usual @code{GET} method, 3441data is transmitted from the client process to the standard input 3442of the server's CGI script. So, to implement a mobile agent, 3443we must not only write the agent program to start on the client 3444side, but also the CGI script to receive the agent on the server side. 3445 3446@cindex CGI (Common Gateway Interface) 3447@cindex apache 3448@item 3449The @code{PUT} method can also be used for migration. HTTP does not 3450require a CGI script for migration via @code{PUT}. However, with common web 3451servers there is no advantage to this solution, because web servers such as 3452Apache 3453require explicit activation of a special @code{PUT} script. 3454 3455@item 3456@cite{Agent Tcl} pursues a different course; it relies on a dedicated server 3457process with a dedicated protocol specialized for receiving mobile agents. 3458@end itemize 3459 3460Our agent example abuses a common web server as a migration tool. So, it needs a 3461universal CGI script on the receiving side (the web server). The receiving script is 3462activated with a @code{POST} request when placed into a location like 3463@file{/httpd/cgi-bin/PostAgent.sh}. 3464 3465@example 3466@c file eg/network/PostAgent.sh 3467#!/bin/sh 3468MobAg=/tmp/MobileAgent.$$ 3469# direct script to mobile agent file 3470cat > $MobAg 3471# execute agent concurrently 3472gawk -f $MobAg $MobAg > /dev/null & 3473# HTTP header, terminator and body 3474gawk 'BEGIN @{ print "\r\nAgent started" @}' 3475rm $MobAg # delete script file of agent 3476@c endfile 3477@end example 3478 3479By making its process id (@code{$$}) part of the unique @value{FN}, the 3480script avoids conflicts between concurrent instances of the script. 3481First, all lines 3482from standard input (the mobile agent's source code) are copied into 3483this unique file. Then, the agent is started as a concurrent process 3484and a short message reporting this fact is sent to the submitting client. 3485Finally, the script file of the mobile agent is removed because it is 3486no longer needed. Although it is a short script, there are several noteworthy 3487points: 3488 3489@table @asis 3490@item Security 3491@emph{There is none}. In fact, the CGI script should never 3492be made available on a server that is part of the Internet because everyone 3493would be allowed to execute arbitrary commands with it. This behavior is 3494acceptable only when performing rapid prototyping. 3495 3496@item Self-Reference 3497Each migrating instance of an agent is started 3498in a way that enables it to read its own source code from standard input 3499and use the code for subsequent 3500migrations. This is necessary because it needs to treat the agent's code 3501as data to transmit. @command{gawk} is not the ideal language for such 3502a job. Lisp and Tcl are more suitable because they do not make a distinction 3503between program code and data. 3504 3505@item Independence 3506After migration, the agent is not linked to its 3507former home in any way. By reporting @samp{Agent started}, it waves 3508``Goodbye'' to its origin. The originator may choose to terminate or not. 3509@end table 3510 3511@cindex Lisp 3512The originating agent itself is started just like any other command-line 3513script, and reports the results on standard output. By letting the name 3514of the original host migrate with the agent, the agent that migrates 3515to a host far away from its origin can report the result back home. 3516Having arrived at the end of the journey, the agent establishes 3517a connection and reports the results. This is the reason for 3518determining the name of the host with @samp{uname -n} and storing it 3519in @code{MyOrigin} for later use. We may also set variables with the 3520@option{-v} option from the command line. This interactivity is only 3521of importance in the context of starting a mobile agent; therefore this 3522@code{BEGIN} pattern and its action do not take part in migration: 3523 3524@smallexample 3525@c file eg/network/mobag.awk 3526BEGIN @{ 3527 if (ARGC != 2) @{ 3528 print "MOBAG - a simple mobile agent" 3529 print "CALL:\n gawk -f mobag.awk mobag.awk" 3530 print "IN:\n the name of this script as a command-line parameter" 3531 print "PARAM:\n -v MyOrigin=myhost.com" 3532 print "OUT:\n the result on stdout" 3533 print "JK 29.03.1998 01.04.1998" 3534 exit 3535 @} 3536 if (MyOrigin == "") @{ 3537 "uname -n" | getline MyOrigin 3538 close("uname -n") 3539 @} 3540@} 3541@c endfile 3542@end smallexample 3543 3544Since @command{gawk} cannot manipulate and transmit parts of the program 3545directly, the source code is read and stored in strings. 3546Therefore, the program scans itself for 3547the beginning and the ending of functions. 3548Each line in between is appended to the code string until the end of 3549the function has been reached. A special case is this part of the program 3550itself. It is not a function. 3551Placing a similar framework around it causes it to be treated 3552like a function. Notice that this mechanism works for all the 3553functions of the source code, but it cannot guarantee that the order 3554of the functions is preserved during migration: 3555 3556@smallexample 3557@c file eg/network/mobag.awk 3558#ReadMySelf 3559/^function / @{ FUNC = $2 @} 3560/^END/ || /^#ReadMySelf/ @{ FUNC = $1 @} 3561FUNC != "" @{ MOBFUN[FUNC] = MOBFUN[FUNC] RS $0 @} 3562(FUNC != "") && (/^@}/ || /^#EndOfMySelf/) \ 3563 @{ FUNC = "" @} 3564#EndOfMySelf 3565@c endfile 3566@end smallexample 3567 3568The web server code in 3569@ref{Interacting Service, ,A Web Service with Interaction}, 3570was first developed as a site-independent core. Likewise, the 3571@command{gawk}-based mobile agent 3572starts with an agent-independent core, to which can be appended 3573application-dependent functions. What follows is the only 3574application-independent function needed for the mobile agent: 3575 3576@smallexample 3577@c file eg/network/mobag.awk 3578function migrate(Destination, MobCode, Label) @{ 3579 MOBVAR["Label"] = Label 3580 MOBVAR["Destination"] = Destination 3581 RS = ORS = "\r\n" 3582 HttpService = "/inet/tcp/0/" Destination 3583 for (i in MOBFUN) 3584 MobCode = (MobCode "\n" MOBFUN[i]) 3585 MobCode = MobCode "\n\nBEGIN @{" 3586 for (i in MOBVAR) 3587 MobCode = (MobCode "\n MOBVAR[\"" i "\"] = \"" MOBVAR[i] "\"") 3588 MobCode = MobCode "\n@}\n" 3589 print "POST /cgi-bin/PostAgent.sh HTTP/1.0" |& HttpService 3590 print "Content-length:", length(MobCode) ORS |& HttpService 3591 printf "%s", MobCode |& HttpService 3592 while ((HttpService |& getline) > 0) 3593 print $0 3594 close(HttpService) 3595@} 3596@c endfile 3597@end smallexample 3598 3599The @code{migrate()} function prepares the 3600aforementioned strings containing the program code and transmits them to a 3601server. A consequence of this modular approach is that the @code{migrate()} 3602function takes some parameters that aren't needed in this application, 3603but that will be in future ones. Its mandatory parameter @code{Destination} holds the 3604name (or IP address) of the server that the agent wants as a host for its 3605code. The optional parameter @code{MobCode} may contain some @command{gawk} 3606code that is inserted during migration in front of all other code. 3607The optional parameter @code{Label} may contain 3608a string that tells the agent what to do in program execution after 3609arrival at its new home site. One of the serious obstacles in implementing 3610a framework for mobile agents is that it does not suffice to migrate the 3611code. It is also necessary to migrate the state of execution of the agent. In 3612contrast to @cite{Agent Tcl}, this program does not try to migrate the complete set 3613of variables. The following conventions apply: 3614 3615@itemize @bullet 3616@item 3617Each variable in an agent program is local to the current host and does 3618@emph{not} migrate. 3619 3620@item 3621The array @code{MOBFUN} shown above is an exception. It is handled 3622by the function @code{migrate()} and does migrate with the application. 3623 3624@item 3625The other exception is the array @code{MOBVAR}. Each variable that 3626takes part in migration has to be an element of this array. 3627@code{migrate()} also takes care of this. 3628@end itemize 3629 3630Now it's clear what happens to the @code{Label} parameter of the 3631function @code{migrate()}. It is copied into @code{MOBVAR["Label"]} and 3632travels alongside the other data. Since traveling takes place via HTTP, 3633records must be separated with @code{"\r\n"} in @code{RS} and 3634@code{ORS} as usual. The code assembly for migration takes place in 3635three steps: 3636 3637@itemize @bullet 3638@item 3639Iterate over @code{MOBFUN} to collect all functions verbatim. 3640 3641@item 3642Prepare a @code{BEGIN} pattern and put assignments to mobile 3643variables into the action part. 3644 3645@item 3646Transmission itself resembles GETURL: the header with the request 3647and the @code{Content-length} is followed by the body. In case there is 3648any reply over the network, it is read completely and echoed to 3649standard output to avoid irritating the server. 3650@end itemize 3651 3652The application-independent framework is now almost complete. What follows 3653is the @code{END} pattern which executes when the mobile agent has 3654finished reading its own code. First, it checks whether it is already 3655running on a remote host or not. In case initialization has not yet taken 3656place, it starts @code{MyInit()}. Otherwise (later, on a remote host), it 3657starts @code{MyJob()}: 3658 3659@smallexample 3660@c file eg/network/mobag.awk 3661END @{ 3662 if (ARGC != 2) exit # stop when called with wrong parameters 3663 if (MyOrigin != "") # is this the originating host? 3664 MyInit() # if so, initialize the application 3665 else # we are on a host with migrated data 3666 MyJob() # so we do our job 3667@} 3668@c endfile 3669@end smallexample 3670 3671All that's left to extend the framework into a complete application 3672is to write two application-specific functions: @code{MyInit()} and 3673@code{MyJob()}. Keep in mind that the former is executed once on the 3674originating host, while the latter is executed after each migration: 3675 3676@smallexample 3677@c file eg/network/mobag.awk 3678function MyInit() @{ 3679 MOBVAR["MyOrigin"] = MyOrigin 3680 MOBVAR["Machines"] = "localhost/80 max/80 moritz/80 castor/80" 3681 split(MOBVAR["Machines"], Machines) # which host is the first? 3682 migrate(Machines[1], "", "") # go to the first host 3683 while (("/inet/tcp/8080/0/0" |& getline) > 0) # wait for result 3684 print $0 # print result 3685 close("/inet/tcp/8080/0/0") 3686@} 3687@c endfile 3688@end smallexample 3689 3690As mentioned earlier, this agent takes the name of its origin 3691(@code{MyOrigin}) with it. Then, it takes the name of its first 3692destination and goes there for further work. Notice that this name has 3693the port number of the web server appended to the name of the server, 3694because the function @code{migrate()} needs it this way to create 3695the @code{HttpService} variable. Finally, it waits for the result to arrive. 3696The @code{MyJob()} function runs on the remote host: 3697 3698@smallexample 3699@c file eg/network/mobag.awk 3700function MyJob() @{ 3701 # forget this host 3702 sub(MOBVAR["Destination"], "", MOBVAR["Machines"]) 3703 MOBVAR["Result"]=MOBVAR["Result"] SUBSEP SUBSEP MOBVAR["Destination"] ":" 3704 while (("who" | getline) > 0) # who is logged in? 3705 MOBVAR["Result"] = MOBVAR["Result"] SUBSEP $0 3706 close("who") 3707 if (index(MOBVAR["Machines"], "/") > 0) @{ # any more machines to visit? 3708 split(MOBVAR["Machines"], Machines) # which host is next? 3709 migrate(Machines[1], "", "") # go there 3710 @} else @{ # no more machines 3711 gsub(SUBSEP, "\n", MOBVAR["Result"]) # send result to origin 3712 print MOBVAR["Result"] |& "/inet/tcp/0/" MOBVAR["MyOrigin"] "/8080" 3713 close("/inet/tcp/0/" MOBVAR["MyOrigin"] "/8080") 3714 @} 3715@} 3716@c endfile 3717@end smallexample 3718 3719After migrating, the first thing to do in @code{MyJob()} is to delete 3720the name of the current host from the list of hosts to visit. Now, it 3721is time to start the real work by appending the host's name to the 3722result string, and reading line by line who is logged in on this host. 3723A very annoying circumstance is the fact that the elements of 3724@code{MOBVAR} cannot hold the newline character (@code{"\n"}). If they 3725did, migration of this string would not work because the string wouldn't 3726obey the syntax rule for a string in @command{gawk}. 3727@code{SUBSEP} is used as a temporary replacement. 3728 3729If the list of hosts to visit holds 3730at least one more entry, the agent migrates to that place to go on 3731working there. Otherwise, we replace the @code{SUBSEP}s 3732with a newline character in the resulting string, and report it to 3733the originating host, whose name is stored in @code{MOBVAR["MyOrigin"]}. 3734 3735@node STOXPRED, PROTBASE, MOBAGWHO, Some Applications and Techniques 3736@section STOXPRED: Stock Market Prediction As A Service 3737@cindex STOXPRED program 3738@cindex Yahoo! 3739@quotation 3740@i{Far out in the uncharted backwaters of the unfashionable end of 3741the Western Spiral arm of the Galaxy lies a small unregarded yellow sun.} 3742 3743@i{Orbiting this at a distance of roughly ninety-two million miles is an 3744utterly insignificant little blue-green planet whose ape-descendent life 3745forms are so amazingly primitive that they still think digital watches are 3746a pretty neat idea.} 3747 3748@i{This planet has --- or rather had --- a problem, which was this: 3749most of the people living on it were unhappy for pretty much of the time. 3750Many solutions were suggested for this problem, but most of these were 3751largely concerned with the movements of small green pieces of paper, 3752which is odd because it wasn't the small green pieces of paper that 3753were unhappy.} @* 3754@author Douglas Adams, @cite{The Hitch Hiker's Guide to the Galaxy} 3755@end quotation 3756 3757@cindex @command{cron} utility 3758Valuable services on the Internet are usually @emph{not} implemented 3759as mobile agents. There are much simpler ways of implementing services. 3760All Unix systems provide, for example, the @command{cron} service. 3761Unix system users can write a list of tasks to be done each day, each 3762week, twice a day, or just once. The list is entered into a file named 3763@file{crontab}. For example, to distribute a newsletter on a daily 3764basis this way, use @command{cron} for calling a script each day early 3765in the morning: 3766 3767@example 3768# run at 8 am on weekdays, distribute the newsletter 37690 8 * * 1-5 $HOME/bin/daily.job >> $HOME/log/newsletter 2>&1 3770@end example 3771 3772The script first looks for interesting information on the Internet, 3773assembles it in a nice form and sends the results via email to 3774the customers. 3775 3776The following is an example of a primitive 3777newsletter on stock market prediction. It is a report which first 3778tries to predict the change of each share in the Dow Jones Industrial 3779Index for the particular day. Then it mentions some especially 3780promising shares as well as some shares which look remarkably bad 3781on that day. The report ends with the usual disclaimer which tells 3782every child @emph{not} to try this at home and hurt anybody. 3783@cindex Dow Jones Industrial Index 3784 3785@smallexample 3786Good morning Uncle Scrooge, 3787 3788This is your daily stock market report for Monday, October 16, 2000. 3789Here are the predictions for today: 3790 3791 AA neutral 3792 GE up 3793 JNJ down 3794 MSFT neutral 3795 @dots{} 3796 UTX up 3797 DD down 3798 IBM up 3799 MO down 3800 WMT up 3801 DIS up 3802 INTC up 3803 MRK down 3804 XOM down 3805 EK down 3806 IP down 3807 3808The most promising shares for today are these: 3809 3810 INTC http://biz.yahoo.com/n/i/intc.html 3811 3812The stock shares to avoid today are these: 3813 3814 EK http://biz.yahoo.com/n/e/ek.html 3815 IP http://biz.yahoo.com/n/i/ip.html 3816 DD http://biz.yahoo.com/n/d/dd.html 3817 @dots{} 3818@end smallexample 3819 3820The script as a whole is rather long. In order to ease the pain of 3821studying other people's source code, we have broken the script 3822up into meaningful parts which are invoked one after the other. 3823The basic structure of the script is as follows: 3824 3825@example 3826@c file eg/network/stoxpred.awk 3827BEGIN @{ 3828 Init() 3829 ReadQuotes() 3830 CleanUp() 3831 Prediction() 3832 Report() 3833 SendMail() 3834@} 3835@c endfile 3836@end example 3837 3838The earlier parts store data into variables and arrays which are 3839subsequently used by later parts of the script. The @code{Init()} function 3840first checks if the script is invoked correctly (without any parameters). 3841If not, it informs the user of the correct usage. What follows are preparations 3842for the retrieval of the historical quote data. The names of the 30 stock 3843shares are stored in an array @code{name} along with the current date 3844in @code{day}, @code{month}, and @code{year}. 3845 3846All users who are separated 3847from the Internet by a firewall and have to direct their Internet accesses 3848to a proxy must supply the name of the proxy to this script with the 3849@samp{-v Proxy=@var{name}} option. For most users, the default proxy and 3850port number should suffice. 3851 3852@example 3853@c file eg/network/stoxpred.awk 3854function Init() @{ 3855 if (ARGC != 1) @{ 3856 print "STOXPRED - daily stock share prediction" 3857 print "IN:\n no parameters, nothing on stdin" 3858 print "PARAM:\n -v Proxy=MyProxy -v ProxyPort=80" 3859 print "OUT:\n commented predictions as email" 3860 print "JK 09.10.2000" 3861 exit 3862 @} 3863 # Remember ticker symbols from Dow Jones Industrial Index 3864 StockCount = split("AA GE JNJ MSFT AXP GM JPM PG BA HD KO \ 3865 SBC C HON MCD T CAT HWP MMM UTX DD IBM MO WMT DIS INTC \ 3866 MRK XOM EK IP", name); 3867 # Remember the current date as the end of the time series 3868 day = strftime("%d") 3869 month = strftime("%m") 3870 year = strftime("%Y") 3871 if (Proxy == "") Proxy = "chart.yahoo.com" 3872 if (ProxyPort == 0) ProxyPort = 80 3873 YahooData = "/inet/tcp/0/" Proxy "/" ProxyPort 3874@} 3875@c endfile 3876@end example 3877 3878@cindex CSV format 3879There are two really interesting parts in the script. One is the 3880function which reads the historical stock quotes from an Internet 3881server. The other is the one that does the actual prediction. In 3882the following function we see how the quotes are read from the 3883Yahoo server. The data which comes from the server is in 3884CSV format (comma-separated values): 3885 3886@example 3887@c file eg/network/stoxdata.txt 3888Date,Open,High,Low,Close,Volume 38899-Oct-00,22.75,22.75,21.375,22.375,7888500 38906-Oct-00,23.8125,24.9375,21.5625,22,10701100 38915-Oct-00,24.4375,24.625,23.125,23.50,5810300 3892@c endfile 3893@end example 3894 3895Lines contain values of the same time instant, whereas columns are 3896separated by commas and contain the kind of data that is described 3897in the header (first) line. At first, @command{gawk} is instructed to 3898separate columns by commas (@samp{FS = ","}). In the loop that follows, 3899a connection to the Yahoo server is first opened, then a download takes 3900place, and finally the connection is closed. All this happens once for 3901each ticker symbol. In the body of this loop, an Internet address is 3902built up as a string according to the rules of the Yahoo server. The 3903starting and ending date are chosen to be exactly the same, but one year 3904apart in the past. All the action is initiated within the @code{printf} 3905command which transmits the request for data to the Yahoo server. 3906 3907In the inner loop, the server's data is first read and then scanned 3908line by line. Only lines which have six columns and the name of a month 3909in the first column contain relevant data. This data is stored 3910in the two-dimensional array @code{quote}; one dimension 3911being time, the other being the ticker symbol. During retrieval of the 3912first stock's data, the calendar names of the time instances are stored 3913in the array @code{day} because we need them later. 3914 3915@smallexample 3916@c file eg/network/stoxpred.awk 3917function ReadQuotes() @{ 3918 # Retrieve historical data for each ticker symbol 3919 FS = "," 3920 for (stock = 1; stock <= StockCount; stock++) @{ 3921 URL = "http://chart.yahoo.com/table.csv?s=" name[stock] \ 3922 "&a=" month "&b=" day "&c=" year-1 \ 3923 "&d=" month "&e=" day "&f=" year \ 3924 "g=d&q=q&y=0&z=" name[stock] "&x=.csv" 3925 printf("GET " URL " HTTP/1.0\r\n\r\n") |& YahooData 3926 while ((YahooData |& getline) > 0) @{ 3927 if (NF == 6 && $1 ~ /Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec/) @{ 3928 if (stock == 1) 3929 days[++daycount] = $1; 3930 quote[$1, stock] = $5 3931 @} 3932 @} 3933 close(YahooData) 3934 @} 3935 FS = " " 3936@} 3937@c endfile 3938@end smallexample 3939 3940Now that we @emph{have} the data, it can be checked once again to make sure 3941that no individual stock is missing or invalid, and that all the stock quotes are 3942aligned correctly. Furthermore, we renumber the time instances. The 3943most recent day gets day number 1 and all other days get consecutive 3944numbers. All quotes are rounded toward the nearest whole number in US Dollars. 3945 3946@smallexample 3947@c file eg/network/stoxpred.awk 3948function CleanUp() @{ 3949 # clean up time series; eliminate incomplete data sets 3950 for (d = 1; d <= daycount; d++) @{ 3951 for (stock = 1; stock <= StockCount; stock++) 3952 if (! ((days[d], stock) in quote)) 3953 stock = StockCount + 10 3954 if (stock > StockCount + 1) 3955 continue 3956 datacount++ 3957 for (stock = 1; stock <= StockCount; stock++) 3958 data[datacount, stock] = int(0.5 + quote[days[d], stock]) 3959 @} 3960 delete quote 3961 delete days 3962@} 3963@c endfile 3964@end smallexample 3965 3966Now we have arrived at the second really interesting part of the whole affair. 3967What we present here is a very primitive prediction algorithm: 3968@emph{If a stock fell yesterday, assume it will also fall today; if 3969it rose yesterday, assume it will rise today}. (Feel free to replace this 3970algorithm with a smarter one.) If a stock changed in the same direction 3971on two consecutive days, this is an indication which should be highlighted. 3972Two-day advances are stored in @code{hot} and two-day declines in 3973@code{avoid}. 3974 3975The rest of the function is a sanity check. It counts the number of 3976correct predictions in relation to the total number of predictions 3977one could have made in the year before. 3978 3979@smallexample 3980@c file eg/network/stoxpred.awk 3981function Prediction() @{ 3982 # Predict each ticker symbol by prolonging yesterday's trend 3983 for (stock = 1; stock <= StockCount; stock++) @{ 3984 if (data[1, stock] > data[2, stock]) @{ 3985 predict[stock] = "up" 3986 @} else if (data[1, stock] < data[2, stock]) @{ 3987 predict[stock] = "down" 3988 @} else @{ 3989 predict[stock] = "neutral" 3990 @} 3991 if ((data[1, stock] > data[2, stock]) && (data[2, stock] > data[3, stock])) 3992 hot[stock] = 1 3993 if ((data[1, stock] < data[2, stock]) && (data[2, stock] < data[3, stock])) 3994 avoid[stock] = 1 3995 @} 3996 # Do a plausibility check: how many predictions proved correct? 3997 for (s = 1; s <= StockCount; s++) @{ 3998 for (d = 1; d <= datacount-2; d++) @{ 3999 if (data[d+1, s] > data[d+2, s]) @{ 4000 UpCount++ 4001 @} else if (data[d+1, s] < data[d+2, s]) @{ 4002 DownCount++ 4003 @} else @{ 4004 NeutralCount++ 4005 @} 4006 if (((data[d, s] > data[d+1, s]) && (data[d+1, s] > data[d+2, s])) || 4007 ((data[d, s] < data[d+1, s]) && (data[d+1, s] < data[d+2, s])) || 4008 ((data[d, s] == data[d+1, s]) && (data[d+1, s] == data[d+2, s]))) 4009 CorrectCount++ 4010 @} 4011 @} 4012@} 4013@c endfile 4014@end smallexample 4015 4016At this point the hard work has been done: the array @code{predict} 4017contains the predictions for all the ticker symbols. It is up to the 4018function @code{Report()} to find some nice words to present the 4019desired information. 4020 4021@smallexample 4022@c file eg/network/stoxpred.awk 4023function Report() @{ 4024 # Generate report 4025 report = "\nThis is your daily " 4026 report = report "stock market report for "strftime("%A, %B %d, %Y")".\n" 4027 report = report "Here are the predictions for today:\n\n" 4028 for (stock = 1; stock <= StockCount; stock++) 4029 report = report "\t" name[stock] "\t" predict[stock] "\n" 4030 for (stock in hot) @{ 4031 if (HotCount++ == 0) 4032 report = report "\nThe most promising shares for today are these:\n\n" 4033 report = report "\t" name[stock] "\t\thttp://biz.yahoo.com/n/" \ 4034 tolower(substr(name[stock], 1, 1)) "/" tolower(name[stock]) ".html\n" 4035 @} 4036 for (stock in avoid) @{ 4037 if (AvoidCount++ == 0) 4038 report = report "\nThe stock shares to avoid today are these:\n\n" 4039 report = report "\t" name[stock] "\t\thttp://biz.yahoo.com/n/" \ 4040 tolower(substr(name[stock], 1, 1)) "/" tolower(name[stock]) ".html\n" 4041 @} 4042 report = report "\nThis sums up to " HotCount+0 " winners and " AvoidCount+0 4043 report = report " losers. When using this kind\nof prediction scheme for" 4044 report = report " the 12 months which lie behind us,\nwe get " UpCount 4045 report = report " 'ups' and " DownCount " 'downs' and " NeutralCount 4046 report = report " 'neutrals'. Of all\nthese " UpCount+DownCount+NeutralCount 4047 report = report " predictions " CorrectCount " proved correct next day.\n" 4048 report = report "A success rate of "\ 4049 int(100*CorrectCount/(UpCount+DownCount+NeutralCount)) "%.\n" 4050 report = report "Random choice would have produced a 33% success rate.\n" 4051 report = report "Disclaimer: Like every other prediction of the stock\n" 4052 report = report "market, this report is, of course, complete nonsense.\n" 4053 report = report "If you are stupid enough to believe these predictions\n" 4054 report = report "you should visit a doctor who can treat your ailment." 4055@} 4056@c endfile 4057@end smallexample 4058 4059The function @code{SendMail()} goes through the list of customers and opens 4060a pipe to the @command{mail} command for each of them. Each one receives an 4061email message with a proper subject heading and is addressed with his full name. 4062 4063@smallexample 4064@c file eg/network/stoxpred.awk 4065function SendMail() @{ 4066 # send report to customers 4067 customer["uncle.scrooge@@ducktown.gov"] = "Uncle Scrooge" 4068 customer["more@@utopia.org" ] = "Sir Thomas More" 4069 customer["spinoza@@denhaag.nl" ] = "Baruch de Spinoza" 4070 customer["marx@@highgate.uk" ] = "Karl Marx" 4071 customer["keynes@@the.long.run" ] = "John Maynard Keynes" 4072 customer["bierce@@devil.hell.org" ] = "Ambrose Bierce" 4073 customer["laplace@@paris.fr" ] = "Pierre Simon de Laplace" 4074 for (c in customer) @{ 4075 MailPipe = "mail -s 'Daily Stock Prediction Newsletter'" c 4076 print "Good morning " customer[c] "," | MailPipe 4077 print report "\n.\n" | MailPipe 4078 close(MailPipe) 4079 @} 4080@} 4081@c endfile 4082@end smallexample 4083 4084Be patient when running the script by hand. 4085Retrieving the data for all the ticker symbols and sending the emails 4086may take several minutes to complete, depending upon network traffic 4087and the speed of the available Internet link. 4088The quality of the prediction algorithm is likely to be disappointing. 4089Try to find a better one. 4090Should you find one with a success rate of more than 50%, please tell 4091us about it! It is only for the sake of curiosity, of course. @code{:-)} 4092 4093@node PROTBASE, , STOXPRED, Some Applications and Techniques 4094@section PROTBASE: Searching Through A Protein Database 4095@cindex PROTBASE 4096@cindex NCBI, National Center for Biotechnology Information 4097@cindex BLAST, Basic Local Alignment Search Tool 4098@cindex Hoare, C.A.R. 4099@quotation 4100@i{Inside every large problem is a small 4101problem struggling to get out.}@footnote{What C.A.R.@: Hoare 4102actually said was ``Inside every large program is a 4103small program struggling to get out.''} 4104@author With apologies to C.A.R.@: Hoare 4105@end quotation 4106 4107Yahoo's database of stock market data is just one among the many large 4108databases on the Internet. Another one is located at NCBI 4109(National Center for Biotechnology 4110Information). Established in 1988 as a national resource for molecular 4111biology information, NCBI creates public databases, conducts research 4112in computational biology, develops software tools for analyzing genome 4113data, and disseminates biomedical information. In this section, we 4114look at one of NCBI's public services, which is called BLAST 4115(Basic Local Alignment Search Tool). 4116 4117You probably know that the information necessary for reproducing living 4118cells is encoded in the genetic material of the cells. The genetic material 4119is a very long chain of four base nucleotides. It is the order of 4120appearance (the sequence) of nucleotides which contains the information 4121about the substance to be produced. Scientists in biotechnology often 4122find a specific fragment, determine the nucleotide sequence, and need 4123to know where the sequence at hand comes from. 4124 4125This is where the large 4126databases enter the game. At NCBI, databases store the knowledge 4127about which sequences have ever been found and where they have been found. 4128When the scientist sends his sequence to the BLAST service, the server 4129looks for regions of genetic material in its database which 4130look the most similar to the delivered nucleotide sequence. After a 4131search time of some seconds or minutes the server sends an answer to 4132the scientist. In order to make access simple, NCBI chose to offer 4133their database service through popular Internet protocols. There are 4134four basic ways to use the so-called BLAST services: 4135 4136@c FIXME: Is all of this still true? 4137@itemize @bullet 4138@item 4139The easiest way to use BLAST is through the web. Users may simply point 4140their browsers at the NCBI home page 4141and link to the BLAST pages. 4142NCBI provides a stable URL that may be used to perform BLAST searches 4143without interactive use of a web browser. This is what we will do later 4144in this section. 4145A demonstration client 4146and a @file{README} file demonstrate how to access this URL. 4147 4148@item 4149Currently, 4150@command{blastcl3} is the standard network BLAST client. 4151You can download @command{blastcl3} from the 4152anonymous FTP location. 4153 4154@item 4155BLAST 2.0 can be run locally as a full executable and can be used to run 4156BLAST searches against private local databases, or downloaded copies of the 4157NCBI databases. BLAST 2.0 executables may be found on the NCBI 4158anonymous FTP server. 4159 4160@item 4161The NCBI BLAST Email server is the best option for people without convenient 4162access to the web. A similarity search can be performed by sending a properly 4163formatted mail message containing the nucleotide or protein query sequence to 4164@email{blast@@ncbi.nlm.nih.gov}. The query sequence is compared against the 4165specified database using the BLAST algorithm and the results are returned in 4166an email message. For more information on formulating email BLAST searches, 4167you can send a message consisting of the word ``HELP'' to the same address, 4168@email{blast@@ncbi.nlm.nih.gov}. 4169@end itemize 4170 4171Our starting point is the demonstration client mentioned in the first option. 4172The @file{README} file that comes along with the client explains the whole 4173process in a nutshell. In the rest of this section, we first show 4174what such requests look like. Then we show how to use @command{gawk} to 4175implement a client in about 10 lines of code. Finally, we show how to 4176interpret the result returned from the service. 4177 4178Sequences are expected to be represented in the standard 4179IUB/IUPAC amino acid and nucleic acid codes, 4180with these exceptions: lower-case letters are accepted and are mapped 4181into upper-case; a single hyphen or dash can be used to represent a gap 4182of indeterminate length; and in amino acid sequences, @samp{U} and @samp{*} 4183are acceptable letters (see below). Before submitting a request, any numerical 4184digits in the query sequence should either be removed or replaced by 4185appropriate letter codes (e.g., @samp{N} for unknown nucleic acid residue 4186or @samp{X} for unknown amino acid residue). 4187The nucleic acid codes supported are: 4188 4189@example 4190A --> adenosine M --> A C (amino) 4191C --> cytidine S --> G C (strong) 4192G --> guanine W --> A T (weak) 4193T --> thymidine B --> G T C 4194U --> uridine D --> G A T 4195R --> G A (purine) H --> A C T 4196Y --> T C (pyrimidine) V --> G C A 4197K --> G T (keto) N --> A G C T (any) 4198 - gap of indeterminate length 4199@end example 4200 4201Now you know the alphabet of nucleotide sequences. The last two lines 4202of the following example query show such a sequence, which is obviously 4203made up only of elements of the alphabet just described. Store this example 4204query into a file named @file{protbase.request}. You are now ready to send 4205it to the server with the demonstration client. 4206 4207@example 4208@c file eg/network/protbase.request 4209PROGRAM blastn 4210DATALIB month 4211EXPECT 0.75 4212BEGIN 4213>GAWK310 the gawking gene GNU AWK 4214tgcttggctgaggagccataggacgagagcttcctggtgaagtgtgtttcttgaaatcat 4215caccaccatggacagcaaa 4216@c endfile 4217@end example 4218 4219@cindex FASTA/Pearson format 4220The actual search request begins with the mandatory parameter @samp{PROGRAM} 4221in the first column followed by the value @samp{blastn} (the name of the 4222program) for searching nucleic acids. The next line contains the mandatory 4223search parameter @samp{DATALIB} with the value @samp{month} for the newest 4224nucleic acid sequences. The third line contains an optional @samp{EXPECT} 4225parameter and the value desired for it. The fourth line contains the 4226mandatory @samp{BEGIN} directive, followed by the query sequence in 4227FASTA/Pearson format. 4228Each line of information must be less than 80 characters in length. 4229 4230The ``month'' database contains all new or revised sequences released in the 4231last 30 days and is useful for searching against new sequences. 4232There are five different blast programs, @command{blastn} being the one that 4233compares a nucleotide query sequence against a nucleotide sequence database. 4234 4235The last server directive that must appear in every request is the 4236@samp{BEGIN} directive. The query sequence should immediately follow the 4237@samp{BEGIN} directive and must appear in FASTA/Pearson format. 4238A sequence in 4239FASTA/Pearson format begins with a single-line description. 4240The description line, which is required, is distinguished from the lines of 4241sequence data that follow it by having a greater-than (@samp{>}) symbol 4242in the first column. For the purposes of the BLAST server, the text of 4243the description is arbitrary. 4244 4245If you prefer to use a client written in @command{gawk}, just store the following 424610 lines of code into a file named @file{protbase.awk} and use this client 4247instead. Invoke it with @samp{gawk -f protbase.awk protbase.request}. 4248Then wait a minute and watch the result coming in. In order to replicate 4249the demonstration client's behavior as closely as possible, this client 4250does not use a proxy server. We could also have extended the client program 4251in @ref{GETURL, ,Retrieving Web Pages}, to implement the client request from 4252@file{protbase.awk} as a special case. 4253 4254@smallexample 4255@c file eg/network/protbase.awk 4256@{ request = request "\n" $0 @} 4257 4258END @{ 4259 BLASTService = "/inet/tcp/0/www.ncbi.nlm.nih.gov/80" 4260 printf "POST /cgi-bin/BLAST/nph-blast_report HTTP/1.0\n" |& BLASTService 4261 printf "Content-Length: " length(request) "\n\n" |& BLASTService 4262 printf request |& BLASTService 4263 while ((BLASTService |& getline) > 0) 4264 print $0 4265 close(BLASTService) 4266@} 4267@c endfile 4268@end smallexample 4269 4270The demonstration client from NCBI is 214 lines long (written in C) and 4271it is not immediately obvious what it does. Our client is so short that 4272it @emph{is} obvious what it does. First it loops over all lines of the 4273query and stores the whole query into a variable. Then the script 4274establishes an Internet connection to the NCBI server and transmits the 4275query by framing it with a proper HTTP request. Finally it receives 4276and prints the complete result coming from the server. 4277 4278Now, let us look at the result. It begins with an HTTP header, which you 4279can ignore. Then there are some comments about the query having been 4280filtered to avoid spuriously high scores. After this, there is a reference 4281to the paper that describes the software being used for searching the data 4282base. After a repetition of the original query's description we find the 4283list of significant alignments: 4284 4285@smallexample 4286@c file eg/network/protbase.result 4287Sequences producing significant alignments: (bits) Value 4288 4289gb|AC021182.14|AC021182 Homo sapiens chromosome 7 clone RP11-733... 38 0.20 4290gb|AC021056.12|AC021056 Homo sapiens chromosome 3 clone RP11-115... 38 0.20 4291emb|AL160278.10|AL160278 Homo sapiens chromosome 9 clone RP11-57... 38 0.20 4292emb|AL391139.11|AL391139 Homo sapiens chromosome X clone RP11-35... 38 0.20 4293emb|AL365192.6|AL365192 Homo sapiens chromosome 6 clone RP3-421H... 38 0.20 4294emb|AL138812.9|AL138812 Homo sapiens chromosome 11 clone RP1-276... 38 0.20 4295gb|AC073881.3|AC073881 Homo sapiens chromosome 15 clone CTD-2169... 38 0.20 4296@c endfile 4297@end smallexample 4298 4299This means that the query sequence was found in seven human chromosomes. 4300But the value 0.20 (20%) means that the probability of an accidental match 4301is rather high (20%) in all cases and should be taken into account. 4302You may wonder what the first column means. It is a key to the specific 4303database in which this occurrence was found. The unique sequence identifiers 4304reported in the search results can be used as sequence retrieval keys 4305via the NCBI server. The syntax of sequence header lines used by the NCBI 4306BLAST server depends on the database from which each sequence was obtained. 4307The table below lists the identifiers for the databases from which the 4308sequences were derived. 4309 4310@ifinfo 4311@example 4312Database Name Identifier Syntax 4313============================ ======================== 4314GenBank gb|accession|locus 4315EMBL Data Library emb|accession|locus 4316DDBJ, DNA Database of Japan dbj|accession|locus 4317NBRF PIR pir||entry 4318Protein Research Foundation prf||name 4319SWISS-PROT sp|accession|entry name 4320Brookhaven Protein Data Bank pdb|entry|chain 4321Kabat's Sequences of Immuno@dots{} gnl|kabat|identifier 4322Patents pat|country|number 4323GenInfo Backbone Id bbs|number 4324@end example 4325@end ifinfo 4326 4327@ifnotinfo 4328@multitable {Kabat's Sequences of Immuno@dots{}} {@code{@w{sp|accession|entry name}}} 4329@item GenBank @tab @code{gb|accession|locus} 4330@item EMBL Data Library @tab @code{emb|accession|locus} 4331@item DDBJ, DNA Database of Japan @tab @code{dbj|accession|locus} 4332@item NBRF PIR @tab @code{pir||entry} 4333@item Protein Research Foundation @tab @code{prf||name} 4334@item SWISS-PROT @tab @code{@w{sp|accession|entry name}} 4335@item Brookhaven Protein Data Bank @tab @code{pdb|entry|chain} 4336@item Kabat's Sequences of Immuno@dots{} @tab @code{gnl|kabat|identifier} 4337@item Patents @tab @code{pat|country|number} 4338@item GenInfo Backbone Id @tab @code{bbs|number} 4339@end multitable 4340@end ifnotinfo 4341 4342 4343For example, an identifier might be @samp{gb|AC021182.14|AC021182}, where the 4344@samp{gb} tag indicates that the identifier refers to a GenBank sequence, 4345@samp{AC021182.14} is its GenBank ACCESSION, and @samp{AC021182} is the GenBank LOCUS. 4346The identifier contains no spaces, so that a space indicates the end of the 4347identifier. 4348 4349Let us continue in the result listing. Each of the seven alignments mentioned 4350above is subsequently described in detail. We will have a closer look at 4351the first of them. 4352 4353@smallexample 4354>gb|AC021182.14|AC021182 Homo sapiens chromosome 7 clone RP11-733N23, WORKING DRAFT SEQUENCE, 4 4355 unordered pieces 4356 Length = 176383 4357 4358 Score = 38.2 bits (19), Expect = 0.20 4359 Identities = 19/19 (100%) 4360 Strand = Plus / Plus 4361 4362Query: 35 tggtgaagtgtgtttcttg 53 4363 ||||||||||||||||||| 4364Sbjct: 69786 tggtgaagtgtgtttcttg 69804 4365@end smallexample 4366 4367This alignment was located on the human chromosome 7. The fragment on which 4368part of the query was found had a total length of 176383. Only 19 of the 4369nucleotides matched and the matching sequence ran from character 35 to 53 4370in the query sequence and from 69786 to 69804 in the fragment on chromosome 7. 4371If you are still reading at this point, you are probably interested in finding 4372out more about Computational Biology and you might appreciate the following 4373hints. 4374 4375@cindex Computational Biology 4376@cindex Bioinformatics 4377@enumerate 4378@item 4379There is a book called @cite{Introduction to Computational Biology} 4380by Michael S. Waterman, which is worth reading if you are seriously 4381interested. You can find a good 4382book review 4383on the Internet. 4384 4385@item 4386While Waterman's book explains the algorithms employed internally 4387in the database search engines, most practitioners prefer to approach 4388the subject differently. The applied side of Computational Biology is 4389called Bioinformatics, and emphasizes the tools available for day-to-day 4390work as well as how to actually @emph{use} them. One of the very few affordable 4391books on Bioinformatics is 4392@cite{Developing Bioinformatics Computer Skills}. 4393 4394@item 4395The sequences @emph{gawk} and @emph{gnuawk} are in widespread use in 4396the genetic material of virtually every earthly living being. Let us 4397take this as a clear indication that the divine creator has intended 4398@command{gawk} to prevail over other scripting languages such as @samp{perl}, 4399@samp{tcl}, or @samp{python} which are not even proper sequences. (:-) 4400@end enumerate 4401 4402@node Links, GNU Free Documentation License, Some Applications and Techniques, Top 4403@chapter Related Links 4404 4405This section lists the URLs for various items discussed in this @value{DOCUMENT}. 4406They are presented in the order in which they appear. 4407 4408@table @asis 4409 4410@item @cite{Internet Programming with Python} 4411@uref{http://cewing.github.io/training.python_web/html/index.html} 4412 4413@item @cite{Advanced Perl Programming} 4414@uref{http://www.oreilly.com/catalog/advperl} 4415 4416@item @cite{Web Client Programming with Perl} 4417@uref{http://www.oreilly.com/catalog/webclient} 4418 4419@item Richard Stevens's home page and book 4420@uref{http://www.kohala.com/start} 4421 4422@item Volume III of @cite{Internetworking with TCP/IP}, by Comer and Stevens 4423@uref{http://www.cs.purdue.edu/homes/dec/tcpip3s.cont.html} 4424 4425@item XBM Graphics File Format 4426@uref{https://en.wikipedia.org/wiki/X_BitMap} 4427 4428@item GNUPlot 4429@uref{http://www.gnuplot.info} 4430 4431@item Mark Humphrys' Eliza page 4432@uref{https://computing.dcu.ie/~humphrys/eliza.html} 4433 4434@item Eliza on Wikipedia 4435@uref{https://en.wikipedia.org/wiki/ELIZA} 4436 4437@item Java versions of Eliza with source code 4438@uref{https://github.com/codeanticode/eliza} 4439 4440@item Loebner Contest 4441@uref{https://www.ocf.berkeley.edu/~arihuang/academic/research/loebner.html} 4442 4443@item Tck/Tk Information 4444@uref{https://en.wikipedia.org/wiki/Tcl} 4445 4446@item Intel 80x86 Processors 4447@item Embedded PCs 4448@uref{https://en.wikipedia.org/wiki/Embedded_system} 4449 4450@item AMD Elan Processors 4451@uref{https://en.wikipedia.org/wiki/Am5x86} 4452 4453@item XINU 4454@uref{https://xinu.cs.purdue.edu} 4455 4456@item GNU/Linux 4457@uref{https://en.wikipedia.org/wiki/GNU/Linux_naming_controversy} 4458 4459@item MiniSQL 4460@uref{https://hughestech.com.au/products/msql} 4461 4462@item Market Share Surveys 4463@uref{http://www.netcraft.com/survey} 4464 4465@item @cite{Numerical Recipes in C: The Art of Scientific Computing} 4466@uref{http://numerical.recipes/} 4467 4468@item VRML 4469@uref{https://en.wikipedia.org/wiki/VRML} 4470 4471@item The UMBC Agent Web 4472@uref{https://agents.umbc.edu} 4473 4474@item Apache Web Server 4475@uref{http://www.apache.org} 4476 4477@item National Center for Biotechnology Information (NCBI) 4478@uref{http://www.ncbi.nlm.nih.gov} 4479 4480@item Basic Local Alignment Search Tool (BLAST) 4481@uref{https://www.nature.com/scitable/topicpage/basic-local-alignment-search-tool-blast-29096} 4482 4483@item BLAST Pages 4484@uref{http://www.ncbi.nlm.nih.gov/BLAST} 4485 4486@item BLAST Demonstration Client 4487@uref{http://www.genebee.msu.su/blast/blast_overview.html#Network} 4488 4489@item BLAST anonymous FTP location 4490@uref{https://www.ncbi.nlm.nih.gov/books/NBK62345/} 4491 4492@item BLAST 2.0 Executables 4493@uref{ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST} 4494 4495@item IUB/IUPAC Amino Acid and Nucleic Acid Codes 4496@uref{https://en.wikipedia.org/wiki/Nucleic_acid_notation} 4497 4498@item FASTA/Pearson Format 4499@uref{https://de.wikipedia.org/wiki/FASTA-Format} 4500 4501@item Fasta/Pearson Sequence in Java 4502@uref{http://www.kazusa.or.jp/java/codon_table_java/} 4503 4504@item Book Review of @cite{Introduction to Computational Biology} 4505@uref{https://dl.acm.org/doi/abs/10.1145/332925.332927} 4506 4507@item @cite{Developing Bioinformatics Computer Skills} 4508@uref{http://www.oreilly.com/catalog/bioskills/} 4509 4510@end table 4511 4512@c The GNU Free Documentation License. 4513@node GNU Free Documentation License, Index, Links, Top 4514@unnumbered GNU Free Documentation License 4515@cindex FDL (Free Documentation License) 4516@cindex Free Documentation License (FDL) 4517@cindex GNU Free Documentation License 4518@center Version 1.3, 3 November 2008 4519 4520@c This file is intended to be included within another document, 4521@c hence no sectioning command or @node. 4522 4523@display 4524Copyright @copyright{} 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc. 4525@uref{http://fsf.org/} 4526 4527Everyone is permitted to copy and distribute verbatim copies 4528of this license document, but changing it is not allowed. 4529@end display 4530 4531@enumerate 0 4532@item 4533PREAMBLE 4534 4535The purpose of this License is to make a manual, textbook, or other 4536functional and useful document @dfn{free} in the sense of freedom: to 4537assure everyone the effective freedom to copy and redistribute it, 4538with or without modifying it, either commercially or noncommercially. 4539Secondarily, this License preserves for the author and publisher a way 4540to get credit for their work, while not being considered responsible 4541for modifications made by others. 4542 4543This License is a kind of ``copyleft'', which means that derivative 4544works of the document must themselves be free in the same sense. It 4545complements the GNU General Public License, which is a copyleft 4546license designed for free software. 4547 4548We have designed this License in order to use it for manuals for free 4549software, because free software needs free documentation: a free 4550program should come with manuals providing the same freedoms that the 4551software does. But this License is not limited to software manuals; 4552it can be used for any textual work, regardless of subject matter or 4553whether it is published as a printed book. We recommend this License 4554principally for works whose purpose is instruction or reference. 4555 4556@item 4557APPLICABILITY AND DEFINITIONS 4558 4559This License applies to any manual or other work, in any medium, that 4560contains a notice placed by the copyright holder saying it can be 4561distributed under the terms of this License. Such a notice grants a 4562world-wide, royalty-free license, unlimited in duration, to use that 4563work under the conditions stated herein. The ``Document'', below, 4564refers to any such manual or work. Any member of the public is a 4565licensee, and is addressed as ``you''. You accept the license if you 4566copy, modify or distribute the work in a way requiring permission 4567under copyright law. 4568 4569A ``Modified Version'' of the Document means any work containing the 4570Document or a portion of it, either copied verbatim, or with 4571modifications and/or translated into another language. 4572 4573A ``Secondary Section'' is a named appendix or a front-matter section 4574of the Document that deals exclusively with the relationship of the 4575publishers or authors of the Document to the Document's overall 4576subject (or to related matters) and contains nothing that could fall 4577directly within that overall subject. (Thus, if the Document is in 4578part a textbook of mathematics, a Secondary Section may not explain 4579any mathematics.) The relationship could be a matter of historical 4580connection with the subject or with related matters, or of legal, 4581commercial, philosophical, ethical or political position regarding 4582them. 4583 4584The ``Invariant Sections'' are certain Secondary Sections whose titles 4585are designated, as being those of Invariant Sections, in the notice 4586that says that the Document is released under this License. If a 4587section does not fit the above definition of Secondary then it is not 4588allowed to be designated as Invariant. The Document may contain zero 4589Invariant Sections. If the Document does not identify any Invariant 4590Sections then there are none. 4591 4592The ``Cover Texts'' are certain short passages of text that are listed, 4593as Front-Cover Texts or Back-Cover Texts, in the notice that says that 4594the Document is released under this License. A Front-Cover Text may 4595be at most 5 words, and a Back-Cover Text may be at most 25 words. 4596 4597A ``Transparent'' copy of the Document means a machine-readable copy, 4598represented in a format whose specification is available to the 4599general public, that is suitable for revising the document 4600straightforwardly with generic text editors or (for images composed of 4601pixels) generic paint programs or (for drawings) some widely available 4602drawing editor, and that is suitable for input to text formatters or 4603for automatic translation to a variety of formats suitable for input 4604to text formatters. A copy made in an otherwise Transparent file 4605format whose markup, or absence of markup, has been arranged to thwart 4606or discourage subsequent modification by readers is not Transparent. 4607An image format is not Transparent if used for any substantial amount 4608of text. A copy that is not ``Transparent'' is called ``Opaque''. 4609 4610Examples of suitable formats for Transparent copies include plain 4611@sc{ascii} without markup, Texinfo input format, La@TeX{} input 4612format, @acronym{SGML} or @acronym{XML} using a publicly available 4613@acronym{DTD}, and standard-conforming simple @acronym{HTML}, 4614PostScript or @acronym{PDF} designed for human modification. Examples 4615of transparent image formats include @acronym{PNG}, @acronym{XCF} and 4616@acronym{JPG}. Opaque formats include proprietary formats that can be 4617read and edited only by proprietary word processors, @acronym{SGML} or 4618@acronym{XML} for which the @acronym{DTD} and/or processing tools are 4619not generally available, and the machine-generated @acronym{HTML}, 4620PostScript or @acronym{PDF} produced by some word processors for 4621output purposes only. 4622 4623The ``Title Page'' means, for a printed book, the title page itself, 4624plus such following pages as are needed to hold, legibly, the material 4625this License requires to appear in the title page. For works in 4626formats which do not have any title page as such, ``Title Page'' means 4627the text near the most prominent appearance of the work's title, 4628preceding the beginning of the body of the text. 4629 4630The ``publisher'' means any person or entity that distributes copies 4631of the Document to the public. 4632 4633A section ``Entitled XYZ'' means a named subunit of the Document whose 4634title either is precisely XYZ or contains XYZ in parentheses following 4635text that translates XYZ in another language. (Here XYZ stands for a 4636specific section name mentioned below, such as ``Acknowledgements'', 4637``Dedications'', ``Endorsements'', or ``History''.) To ``Preserve the Title'' 4638of such a section when you modify the Document means that it remains a 4639section ``Entitled XYZ'' according to this definition. 4640 4641The Document may include Warranty Disclaimers next to the notice which 4642states that this License applies to the Document. These Warranty 4643Disclaimers are considered to be included by reference in this 4644License, but only as regards disclaiming warranties: any other 4645implication that these Warranty Disclaimers may have is void and has 4646no effect on the meaning of this License. 4647 4648@item 4649VERBATIM COPYING 4650 4651You may copy and distribute the Document in any medium, either 4652commercially or noncommercially, provided that this License, the 4653copyright notices, and the license notice saying this License applies 4654to the Document are reproduced in all copies, and that you add no other 4655conditions whatsoever to those of this License. You may not use 4656technical measures to obstruct or control the reading or further 4657copying of the copies you make or distribute. However, you may accept 4658compensation in exchange for copies. If you distribute a large enough 4659number of copies you must also follow the conditions in section 3. 4660 4661You may also lend copies, under the same conditions stated above, and 4662you may publicly display copies. 4663 4664@item 4665COPYING IN QUANTITY 4666 4667If you publish printed copies (or copies in media that commonly have 4668printed covers) of the Document, numbering more than 100, and the 4669Document's license notice requires Cover Texts, you must enclose the 4670copies in covers that carry, clearly and legibly, all these Cover 4671Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on 4672the back cover. Both covers must also clearly and legibly identify 4673you as the publisher of these copies. The front cover must present 4674the full title with all words of the title equally prominent and 4675visible. You may add other material on the covers in addition. 4676Copying with changes limited to the covers, as long as they preserve 4677the title of the Document and satisfy these conditions, can be treated 4678as verbatim copying in other respects. 4679 4680If the required texts for either cover are too voluminous to fit 4681legibly, you should put the first ones listed (as many as fit 4682reasonably) on the actual cover, and continue the rest onto adjacent 4683pages. 4684 4685If you publish or distribute Opaque copies of the Document numbering 4686more than 100, you must either include a machine-readable Transparent 4687copy along with each Opaque copy, or state in or with each Opaque copy 4688a computer-network location from which the general network-using 4689public has access to download using public-standard network protocols 4690a complete Transparent copy of the Document, free of added material. 4691If you use the latter option, you must take reasonably prudent steps, 4692when you begin distribution of Opaque copies in quantity, to ensure 4693that this Transparent copy will remain thus accessible at the stated 4694location until at least one year after the last time you distribute an 4695Opaque copy (directly or through your agents or retailers) of that 4696edition to the public. 4697 4698It is requested, but not required, that you contact the authors of the 4699Document well before redistributing any large number of copies, to give 4700them a chance to provide you with an updated version of the Document. 4701 4702@item 4703MODIFICATIONS 4704 4705You may copy and distribute a Modified Version of the Document under 4706the conditions of sections 2 and 3 above, provided that you release 4707the Modified Version under precisely this License, with the Modified 4708Version filling the role of the Document, thus licensing distribution 4709and modification of the Modified Version to whoever possesses a copy 4710of it. In addition, you must do these things in the Modified Version: 4711 4712@enumerate A 4713@item 4714Use in the Title Page (and on the covers, if any) a title distinct 4715from that of the Document, and from those of previous versions 4716(which should, if there were any, be listed in the History section 4717of the Document). You may use the same title as a previous version 4718if the original publisher of that version gives permission. 4719 4720@item 4721List on the Title Page, as authors, one or more persons or entities 4722responsible for authorship of the modifications in the Modified 4723Version, together with at least five of the principal authors of the 4724Document (all of its principal authors, if it has fewer than five), 4725unless they release you from this requirement. 4726 4727@item 4728State on the Title page the name of the publisher of the 4729Modified Version, as the publisher. 4730 4731@item 4732Preserve all the copyright notices of the Document. 4733 4734@item 4735Add an appropriate copyright notice for your modifications 4736adjacent to the other copyright notices. 4737 4738@item 4739Include, immediately after the copyright notices, a license notice 4740giving the public permission to use the Modified Version under the 4741terms of this License, in the form shown in the Addendum below. 4742 4743@item 4744Preserve in that license notice the full lists of Invariant Sections 4745and required Cover Texts given in the Document's license notice. 4746 4747@item 4748Include an unaltered copy of this License. 4749 4750@item 4751Preserve the section Entitled ``History'', Preserve its Title, and add 4752to it an item stating at least the title, year, new authors, and 4753publisher of the Modified Version as given on the Title Page. If 4754there is no section Entitled ``History'' in the Document, create one 4755stating the title, year, authors, and publisher of the Document as 4756given on its Title Page, then add an item describing the Modified 4757Version as stated in the previous sentence. 4758 4759@item 4760Preserve the network location, if any, given in the Document for 4761public access to a Transparent copy of the Document, and likewise 4762the network locations given in the Document for previous versions 4763it was based on. These may be placed in the ``History'' section. 4764You may omit a network location for a work that was published at 4765least four years before the Document itself, or if the original 4766publisher of the version it refers to gives permission. 4767 4768@item 4769For any section Entitled ``Acknowledgements'' or ``Dedications'', Preserve 4770the Title of the section, and preserve in the section all the 4771substance and tone of each of the contributor acknowledgements and/or 4772dedications given therein. 4773 4774@item 4775Preserve all the Invariant Sections of the Document, 4776unaltered in their text and in their titles. Section numbers 4777or the equivalent are not considered part of the section titles. 4778 4779@item 4780Delete any section Entitled ``Endorsements''. Such a section 4781may not be included in the Modified Version. 4782 4783@item 4784Do not retitle any existing section to be Entitled ``Endorsements'' or 4785to conflict in title with any Invariant Section. 4786 4787@item 4788Preserve any Warranty Disclaimers. 4789@end enumerate 4790 4791If the Modified Version includes new front-matter sections or 4792appendices that qualify as Secondary Sections and contain no material 4793copied from the Document, you may at your option designate some or all 4794of these sections as invariant. To do this, add their titles to the 4795list of Invariant Sections in the Modified Version's license notice. 4796These titles must be distinct from any other section titles. 4797 4798You may add a section Entitled ``Endorsements'', provided it contains 4799nothing but endorsements of your Modified Version by various 4800parties---for example, statements of peer review or that the text has 4801been approved by an organization as the authoritative definition of a 4802standard. 4803 4804You may add a passage of up to five words as a Front-Cover Text, and a 4805passage of up to 25 words as a Back-Cover Text, to the end of the list 4806of Cover Texts in the Modified Version. Only one passage of 4807Front-Cover Text and one of Back-Cover Text may be added by (or 4808through arrangements made by) any one entity. If the Document already 4809includes a cover text for the same cover, previously added by you or 4810by arrangement made by the same entity you are acting on behalf of, 4811you may not add another; but you may replace the old one, on explicit 4812permission from the previous publisher that added the old one. 4813 4814The author(s) and publisher(s) of the Document do not by this License 4815give permission to use their names for publicity for or to assert or 4816imply endorsement of any Modified Version. 4817 4818@item 4819COMBINING DOCUMENTS 4820 4821You may combine the Document with other documents released under this 4822License, under the terms defined in section 4 above for modified 4823versions, provided that you include in the combination all of the 4824Invariant Sections of all of the original documents, unmodified, and 4825list them all as Invariant Sections of your combined work in its 4826license notice, and that you preserve all their Warranty Disclaimers. 4827 4828The combined work need only contain one copy of this License, and 4829multiple identical Invariant Sections may be replaced with a single 4830copy. If there are multiple Invariant Sections with the same name but 4831different contents, make the title of each such section unique by 4832adding at the end of it, in parentheses, the name of the original 4833author or publisher of that section if known, or else a unique number. 4834Make the same adjustment to the section titles in the list of 4835Invariant Sections in the license notice of the combined work. 4836 4837In the combination, you must combine any sections Entitled ``History'' 4838in the various original documents, forming one section Entitled 4839``History''; likewise combine any sections Entitled ``Acknowledgements'', 4840and any sections Entitled ``Dedications''. You must delete all 4841sections Entitled ``Endorsements.'' 4842 4843@item 4844COLLECTIONS OF DOCUMENTS 4845 4846You may make a collection consisting of the Document and other documents 4847released under this License, and replace the individual copies of this 4848License in the various documents with a single copy that is included in 4849the collection, provided that you follow the rules of this License for 4850verbatim copying of each of the documents in all other respects. 4851 4852You may extract a single document from such a collection, and distribute 4853it individually under this License, provided you insert a copy of this 4854License into the extracted document, and follow this License in all 4855other respects regarding verbatim copying of that document. 4856 4857@item 4858AGGREGATION WITH INDEPENDENT WORKS 4859 4860A compilation of the Document or its derivatives with other separate 4861and independent documents or works, in or on a volume of a storage or 4862distribution medium, is called an ``aggregate'' if the copyright 4863resulting from the compilation is not used to limit the legal rights 4864of the compilation's users beyond what the individual works permit. 4865When the Document is included in an aggregate, this License does not 4866apply to the other works in the aggregate which are not themselves 4867derivative works of the Document. 4868 4869If the Cover Text requirement of section 3 is applicable to these 4870copies of the Document, then if the Document is less than one half of 4871the entire aggregate, the Document's Cover Texts may be placed on 4872covers that bracket the Document within the aggregate, or the 4873electronic equivalent of covers if the Document is in electronic form. 4874Otherwise they must appear on printed covers that bracket the whole 4875aggregate. 4876 4877@item 4878TRANSLATION 4879 4880Translation is considered a kind of modification, so you may 4881distribute translations of the Document under the terms of section 4. 4882Replacing Invariant Sections with translations requires special 4883permission from their copyright holders, but you may include 4884translations of some or all Invariant Sections in addition to the 4885original versions of these Invariant Sections. You may include a 4886translation of this License, and all the license notices in the 4887Document, and any Warranty Disclaimers, provided that you also include 4888the original English version of this License and the original versions 4889of those notices and disclaimers. In case of a disagreement between 4890the translation and the original version of this License or a notice 4891or disclaimer, the original version will prevail. 4892 4893If a section in the Document is Entitled ``Acknowledgements'', 4894``Dedications'', or ``History'', the requirement (section 4) to Preserve 4895its Title (section 1) will typically require changing the actual 4896title. 4897 4898@item 4899TERMINATION 4900 4901You may not copy, modify, sublicense, or distribute the Document 4902except as expressly provided under this License. Any attempt 4903otherwise to copy, modify, sublicense, or distribute it is void, and 4904will automatically terminate your rights under this License. 4905 4906However, if you cease all violation of this License, then your license 4907from a particular copyright holder is reinstated (a) provisionally, 4908unless and until the copyright holder explicitly and finally 4909terminates your license, and (b) permanently, if the copyright holder 4910fails to notify you of the violation by some reasonable means prior to 491160 days after the cessation. 4912 4913Moreover, your license from a particular copyright holder is 4914reinstated permanently if the copyright holder notifies you of the 4915violation by some reasonable means, this is the first time you have 4916received notice of violation of this License (for any work) from that 4917copyright holder, and you cure the violation prior to 30 days after 4918your receipt of the notice. 4919 4920Termination of your rights under this section does not terminate the 4921licenses of parties who have received copies or rights from you under 4922this License. If your rights have been terminated and not permanently 4923reinstated, receipt of a copy of some or all of the same material does 4924not give you any rights to use it. 4925 4926@item 4927FUTURE REVISIONS OF THIS LICENSE 4928 4929The Free Software Foundation may publish new, revised versions 4930of the GNU Free Documentation License from time to time. Such new 4931versions will be similar in spirit to the present version, but may 4932differ in detail to address new problems or concerns. See 4933@uref{http://www.gnu.org/copyleft/}. 4934 4935Each version of the License is given a distinguishing version number. 4936If the Document specifies that a particular numbered version of this 4937License ``or any later version'' applies to it, you have the option of 4938following the terms and conditions either of that specified version or 4939of any later version that has been published (not as a draft) by the 4940Free Software Foundation. If the Document does not specify a version 4941number of this License, you may choose any version ever published (not 4942as a draft) by the Free Software Foundation. If the Document 4943specifies that a proxy can decide which future versions of this 4944License can be used, that proxy's public statement of acceptance of a 4945version permanently authorizes you to choose that version for the 4946Document. 4947 4948@item 4949RELICENSING 4950 4951``Massive Multiauthor Collaboration Site'' (or ``MMC Site'') means any 4952World Wide Web server that publishes copyrightable works and also 4953provides prominent facilities for anybody to edit those works. A 4954public wiki that anybody can edit is an example of such a server. A 4955``Massive Multiauthor Collaboration'' (or ``MMC'') contained in the 4956site means any set of copyrightable works thus published on the MMC 4957site. 4958 4959``CC-BY-SA'' means the Creative Commons Attribution-Share Alike 3.0 4960license published by Creative Commons Corporation, a not-for-profit 4961corporation with a principal place of business in San Francisco, 4962California, as well as future copyleft versions of that license 4963published by that same organization. 4964 4965``Incorporate'' means to publish or republish a Document, in whole or 4966in part, as part of another Document. 4967 4968An MMC is ``eligible for relicensing'' if it is licensed under this 4969License, and if all works that were first published under this License 4970somewhere other than this MMC, and subsequently incorporated in whole 4971or in part into the MMC, (1) had no cover texts or invariant sections, 4972and (2) were thus incorporated prior to November 1, 2008. 4973 4974The operator of an MMC Site may republish an MMC contained in the site 4975under CC-BY-SA on the same site at any time before August 1, 2009, 4976provided the MMC is eligible for relicensing. 4977 4978@end enumerate 4979 4980@c fakenode --- for prepinfo 4981@unnumberedsec ADDENDUM: How to use this License for your documents 4982 4983To use this License in a document you have written, include a copy of 4984the License in the document and put the following copyright and 4985license notices just after the title page: 4986 4987@smallexample 4988@group 4989 Copyright (C) @var{year} @var{your name}. 4990 Permission is granted to copy, distribute and/or modify this document 4991 under the terms of the GNU Free Documentation License, Version 1.3 4992 or any later version published by the Free Software Foundation; 4993 with no Invariant Sections, no Front-Cover Texts, and no Back-Cover 4994 Texts. A copy of the license is included in the section entitled ``GNU 4995 Free Documentation License''. 4996@end group 4997@end smallexample 4998 4999If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, 5000replace the ``with@dots{}Texts.'' line with this: 5001 5002@smallexample 5003@group 5004 with the Invariant Sections being @var{list their titles}, with 5005 the Front-Cover Texts being @var{list}, and with the Back-Cover Texts 5006 being @var{list}. 5007@end group 5008@end smallexample 5009 5010If you have Invariant Sections without Cover Texts, or some other 5011combination of the three, merge those two alternatives to suit the 5012situation. 5013 5014If your document contains nontrivial examples of program code, we 5015recommend releasing these examples in parallel under your choice of 5016free software license, such as the GNU General Public License, 5017to permit their use in free software. 5018 5019@c Local Variables: 5020@c ispell-local-pdict: "ispell-dict" 5021@c End: 5022 5023 5024@node Index, , GNU Free Documentation License, Top 5025@comment node-name, next, previous, up 5026 5027@unnumbered Index 5028@printindex cp 5029@bye 5030 5031Conventions: 50321. Functions, built-in or otherwise, do have () after them. 50332. Gawk built-in vars and functions are in @code. Also program vars and 5034 functions. 50353. HTTP method names are in @code. 50364. Protocols such as echo, ftp, etc are in @samp. 50375. URLs are in @url. 50386. All RFCs are in the index. Put a space between `RFC' and the number. 5039