1 Overview of innd Internals 2 3Introduction 4 5 innd is in many respects the heart of INN. It is the transit 6 component of the news server, the component that accepts new articles 7 from peers or from nnrpd on behalf of local readers, stores them, and 8 puts information about them in the right places so that other programs 9 such as innxmit or innfeed can send them back to other peers. 10 11 innd is structured around channels. With the exception of the active 12 file, the history database, the article and overview storage system, 13 and a few other things such as logs, everything coming into or going 14 out of innd is handled by a channel. Each channel can be waiting to 15 read, waiting to write, or sleeping. innd's main loop (in 16 CHANreadloop) calls select, passes control to each channel whose file 17 descriptor selected ready for reading or writing, and takes care of 18 other housekeeping (such as finding idle peers or waking up sleeping 19 channels at the right time). The core channel routines are in chan.c, 20 with major classes of channels handled by cc.c, lc.c, nc.c, rc.c, and 21 site.c. See below for more details on the types of channels. The 22 routines in proc.c are used to manage processes spawned for outgoing 23 channels. 24 25 The storage and overview subsystem are mostly self-contained at this 26 point and INN is simply a client of the storage and overview APIs. 27 The history database is approaching that state, but some aspects (such 28 as the pre-commit cache handled by the WIP* family of routines in 29 wip.c) are still handled internally by innd. 30 31 Updates and queries of the active file are handled internally by innd 32 in the ICD* and NG* family of routines in icd.c and ng.c. 33 34 innd is configured primarily by incoming.conf (which controls who can 35 send articles) and newsfeeds (which controls where the articles should 36 go after they're received and stored). The former is read in rc.c, 37 the file that also contains the RC* family of routines for dealing 38 with the remote connection channel (see below). The latter is read by 39 newsfeeds.c and is used to set up all of the outgoing channels when 40 innd is started or told to re-read the file. Incoming articles are 41 parsed and fed to the appropriate places by the routines in art.c. 42 43 Both Perl and Python embedded filters are supported. The glue 44 routines to load and run the Perl or Python scripts are in perl.c and 45 python.c respectively. 46 47 Finally, keywords.c contains the support for synthesizing keywords 48 based on article contents, status.c writes out innd status 49 periodically if configured, util.c contains various utility functions 50 used by other parts of innd, and innd.c contains the startup, 51 initialization, and shutdown code as well as the main routine. 52 53Core Channel Handling 54 55 CHANreadloop is the main processing loop of innd. As long as innd is 56 running, it will be inside that function. The core channel code 57 maintains a table of channels, which have a one-to-one correspondence 58 with open file descriptors, and three file descriptor sets. Each 59 channel is generally in one of the three sets (reading, writing, or 60 sleeping) at any given time. The states should generally be 61 considered mutually exclusive, since NNTP is not asychronous and a 62 channel that's reading and writing at the same time is liable to 63 deadlock, but the core code doesn't assume that. 64 65 A channel fundamentally consists of two functions, a reader function 66 called whenever data is available for it to read and a write-done 67 function called when data it wrote has been completely written out. 68 If it is put to sleep, it also needs a function that is called when it 69 is woken up again. Some channels may only read (such as the channels 70 that accept connections) and some channels may only write (such as 71 outgoing feeds), or channels may do both (like NNTP channels). 72 73 Reading is handled by the channel itself, since some channels don't 74 just read data from their file descriptor, but CHANreadtext is 75 provided for channels to call from their reader fuctions if they want 76 to read normally. CHANreadtext puts the data into the channel's input 77 buffer and handles resizing and compacting the buffer as needed. To 78 register as a reading channel, the channel calls RCHANadd, and then 79 its file descriptor will be added to the read set and its reader 80 function will be called whenever select indicates data is available. 81 82 Writing is handled by the channel core code; the channel just puts 83 data into its output buffer, usually using WCHANset or WCHANappend, 84 and then calls WCHANadd to tell the channel code that data is 85 available. The data is written out as select indicates the file 86 descriptor can take it, and when the write is complete, the channel's 87 write-done function is called. 88 89 Channels are put to sleep if there's some reason why they must not be 90 allowed to do anything for some time. Sleeping is generally used for 91 write channels that have encountered some (hopefully temporary) error 92 when writing, or which need to pause and spool output for a while 93 before writing it out. They're also used for NNTP channels when the 94 server is paused. A sleeping channel has an associated time to wake 95 up, an optional event that will wake it up earlier, and a function 96 that's called when it's woken up. Sleeping is not used for writing 97 channels that just don't have any data at the moment to write; those 98 channels are just in none of the three states (which is also allowed). 99 100 The core channel code also supports prioritized channels. Normally, 101 after each call to select returns, CHANreadloop walks through each 102 channel in turn, doing the appropriate work if the channel selected 103 for reading or writing or if it is time to wake it up. However, on 104 each pass, the prioritized channels are checked first to see if they 105 selected for read, and if so, those reader functions are called 106 immediately and the number of other events that will be handled that 107 time through is capped (in case more data is available from the 108 prioritized channels immediately). Only the control channel and the 109 remote connection channels are prioritized. 110 111Channel Types 112 113 The following channel types are implemented in innd: 114 115 Remote connections (CTremconn) 116 117 This is the channel that accepts new connections from remote 118 peers. If innd is running in the mode where it accepts and hands 119 off reader connections to nnrpd, the remconn channel also does 120 this. Its reader function doesn't actually read data, but rather 121 accepts the connection and creates a new NNTP channel. These 122 channels are always prioritized. The implementation is in rc.c. 123 124 NNTP (CTnntp) 125 126 Channels that speak NNTP to a peer (or to nnrpd or rnews feeding 127 articles to innd). These channels are responsible for most of the 128 data stored in the channel struct. They are probably the most 129 complex channels in innd and use all of the facilities of the 130 channel code. The implementation is in nc.c, including all the 131 code to handle NNTP commands. 132 133 Reject (CTreject) 134 135 A special type of channel that exists solely to reject an unwanted 136 connection. Peers who connect while the server is overloaded, who 137 try to open too many connections at once, or who have no access 138 (when innd is not handing connections to nnrpd) are handed off to 139 this type of channel. All they do is write the rejection message 140 and then close themselves. 141 142 Local connections (CTlocalconn) 143 144 innd maintains a separate local Unix domain socket for the use of 145 nnrpd and rnews when injecting articles. This channel type 146 handles incoming connections on that socket and spawns an NNTP 147 channel for them, similar to the remote connections channel. 148 These channels are not prioritized (but possibly should be). The 149 implementation is in lc.c. 150 151 Control (CTcontrol) 152 153 innd can be given a wide variety of commands by external 154 processes, either automated ones like control message handling or 155 nightly expiration and log rotation or manual actions by the news 156 administrator. The control channel handles incoming requests on 157 the Unix domain socket created for this purpose, runs the command, 158 and returns the results. This Unix domain socket is a datagram 159 socket rather than a stream socket, so each command and response 160 are single datagrams, making the reader function a bit different 161 than other channels. While the control channel writes its 162 response back, it doesn't use the write support in the core 163 channel code since it has to send a datagram; instead, it sends 164 the response immediately from the reader function. There is only 165 one control channel and it is always prioritized. The 166 implementation is in cc.c. 167 168 File (CTfile) 169 Exploder (CTexploder) 170 Process (CTprocess) 171 172 These channels are used to implement different types of outgoing 173 sites (outgoing channels configured in newsfeeds). They are 174 created as needed by the site code in site.c and get data mostly 175 due to the processing of articles by art.c. These channels are 176 mostly alike from the perspective of the channel code, but have 177 different types so that the site code can easily distinguish 178 between them. 179 180 In addition, the channel type CTany is used as a wildcard in some 181 channel operations and the type CTfree is used in the channel table 182 for free channels (corresponding to closed file descriptors). 183 184Article Handling 185Newsfeeds and Sites 186The Active File 187 188 To be written. 189