1 Overview of innd Internals
2
3Introduction
4
5 innd is in many respects the heart of INN. It is the transit
6 component of the news server, the component that accepts new articles
7 from peers or from nnrpd on behalf of local readers, stores them, and
8 puts information about them in the right places so that other programs
9 such as innxmit or innfeed can send them back to other peers.
10
11 innd is structured around channels. With the exception of the active
12 file, the history database, the article and overview storage system,
13 and a few other things such as logs, everything coming into or going
14 out of innd is handled by a channel. Each channel can be waiting to
15 read, waiting to write, or sleeping. innd's main loop (in
16 CHANreadloop) calls select, passes control to each channel whose file
17 descriptor selected ready for reading or writing, and takes care of
18 other housekeeping (such as finding idle peers or waking up sleeping
19 channels at the right time). The core channel routines are in chan.c,
20 with major classes of channels handled by cc.c, lc.c, nc.c, rc.c, and
21 site.c. See below for more details on the types of channels. The
22 routines in proc.c are used to manage processes spawned for outgoing
23 channels.
24
25 The storage and overview subsystem are mostly self-contained at this
26 point and INN is simply a client of the storage and overview APIs.
27 The history database is approaching that state, but some aspects (such
28 as the pre-commit cache handled by the WIP* family of routines in
29 wip.c) are still handled internally by innd.
30
31 Updates and queries of the active file are handled internally by innd
32 in the ICD* and NG* family of routines in icd.c and ng.c.
33
34 innd is configured primarily by incoming.conf (which controls who can
35 send articles) and newsfeeds (which controls where the articles should
36 go after they're received and stored). The former is read in rc.c,
37 the file that also contains the RC* family of routines for dealing
38 with the remote connection channel (see below). The latter is read by
39 newsfeeds.c and is used to set up all of the outgoing channels when
40 innd is started or told to re-read the file. Incoming articles are
41 parsed and fed to the appropriate places by the routines in art.c.
42
43 Both Perl and Python embedded filters are supported. The glue
44 routines to load and run the Perl or Python scripts are in perl.c and
45 python.c respectively.
46
47 Finally, keywords.c contains the support for synthesizing keywords
48 based on article contents, status.c writes out innd status
49 periodically if configured, util.c contains various utility functions
50 used by other parts of innd, and innd.c contains the startup,
51 initialization, and shutdown code as well as the main routine.
52
53Core Channel Handling
54
55 CHANreadloop is the main processing loop of innd. As long as innd is
56 running, it will be inside that function. The core channel code
57 maintains a table of channels, which have a one-to-one correspondence
58 with open file descriptors, and three file descriptor sets. Each
59 channel is generally in one of the three sets (reading, writing, or
60 sleeping) at any given time. The states should generally be
61 considered mutually exclusive, since NNTP is not asychronous and a
62 channel that's reading and writing at the same time is liable to
63 deadlock, but the core code doesn't assume that.
64
65 A channel fundamentally consists of two functions, a reader function
66 called whenever data is available for it to read and a write-done
67 function called when data it wrote has been completely written out.
68 If it is put to sleep, it also needs a function that is called when it
69 is woken up again. Some channels may only read (such as the channels
70 that accept connections) and some channels may only write (such as
71 outgoing feeds), or channels may do both (like NNTP channels).
72
73 Reading is handled by the channel itself, since some channels don't
74 just read data from their file descriptor, but CHANreadtext is
75 provided for channels to call from their reader fuctions if they want
76 to read normally. CHANreadtext puts the data into the channel's input
77 buffer and handles resizing and compacting the buffer as needed. To
78 register as a reading channel, the channel calls RCHANadd, and then
79 its file descriptor will be added to the read set and its reader
80 function will be called whenever select indicates data is available.
81
82 Writing is handled by the channel core code; the channel just puts
83 data into its output buffer, usually using WCHANset or WCHANappend,
84 and then calls WCHANadd to tell the channel code that data is
85 available. The data is written out as select indicates the file
86 descriptor can take it, and when the write is complete, the channel's
87 write-done function is called.
88
89 Channels are put to sleep if there's some reason why they must not be
90 allowed to do anything for some time. Sleeping is generally used for
91 write channels that have encountered some (hopefully temporary) error
92 when writing, or which need to pause and spool output for a while
93 before writing it out. They're also used for NNTP channels when the
94 server is paused. A sleeping channel has an associated time to wake
95 up, an optional event that will wake it up earlier, and a function
96 that's called when it's woken up. Sleeping is not used for writing
97 channels that just don't have any data at the moment to write; those
98 channels are just in none of the three states (which is also allowed).
99
100 The core channel code also supports prioritized channels. Normally,
101 after each call to select returns, CHANreadloop walks through each
102 channel in turn, doing the appropriate work if the channel selected
103 for reading or writing or if it is time to wake it up. However, on
104 each pass, the prioritized channels are checked first to see if they
105 selected for read, and if so, those reader functions are called
106 immediately and the number of other events that will be handled that
107 time through is capped (in case more data is available from the
108 prioritized channels immediately). Only the control channel and the
109 remote connection channels are prioritized.
110
111Channel Types
112
113 The following channel types are implemented in innd:
114
115 Remote connections (CTremconn)
116
117 This is the channel that accepts new connections from remote
118 peers. If innd is running in the mode where it accepts and hands
119 off reader connections to nnrpd, the remconn channel also does
120 this. Its reader function doesn't actually read data, but rather
121 accepts the connection and creates a new NNTP channel. These
122 channels are always prioritized. The implementation is in rc.c.
123
124 NNTP (CTnntp)
125
126 Channels that speak NNTP to a peer (or to nnrpd or rnews feeding
127 articles to innd). These channels are responsible for most of the
128 data stored in the channel struct. They are probably the most
129 complex channels in innd and use all of the facilities of the
130 channel code. The implementation is in nc.c, including all the
131 code to handle NNTP commands.
132
133 Reject (CTreject)
134
135 A special type of channel that exists solely to reject an unwanted
136 connection. Peers who connect while the server is overloaded, who
137 try to open too many connections at once, or who have no access
138 (when innd is not handing connections to nnrpd) are handed off to
139 this type of channel. All they do is write the rejection message
140 and then close themselves.
141
142 Local connections (CTlocalconn)
143
144 innd maintains a separate local Unix domain socket for the use of
145 nnrpd and rnews when injecting articles. This channel type
146 handles incoming connections on that socket and spawns an NNTP
147 channel for them, similar to the remote connections channel.
148 These channels are not prioritized (but possibly should be). The
149 implementation is in lc.c.
150
151 Control (CTcontrol)
152
153 innd can be given a wide variety of commands by external
154 processes, either automated ones like control message handling or
155 nightly expiration and log rotation or manual actions by the news
156 administrator. The control channel handles incoming requests on
157 the Unix domain socket created for this purpose, runs the command,
158 and returns the results. This Unix domain socket is a datagram
159 socket rather than a stream socket, so each command and response
160 are single datagrams, making the reader function a bit different
161 than other channels. While the control channel writes its
162 response back, it doesn't use the write support in the core
163 channel code since it has to send a datagram; instead, it sends
164 the response immediately from the reader function. There is only
165 one control channel and it is always prioritized. The
166 implementation is in cc.c.
167
168 File (CTfile)
169 Exploder (CTexploder)
170 Process (CTprocess)
171
172 These channels are used to implement different types of outgoing
173 sites (outgoing channels configured in newsfeeds). They are
174 created as needed by the site code in site.c and get data mostly
175 due to the processing of articles by art.c. These channels are
176 mostly alike from the perspective of the channel code, but have
177 different types so that the site code can easily distinguish
178 between them.
179
180 In addition, the channel type CTany is used as a wildcard in some
181 channel operations and the type CTfree is used in the channel table
182 for free channels (corresponding to closed file descriptors).
183
184Article Handling
185Newsfeeds and Sites
186The Active File
187
188 To be written.
189