|
Name |
|
Date |
Size |
#Lines |
LOC |
| .. | | 03-May-2022 | - |
| access/ | H | 06-Jan-2017 | - | 31,774 | 23,339 |
| algo/blast/ | H | 06-Jan-2017 | - | 107,474 | 68,318 |
| api/ | H | 06-Jan-2017 | - | 499,084 | 403,120 |
| asn/ | H | 03-May-2022 | - | 12,790 | 10,869 |
| asnlib/ | H | 06-Jan-2017 | - | 20,105 | 14,371 |
| asnstat/ | H | 06-Jan-2017 | - | 22,422 | 19,816 |
| bin/ | H | 06-Jan-2017 | - | | |
| biostruc/ | H | 06-Jan-2017 | - | 111,588 | 76,766 |
| build/ | H | 06-Jan-2017 | - | | |
| cdromlib/ | H | 06-Jan-2017 | - | 20,376 | 14,133 |
| cn3d/ | H | 06-Jan-2017 | - | 21,420 | 13,278 |
| config/ | H | 06-Jan-2017 | - | 4,715 | 4,295 |
| connect/ | H | 06-Jan-2017 | - | 59,795 | 39,673 |
| corelib/ | H | 06-Jan-2017 | - | 57,880 | 34,429 |
| ctools/ | H | 06-Jan-2017 | - | 346 | 161 |
| data/ | H | 03-May-2022 | - | 171,208 | 166,492 |
| ddv/ | H | 06-Jan-2017 | - | 17,010 | 9,853 |
| demo/ | H | 06-Jan-2017 | - | 121,733 | 90,987 |
| desktop/ | H | 06-Jan-2017 | - | 230,506 | 184,668 |
| doc/ | H | 03-May-2022 | - | 8,652 | 8,003 |
| errmsg/ | H | 06-Jan-2017 | - | 1,995 | 1,457 |
| gif/ | H | 06-Jan-2017 | - | 8,340 | 7,595 |
| include/ | H | 06-Jan-2017 | - | | |
| lib/ | H | 06-Jan-2017 | - | | |
| link/ | H | 06-Jan-2017 | - | 5,268 | 4,353 |
| make/ | H | 03-May-2022 | - | 163,325 | 161,052 |
| network/ | H | 06-Jan-2017 | - | 100,777 | 73,220 |
| object/ | H | 06-Jan-2017 | - | 96,587 | 69,087 |
| platform/ | H | 03-May-2022 | - | 1,556 | 876 |
| regexp/ | H | 06-Jan-2017 | - | 26,458 | 20,087 |
| sequin/ | H | 06-Jan-2017 | - | 230,915 | 193,548 |
| tools/ | H | 06-Jan-2017 | - | 181,184 | 123,598 |
| util/ | H | 06-Jan-2017 | - | 7,663 | 5,540 |
| vibrant/ | H | 06-Jan-2017 | - | 93,667 | 72,786 |
| webdesign/designs/ | H | 06-Jan-2017 | - | 16,319 | 16,052 |
| README | H A D | 16-Jun-2008 | 49.7 KiB | 1,127 | 925 |
| README.htm | H A D | 16-Jun-2008 | 58.2 KiB | 1,246 | 1,161 |
| VERSION | H A D | 06-Jan-2017 | 28 | 2 | 1 |
| build.me | H A D | 16-May-2014 | 1.1 KiB | 44 | 25 |
| build.me64 | H A D | 25-Jan-2008 | 1.9 KiB | 61 | 40 |
| checkout.date | H A D | 06-Jan-2017 | 28 | 2 | 1 |
README
1 NCBI SOFTWARE DEVELOPMENT TOOLKIT
2 National Center for Biotechnology Information
3 Bldg 38A, NIH
4 8600 Rockville Pike
5 Bethesda, MD 20894
6
7The NCBI Software Development Toolkit was developed for the production and
8distribution of GenBank, Entrez, BLAST, and related services by NCBI. We make
9it freely available to the public without restriction to facilitate the
10use of NCBI by the scientific community. However, please understand that
11while we feel we have done a high quality job, this is not commercial software.
12The documentation lags considerably behind the software and we must make any
13changes required by our data production needs. Nontheless, many people have
14found it a useful and stable basis for a number of tools and applications.
15
16The toolkit is available by anonymous ftp from ftp.ncbi.nih.gov
17
18cd toolbox
19cd ncbi_tools
20bin
21get ncbi.tar.Z (compressed UNIX tar file)
22quit
23
24In this same directory are also ncbiz.exe (DOS self extracting archive) and
25ncbi.hqx (Mac self extracting archive). All three files contain the same
26source code and will make the toolkit for all platforms.
27
28
29Please feel free to email questions/suggestions to:
30 toolbox@ncbi.nlm.nih.gov
31
32If you would like hardcopy of the current documentation, send your mailing
33address with your request to the email address above.
34
35If you are considering a serious development project using this toolkit, please
36contact us. We are happy to discuss compatible strategies and inform you of
37our longer term plans. There is no limitation of the use of this code or in
38contacting us about its use for commercial, academic, or government groups.
39
40===========================================================================
41
42 Version 6.1
43 the date of release may be obtained from the file ncbi/VERSION
44
45===========================================================================
46
47 Summary
48
49The procedure of building the toolkit on Unix was slightly changed.
50Now there is no need to download any binary NCBI product for your
51platform to obtain the platform-specific ncbi.mk file.
52
53To build the NCBI toolkit you need to look for platform-dependent instructions:
54For UNIX (including Linux and Mac OS X):
55 look at the file make/readme.unx
56For alternative Mac instructions (using CodeWarrior):
57 look at the file make/readme.mac
58For Microsoft Windows95/98/NT:
59 look at the file make/readme.dos
60There is some information which may be useful for NCBI tookit building
61in the file doc/FAQ.txt
62
63This release includes source code for the new (2.0.9) version of BLAST.
64Look at the file doc/README.bls for more detailed documentation on
65stand-alone BLAST.
66
67The file doc/README.pbl has the information about PowerBLAST.
68
69And the description on Integrating Matrix Profiles And Local Alignments
70(IMPALA) is located in the file doc/README.imp
71
72The file doc/sequin.htm describes the SEQUIN and its configuration.
73
74If you have problems configuring Entrez with a firewall, look at the
75file doc/firewall.txt
76
77This file has a section called CONFIGURATION OR SETTINGS FILES,
78which explains in detail how our configuration system works. The ncbi
79config file (.ncbirc on UNIX, ncbi.ini on PC/Windows, and ncbi.cnf on
80Macintosh) is needed in order to find data files, such as
81gc.val (the genetic code table), provided in the toolkit or with programs
82like Sequin. (The asnload files containing dynamic versions of the ASN.1
83parse tables are no longer needed, since all platforms can now have large
84static data.)
85
86It has recently become possible to eliminate the need for the ncbi config
87file by calling UseLocalAsnloadDataAndErrMsg () at the beginning of your
88program. This looks for the data directory in the same directory as the
89running program. If it doesn't find it, it looks up one level, in case you
90are compiling programs in the build directory of the toolkit. If it finds
91the data directory in either of these places, it transiently sets the
92location, so code that loads these files is given the correct path.
93
94An even more recent change is that copies of several of our data files (gc,
95seqcode, and featdef) are now built into the source code, so if the data
96directory is not found, programs that require only these can still run.
97
98One final improvement is that access to our network services is now much
99simpler than before, so if you are not behind a firewall and have domain
100name server (DNS) available you can connect to our network without needing
101any configuration information in the ncbi config file. Operation behind a
102firewall, or with a proxy, requires very little in the ncbi config file, and
103this is easily created by asking Sequin to configure for network access.
104
105=============================================================================
106 Notes from Previous Releases
107=============================================================================
108
109=============================================================================
110 Version 6.0
111 the date of release may be obtained from the file ncbi/VERSION
112=============================================================================
113
114This release includes source code for the new (2.0) version of BLAST.
115Also included are a small number of incremental changes in the ASN.1
116specification.
117
118BLAST 2.0 - BLAST 2.0 can produce gapped alignments and is capable of
119position-specific-iterated BLASTp (PSI-BLAST). Compared to the 1.4 release of
120BLAST, there are also signficant performance enhancements as well as extensive
121changes to the text report and the format of the databases. BLAST 2.0
122uses threads for multi-processing, using the NCBI threads library.
123Three BLAST programs may be compiled in the demo directory. They are:
124
125formatdb: formats FASTA files as BLAST databases for BLAST 2.0.
126
127blastall: perform all five flavors of blast comparison.
128blastn and blastp offer fully gapped alignments.
129blastx and tblastn have 'in-frame' gapped alignments and use sum
130 statistics to link alignments from different frames.
131tblastx provides only ungapped alignments.
132
133blastpgp: performs gapped blastp searches and can be used to perform
134iterative searches in psi-blast mode.
135
136Additional information may be obtained from the README in the BLAST
137directory of the FTP site and from the NCBI BLAST pages.
138
139ASN.1 Spec Changes for 1997
140
141biblio.asn
142 Cit-pat - some fields made optional to allow patent applications to be legal
143 Cit-pat.number OPTIONAL
144 Cit-pat.date-issue OPTIONAL
145 -- Patent number and date-issue were made optional in 1997 to
146 -- support patent applications being issued from the USPTO
147 -- Semantically a Cit-pat must have either a patent number or
148 -- an application number (or both) to be valid
149
150medline.asn
151 added ML-field to support other MEDLINE line types
152
153Medline-entry ::= SEQUENCE {
154 uid INTEGER OPTIONAL , -- MEDLINE UID, sometimes not yet available if from PubMed
155 em Date , -- Entry Month
156 ... (not shown)
157 pmid PubMedId OPTIONAL , -- MEDLINE records may include the PubMedId
158 pub-type SET OF VisibleString OPTIONAL, -- may show publication types (review, etc)
159 mlfield SET OF Medline-field OPTIONAL } -- additional Medline field types
160
161Medline-field ::= SEQUENCE {
162 type INTEGER { -- Keyed type
163 other (0) , -- look in line code
164 comment (1) , -- comment line
165 erratum (2) } , -- retracted, corrected, etc
166 str VisibleString , -- the text
167 ids SEQUENCE OF DocRef OPTIONAL } -- pointers relevant to this text
168
169DocRef ::= SEQUENCE { -- reference to a document
170 type INTEGER {
171 medline (1) ,
172 pubmed (2) ,
173 ncbigi (3) } ,
174 uid INTEGER }
175
176
177seq.asn
178 MolInfo.tech - added names for HTG classes already implemented
179 Annotdesc.region - added seqloc. If present, all annots in this SeqAnnot
180 are within this region. Optimization on big seqs.
181
182seqfeat.asn
183 added OrgMod.specimen-voucher - new organism qualifier
184 added OrgMod.old-name - used internally at NCBI
185 added BioSource.is-focus - for distinguishing biological focus of
186 multiple source features.
187 added Seq-feat.pseudo so any feature can be flagged explicitly as
188 belonging to a pseudogene
189 added Seq-feat.except-text for an explanation of the exception when
190 Seq-feat.except is TRUE. Currently this text is in Seq-feat.comment
191 in backbone records and GBQuals in some other genbank records.
192
193
194
195=============================================================================
196 Notes from Previous Releases
197=============================================================================
198
199 Version 5.0
200
201 Summary
202
203This release includes a small number of incremental changes in the ASN.1
204specification. Most significant is the addition of the PubMedID, a
205bibliographic citation identifier similar to a MEDLINE UID. PubMed is a new
206citation database being developed at NCBI which is a superset of MEDLINE. It
207will be an avenue by which publishers can deposit electronic versions of their
208citations and abstracts to allow them timely linking to network entrez from
209the publishers on-line services. PubMed will route these citations to MEDLINE
210and they will appear in MEDLINE (and Entrez) after the usual MEDLINE indexing.
211However, for some period of time, such articles will have only a PubMedID.
212We would like to switch Entrez over to supporting PubMedIDs as early as
213possible. WE STRONGLY ENCOURAGE DEVELOPERS TO RECOMPILE AND RELINK WITH THIS
214VERSION OF THE TOOLKIT AS SOON AS POSSIBLE. The changes in this specification
215should not cause problems with existing software, so a simple compile and
216link should be enough to make you compatible. Details of ASN.1 specification
217changes are listed below.
218
219There has been considerable development of the toolkit in other aspects as
220well, many of which are embodied in sequin, the new NCBI direct submission
221tool, which is included in the toolkit as well. In the interest of getting the
222PubMed changes into the specification and developers hands promptly, we have
223not included much on that aspect of this toolkit at this time.
224
225
226 Changes in the 1996 NCBI ASN.1 (version 5.0) specification
227
228Once again, there are very few changes to the NCBI ASN.1 specification this
229year. The biggest change is the addition of the PubMed ID to support the new
230NCBI PubMed database. There are also small additions to the medline and
231organism specifications, detailed below. As usual, these changes are also
232backward compatible with old data. However, you should recompile and relink
233your applications as soon as possible, since the old applications will not be
234compatible with the new datatypes.
235
2361) PubMed - NCBI is building a new citation database that is a superset of
237MEDLINE and which will be linked to online journals from publishers. The
238bibliographic components of the specification have had support for PubMed IDs
239added. These include biblio.asn (objbibli.[ch]), pub.asn (objpub.[ch]),
240medline.asn (objmedli.[ch]).
241
2422) pub-type - MEDLINE includes strings indicating the type of a publication.
243The medline definition has had the attribute pub-type added to support these
244strings.
245
246From the 1996 MeSH, here's the list.
247
248Abstract
249Bibliography
250Classical Article
251Clinical Conference
252Clinical Trial
253Clinical Trial, Phase I
254Clinical Trial, Phase II
255Clinical Trial, Phase III
256Clinical Trial, Phase IV
257Comment
258Consensus Development Conference
259Consensus Development Conference, NIH
260Controlled Clinical Trial
261Corrected and Republished Article
262Current Biog-Obit
263Dictionary
264Directory
265Duplicate Publication
266Editorial
267Festschrift
268Guideline
269Historical Article
270Historical Biography
271Interview
272Journal Article
273Legal Brief
274Letter
275Meeting Report
276Meta-Analysis
277Monograph
278Multicenter Study
279News
280Newspaper Article
281Overall
282Periodical Index
283Practice Guideline
284Published Erratum
285Randomized Controlled Trial
286Retracted Publication
287Retraction of Publication
288Review
289Review Literature
290Review of Reported Cases
291Review, Academic
292Review, Multicase
293Review, Tutorial
294Scientific Integrity Review
295Technical Report
296Twin Study
297
2983) virion - the attribute virion has been added to BioSource.genome. It just
299complements proviral which was already there. This will map to a /virion
300qualifier in the new GenBank feature table definition.
301
3024) division - OrgName.div now (optionally) can contain the GenBank division code
303(eg. PRI).
304
3055) signal-peptide, transit-peptide - were added to Prot-ref, to support
306annotation of protein features on the protein sequence in a way that could be
307mapped to a GenBank feature table.
308
309That's all. Relevant sections of the asn.1 specification are shown below.
310
311================================================================================
312
313biblio.asn
314
315
316PubMedId ::= INTEGER -- Id from the PubMed database at NCBI
317
318and..
319
320
321Cit-gen ::= SEQUENCE { -- NOT from ANSI, this is a catchall
322 cit VisibleString OPTIONAL , -- anything, not parsable
323 authors Auth-list OPTIONAL ,
324 muid INTEGER OPTIONAL , -- medline uid
325 journal Title OPTIONAL ,
326 volume VisibleString OPTIONAL ,
327 issue VisibleString OPTIONAL ,
328 pages VisibleString OPTIONAL ,
329 date Date OPTIONAL ,
330 serial-number INTEGER OPTIONAL , -- for GenBank style references
331 title VisibleString OPTIONAL , -- eg. cit="unpublished",title="title"
332 pmid PubMedId OPTIONAL } -- PubMed Id
333
334pub.asn
335
336
337Pub ::= CHOICE {
338 gen Cit-gen , -- general or generic unparsed
339 sub Cit-sub , -- submission
340 medline Medline-entry ,
341 muid INTEGER , -- medline uid
342 article Cit-art ,
343 journal Cit-jour ,
344 book Cit-book ,
345 proc Cit-proc , -- proceedings of a meeting
346 patent Cit-pat ,
347 pat-id Id-pat , -- identify a patent
348 man Cit-let , -- manuscript, thesis, or letter
349 equiv Pub-equiv, -- to cite a variety of ways
350 pmid PubMedId } -- PubMedId
351
352medline.asn
353
354 -- a MEDLINE or PubMed entry
355Medline-entry ::= SEQUENCE {
356 uid INTEGER OPTIONAL , -- MEDLINE UID, sometimes not yet available if
357from PubMed
358 em Date , -- Entry Month
359 cit Cit-art , -- article citation
360 abstract VisibleString OPTIONAL ,
361 mesh SET OF Medline-mesh OPTIONAL ,
362 substance SET OF Medline-rn OPTIONAL ,
363 xref SET OF Medline-si OPTIONAL ,
364 idnum SET OF VisibleString OPTIONAL , -- ID Number (grants, contracts)
365 gene SET OF VisibleString OPTIONAL ,
366 pmid PubMedId OPTIONAL , -- MEDLINE records may include
367the PubMedId
368 pub-type SET OF VisibleString OPTIONAL } -- may show publication types
369(review, etc)
370
371seqfeat.asn
372
373
374OrgName ::= SEQUENCE {
375 name CHOICE {
376 binomial BinomialOrgName , -- genus/species type name
377 virus VisibleString , -- virus names are different
378 hybrid MultiOrgName , -- hybrid between organisms
379 namedhybrid BinomialOrgName , -- some hybrids have genus x species
380name
381 partial PartialOrgName } OPTIONAL , -- when genus not known
382 attrib VisibleString OPTIONAL , -- attribution of name
383 mod SEQUENCE OF OrgMod OPTIONAL ,
384 lineage VisibleString OPTIONAL , -- lineage with semicolon separators
385 gcode INTEGER OPTIONAL , -- genetic code (see CdRegion)
386 mgcode INTEGER OPTIONAL , -- mitochondrial genetic code
387 div VisibleString OPTIONAL } -- GenBank division code
388
389BioSource ::= SEQUENCE {
390 genome INTEGER { -- biological context
391 unknown (0) ,
392 genomic (1) ,
393 chloroplast (2) ,
394 chromoplast (3) ,
395 kinetoplast (4) ,
396 mitochondrion (5) ,
397 plastid (6) ,
398 macronuclear (7) ,
399 extrachrom (8) ,
400 plasmid (9) ,
401 transposon (10) ,
402 insertion-seq (11) ,
403 cyanelle (12) ,
404 proviral (13) ,
405 virion (14) } DEFAULT unknown ,
406 origin INTEGER {
407 unknown (0) ,
408 natural (1) , -- normal biological entity
409 natmut (2) , -- naturally occurring mutant
410 mut (3) , -- artificially mutagenized
411 artificial (4) , -- artificially engineered
412 synthetic (5) , -- purely synthetic
413 other (255) } DEFAULT unknown ,
414 org Org-ref ,
415 subtype SEQUENCE OF SubSource OPTIONAL }
416
417Prot-ref ::= SEQUENCE {
418 name SET OF VisibleString OPTIONAL , -- protein name
419 desc VisibleString OPTIONAL , -- description (instead of name)
420 ec SET OF VisibleString OPTIONAL , -- E.C. number(s)
421 activity SET OF VisibleString OPTIONAL , -- activities
422 db SET OF Dbtag OPTIONAL , -- ids in other dbases
423 processed ENUMERATED { -- processing status
424 not-set (0) ,
425 preprotein (1) ,
426 mature (2) ,
427 signal-peptide (3) ,
428 transit-peptide (4) } DEFAULT not-set }
429
430
431=============================================================================
432 Notes from Previous Releases
433=============================================================================
434
435 New Functions in Version 4.0
436
437There are a host of new functions in this release, but as usual we have not
438managed to make time to document them all. Large parts of Sequin are present
439which will be announced and described more fully in the fall. However,
440specific tools of immediate interest are:
441
442blast2 - this is the long awaited BLAST client/server which permits structured
443 interaction with BLAST over the internet. We have provided a basic client
444 that produces the traditional blast output. In addition, the function call
445 interface can be used in more elaborate clients. For more information
446 contact Tom Madden, madden@ncbi.nlm.nih.gov
447
448 WARNING!!! blast2 is the client we plan to support on the longer term.
449 The blast1 client we included for those of you who wanted a head start
450 will NOT be supported in future. Please shift any blast1 clients to the
451 (very similar) blast2 interface as soon as possible.
452
453sim, sim2 - protein and DNA sequence alignments in linear space. This is
454 the function call interface to these valuable tools. Applications have
455 been written which are available by ftp as are published papers. For more
456 information contact Jinghui Zhang, zjing@ncbi.nlm.nih.gov
457
458
459
460
461 Changes in ASN.1 spec 4.0 from 3.0
462
463
464Affil - biblio.asn
465 added the field "postal-code" for Zip code finally.
466
467Contact-info - submit.asn
468 added the field "contact" which is type "Author". The contact info has
469 evolved into a fully structured form, so I just took Author which has
470 structured names and structured address (Affil). We will eventually
471 phase out all the less structured ones in Contact-info.
472
473OrgName - sefeat.asn
474 added "lineage", "gcode", "mgcode" for the lineage, genetic code, and
475 mitochondrial genetic code. This is part of Org-ref, and consolidates
476 all the organism info (except original SOURCE line) out of the
477 GenBank block... and enables us to deliver it nicely from Taxon.
478
479Seq-descr - seq.asn
480 removed the Seq-descr "neighbors" and replaced it with "dbxref", since
481 neighbors has never been used. This is used to add cross-references to
482 the whole entry.
483
484Pubdesc - seq.asn
485 has an added slot, "reftype" which is an integer and is used to
486 indicate the GenBank usage of a reference.
487
488 0 - seq - applies to the sequence. This is default and they way it is
489 used now.
490 1 - sites - applies to (unspecified) features. Equivalent to a GenBank
491 SITES feature. We could switch to this from using the
492 Imp-feat we do now.
493 2 - feats - applies to specific features. The idea here is provide a
494 place for the full citation, so features nead only reference
495 it. If now features reference it should be removed. This
496 would work for checking content when only a part of a sequence
497 is copied or pasted. A "sites" ref could not have this check
498 since we do not know which features it goes to.
499
500Seq-feat - seqfeat.asn
501 added a slot called "dbxref" to Seq-feat. This is a SET OF Dbtag. It will
502 be for adding the new db_xref qualifiers to features. We already have some
503 of these in the xref slots of Gene-ref, Prot-ref, Org-ref. It means we ahve
504 to check two places in these cases. I do not want to retire the slots
505 since these were meant to be used in other contexts besides features.. and
506 Org-ref already is.
507
508
509 added a slot called "anticodon" to the tRNA extension of the RNA feature.
510 This is a Seq-loc that points to the location of the anticodon in a tRNA.
511 We have been populating this data in a User-object, and will have to do
512 a retro to convert it.
513
514 EXPORTED Genetic-code
515
516
517Seq-align - seqalign.asn
518
519 added "bounds" to Seq-align so you can record the regions over which
520 an alignment was computed.. not always included in the resulting alignment
521 itself.
522
523 added two new types:
524 A) Packed-seg -- a denser representation from Colombe and Jinghui
525 B) disc - discontinuous alignments as a SEQUENCE OF Seq-align
526
527
528Seq-annot - seq.asn
529
530 added a field to Seq-annot, Align-def, to discriminate types of
531 alignment sets. This has the advantage of minimal changes as well as
532 separating sets of alignments from conceptually single alignments. I am
533 not sure it is necessary to distinguish "alt" from "blocks" though. Also
534 it means you can attach more info, with other Seq-annot fields and/or by
535 expanding the Align-def. I put in "ids" in Align-def specifically to put
536 the one Seq-id that is the "master" for type "ref". I made it a SET OF
537 so we could use it for other collections where we might want to list
538 more than one.
539
540 added "ids" and "locs" as allowed types within Seq-annot. This would
541 enable us to pass lists like this around between tools with all the
542 addtional descriptive information in Annotdesc. I know this will be
543 useful.
544
545 added "general" to Annot-id for tracking 3rd party annotations.
546
547
548
549
550
551
552
553 Introduction
554
555 This distribution is release 5.0 of the NCBI core library for building
556portable software, and AsnLib, a collection of routines for handling ASN.1
557data and developing ASN.1 software applications. AsnLib and the asntool
558application are built using the CoreLib routines. In the \doc directory is an
559MS Word file which details the information given below. It is also available
560as hardcopy. See the README in \doc.
561
562The lowest layer of code is the CoreLib. These are multi-
563platform functions for memory allocation (including byte stores), string
564manipulation, file input and output, error and general messages, and
565time and date notification. These functions have been written only
566where we found that the existing ANSI functions were not sufficiently
567multi-platform or well- behaved among all of the platforms that we
568support. For each platform (a combination of processor, operating
569system, compiler, and windowing system), we supply a specific ncbilcl.h
570file, which contains typedefs and defines for multi-platform symbols,
571and includes a number of standard header files. (For example,
572ncbilcl.msw is used for the Microsoft C compiler under Microsoft Windows
573on the PC.) Use of these symbols, and of the functions in the CoreLib,
574allow us to write multi-platform source code for a variety of disparate
575platforms.
576
577The next layer of code is the AsnLib stream reader. This is
578used in conjunction with a header file and a parse table loader file,
579both of which are produced by processing the formal ASN.1 specification
580with the AsnTool application. The symbolic defines in the
581header file are pointers into the parse table, in which the ASN.1
582specification is represented. To read at the stream reader level, a
583program alternates between calls to AsnReadId and AsnReadVal. AsnReadId
584returns a pointer into the parse table, which can be compared against
585the defines in the AsnTool-generated header. For example, in the
586specification for MEDLINE records, the Medline-entry section has an item
587called "uid", for the unique ID of the record. This is symbolized in
588the header file as MEDLINE_ENTRY_uid. When AsnReadId returns this
589symbol, the program calls AsnReadVal to obtain the uid for that record.
590AsnKillValue is also needed to free any memory allocated by AsnReadVal,
591which occurs when the value is a string and not an integer. The entire
592set of records on the Entrez CD-ROM can be read as a single stream with
593the AsnLib functions.
594
595The ASN.1 records may be accessed at a higher level through the object
596loaders, which utilize the stream processing functions to
597load C memory structures with the contents of the ASN.1 objects. For
598each ASN.1 object we specify, we also define an equivalent C memory
599structure. The object loader level of code contains functions to read
600and write each ASN.1 object. These are hierarchical, as are the ASN.1
601specifications. Calling the top level loader, SeqEntryAsnRead, will
602load an entire SeqEntry from an open AsnIo channel, and will return a
603pointer to the loaded memory structure. The read function for an AsnIo
604channel can be swapped to refer to a normal disk file, a network socket,
605or to compressed data, which it automatically decompresses. The object
606loader code can interconvert between the highly-branched memory object
607and a linear ASN.1 message with complete fidelity. The object loaders
608have additional functions, including the ability to explore the
609structure and notify the program when particular data elements are
610encountered. The entire contents of the Entrez CD-ROM can also be
611streamed through the object loaders. However, most calls to the object
612loaders for simply reading a particular record are done via the data
613access functions (see below).
614
615The data access functions allow a program to call the object loaders on
616a sequence or MEDLINE record given the uid of the record.
617This will get the data into memory regardless of whether the data are
618compressed on the Entrez CD-ROM or are obtained through a service over
619the Internet. This means that a detailed understanding of the files and
620formats on the Entrez disc is not needed by application programmers. The
621function to load a sequence record, SeqEntryGet, needs the uid to
622retrieve and a complexity code parameter. A sequence record is in the
623form of a NucProt set. This contains a nucleotide (which may itself be
624composed of segments) and all of the proteins it is known to encode.
625The set of segments is called a SegSet, and the individual sequences are
626called BioSeqs. We have taken the liberty of producing this integrated
627view, but the complexity code parameter allows the record to be easily
628loaded in a simpler, more traditional form, if desired. The accession
629number term list is built to supply the proper uids to support this
630facility. This access library is compatible with Entrez release 1.0 or
631later only.
632
633The sequence utilities and application programmer interface layer
634allows exploration of the loaded memory structures and
635generation of standard literature or sequence reports from those
636objects. For example, a BioSeq can be converted to FASTA or GenBank
637flat file formats and saved to a file, and a MEDLINE record can be saved
638in MEDLARS format, which is suitable for entry into personal
639bibliographic database programs. A sequence port can be opened that
640gives a simple, linear view of a segmented sequence, converting
641alphabets, merging exon segments, and dealing with information on both
642strands of the DNA. This layer also includes some functions to explore
643the NucProt set. The explore functions visit each individual BioSeq in
644the set, calling a callback function for each sequence node so that a
645program can examine feature tables and other information that are
646associated with the NucProt or SegSets or with the individual sequences.
647
648Vibrant is a multi-platform user interface development library that runs
649on the Macintosh, Microsoft Windows on the PC, or X11 and OSF/Motif on
650UNIX and VAX computers [separate documentation]. It is used to build
651the graphical interface for the Entrez application (whose source code is
652in the browser directory). The philosophy behind Vibrant is that
653everything in the published user interface guidelines (the generic
654behavior of windows, menus, buttons, etc.), as well as positioning and
655sizing of graphical control objects, is taken care of automatically.
656The program provides callback functions that are notified when the user
657has manipulated an object. Vibrant and Entrez code are not supported,
658but are provided on an as-is basis.
659
660The advantage of using AsnLib and the object loaders, as they are
661implemented, is that application program developers merely need to
662recompile their programs with the new (AsnTool-generated) header files
663and load the new parse tables (included with the Entrez software) in
664order to be able to read the new data. This process is straightforward,
665and will not break existing program code. The application is free to
666ignore new fields if it does not choose to take advantage of the new
667kinds of information.
668
669When developing new ASN.1 specifications, as of June 1994 it is possible to
670automatically generate the object loaders and header files for those
671specifications, using the AsnCode utility. For some complex ASN.1
672specifications, however, AsnCode may fail to generate the correct source code.
673
674The documentation is currently being brought up to date. The programs
675in the demo directory are designed to teach the proper use of many of
676the functions discussed above. Many of these programs are not yet
677documented. The simplest is testcore.c, which tests various functions
678in the CoreLib. The most complex is getfeat.c, which takes an accession
679number of locus name, determines the unique seq ID, retrieves the entry
680from the Entrez CD-ROM using the data access library, locates all coding
681region features using the explore functions, and prints the DNA
682sequences of all exons using sequence port functions. If you cannot
683extract and print the doc.tar.Z file, please send an email message with
684your land mailing address and phone number to toolbox@ncbi.nlm.nih.gov,
685and we will mail a copy to you.
686
687The contents of the ncbi directory (the highest level, containing the
688NCBI Software Development Kit source code in several subdirectories) is
689shown below. The readme file contains instructions on copying the
690appropriate make files to be built in the build directory. The makeall
691file copies headers to the include directory builds four libraries
692(ncbi, ncbiobj, ncbicdr and vibrant), copying them to the lib directory.
693The makedemo file builds the demo programs and the Entrez application:
694
695 api Application Programmer Interface, Sequence Utilities
696 asn ASN.1 specifications for publications and sequences
697 asnlib Source code for AsnLib and asntool
698 asnload AsnLib headers and dynamic parse tables (Mac and PC)
699 asnstat AsnLib headers that use static memory (UNIX and VMS)
700 bin Asntool executable copied here
701 biostruc Source code for Molecular Modelling DataBase functions
702 browser Source code for Entrez application
703 build Empty directory for building tools and libraries
704 cdromlib Access routines for data on the Entrez CD-ROM
705 cn3d Source code for Vibrant-based 3D structure viewer
706 config Configuration files for NCBI software:
707 mac
708 unix
709 vms
710 win
711 corelib Source code for NCBI Core Software Library
712 data Data files used for sequence conversion
713 demo AsnLib and sequence utility demonstration programs
714 desktop Source code for Vibrant-based viewers and editors
715 doc Documentation in Microsoft Word file
716 include Include files required by applications are copied here
717 lib Libraries copied here
718 link Contains several subdirectories with build accessory files:
719 macmet Macintosh Metrowerks/CodeWarrior
720 macmpw Macintosh MPW C
721 mswin Microsoft C and Borland C for Windows
722 make Make files for various systems
723 network Network version of data access
724 apple
725 blast2
726 encrypt
727 entrez
728 netmanag
729 nsclilib
730 object Functions for reading and writing complex objects
731 sequin Source code for Sequin application
732 tools Source code for alignment and other contributed utilities
733 readme File that contains important building instructions
734 vibrant Source code for Vibrant portable interface package
735
736The platforms that are supported (as indicated by the suffix on the
737relevant ncbilcl.h file) are shown below. Those marked with an asterisk
738(*) are available as-is:
739
740 370* IBM 370
741 acc SUN acc compiler
742 alf DEC Alpha under OSF/1
743 aov DEC Alpha under AXP/OpenVMS
744 aux* Macintosh A/UX
745 bor Borland for DOS
746 bwn Borland for Microsoft Windows
747 ccr CenterLine CodeCenter
748 cpp SUN C++
749 cra* Cray
750 cvx* Convex
751 gcc Gnu gcc (under SunOS, not Solaris)
752 hp * Hewlett Packard
753 lna* Linux on DEC Alpha
754 lnx Linux (RedHat Linux release 5.2 with kernel 2.0.36)
755 met Macintosh Metrowerks compiler
756 mpw Macintosh Programmer's Workshop
757 msc Microsoft C for DOS
758 msw Microsoft for Windows
759 nxt* NeXT
760 r6k* IBM RS 6000
761 scr CodeCenter under Sun Solaris
762 sgi Silicon Graphics
763 sin Sun Solaris on Intel processors
764 sol Sun Solaris (for cc and gcc)
765 thc THINK C on Macintosh
766 ult DEC ULTRIX
767 vms DEC VAX/VMS
768
769Questions or comments can be directed to toolbox@ncbi.nlm.nih.gov.
770
771ANSI C:
772
773 This software requires an ANSI C compiler. This will be no problem at
774all except to people on Sun machines, where the bundled C compiler, cc, is
775non-ansi. However, you can use the Sun unbundled compiler, acc, or the Gnu
776compiler, gcc (which is free) and that works just fine. If you have written
777applications on the Sun with non-ANSI functions, the ANSI compilers will
778complain. See the notes below if this is a problem.
779
780
781 Installation
782
783To build the NCBI toolkit you need to look for platform-dependent instructions:
784For UNIX:
785 look at the file make/readme.unx
786For Mac:
787 look at the file make/readme.mac
788For Microsoft Windows95/98/NT:
789 look at the file make/readme.dos
790
791There is some information which may be useful for NCBI tookit building
792in the file doc/FAQ.txt
793
794ALL -
795 change to the directory above ncbi subdirectory
796
797Unix
798 tested on Sun Sparc (Solaris 2.6, Sunos 4.1.3),
799 Silicon Graphics IRIX 5.* and 6.*, DEC Alpha with OSF/1 V5.1,
800 Linux (Red Hat Linux release 6.2 with kernel 2.2.16) on Intel,
801 Sun Solaris for Intel (Solaris 2.7).
802
803 Run the script ncbi/make/makedis.csh keeping it's output in the
804 separate file:
805 for sh or bash:
806 ncbi/make/makedis.csh 2>&1 | tee out.makedis.csh
807 for csh or tcsh:
808 ncbi/make/makedis.csh |& tee out.makedis.csh
809 If that script gives you an error like this:
810 Your platform is not supported.
811 To port ncbi toolkit to your platform consult
812 the files platform/*.ncbi.mk
813 then you should check the script ncbi/make/makedis.csh and
814 add proper platform-dependent ncbi.mk file in ncbi/platform
815 directory.
816
817 Other UNIX: AIX, ULTRIX, NeXt, Sun acc,
818 Follows models above. Read header in makeall.unx and makedemo.unx
819 for details.
820
821 for all UNIX, edit .ncbirc as described in section "CONFIGURATION OR
822 SETTINGS FILES".
823 optional edit .login to "setenv NCBI=[path to .ncbirc file]"
824
825MS-DOS
826 look at the file make/readme.dos
827
828Mac
829 tested on CodeWarrior IDE 2.1, MacOS 8.0
830 All - copy config:mac:ncbi.cnf to your System Folder, or to the
831 System Folder:Preferences subfolder
832 edit the "ASNLOAD" line in "ncbi.cnf" to point to the
833 ncbi:asnload directory in this release
834 edit the "DATA" line to point to the ncbi/data directory
835 CodeWarrior - raise Preferred Size of Script Editor from 700 to 3000,
836 and raise Preferred Size of CodeWarrior IDE 2.1 by
837 2000 (e.g., from 8206 to 10206), using Get Info from
838 the Finder.
839 to compile for MC680x0 platform (default is PowerPC),
840 change property MASTER from "PPC" to "68K".
841 run copyhdrs.met
842 run makeall.met
843 run makenet.met
844 run makedemo.met
845 Think C - no longer supported
846 MPW C - no longer supported
847
848Changes to VMS make file naming conventions:
849
850 The old .dcl prefix (last character is a lower case L) was changed
851to .dc1 (last character is the numeral 1) to allow for different make files
852for DecWindows 1.1 and DecWindows 1.2. Several new .dc2 files were
853contributed by David Mathog of CalTech. A synopsis of his additional
854instructions:
855
856 VAX C DecWindows 1.1 Use .dcl1 files.
857 DEC C DecWindows 1.1 Use .dcl1 files,
858 but change cc to cc/standard=vaxc
859 VAX C DecWindows 1.2 This combination has not been tested.
860 DEC C DecWindows 1.2 Use .dcl2 files.
861
862VMS (without Vibrant) on VAX
863 $set def [ncbi.build]
864 $copy [-.make]*.dc1 *.com
865 $@makeall
866
867 check ncbi.cfg as described in section "CONFIGURATION OR SETTINGS FILES".
868 edit LOGIN.COM to "define NCBI [path to ncbi.cfg file]"
869
870 To make demos:
871 $@makedemo
872
873VMS (with Vibrant) on VAX
874 $set def [ncbi.build]
875 $copy [-.make]*.dc1 *.com
876 $@viball
877
878 check ncbi.cfg as described in section "CONFIGURATION OR SETTINGS FILES".
879 edit LOGIN.COM to "define NCBI [path to ncbi.cfg file]"
880
881 To make demos:
882 $@vibdemo
883
884 Testing
885
886VMS only: look in rundemo.dc1 in [make] to see how to give command
887 line arguments. Not all demo programs are shown. Run at least testcore.
888
889All else:
890
891 In build should be a program called testcore. Type "testcore -" and
892it should show you some default arguments. Type "testcore" and it will
893run through a variety of functions in CoreLib, prompting you for responses
894along the way. It should run without a crash or error report. If you made
895Vibrant versions all demos will have startup dialog boxes. If not, they
896take command line arguments.
897
898 If testcore runs, read the documentation for CoreLib and for AsnLib.
899In the AsnLib documentation are instructions for running asntool itself.
900for running a few of the demo programs. There are a large number of demo
901programs now (including Entrez itself, if you made the Vibrant versions).
902
903
904
905CONFIGURATION OR SETTINGS FILES:
906
907 One of the fundamental problems in writing portable software concerns
908configuration issues. Each individual user's computer will have its own
909particular hardware and software environment, and each machine will have
910its disk file hierarchy set up in a unique manner. A program that needs
911accessory information, such as help files, parse tables, or format
912converters, must be given a means of finding the data regardless of where
913the user has placed the files. The difficulty is compounded by the different
914conventions for naming files and specifying paths on each class of machine.
915For example, the name of a CD-ROM on the Macintosh is fixed, determined by
916information on the CD itself, whereas on the PC it is addressed by a drive
917letter, which can be assigned by the user, but which cannot be reconciled
918with the name the Macintosh sees.
919
920 An associated problem is that many programs will want to allow the user
921to make persistent changes to parameters. These parameters typically involve
922numbers or font specifications, but may also include paths to data files.
923Some platforms supply such configuration information in preferences files,
924others in environment variables. Manipulating these settings is platform
925dependent, as is the format in which the preference is specified.
926
927 The NCBI Software Toolkit core library addresses these problems by
928providing configuration or settings files. These are modeled after the .INI
929files used by Microsoft Windows. Settings files are plain ASCII text files
930that may be edited by the user or modified by the program. They are divided
931into sections, each of which is headed by the section name enclosed in square
932brackets. Below each section heading is a series of key=value strings, somewhat
933analogous to the environment variables that are used on many platforms.
934
935 The ncbi configuration file supplies general purpose configuration
936information on paths for commonly used data files. The typical file set up for
937the Entrez application running on the PC under Microsoft Windows is shown below:
938
939[NCBI]
940ROOT=D:
941ASNLOAD=C:\ENTREZ\ASNLOAD\
942DATA=C:\ENTREZ\DATA
943
944 The only section is entitled NCBI. The ROOT entry refers to the path to
945the Entrez CD-ROM. In this example, the user has configured the machine to
946use drive letter D. (On the Macintosh, the name of the disc is SEQDATA, which
947cannot be changed by the user.) The ASNLOAD specifies the path to the ASN.1
948parse tables. These files are required by the AsnLib functions, and all
949higher-level procedures that call them, including the Object Loader, Sequence
950Utility, and Data Access functions. Files pointed to by the DATA entry contain
951information necessary to convert biomolecule sequence data into different
952alphabets (e.g., unpacking the 2-bit nucleotide code stored on the Entrez CD
953into standard IUPAC letters).
954
955 Although the contents of a configuration file is similar regardless of
956platform, the name of the file and its location is platform dependent. If the
957base name of the configuration file is xxx, then the actual file name is shown
958below for each platform:
959
960Macintosh xxx.cnf
961Microsoft Windows xxx.INI
962MS-DOS (without Windows) xxx.CFG
963UNIX .xxxrc
964VMS xxx.cfg
965
966 Samples of such files are in subdirectories of \config. The UNIX version
967does not have the leading '.' in filename so you can see it.
968
969 The location in which these files must reside is also platform dependent,
970and the functions that manipulate the contents may look in several places to
971find these files.
972
973On the Macintosh, the function first looks in the System Folder, then in the
974Preferences folder within the System Folder. (See the Mac OS X addendum in the
975next paragraph). Under Microsoft Windows, the file must be in the Windows
976directory, along with all of the other .INI files. Under DOS without Windows,
977the function first looks in the current working directory, then in the directory
978whose path is specified in the NCBI environment variable. Under UNIX and VMS,
979the current working directory is first checked, then the user's home directory,
980and finally the directory specified by the NCBI environment variable. (Under
981UNIX, when it uses the environment variable, it will check for configuration
982files first without and then with the initial dot.) On the multi- user
983platforms (UNIX and VMS), the use of the NCBI environment variable allows a
984common settings file to be used as the default by multiple users. If such a
985settings file is changed under program control, it is copied over into the
986user's home directory, and the new copy is modified. The order of searching
987for settings files ensures that this new copy is used in all subsequent
988operations.
989
990 On Mac OS X, it first looks for xxx.cnf in username/Library/Preferences,
991then in package/Contents/Resources, where username is the user's home directory
992and package is the application package. If it does not find the configuration
993file, it then switches to UNIX style, looking for .xxxrc in the home directory
994and then in the current directory. This way Mac OS X applications retain the
995traditional Mac behavior but can also UNIX style configuration files.
996
997
998contents of ASNLOAD are in ncbi/asnload
999contents of DATA are in ncbi/data
1000
1001Automatic Generation of code to read and write new ASN.1 messages.
1002(Previously, ASNCODE USAGE)
1003
1004'asntool' can now generate code for use as ASN.1 readers and writers.
1005This functionality used to be in the program called 'asncode'. There
1006is thus no longer any need for the *.l* files. An example of how
1007to generate this code follows:
1008
1009
1010 asntool -m YOURSPEC.asn -G -B genYOURSPEC
1011
1012Both genYOURSPEC.h and genYOURSPEC.c will be generated.
1013
1014Within asn ASN.1 definitions, types can be EXPORTed and IMPORTed.
1015If YOURSPEC.asn imports definitions from otherspec.asn then it has
1016to be added to the -m parameter as below. Note that code is only
1017generated for the first file.
1018
1019 asntool -m YOURSPEC.asn,otherspec.asn -G -B genYOURSPEC
1020 ^
1021
1022Notice the lack of a blank at the caret (^), above. This is important.
1023
1024
1025MAJOR CHANGES FROM DOCUMENTATION:
1026
1027 AsnNode structures have proved to be generally useful and moved from AsnLib
1028to ncbimisc. In addition, some elements of structs used in the object loaders
1029were called "class" to match the ASN.1 names. Class is a C++ reserved word,
1030so all instances of "class" have been changed to "_class".
1031
1032 To conform to our naming conventions, we have changed the names appropriately:
1033
1034AsnValue = DataVal
1035AsnNode = ValNode
1036class = _class
1037
1038 A global search and replace of your code with these strings (not restricted
1039to words... we want to change AsnNodePtr = ValNodePtr as well) should fix
1040any problems. Field names within structures have not been changed. If your
1041code uses only the object loaders, you may not find these strings in your
1042code at all.
1043
1044DATA ACCESS LIBRARIES
1045
1046 cdromlib contains data access routines compatible with release 1.0-6.0
1047of the Entrez CDROM. The documentation for these functions are out of
1048date. The routines in cdromlib have been split into entrez, sequence, and
1049medline access functions. The interface you should normally program to is
1050defined in accentr.[ch]. The form of this calls has been changed to make
1051them compatible with the NCBI network server, a client/server version of
1052data access. A program written to use these calls can access the the cdrom
1053data, the network data, a combination, or that plus a local database by just
1054fiddling with defines. The form of the api for these functions has also
1055been changed to hide the details of storage and caching more so that the
1056different optimizations done to support cdrom and network access are
1057transparent to the application programmer. The end user tool called
1058"Entrez" now uses these libraries as it's only means of data access (i.e.,
1059you can write an application of your own with any or all of Entrez's
1060functionality using just these routines).
1061
1062NETWORK LIBRARIES
1063
1064 The toolbox now includes NCBI "Network Services". This includes
1065everything which you need to build your own "Network Entrez" client software.
1066The network libraries include a generic network services library (nsclilib),
1067which is used to contact the network services dispatcher and connect to a
1068desired server. Note that some development platforms require that you obtain
1069a few source modules from external vendors. Look at the README files
1070contained in the network directory (network/*/README) for more details.
1071
1072
1073DOCUMENTATION
1074
1075 We are rewriting the documentation to conform with all the new features
1076contained in this software. We will add it to the package as soon as possible.
1077
1078DEMO PROGRAMS
1079
1080 As in the tools, there are a number of undocumented programs in the demo
1081directory as well, that use a number of the utility functions in api. There
1082is also a demo program called "getseq" in the cdromlib directory which
1083retrieves a sequence from the cdrom given any valid sequence id. These will
1084be described in more detail in the next set of documentation. Briefly:
1085
1086asn2ff.c converts ASN.1 to GenBank flatfile
1087asn2rpt.c converts ASN.1 to human readable report
1088dosimple.c converts ASN.1 to a "simple sequence"
1089getseq.c gets sequence from Entrez Cdrom using data access library,
1090 writes to disk
1091getfeat.c ditto, but writes sequence of any CdRegion features to
1092 "test.out"
1093getmesh.c documented
1094getpub.c documented
1095indexpub.c documented
1096seqtest.c reads ASN.1 sequence, converts to iupac, reports segmented
1097 sequences, outputs fasta format to seqtest.out
1098testcore.c documented
1099testobj.c tests Medline object loader, demonstrates error checking using
1100 NULL asnio stream.
1101entrez If Vibrant is installed, the full Entrez program is made.
1102asndhuff Demonstrates streaming ASN.1 data from the huffman compressed
1103 Entrez CDROM (only works on release 1.0 or later).
1104entrcmd Standalone non-interactive tool for accessing Entrez data.
1105 Entrcmd is the search engine used for NCBI's Entrez WWW server.
1106asncode Tool for generating object loader source code given a .l
1107 file which is the output of AsnTool.
1108cdscan scans entrez cdrom, makes GenBank, GenPept, or FASTA format
1109 output. Also has a slot for a replaceable CustomRoutine
1110 supplied by you. Has two examples of such routines.
1111
1112CALLBACK CONVENTIONS
1113
1114 The CoreLib, AsnLib, and Object Loader routines have been converted to use
1115the LIBCALL and LIBCALLBACK symbols (FAR PASCAL) on the PC for Windows. This will
1116allow us to build dynamic link libraries (DLLs) so that the code can be accessed
1117from languages other than C. Callback functions you write that are of types
1118AsnOptFreeFunc, AsnExpOptFunc, IoFuncType, AsnReadFunc, AsnWriteFunc, and
1119SeqEntryFunc, should be declared using the LIBCALLBACK macro. For example, a
1120callback used as an AsnOptFreeFunc should be declared as follows:
1121
1122static Pointer LIBCALLBACK MyOptFreeFunc (Pointer);
1123
1124The SeqEntryFunc callback used by SeqEntryExplore has not yet been modified to
1125use the LIBCALLBACK type. This will be added in the near future.
1126
1127
README.htm
1<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
2<html>
3<head>
4</head>
5<body text="#000000" bgcolor="#FFFFFF" link="#0000EE" vlink="#551A8B" alink="#FF0000">
6
7<center><b><font color="#3366FF"><font size=+2>NCBI SOFTWARE DEVELOPMENT
8TOOLKIT</font></font></b>
9<br><b><font color="#3366FF"><font size=+2>National Center for Biotechnology
10Information</font></font></b>
11<br><b><font color="#3366FF"><font size=+2>Bldg 38A, NIH</font></font></b>
12<br><b><font color="#3366FF"><font size=+2>8600 Rockville Pike</font></font></b>
13<br><b><font color="#3366FF"><font size=+2>Bethesda, MD 20894</font></font></b></center>
14
15<p>The NCBI Software Development Toolkit was developed for the production
16and distribution of GenBank, Entrez, BLAST, and related services by NCBI.
17We make it freely available to the public without restriction to facilitate
18the use of NCBI by the scientific community. However, please understand
19that while we feel we have done a high quality job, this is not commercial
20software. The documentation lags considerably behind the software and we
21must make any changes required by our data production needs. Nontheless,
22many people have found it a useful and stable basis for a number of tools
23and applications.
24<p>The toolkit is available by anonymous ftp from <a href="ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/">ftp.ncbi.nih.gov</a>
25<blockquote><tt>cd toolbox</tt>
26<br><tt>cd ncbi_tools</tt>
27<br><tt>bin</tt>
28<br><tt>get ncbi.tar.Z (compressed UNIX tar file)</tt>
29<br><tt>quit</tt></blockquote>
30
31<p><br>In this same directory are also ncbiz.exe (DOS self extracting archive)
32and ncbi.hqx (Mac self extracting archive). All three files contain the
33same source code and will make the toolkit for all platforms.
34<p>Please feel free to email questions/suggestions to: <a href="mailto:toolbox@ncbi.nlm.nih.gov">toolbox@ncbi.nlm.nih.gov</a>
35<p>If you would like hardcopy of the current documentation, send your mailing
36address with your request to the email address above.
37<p>If you are considering a serious development project using this toolkit,
38please contact us. We are happy to discuss compatible strategies and inform
39you of our longer term plans. There is no limitation of the use of this
40code or in contacting us about its use for commercial, academic, or government
41groups.
42<br>
43<hr WIDTH="100%">
44<center><b><font size=+1>Version 6.1</font></b>
45<br><i> the date of release may be obtained from the file <b>ncbi/VERSION</b></i></center>
46
47<hr WIDTH="100%">
48<center><b>Summary</b></center>
49
50<p>The procedure of building the toolkit on Unix was slightly changed.
51Now there is no need to download any binary NCBI product for your platform
52to obtain the platform-specific ncbi.mk file.
53<p>To build the NCBI toolkit you need to look for platform-dependent instructions:
54<br>For UNIX (including Linux and Mac OS X):
55<br> look at the file <b>make/readme.unx</b>
56<br>For alternative Mac instructions (using CodeWarrior):
57<br> look at the file <b>make/readme.mac</b>
58<br>For Microsoft Windows95/98/NT:
59<br> look at the file <b>make/readme.dos</b>
60<br>There is some information which may be useful for NCBI tookit building
61in the file <b>doc/FAQ.txt</b>
62<p>Documentation relevant to BLAST may be found in the <b>doc/blast</b> subdirectory.
63<p>The file <b>doc/sequin.htm</b> describes SEQUIN and its configuration.
64<p>If you have problems configuring Entrez with a firewall, look at the
65file <b>doc/firewall.txt</b>
66<p>This file has a section called <b>CONFIGURATION OR SETTINGS FILES,</b>
67which explains in detail how our configuration system works. The ncbi config
68file (<b>.ncbirc </b>on UNIX, <b>ncbi.ini </b>on PC/Windows, and <b>ncbi.cnf
69</b>on
70Macintosh) is needed in order to find data files, such as <b>gc.val
71</b>(the
72genetic code table), provided in the toolkit or with programs like Sequin.
73(The <b>asnload</b> files containing dynamic versions of the ASN.1 parse
74tables are no longer needed, since all platforms can now have large static
75data.)
76<p>It has recently become possible to eliminate the need for the ncbi config
77file by calling <b>UseLocalAsnloadDataAndErrMsg ()</b> at the beginning
78of your program. This looks for the data directory in the same directory
79as the running program. If it doesn't find it, it looks up one level,
80in case you are compiling programs in the build directory of the toolkit.
81If it finds the data directory in either of these places, it transiently
82sets the location, so code that loads these files is given the correct
83path.
84<p>An even more recent change is that copies of several of our data files
85(gc, seqcode, and featdef) are now built into the source code, so if the
86data directory is not found, programs that require only these can still
87run.
88<p>One final improvement is that access to our network services is now
89much simpler than before, so if you are not behind a firewall and have
90domain name server (DNS) available you can connect to our network without
91needing any configuration information in the ncbi config file. Operation
92behind a firewall, or with a proxy, requires very little in the ncbi config
93file, and this is easily created by asking Sequin to configure for network
94access.
95<br>
96<hr WIDTH="100%">
97<center><b>Notes from Previous Releases</b>
98<br><b><font size=+1>Version 6.0</font></b>
99<br><i>the date of release may be obtained from the file <b>ncbi/VERSION</b></i></center>
100
101<hr WIDTH="100%">
102<br>This release includes source code for the new (2.0) version of BLAST.
103Also included are a small number of incremental changes in the ASN.1 specification.
104<p>BLAST 2.0 - BLAST 2.0 can produce gapped alignments and is capable of
105position-specific-iterated BLASTp (PSI-BLAST). Compared to the 1.4
106release of BLAST, there are also signficant performance enhancements as
107well as extensive changes to the text report and the format of the databases.
108BLAST 2.0 uses threads for multi-processing, using the NCBI threads library.
109Three BLAST programs may be compiled in the demo directory. They are:
110<br>
111<ul>
112<li>
113<b>formatdb</b>: formats FASTA files as BLAST databases for BLAST 2.0.</li>
114
115<li>
116<b>blastall</b>: perform all five flavors of blast comparison.</li>
117
118<li>
119<b>blastn</b> and <b>blastp</b> offer fully gapped alignments.</li>
120
121<li>
122<b>blastx</b> and <b>tblastn</b> have 'in-frame' gapped alignments and
123use sum statistics to link alignments from different frames.</li>
124
125<li>
126<b>tblastx</b> provides only ungapped alignments.</li>
127
128<li>
129<b>blastpgp</b>: performs gapped blastp searches and can be used to perform
130iterative searches in psi-blast mode.</li>
131</ul>
132Additional information may be obtained from the README in the BLAST
133<br>directory of the FTP site and from the NCBI BLAST pages.
134<p><b>ASN.1 Spec Changes for 1997</b>
135<p><tt>biblio.asn</tt>
136<blockquote><tt>Cit-pat - some fields made optional to allow patent applications
137to be legal</tt>
138<blockquote><tt>Cit-pat.number OPTIONAL</tt>
139<br><tt>Cit-pat.date-issue OPTIONAL</tt></blockquote>
140<tt> -- Patent number and date-issue were made optional in 1997 to</tt>
141<br><tt> -- support patent applications being issued from the
142USPTO</tt>
143<br><tt> -- Semantically a Cit-pat must have either a patent
144number or</tt>
145<br><tt> -- an application number (or both) to be valid</tt>
146<br> </blockquote>
147<tt>medline.asn</tt>
148<blockquote><tt>added ML-field to support other MEDLINE line types</tt></blockquote>
149
150<p><br><tt>Medline-entry ::= SEQUENCE {</tt>
151<br><tt> uid INTEGER OPTIONAL , -- MEDLINE UID, sometimes
152not yet available if from PubMed</tt>
153<br><tt> em Date ,
154-- Entry Month</tt>
155<br><tt> ... (not shown)</tt>
156<br><tt> pmid PubMedId OPTIONAL , -- MEDLINE records may include
157the PubMedId</tt>
158<br><tt> pub-type SET OF VisibleString OPTIONAL, -- may show
159publication types (review, etc)</tt>
160<br><tt> mlfield SET OF Medline-field OPTIONAL } -- additional
161Medline field types</tt>
162<p><tt>Medline-field ::= SEQUENCE {</tt>
163<br><tt> type INTEGER { -- Keyed type</tt>
164<br><tt> other (0) ,
165-- look in line code</tt>
166<br><tt> comment (1) , -- comment
167line</tt>
168<br><tt> erratum (2) } , -- retracted, corrected,
169etc</tt>
170<br><tt> str VisibleString , -- the text</tt>
171<br><tt> ids SEQUENCE OF DocRef OPTIONAL } -- pointers relevant
172to this text</tt>
173<p><tt>DocRef ::= SEQUENCE { -- reference to a document</tt>
174<br><tt> type INTEGER {</tt>
175<br><tt> medline (1) ,</tt>
176<br><tt> pubmed (2) ,</tt>
177<br><tt> ncbigi (3) } ,</tt>
178<br><tt> uid INTEGER }</tt>
179<br>
180<p><tt>seq.asn</tt>
181<blockquote><tt>MolInfo.tech - added names for HTG classes already implemented</tt>
182<br><tt>Annotdesc.region - added seqloc. If present, all annots in this
183SeqAnnot are within this region. Optimization on big seqs.</tt></blockquote>
184
185<p><br><tt>seqfeat.asn</tt>
186<blockquote><tt>added OrgMod.specimen-voucher - new organism qualifier</tt>
187<br><tt>added OrgMod.old-name - used internally at NCBI</tt>
188<br><tt>added BioSource.is-focus - for distinguishing biological focus
189of multiple source features.</tt>
190<br><tt>added Seq-feat.pseudo so any feature can be flagged explicitly
191as belonging to a pseudogene</tt>
192<br><tt>added Seq-feat.except-text for an explanation of the exception
193when Seq-feat.except is TRUE. Currently this text is in Seq-feat.comment
194in backbone records and GBQuals in some other genbank records.</tt>
195<br> </blockquote>
196
197<p><br>
198<hr WIDTH="100%">
199<center><b>Notes from Previous Releases</b>
200<br><b><font size=+1>Version 5.0</font></b>
201<br>
202<hr WIDTH="100%"><b>Summary</b></center>
203
204<p>This release includes a small number of incremental changes in the ASN.1
205specification. Most significant is the addition of the PubMedID, a bibliographic
206citation identifier similar to a MEDLINE UID. PubMed is a new citation
207database being developed at NCBI which is a superset of MEDLINE. It will
208be an avenue by which publishers can deposit electronic versions of their
209citations and abstracts to allow them timely linking to network entrez
210from the publishers on-line services. PubMed will route these citations
211to MEDLINE and they will appear in MEDLINE (and Entrez) after the usual
212MEDLINE indexing. However, for some period of time, such articles will
213have only a PubMedID. We would like to switch Entrez over to supporting
214PubMedIDs as early as possible. WE STRONGLY ENCOURAGE DEVELOPERS TO RECOMPILE
215AND RELINK WITH THISVERSION OF THE TOOLKIT AS SOON AS POSSIBLE. The changes
216in this specification should not cause problems with existing software,
217so a simple compile and link should be enough to make you compatible. Details
218of ASN.1 specification changes are listed below.
219<p>There has been considerable development of the toolkit in other aspects
220as well, many of which are embodied in sequin, the new NCBI direct submission
221tool, which is included in the toolkit as well. In the interest of getting
222the PubMed changes into the specification and developers hands promptly,
223we have not included much on that aspect of this toolkit at this time.
224<p>
225<hr WIDTH="100%">
226<center><b> Changes in the 1996 NCBI ASN.1 (version 5.0) specification</b></center>
227
228<hr WIDTH="100%">
229<br>Once again, there are very few changes to the NCBI ASN.1 specification
230this year. The biggest change is the addition of the PubMed ID to support
231the new NCBI PubMed database. There are also small additions to the
232medline and organism specifications, detailed below. As usual, these
233changes are also backward compatible with old data. However, you
234should recompile and relinkyour applications as soon as possible, since
235the old applications will not be compatible with the new datatypes.
236<p>1) PubMed - NCBI is building a new citation database that is a superset
237of MEDLINE and which will be linked to online journals from publishers.
238The bibliographic components of the specification have had support for
239PubMed IDs added. These include biblio.asn (objbibli.[ch]), pub.asn
240(objpub.[ch]), medline.asn (objmedli.[ch]).
241<p>2) pub-type - MEDLINE includes strings indicating the type of a publication.
242The medline definition has had the attribute pub-type added to support
243these strings.
244<p>From the 1996 MeSH, here's the list.
245<br>
246<blockquote><tt>Abstract</tt>
247<br><tt>Bibliography</tt>
248<br><tt>Classical Article</tt>
249<br><tt>Clinical Conference</tt>
250<br><tt>Clinical Trial</tt>
251<br><tt>Clinical Trial, Phase I</tt>
252<br><tt>Clinical Trial, Phase II</tt>
253<br><tt>Clinical Trial, Phase III</tt>
254<br><tt>Clinical Trial, Phase IV</tt>
255<br><tt>Comment</tt>
256<br><tt>Consensus Development Conference</tt>
257<br><tt>Consensus Development Conference, NIH</tt>
258<br><tt>Controlled Clinical Trial</tt>
259<br><tt>Corrected and Republished Article</tt>
260<br><tt>Current Biog-Obit</tt>
261<br><tt>Dictionary</tt>
262<br><tt>Directory</tt>
263<br><tt>Duplicate Publication</tt>
264<br><tt>Editorial</tt>
265<br><tt>Festschrift</tt>
266<br><tt>Guideline</tt>
267<br><tt>Historical Article</tt>
268<br><tt>Historical Biography</tt>
269<br><tt>Interview</tt>
270<br><tt>Journal Article</tt>
271<br><tt>Legal Brief</tt>
272<br><tt>Letter</tt>
273<br><tt>Meeting Report</tt>
274<br><tt>Meta-Analysis</tt>
275<br><tt>Monograph</tt>
276<br><tt>Multicenter Study</tt>
277<br><tt>News</tt>
278<br><tt>Newspaper Article</tt>
279<br><tt>Overall</tt>
280<br><tt>Periodical Index</tt>
281<br><tt>Practice Guideline</tt>
282<br><tt>Published Erratum</tt>
283<br><tt>Randomized Controlled Trial</tt>
284<br><tt>Retracted Publication</tt>
285<br><tt>Retraction of Publication</tt>
286<br><tt>Review</tt>
287<br><tt>Review Literature</tt>
288<br><tt>Review of Reported Cases</tt>
289<br><tt>Review, Academic</tt>
290<br><tt>Review, Multicase</tt>
291<br><tt>Review, Tutorial</tt>
292<br><tt>Scientific Integrity Review</tt>
293<br><tt>Technical Report</tt>
294<br><tt>Twin Study</tt></blockquote>
295
296<p><br>3) virion - the attribute virion has been added to BioSource.genome.
297It just complements proviral which was already there. This will map
298to a /virion qualifier in the new GenBank feature table definition.
299<p>4) division - OrgName.div now (optionally) can contain the GenBank division
300code (eg. PRI).
301<p>5) signal-peptide, transit-peptide - were added to Prot-ref, to support
302annotation of protein features on the protein sequence in a way that could
303be mapped to a GenBank feature table.
304<p>That's all. Relevant sections of the asn.1 specification are shown below.
305<br>
306<hr WIDTH="100%">
307<br><tt>biblio.asn</tt>
308<p><tt>PubMedId ::= INTEGER -- Id from the PubMed database at NCBI</tt>
309<p><tt>and..</tt>
310<br>
311<p><tt>Cit-gen ::= SEQUENCE {
312-- NOT from ANSI, this is a catchall</tt>
313<br><tt> cit VisibleString OPTIONAL , -- anything, not parsable</tt>
314<br><tt> authors Auth-list OPTIONAL ,</tt>
315<br><tt> muid INTEGER OPTIONAL , --
316medline uid</tt>
317<br><tt> journal Title OPTIONAL ,</tt>
318<br><tt> volume VisibleString OPTIONAL ,</tt>
319<br><tt> issue VisibleString OPTIONAL ,</tt>
320<br><tt> pages VisibleString OPTIONAL ,</tt>
321<br><tt> date Date OPTIONAL ,</tt>
322<br><tt> serial-number INTEGER OPTIONAL , -- for GenBank style
323references</tt>
324<br><tt> title VisibleString OPTIONAL , -- eg.
325cit="unpublished",title="title"</tt>
326<br><tt> pmid PubMedId OPTIONAL }
327-- PubMed Id</tt>
328<br> <tt></tt>
329<p><tt>pub.asn</tt>
330<p><tt>Pub ::= CHOICE {</tt>
331<br><tt> gen Cit-gen ,
332-- general or generic unparsed</tt>
333<br><tt> sub Cit-sub ,
334-- submission</tt>
335<br><tt> medline Medline-entry ,</tt>
336<br><tt> muid INTEGER ,
337-- medline uid</tt>
338<br><tt> article Cit-art ,</tt>
339<br><tt> journal Cit-jour ,</tt>
340<br><tt> book Cit-book ,</tt>
341<br><tt> proc Cit-proc , -- proceedings
342of a meeting</tt>
343<br><tt> patent Cit-pat ,</tt>
344<br><tt> pat-id Id-pat , -- identify
345a patent</tt>
346<br><tt> man Cit-let ,
347-- manuscript, thesis, or letter</tt>
348<br><tt> equiv Pub-equiv, -- to cite
349a variety of ways</tt>
350<br><tt> pmid PubMedId } -- PubMedId</tt>
351<p><tt>medline.asn</tt>
352<p><tt>
353-- a MEDLINE or PubMed entry</tt>
354<br><tt>Medline-entry ::= SEQUENCE {</tt>
355<br><tt> uid INTEGER OPTIONAL , -- MEDLINE UID,
356sometimes not yet available if from PubMed</tt>
357<br><tt> em Date ,
358-- Entry Month</tt>
359<br><tt> cit Cit-art ,
360-- article citation</tt>
361<br><tt> abstract VisibleString OPTIONAL ,</tt>
362<br><tt> mesh SET OF Medline-mesh OPTIONAL ,</tt>
363<br><tt> substance SET OF Medline-rn OPTIONAL ,</tt>
364<br><tt> xref SET OF Medline-si OPTIONAL ,</tt>
365<br><tt> idnum SET OF VisibleString OPTIONAL ,
366-- ID Number (grants, contracts)</tt>
367<br><tt> gene SET OF VisibleString OPTIONAL ,</tt>
368<br><tt> pmid PubMedId OPTIONAL ,
369-- MEDLINE records may include the PubMedId</tt>
370<br><tt> pub-type SET OF VisibleString OPTIONAL } -- may show publication
371types (review, etc)</tt>
372<p><tt>seqfeat.asn</tt>
373<p><tt>OrgName ::= SEQUENCE {</tt>
374<br><tt> name CHOICE {</tt>
375<br><tt> binomial BinomialOrgName ,
376-- genus/species type name</tt>
377<br><tt> virus VisibleString ,
378-- virus names are different</tt>
379<br><tt> hybrid MultiOrgName ,
380-- hybrid between organisms</tt>
381<br><tt> namedhybrid BinomialOrgName , --
382some hybrids have genus x species name</tt>
383<br><tt> partial PartialOrgName } OPTIONAL
384, -- when genus not known</tt>
385<br><tt> attrib VisibleString OPTIONAL , -- attribution
386of name</tt>
387<br><tt> mod SEQUENCE OF OrgMod OPTIONAL ,</tt>
388<br><tt> lineage VisibleString OPTIONAL , -- lineage with semicolon
389separators</tt>
390<br><tt> gcode INTEGER OPTIONAL ,
391-- genetic code (see CdRegion)</tt>
392<br><tt> mgcode INTEGER OPTIONAL ,
393-- mitochondrial genetic code</tt>
394<br><tt> div VisibleString OPTIONAL }
395-- GenBank division code</tt>
396<p><tt>BioSource ::= SEQUENCE {</tt>
397<br><tt> genome INTEGER { -- biological context</tt>
398<br><tt> unknown (0) ,</tt>
399<br><tt> genomic (1) ,</tt>
400<br><tt> chloroplast (2) ,</tt>
401<br><tt> chromoplast (3) ,</tt>
402<br><tt> kinetoplast (4) ,</tt>
403<br><tt> mitochondrion (5) ,</tt>
404<br><tt> plastid (6) ,</tt>
405<br><tt> macronuclear (7) ,</tt>
406<br><tt> extrachrom (8) ,</tt>
407<br><tt> plasmid (9) ,</tt>
408<br><tt> transposon (10) ,</tt>
409<br><tt> insertion-seq (11) ,</tt>
410<br><tt> cyanelle (12) ,</tt>
411<br><tt> proviral (13) ,</tt>
412<br><tt> virion (14) } DEFAULT unknown ,</tt>
413<br><tt> origin INTEGER {</tt>
414<br><tt> unknown (0) ,</tt>
415<br><tt> natural (1) ,
416-- normal biological entity</tt>
417<br><tt> natmut (2) ,
418-- naturally occurring mutant</tt>
419<br><tt> mut (3) ,
420-- artificially mutagenized</tt>
421<br><tt> artificial (4) ,
422-- artificially engineered</tt>
423<br><tt> synthetic (5) ,
424-- purely synthetic</tt>
425<br><tt> other (255) } DEFAULT unknown ,</tt>
426<br><tt> org Org-ref ,</tt>
427<br><tt> subtype SEQUENCE OF SubSource OPTIONAL }</tt>
428<p><tt>Prot-ref ::= SEQUENCE {</tt>
429<br><tt> name SET OF VisibleString OPTIONAL , -- protein name</tt>
430<br><tt> desc VisibleString OPTIONAL ,
431-- description (instead of name)</tt>
432<br><tt> ec SET OF VisibleString OPTIONAL , --
433E.C. number(s)</tt>
434<br><tt> activity SET OF VisibleString OPTIONAL , -- activities</tt>
435<br><tt> db SET OF Dbtag OPTIONAL ,
436-- ids in other dbases</tt>
437<br><tt> processed ENUMERATED {
438-- processing status</tt>
439<br><tt> not-set (0) ,</tt>
440<br><tt> preprotein (1) ,</tt>
441<br><tt> mature (2) ,</tt>
442<br><tt> signal-peptide (3) ,</tt>
443<br><tt> transit-peptide (4) } DEFAULT not-set
444}</tt>
445<p>
446<hr WIDTH="100%">
447<center><b>Notes from Previous Releases</b>
448<br><b><font size=+1>New Functions in Version 4.0</font></b></center>
449
450<hr WIDTH="100%">
451<br>There are a host of new functions in this release, but as usual we
452have not managed to make time to document them all. Large parts of Sequin
453are present which will be announced and described more fully in the fall.
454However, specific tools of immediate interest are:
455<p>blast2 - this is the long awaited BLAST client/server which permits
456structured interaction with BLAST over the internet. We have provided a
457basic client that produces the traditional blast output. In addition, the
458function call interface can be used in more elaborate clients. For more
459information contact Tom Madden, <a href="mailto:madden@ncbi.nlm.nih.gov">madden@ncbi.nlm.nih.gov</a>
460<p>WARNING!!! blast2 is the client we plan to support on the longer term.
461The blast1 client we included for those of you who wanted a head start
462will NOT be supported in future. Please shift any blast1 clients to the
463(very similar) blast2 interface as soon as possible.
464<p>sim, sim2 - protein and DNA sequence alignments in linear space. This
465is the function call interface to these valuable tools. Applications have
466been written which are available by ftp as are published papers. For more
467information contact Jinghui Zhang, <a href="mailto:zjing@ncbi.nlm.nih.gov">zjing@ncbi.nlm.nih.gov</a>
468<br>
469<br>
470<p><b>Changes in ASN.1 spec 4.0 from 3.0</b>
471<p>Affil - biblio.asn
472<br>added the field "postal-code" for Zip code finally.
473<p>Contact-info - submit.asn
474<br>added the field "contact" which is type "Author". The contact info
475has evolved into a fully structured form, so I just took Author which has
476structured names and structured address (Affil). We will eventually phase
477out all the less structured ones in Contact-info.
478<p>OrgName - sefeat.asn
479<br>added "lineage", "gcode", "mgcode" for the lineage, genetic code, and
480mitochondrial genetic code. This is part of Org-ref, and consolidates all
481the organism info (except original SOURCE line) out of the GenBank block...
482and enables us to deliver it nicely from Taxon.
483<p>Seq-descr - seq.asn
484<br>removed the Seq-descr "neighbors" and replaced it with "dbxref", since
485neighbors has never been used. This is used to add cross-references to
486the whole entry.
487<p>Pubdesc - seq.asn
488<br>has an added slot, "reftype" which is an integer and is used to indicate
489the GenBank usage of a reference.
490<p>0 - seq - applies to the sequence. This is default and they way it is
491used now.
492<br>1 - sites - applies to (unspecified) features. Equivalent to a GenBank
493SITES feature. We could switch to this from using the Imp-feat we do now.
494<br>2 - feats - applies to specific features. The idea here is provide
495a place for the full citation, so features nead only reference it. If now
496features reference it should be removed. This would work for checking content
497when only a part of a sequence is copied or pasted. A "sites" ref could
498not have this check since we do not know which features it goes to.
499<p>Seq-feat - seqfeat.asn
500<br>added a slot called "dbxref" to Seq-feat. This is a SET OF Dbtag. It
501will be for adding the new db_xref qualifiers to features. We already have
502some of these in the xref slots of Gene-ref, Prot-ref, Org-ref. It means
503we have to check two places in these cases. I do not want to retire the
504slots since these were meant to be used in other contexts besides features..
505and Org-ref already is.
506<p>added a slot called "anticodon" to the tRNA extension of the RNA feature.
507This is a Seq-loc that points to the location of the anticodon in a tRNA.
508We have been populating this data in a User-object, and will have to do
509a retro to convert it.
510<p><b>EXPORTED Genetic-code</b>
511<p>Seq-align - seqalign.asn
512<br>added "bounds" to Seq-align so you can record the regions over which
513an alignment was computed.. not always included in the resulting alignment
514itself.
515<p>added two new types:
516<br> A) Packed-seg -- a denser representation from Colombe and Jinghui
517<br> B) disc - discontinuous alignments as a SEQUENCE OF Seq-align
518<p>Seq-annot - seq.asn
519<p>added a field to Seq-annot, Align-def, to discriminate types of alignment
520sets. This has the advantage of minimal changes as well as separating sets
521of alignments from conceptually single alignments. I am not sure it is
522necessary to distinguish "alt" from "blocks" though. Also it means you
523can attach more info, with other Seq-annot fields and/or by expanding the
524Align-def. I put in "ids" in Align-def specifically to put the one Seq-id
525that is the "master" for type "ref". I made it a SET OF so we could use
526it for other collections where we might want to list more than one.
527<p>added "ids" and "locs" as allowed types within Seq-annot. This would
528enable us to pass lists like this around between tools with all the addtional
529descriptive information in Annotdesc. I know this will be useful.
530<p>added "general" to Annot-id for tracking 3rd party annotations.
531<br>
532<br>
533<br>
534<br>
535<br>
536<center>
537<p><b><font size=+1>INTRODUCTION</font></b></center>
538
539<p>This distribution is release 5.0 of the NCBI core library for building
540portable software, and AsnLib, a collection of routines for handling ASN.1
541data and developing ASN.1 software applications. AsnLib and the asntool
542application are built using the CoreLib routines. In the ./doc directory
543is an MS Word file which details the information given below. It is also
544available as hardcopy. See the README in ./doc.
545<p>The lowest layer of code is the CoreLib. These are multiplatform
546functions for memory allocation (including byte stores), string manipulation,
547file input and output, error and general messages, and time and date notification.
548These functions have been written only where we found that the existing
549ANSI functions were not sufficiently multi-platform or wellbehaved among
550all of the platforms that we support. For each platform (a combination
551of processor, operating system, compiler, and windowing system), we supply
552a specific ncbilcl.h file, which contains typedefs and defines for multi-platform
553symbols,and includes a number of standard header files. (For example,
554ncbilcl.msw is used for the Microsoft C compiler under Microsoft Windows
555on the PC.)
556<br>Use of these symbols, and of the functions in the CoreLib, allow us
557to write multi-platform source code for a variety of disparate platforms.
558<p>The next layer of code is the AsnLib stream reader. This is used
559in conjunction with a header file and a parse table loader file, both of
560which are produced by processing the formal ASN.1 specification with the
561AsnTool application. The symbolic defines in the header file are pointers
562into the parse table, in which the ASN.1 specification is represented.
563To read at the stream reader level, a program alternates between calls
564to AsnReadId and AsnReadVal. AsnReadId returns a pointer into the parse
565table, which can be compared against the defines in the AsnTool-generated
566header. For example, in the specification for MEDLINE records, the
567Medline-entry section has an item called "uid", for the unique ID of the
568record. This is symbolized in the header file as MEDLINE_ENTRY_uid.
569When AsnReadId returns this symbol, the program calls AsnReadVal to obtain
570the uid for that record. AsnKillValue is also needed to free any memory
571allocated by AsnReadVal, which occurs when the value is a string and not
572an integer. The entire set of records on the Entrez CD-ROM can be
573read as a single stream with the AsnLib functions.
574<p>The ASN.1 records may be accessed at a higher level through the object
575loaders, which utilize the stream processing functions to load C memory
576structures with the contents of the ASN.1 objects. For each ASN.1 object
577we specify, we also define an equivalent C memory structure. The
578object loader level of code contains functions to read and write each ASN.1
579object. These are hierarchical, as are the ASN.1 specifications.
580Calling the top level loader, SeqEntryAsnRead, will load an entire SeqEntry
581from an open AsnIo channel, and will return apointer to the loaded memory
582structure. The read function for an AsnIo channel can be swapped
583to refer to a normal disk file, a network socket, or to compressed data,
584which it automatically decompresses. The object loader code can interconvert
585between the highly-branched memory object and a linear ASN.1 message with
586complete fidelity. The object loaders have additional functions,
587including the ability to explore the structure and notify the program when
588particular data elements are encountered. The entire contents of
589the Entrez CD-ROM can also be streamed through the object loaders.
590However, most calls to the object loaders for simply reading a particular
591record are done via the data access functions (see below).
592<p>The data access functions allow a program to call the object loaders
593on a sequence or MEDLINE record given the uid of the record. This will
594get the data into memory regardless of whether the data are compressed
595on the Entrez CD-ROM or are obtained through a service over the Internet.
596This means that a detailed understanding of the files and formats on the
597Entrez disc is not needed by application programmers. The function to load
598a sequence record, SeqEntryGet, needs the uid to retrieve and a complexity
599code parameter. A sequence record is in the form of a NucProt set.
600This contains a nucleotide (which may itself be composed of segments) and
601all of the proteins it is known to encode. The set of segments is called
602a SegSet, and the individual sequences are called BioSeqs. We have
603taken the liberty of producing this integrated view, but the complexity
604code parameter allows the record to be easily loaded in a simpler, more
605traditional form, if desired. The accession number term list is built
606to supply the proper uids to support this facility. This access library
607is compatible with Entrez release 1.0 or later only.
608<p>The sequence utilities and application programmer interface layer allows
609exploration of the loaded memory structures and generation of standard
610literature or sequence reports from those objects. For example, a
611BioSeq can be converted to FASTA or GenBank flat file formats and saved
612to a file, and a MEDLINE record can be saved in MEDLARS format, which is
613suitable for entry into personal bibliographic database programs.
614A sequence port can be opened that gives a simple, linear view of a segmented
615sequence, converting alphabets, merging exon segments, and dealing with
616information on both strands of the DNA. This layer also includes
617some functions to explore the NucProt set. The explore functions
618visit each individual BioSeq in the set, calling a callback function for
619each sequence node so that a program can examine feature tables and other
620information that are associated with the NucProt or SegSets or with the
621individual sequences.
622<p>Vibrant is a multi-platform user interface development library that
623runs on the Macintosh, Microsoft Windows on the PC, or X11 and OSF/Motif
624on UNIX and VAX computers [separate documentation]. It is used to
625build the graphical interface for the Entrez application (whose source
626code is in the browser directory). The philosophy behind Vibrant is that
627everything in the published user interface guidelines (the generic behavior
628of windows, menus, buttons, etc.), as well as positioning and sizing of
629graphical control objects, is taken care of automatically. The program
630provides callback functions that are notified when the user has manipulated
631an object. Vibrant and Entrez code are not supported, but are provided
632on an as-is basis.
633<p>The advantage of using AsnLib and the object loaders, as they are implemented,
634is that application program developers merely need to recompile their programs
635with the new (AsnTool-generated) header files and load the new parse tables
636(included with the Entrez software) in order to be able to read the new
637data. This process is straightforward, and will not break existing
638program code. The application is free to ignore new fields if it
639does not choose to take advantage of the new kinds of information.
640<p>When developing new ASN.1 specifications, as of June 1994 it is possible
641to automatically generate the object loaders and header files for those
642specifications, using the AsnCode utility. For some complex ASN.1
643specifications, however, AsnCode may fail to generate the correct source
644code.
645<p>The documentation is currently being brought up to date. The programs
646in the demo directory are designed to teach the proper use of many of the
647functions discussed above. Many of these programs are not yet documented.
648The simplest is testcore.c, which tests various functionsin the CoreLib.
649The most complex is getfeat.c, which takes an accession number of locus
650name, determines the unique seq ID, retrieves the entry from the Entrez
651CD-ROM using the data access library, locates all coding region features
652using the explore functions, and prints the DNA sequences of all exons
653using sequence port functions. If you cannotextract and print the
654doc.tar.Z file, please send an email message with your land mailing address
655and phone number to <a href="mailto:toolbox@ncbi.nlm.nih.gov">toolbox@ncbi.nlm.nih.gov</a>,
656and we will mail a copy to you.
657<p>The contents of the ncbi directory (the highest level, containing the
658NCBI Software Development Kit source code in several subdirectories) is
659shown below. The readme file contains instructions on copying the
660appropriate make files to be built in the build directory. The makeallfile
661copies headers to the include directory builds four libraries (ncbi, ncbiobj,
662ncbicdr and vibrant), copying them to the lib directory. The makedemo file
663builds the demo programs and the Entrez application:
664<br>
665<ul>
666<li>
667api Application Programmer Interface, Sequence Utilities</li>
668
669<li>
670asn ASN.1 specifications for publications and sequences</li>
671
672<li>
673asnlib Source code for AsnLib and asntool</li>
674
675<li>
676asnload AsnLib headers and dynamic parse tables (Mac and PC)</li>
677
678<li>
679asnstat AsnLib headers that use static memory (UNIX and VMS)</li>
680
681<li>
682bin Asntool executable copied here</li>
683
684<li>
685biostruc Source code for Molecular Modelling DataBase functions</li>
686
687<li>
688browser Source code for Entrez application</li>
689
690<li>
691build Empty directory for building tools and libraries</li>
692
693<li>
694cdromlib Access routines for data on the Entrez CD-ROM</li>
695
696<li>
697cn3d Source code for Vibrant-based 3D structure viewer</li>
698
699<li>
700config Configuration files for NCBI software:</li>
701
702<ul>
703<li>
704dos</li>
705
706<li>
707mac</li>
708
709<li>
710unix</li>
711
712<li>
713vms</li>
714
715<li>
716win</li>
717</ul>
718
719<li>
720corelib Source code for NCBI Core Software Library</li>
721
722<li>
723data Data files used for sequence conversion</li>
724
725<li>
726demo AsnLib and sequence utility demonstration programs</li>
727
728<li>
729desktop Source code for Vibrant-based viewers and editors</li>
730
731<li>
732doc Documentation in Microsoft Word file</li>
733
734<li>
735include Include files required by applications are copied here</li>
736
737<li>
738lib Libraries copied here</li>
739
740<li>
741link Contains several subdirectories with build accessory files:</li>
742
743<ul>
744<li>
745macmet Macintosh Metrowerks/CodeWarrior</li>
746
747<li>
748macmpw Macintosh MPW C</li>
749
750<li>
751msdos Microsoft C and Borland C for DOS</li>
752
753<li>
754mswin Microsoft C and Borland C for Windows</li>
755</ul>
756
757<li>
758make Make files for various systems</li>
759
760<li>
761network Network version of data access</li>
762
763<ul>
764<li>
765apple</li>
766
767<li>
768blast2</li>
769
770<li>
771encrypt</li>
772
773<li>
774entrez</li>
775
776<li>
777netmanag</li>
778
779<li>
780nsclilib</li>
781</ul>
782
783<li>
784object Functions for reading and writing complex objects</li>
785
786<li>
787sequin Source code for Sequin application</li>
788
789<li>
790tools Source code for alignment and other contributed utilities</li>
791
792<li>
793readme File that contains important building instructions</li>
794
795<li>
796vibrant Source code for Vibrant portable interface package</li>
797</ul>
798
799<p><br>The platforms that are supported (as indicated by the suffix on
800the relevant ncbilcl.h file) are shown below. Those marked with an asterisk
801(*) are available as-is:
802<p>370* IBM 370
803<br>acc SUN acc compiler
804<br>alf DEC Alpha under OSF/1
805<br>aov DEC Alpha under AXP/OpenVMS
806<br>aux* Macintosh A/UX
807<br>bor Borland for DOS
808<br>bwn Borland for Microsoft Windows
809<br>ccr CenterLine CodeCenter
810<br>cpp SUN C++
811<br>cra* Cray
812<br>cvx* Convex
813<br>gcc Gnu gcc (under SunOS, not Solaris)
814<br>hp * Hewlett Packard
815<br>lna* Linux on DEC Alpha
816<br>lnx Linux (Red Hat Linux release 5.2 with kernel 2.0.36)
817<br>met Macintosh Metrowerks compiler
818<br>mpw Macintosh Programmer's Workshop
819<br>msc Microsoft C for DOS
820<br>msw Microsoft for Windows
821<br>nxt* NeXT
822<br>r6k* IBM RS 6000
823<br>scr CodeCenter under Sun Solaris
824<br>sgi Silicon Graphics
825<br>sin Sun Solaris on Intel processors
826<br>sol Sun Solaris (for cc and gcc)
827<br>thc THINK C on Macintosh
828<br>ult DEC ULTRIX
829<br>vms DEC VAX/VMS
830<p>Questions or comments can be directed to <a href="mailto:toolbox@ncbi.nlm.nih.gov.">toolbox@ncbi.nlm.nih.gov.</a>
831<p><b>ANSI C:</b>
832<p> This software requires an ANSI C compiler. This will be no problem
833at
834<br>all except to people on Sun machines, where the bundled C compiler,
835cc, is
836<br>non-ansi. However, you can use the Sun unbundled compiler, acc,
837or the Gnu
838<br>compiler, gcc (which is free) and that works just fine. If you
839have written
840<br>applications on the Sun with non-ANSI functions, the ANSI compilers
841will
842<br>complain. See the notes below if this is a problem.
843<center>
844<p><b><font size=+1>INSTALLATION</font></b></center>
845
846<p>To build the NCBI toolkit you need to look for platform-dependent instructions:
847<br>For UNIX:
848<br> look at the file make/readme.unx
849<br>For Mac:
850<br> look at the file make/readme.mac
851<br>For Microsoft Windows95/98/NT:
852<br> look at the file make/readme.dos
853<p>There is some information which may be useful for NCBI tookit building
854<br>in the file doc/FAQ.txt
855<p><b>ALL</b>
856<br> change to the directory above the ncbi subdirectory
857<p><b>Unix</b>
858<br> tested on Sun Sparc (Solaris 2.6, Sunos 4.1.3),
859<br> Silicon Graphics IRIX 5.* and 6.*, DEC Alpha with OSF/1 V4.0,
860<br> Linux (Red Hat Linux release 5.2 with kernel 2.0.33) on Intel,
861<br> Sun Solaris for Intel (Solaris 2.7).
862<p> Run the script ncbi/make/makedis.csh keeping it's output in the
863<br> separate file:
864<br> for sh or bash:
865<blockquote><tt>ncbi/make/makedis.csh 2>&1 | tee out.makedis.csh</tt></blockquote>
866 for csh or tcsh:
867<blockquote><tt>ncbi/make/makedis.csh |& tee out.makedis.csh</tt></blockquote>
868 If that script gives you an error like this:
869<blockquote><tt>Your platform is not supported.</tt>
870<br><tt>To port ncbi toolkit to your platform consult</tt>
871<br><tt>the files platform/*.ncbi.mk</tt></blockquote>
872 then you should check the script ncbi/make/makedis.csh and
873<br> add proper platform-dependent ncbi.mk file in ncbi/platform
874<br> directory.
875<p> Other UNIX: AIX, ULTRIX, NeXt, Sun acc,
876<br> Follows models above. Read header in makeall.unx and makedemo.unx
877<br> for details.
878<p> for all UNIX, edit .ncbirc as described in section "CONFIGURATION
879OR
880<br> SETTINGS FILES".
881<br> optional edit .login to "setenv NCBI=[path to .ncbirc file]"
882<br>
883<p><b>MS-DOS</b>
884<br>(Also see NEW MAKEFILES, below)
885<br><u>Microsoft C version 7.00</u>
886<blockquote><tt>copy ..\make\*.dos</tt>
887<br><tt>ren makeall.dos makefile</tt>
888<br><tt>nmake MSC=1 [note: nmake requires windows or DPMI]</tt>
889<br><tt>copy ..\config\ncbi.dos ncbi.cfg</tt></blockquote>
890check paths in ncbi.cfg file [see section on CONFIGURATION]
891<p>Optional:
892<br>edit AUTOEXEC.BAT with "set NCBI=[path to directory containing ncbi.cfg]".
893<br>reboot to activate
894<p> To make demo programs:
895<blockquote><tt>nmake -f makedemo.dos MSC=1</tt></blockquote>
896<u>Microsoft Windows version 7.00</u>
897<blockquote><tt>copy ..\make\*.dos</tt>
898<br><tt>ren makeall.dos makefile</tt>
899<br><tt>nmake MSW=1 [note: nmake requires windows or DPMI]</tt></blockquote>
900 check paths in "ncbi.ini" as above
901<br> copy ncbi.ini to your windows directory
902<br> To make demos:
903<blockquote><tt>nmake -f makedemo.dos MSW=1</tt></blockquote>
904<u>Borland C++ 3.1</u>
905<blockquote><tt>copy ..\make\*.dos</tt>
906<br><tt>ren makeall.dos makefile</tt>
907<br><tt>make -DBOR</tt></blockquote>
908then set paths as in Microsoft C, above.
909<p>To make demos:
910<blockquote><tt>make -f makedemo.dos -DBOR</tt></blockquote>
911
912<p><br><u>Borland C++ 3.1 for Windows</u>
913<blockquote><tt>copy ..\make\*.dos</tt>
914<br><tt>ren makeall.dos makefile</tt>
915<br><tt>make -DBWN</tt></blockquote>
916then set paths as in Microsoft Windows, above.
917<br>To make demos:
918<blockquote><tt>make -f makedemo.dos -DBWN</tt></blockquote>
919
920<p><br><b>Mac</b><b></b>
921<p>tested on <u>CodeWarrior IDE 2.1, MacOS 8.0</u>
922<p><u>All</u>
923<blockquote>copy <b>config:mac:ncbi.cnf </b>to your System Folder, or to
924the <b>System Folder:Preferences</b> subfolder
925<br>edit the "<b>ASNLOAD</b>" line in <i>"ncbi.cnf" </i>to point to the
926<b>ncbi:asnload</b> directory in this release
927<br>edit the "<b>DATA</b>" line to point to the <b>ncbi/data </b>directory
928<br> </blockquote>
929<u>CodeWarrior</u>
930<blockquote>raise Preferred Size of Script Editor from 700 to 3000, and
931raise Preferred Size of CodeWarrior IDE 2.1 by 2000 (e.g., from 8206 to
93210206), using Get Info from the Finder.
933<br>to compile for MC680x0 platform (default is PowerPC), change property
934MASTER from "PPC" to "68K".
935<br>run copyhdrs.met
936<br>run makeall.met
937<br>run makenet.met
938<br>run makedemo.met</blockquote>
939<u>Think C</u> - no longer supported
940<br><u>MPW C</u> - no longer supported
941<br>
942<p><b>VMS</b>
943<p><u>Changes to VMS make file naming conventions:</u>
944<p> The old .dcl prefix (last character is a lower case L) was changed
945<br>to .dc1 (last character is the numeral 1) to allow for different make
946files
947<br>for DecWindows 1.1 and DecWindows 1.2. Several new .dc2 files
948were
949<br>contributed by David Mathog of CalTech. A synopsis of his additional
950<br>instructions:
951<p> VAX C DecWindows 1.1 Use .dcl1 files.
952<br> DEC C DecWindows 1.1 Use .dcl1 files, but change cc to
953cc/standard=vaxc
954<br> VAX C DecWindows 1.2 This combination has not been tested.
955<br> DEC C DecWindows 1.2 Use .dcl2 files.
956<p><u>VMS (without Vibrant) on VAX</u>
957<br><tt> $set def [ncbi.build]</tt>
958<br><tt> $copy [-.make]*.dc1 *.com</tt>
959<br><tt> $@makeall</tt><tt></tt>
960<p> check ncbi.cfg as described in section "CONFIGURATION OR SETTINGS
961FILES".
962<br> edit LOGIN.COM to "define NCBI [path to ncbi.cfg file]"
963<p> To make demos:
964<br><tt> $@makedemo</tt>
965<p><u>VMS (with Vibrant) on VAX</u>
966<br><tt> $set def [ncbi.build]</tt>
967<br><tt> $copy [-.make]*.dc1 *.com</tt>
968<br><tt> $@viball</tt>
969<p> check ncbi.cfg as described in section "CONFIGURATION OR SETTINGS
970FILES".
971<br> edit LOGIN.COM to "define NCBI [path to ncbi.cfg file]"
972<p> To make demos:
973<br><tt> $@vibdemo</tt>
974<p><b>Testing</b>
975<p><u>VMS</u> only: look in rundemo.dc1 in [make] to see how to give
976command line arguments. Not all demo programs are shown. Run at least testcore.
977<p><u>All</u> else:
978<br>In <b>build</b> directory should be a program called <b>testcore</b>.
979Type "<tt>testcore -</tt>" and it should show you some default arguments.
980Type "<b>testcore</b>" and it will run through a variety of functions in
981CoreLib, prompting you for responses along the way. It should run
982without a crash or error report. If you made Vibrant versions all demos
983will have startup dialog boxes. If not, they take command line arguments.
984<p>If testcore runs, read the documentation for CoreLib and for AsnLib.
985In the AsnLib documentation are instructions for running asntool itself.
986for running a few of the demo programs. There are a large number
987of demo programs now (including Entrez itself, if you made the Vibrant
988versions).
989<br>
990<br>
991<br>
992<br>
993<center>
994<p><b><font size=+1>CONFIGURATION OR SETTINGS FILES</font></b></center>
995
996<p>One of the fundamental problems in writing portable software concerns
997configuration issues. Each individual user's computer will have its
998own particular hardware and software environment, and each machine will
999have its disk file hierarchy set up in a unique manner. A program
1000that needs accessory information, such as help files, parse tables, or
1001format converters, must be given a means of finding the data regardless
1002of where the user has placed the files. The difficulty is compounded
1003by the different conventions for naming files and specifying paths on each
1004class of machine. For example, the name of a CD-ROM on the Macintosh is
1005fixed, determined by information on the CD itself, whereas on the PC it
1006is addressed by a drive letter, which can be assigned by the user, but
1007which cannot be reconciled with the name the Macintosh sees.
1008<p>An associated problem is that many programs will want to allow the user
1009to make persistent changes to parameters. These parameters typically
1010involve numbers or font specifications, but may also include paths to data
1011files. Some platforms supply such configuration information in preferences
1012files, others in environment variables. Manipulating these settings
1013is platform dependent, as is the format in which the preference is specified.
1014<p>The NCBI Software Toolkit core library addresses these problems by providing
1015configuration or settings files. These are modeled after the .INI
1016files used by Microsoft Windows. Settings files are plain ASCII text
1017files that may be edited by the user or modified by the program.
1018They are dividedinto sections, each of which is headed by the section name
1019enclosed in square brackets. Below each section heading is a series
1020of key=value strings, somewhat analogous to the environment variables that
1021are used on many platforms.
1022<p>The ncbi configuration file supplies general purpose configuration information
1023on paths for commonly used data files. The typical file set up for
1024the Entrez application running on the PC under Microsoft Windows is shown
1025below:
1026<br>
1027<blockquote><tt>[NCBI]</tt>
1028<br><tt>ROOT=D:</tt>
1029<br><tt>ASNLOAD=C:\ENTREZ\ASNLOAD\</tt>
1030<br><tt>DATA=C:\ENTREZ\DATA</tt>
1031<br> </blockquote>
1032The only section is entitled NCBI. The ROOT entry refers to the path
1033to the Entrez CD-ROM. In this example, the user has configured the
1034machine to use drive letter D. (On the Macintosh, the name of the
1035disc is SEQDATA, which cannot be changed by the user.) The ASNLOAD
1036specifies the path to the ASN.1 parse tables. These files are required
1037by the AsnLib functions, and all higher-level procedures that call them,
1038including the Object Loader, Sequence Utility, and Data Access functions.
1039Files pointed to by the DATA entry contain information necessary to convert
1040biomolecule sequence data into different alphabets (e.g., unpacking the
10412-bit nucleotide code stored on the Entrez CD into standard IUPAC letters).
1042<p>Although the contents of a configuration file is similar regardless
1043of platform, the name of the file and its location is platform dependent.
1044If the base name of the configuration file is xxx, then the actual file
1045name is shown below for each platform:
1046<br>
1047<table BORDER COLS=2 WIDTH="30%" NOSAVE >
1048<tr>
1049<td>Macintosh</td>
1050
1051<td>xxx.cnf</td>
1052</tr>
1053
1054<tr>
1055<td>Microsoft Windows</td>
1056
1057<td>xxx.INI</td>
1058</tr>
1059
1060<tr>
1061<td>MS-DOS (without Windows) </td>
1062
1063<td>xxx.CFG</td>
1064</tr>
1065
1066<tr>
1067<td>UNIX</td>
1068
1069<td>.xxxrc</td>
1070</tr>
1071
1072<tr>
1073<td>VMS</td>
1074
1075<td>xxx.cfg</td>
1076</tr>
1077</table>
1078
1079<p>
1080<br>Samples of such files are in subdirectories of \config. The UNIX
1081version does not have the leading '.' in filename so you can see it. Since
1082VMS and DOS both use the same file name (ncbi.cfg) the DOS version was
1083called ncbi.dos. You will have to rename it. Remember these are just models.
1084You will have to set the paths appropriately for your machine yourself.
1085<p>The location in which these files must reside is also platform dependent,
1086and the functions that manipulate the contents may look in several places
1087to find these files.
1088<p>On the Macintosh, the function first looks in the System Folder, then in the
1089Preferences folder within the System Folder. (See the Mac OS X addendum in the
1090next paragraph). Under Microsoft Windows, the file must be in the Windows
1091directory, along with all of the other .INI files. Under DOS without Windows,
1092the function first looks in the current working directory, then in the directory
1093whose path is specified in the NCBI environment variable. Under UNIX and VMS,
1094the current working directory is first checked, then the user's home directory,
1095and finally the directory specified by the NCBI environment variable. (Under
1096UNIX, when it uses the environment variable, it will check for configuration
1097files first without and then with the initial dot.) On the multi- user
1098platforms (UNIX and VMS), the use of the NCBI environment variable allows a
1099common settings file to be used as the default by multiple users. If such a
1100settings file is changed under program control, it is copied over into the
1101user's home directory, and the new copy is modified. The order of searching
1102for settings files ensures that this new copy is used in all subsequent
1103operations.
1104<p>On Mac OS X, it first looks for xxx.cnf in username/Library/Preferences,
1105then in package/Contents/Resources, where username is the user's home directory
1106and package is the application package. If it does not find the configuration
1107file, it then switches to UNIX style, looking for .xxxrc in the home directory
1108and then in the current directory. This way Mac OS X applications retain the
1109traditional Mac behavior but can also UNIX style configuration files.
1110<p>contents of <b>ASNLOAD</b> are in <b>ncbi/asnload</b>
1111<br>contents of <b>DATA </b>are in <b>ncbi/data</b>
1112<p>Automatic Generation of code to read and write new ASN.1 messages.
1113<br>(Previously, <b>ASNCODE USAGE</b>)
1114<p>'asntool' can now generate code for use as ASN.1 readers and writers.
1115<br>This functionality used to be in the program called 'asncode'. There
1116<br>is thus no longer any need for the *.l* files. An example of
1117how
1118<br>to generate this code follows:
1119<br>
1120<blockquote><tt>asntool -m YOURSPEC.asn -G -B genYOURSPEC</tt></blockquote>
1121
1122<p><br>Both genYOURSPEC.h and genYOURSPEC.c will be generated.
1123<p>Within asn ASN.1 definitions, types can be EXPORTed and IMPORTed.
1124<br>If YOURSPEC.asn imports definitions from otherspec.asn then it has
1125<br>to be added to the -m parameter as below. Note that code is only
1126<br>generated for the first file.
1127<br>
1128<blockquote><tt>asntool -m YOURSPEC.asn<u>,</u>otherspec.asn -G -B genYOURSPEC</tt>
1129<br><tt>
1130^</tt></blockquote>
1131
1132<p><br>Notice the lack of a blank at the caret (^), above. This is
1133important.
1134<br>
1135<p><br>
1136<center>
1137<p><b><font size=+1>MAJOR CHANGES FROM DOCUMENTATION</font></b></center>
1138
1139<p>AsnNode structures have proved to be generally useful and moved from
1140AsnLib to ncbimisc. In addition, some elements of structs used in
1141the object loaders were called "class" to match the ASN.1 names.
1142Class is a C++ reserved word, so all instances of "class" have been changed
1143to "_class".
1144<p>To conform to our naming conventions, we have changed the names appropriately:
1145<p><tt>AsnValue = DataVal</tt>
1146<br><tt>AsnNode = ValNode</tt>
1147<br><tt>class = _class</tt>
1148<p>A global search and replace of your code with these strings (not restricted
1149to words... we want to change AsnNodePtr = ValNodePtr as well) should fix
1150any problems. Field names within structures have not been changed.
1151If your code uses only the object loaders, you may not find these strings
1152in your code at all.
1153<center>
1154<p><b><font size=+1>DATA ACCESS LIBRARIES</font></b></center>
1155
1156<p>cdromlib contains data access routines compatible with release 1.0-6.0
1157of the Entrez CDROM. The documentation for these functions are out
1158of date. The routines in cdromlib have been split into entrez, sequence,
1159and medline access functions. The interface you should normally program
1160to is defined in accentr.[ch]. The form of this calls has been changed
1161to make them compatible with the NCBI network server, a client/server version
1162of data access. A program written to use these calls can access the
1163the cdrom data, the network data, a combination, or that plus a local database
1164by just fiddling with defines. The form of the api for these functions
1165has also been changed to hide the details of storage and caching more so
1166that the different optimizations done to support cdrom and network access
1167are transparent to the application programmer. The end user tool
1168called "Entrez" now uses these libraries as it's only means of data access
1169(i.e., you can write an application of your own with any or all of Entrez's
1170functionality using just these routines).
1171<br>
1172<p><br>
1173<center>
1174<p><b><font size=+1>NETWORK LIBRARIES</font></b></center>
1175
1176<p>The toolbox now includes NCBI "Network Services". This includes
1177everything which you need to build your own "Network Entrez" client software.
1178The network libraries include a generic network services library (nsclilib),
1179which is used to contact the network services dispatcher and connect to a
1180desired server. Note that some development platforms require that you
1181obtain a few source modules from external vendors. Look at the README
1182files contained in the network directory (network/*/README) for more details.
1183<br> <p><br> <center> <p><b><font
1184size=+1>DOCUMENTATION</font></b></center>
1185
1186<p>We are rewriting the documentation to conform with all the new features
1187contained in this software. We will add it to the package as soon
1188as possible.
1189<br>
1190<p><br>
1191<center>
1192<p><b><font size=+1>DEMO PROGRAMS</font></b></center>
1193
1194<p>As in the tools, there are a number of undocumented programs in the
1195demo directory as well, that use a number of the utility functions in api.
1196There is also a demo program called "getseq" in the cdromlib directory
1197which retrieves a sequence from the cdrom given any valid sequence id.
1198These will be described in more detail in the next set of documentation.
1199<p>Briefly:
1200<p>asn2ff.c converts ASN.1 to GenBank flatfile
1201<br>asn2rpt.c converts ASN.1 to human readable report
1202<br>dosimple.c converts ASN.1 to a "simple sequence"
1203<br>getseq.c gets sequence from Entrez Cdrom using data access library,
1204writes to disk
1205<br>getfeat.c ditto, but writes sequence of any CdRegion features
1206to "test.out"
1207<br>getmesh.c documented
1208<br>getpub.c documented
1209<br>indexpub.c documented
1210<br>seqtest.c reads ASN.1 sequence, converts to iupac, reports segmented
1211sequences, outputs fasta format to seqtest.out
1212<br>testcore.c documented
1213<br>testobj.c tests Medline object loader, demonstrates error checking
1214using NULL asnio stream.
1215<br>entrez If Vibrant is installed, the full Entrez program is made.
1216<br>asndhuff Demonstrates streaming ASN.1 data from the huffman compressed
1217Entrez CDROM (only works on release 1.0 or later).
1218<br>entrcmd Standalone non-interactive tool for accessing Entrez
1219data.
1220<br>Entrcmd is the search engine used for NCBI's Entrez WWW server.
1221<br>asncode Tool for generating object loader source code given a
1222.l file which is the output of AsnTool.
1223<br>cdscan scans entrez cdrom, makes GenBank, GenPept, or FASTA format
1224output. Also has a slot for a replaceable CustomRoutine supplied by you.
1225Has two examples of such routines.
1226<center>
1227<p><b><font size=+1>CALLBACK CONVENTIONS</font></b></center>
1228
1229<p>The CoreLib, AsnLib, and Object Loader routines have been converted
1230to use the LIBCALL and LIBCALLBACK symbols (FAR PASCAL) on the PC for Windows.
1231This will allow us to build dynamic link libraries (DLLs) so that the code
1232can be accessed from languages other than C. Callback functions you
1233write that are of types AsnOptFreeFunc, AsnExpOptFunc, IoFuncType, AsnReadFunc,
1234AsnWriteFunc, and SeqEntryFunc, should be declared using the LIBCALLBACK
1235macro. For example, a callback used as an AsnOptFreeFunc should be declared
1236as follows:
1237<br>
1238<blockquote><tt>static Pointer LIBCALLBACK MyOptFreeFunc (Pointer);</tt></blockquote>
1239
1240<p><br>The SeqEntryFunc callback used by SeqEntryExplore has not yet been
1241modified to use the LIBCALLBACK type. This will be added in the near
1242future.
1243<br>
1244</body>
1245</html>
1246