|
Name |
|
Date |
Size |
#Lines |
LOC |
| .. | | 03-May-2022 | - |
| css/ | H | 03-May-2022 | - | 7,115 | 5,603 |
| html/ | H | 03-May-2022 | - | 6,701 | 5,551 |
| m4/ | H | 03-May-2022 | - | 9,063 | 8,191 |
| ASF-2.0 | H A D | 29-Dec-2018 | 11.1 KiB | 203 | 169 |
| AUTHORS | H A D | 29-Dec-2018 | 101 | 3 | 2 |
| COPYING | H A D | 29-Dec-2018 | 453 | 7 | 6 |
| ChangeLog | H A D | 29-Dec-2018 | 1 KiB | 70 | 35 |
| INSTALL | H A D | 29-Dec-2018 | 15.4 KiB | 369 | 287 |
| LGPL_V2 | H A D | 29-Dec-2018 | 24.7 KiB | 483 | 400 |
| Makefile.am | H A D | 29-Dec-2018 | 356 | 14 | 8 |
| Makefile.in | H A D | 03-May-2022 | 33.7 KiB | 1,032 | 920 |
| NEWS | H A D | 29-Dec-2018 | 0 | | |
| README | H A D | 29-Dec-2018 | 4.2 KiB | 154 | 110 |
| aclocal.m4 | H A D | 29-Dec-2018 | 41.6 KiB | 1,160 | 1,053 |
| compile | H A D | 29-Dec-2018 | 7.2 KiB | 349 | 259 |
| config.guess | H A D | 29-Dec-2018 | 43.1 KiB | 1,477 | 1,284 |
| config.h.in | H A D | 29-Dec-2018 | 2.4 KiB | 93 | 63 |
| config.sub | H A D | 29-Dec-2018 | 35.3 KiB | 1,802 | 1,661 |
| configure | H A D | 03-May-2022 | 577.5 KiB | 18,999 | 16,039 |
| configure.ac | H A D | 29-Dec-2018 | 937 | 44 | 34 |
| depcomp | H A D | 29-Dec-2018 | 23 KiB | 792 | 502 |
| htmlcxx.cc | H A D | 29-Dec-2018 | 4.1 KiB | 207 | 179 |
| htmlcxx.pc.in | H A D | 29-Dec-2018 | 285 | 13 | 11 |
| htmlcxx.spec | H A D | 29-Dec-2018 | 1.2 KiB | 51 | 40 |
| htmlcxx.vcproj | H A D | 29-Dec-2018 | 4 KiB | 171 | 170 |
| htmlcxxapp.vcproj | H A D | 29-Dec-2018 | 3.6 KiB | 145 | 144 |
| install-sh | H A D | 29-Dec-2018 | 15 KiB | 519 | 337 |
| ltmain.sh | H A D | 29-Dec-2018 | 316.5 KiB | 11,148 | 7,979 |
| missing | H A D | 29-Dec-2018 | 6.7 KiB | 216 | 143 |
| test-driver | H A D | 29-Dec-2018 | 4.5 KiB | 149 | 87 |
| wingetopt.c | H A D | 29-Dec-2018 | 8.3 KiB | 179 | 69 |
| wingetopt.h | H A D | 29-Dec-2018 | 590 | 26 | 19 |
| ylwrap | H A D | 29-Dec-2018 | 6.7 KiB | 248 | 143 |
README
1htmlcxx - html and css APIs for C++
2
3---------------------------------------------
4
5 Preamble
6 ===========
7
8This project stayed a decade hosted in Sourceforge, and was previously manually
9copied elsewhere. We finally moved it to github, and collected (some of) the
10relevant patches and fixes that were floating around. Thanks a lot for those
11that kept the project alive during this time.
12
13Also, the library is considered frozen, and we only do eventual bugfixes.
14
15
16 Description
17 ===========
18
19htmlcxx is a simple non-validating css1 and html parser for C++.
20Although there are several other html parsers available, htmlcxx has some
21characteristics that make it unique:
22
23- STL like navigation of DOM tree, using excelent's tree.hh library from
24 Kasper Peeters
25- It is possible to reproduce exactly, character by character, the
26 original document from the parse tree
27- Bundled css parser
28- Optional parsing of attributes
29- C++ code that looks like C++ (not so true anymore)
30- Offsets of tags/elements in the original document are stored in the
31 nodes of the DOM tree
32
33The parsing politics of htmlcxx were created trying to mimic mozilla
34firefox (http://www.mozilla.org) behavior. So you should expect parse
35trees similar to those create by firefox. However, differently from firefox,
36htmlcxx does not insert non-existent stuff in your html. Therefore, serializing
37the DOM tree gives exactly the same bytes contained in the original HTML
38document.
39
40
41 News for version 0.87
42 =====================
43
44Cherry-picked the non Visual Studio fixes from
45https://github.com/dhoerl/htmlcxx. Kudos for David Hoerl for the fixes and
46code babysitting for so long.
47
48 News for version 0.86
49 =====================
50
51Fixed a few compilation problems.
52
53 News for version 0.85
54 =====================
55
56Fixed gcc 4.3 compiler errors, several minor bug fixes, improved distribution
57of the css library.
58
59
60 News for version 0.7.3
61 ======================
62
63Added utility code to escape/decode urls as defined by RFC 2396.
64Added new SAX interface. The API was slightly broken to support the new
65SAX interface :-(.
66Added Visual Studio 2003 projects for the WIN32 port.
67
68
69 Examples
70 ========
71
72Using htmlcxx is quite simple. Take a look
73at this example.
74
75-----------------------------------------------------------------------
76
77 #include <htmlcxx/html/ParserDom.h>
78 ...
79 using namespace std;
80 using namespace htmlcxx;
81
82 //Parse some html code
83 string html = "<html><body>hey</body></html>";
84 HTML::ParserDom parser;
85 tree<HTML::Node> dom = parser.parseTree(html);
86
87 //Print whole DOM tree
88 cout << dom << endl;
89
90 //Dump all links in the tree
91 tree<HTML::Node>::iterator it = dom.begin();
92 tree<HTML::Node>::iterator end = dom.end();
93 for (; it != end; ++it)
94 {
95 if (strcasecmp(it->tagName().c_str(), "A") == 0)
96 {
97 it->parseAttributes();
98 cout << it->attribute("href").second << endl;
99 }
100 }
101
102 //Dump all text of the document
103 it = dom.begin();
104 end = dom.end();
105 for (; it != end; ++it)
106 {
107 if ((!it->isTag()) && (!it->isComment()))
108 {
109 cout << it->text();
110 }
111 }
112 cout << endl;
113
114-------------------------------------------------
115
116
117 The htmlcxx application
118 =======================
119
120htmlcxx is the name of both the library and the utility
121application that comes with this package. Although the
122htmlcxx (the application) is mostly useless for programming, you can use it
123to easily see how htmlcxx (the library) would parse your html code.
124Just install and try htmlcxx -h.
125
126
127 Downloads
128 =========
129
130Use the project page at sourceforge: http://sf.net/projects/htmlcxx
131
132
133 License Stuff
134 =============
135
136Code is now under the LGPL. This was our initial intention, and is
137now possible thanks to the author of tree.hh, who allowed us to use it
138under LGPL only for HTML::Node template instances. Check
139http://www.fsf.org or the COPYING file in the distribution for details
140about the LGPL license. The uri parsing code is a derivative work of
141Apache web server uri parsing routines. Check
142www.apache.org/licenses/LICENSE-2.0 or the ASF-2.0 file in the
143distribution for details.
144
145----------------------------------------
146
147Enjoy!
148
149Davi de Castro Reis - <davi (a) users sf net>
150
151Robson Braga Ara�jo - <braga (a) users sf net>
152
153Last Updated: Tue Dec 8 20:37:41 BRST 2015
154