• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

css/H03-May-2022-7,1155,603

html/H03-May-2022-6,7015,551

m4/H03-May-2022-9,0638,191

ASF-2.0H A D29-Dec-201811.1 KiB203169

AUTHORSH A D29-Dec-2018101 32

COPYINGH A D29-Dec-2018453 76

ChangeLogH A D29-Dec-20181 KiB7035

INSTALLH A D29-Dec-201815.4 KiB369287

LGPL_V2H A D29-Dec-201824.7 KiB483400

Makefile.amH A D29-Dec-2018356 148

Makefile.inH A D03-May-202233.7 KiB1,032920

NEWSH A D29-Dec-20180

READMEH A D29-Dec-20184.2 KiB154110

aclocal.m4H A D29-Dec-201841.6 KiB1,1601,053

compileH A D29-Dec-20187.2 KiB349259

config.guessH A D29-Dec-201843.1 KiB1,4771,284

config.h.inH A D29-Dec-20182.4 KiB9363

config.subH A D29-Dec-201835.3 KiB1,8021,661

configureH A D03-May-2022577.5 KiB18,99916,039

configure.acH A D29-Dec-2018937 4434

depcompH A D29-Dec-201823 KiB792502

htmlcxx.ccH A D29-Dec-20184.1 KiB207179

htmlcxx.pc.inH A D29-Dec-2018285 1311

htmlcxx.specH A D29-Dec-20181.2 KiB5140

htmlcxx.vcprojH A D29-Dec-20184 KiB171170

htmlcxxapp.vcprojH A D29-Dec-20183.6 KiB145144

install-shH A D29-Dec-201815 KiB519337

ltmain.shH A D29-Dec-2018316.5 KiB11,1487,979

missingH A D29-Dec-20186.7 KiB216143

test-driverH A D29-Dec-20184.5 KiB14987

wingetopt.cH A D29-Dec-20188.3 KiB17969

wingetopt.hH A D29-Dec-2018590 2619

ylwrapH A D29-Dec-20186.7 KiB248143

README

1htmlcxx - html and css APIs for C++
2
3---------------------------------------------
4
5	Preamble
6	===========
7
8This project stayed a decade hosted in Sourceforge, and was previously manually
9copied elsewhere. We finally moved it to github, and collected (some of) the
10relevant patches and fixes that were floating around. Thanks a lot for those
11that kept the project alive during this time.
12
13Also, the library is considered frozen, and we only do eventual bugfixes.
14
15
16	Description
17	===========
18
19htmlcxx is a simple non-validating css1 and html parser for C++.
20Although there are several other html parsers available, htmlcxx has some
21characteristics that make it unique:
22
23- STL like navigation of DOM tree, using excelent's tree.hh library from
24  Kasper Peeters
25- It is possible to reproduce exactly, character by character, the
26  original document from the parse tree
27- Bundled css parser
28- Optional parsing of attributes
29- C++ code that looks like C++ (not so true anymore)
30- Offsets of tags/elements in the original document are stored in the
31  nodes of the DOM tree
32
33The parsing politics of htmlcxx were created trying to mimic mozilla
34firefox (http://www.mozilla.org) behavior. So you should expect parse
35trees similar to those create by firefox. However, differently from firefox,
36htmlcxx does not insert non-existent stuff in your html. Therefore, serializing
37the DOM tree gives exactly the same bytes contained in the original HTML
38document.
39
40
41        News for version 0.87
42        =====================
43
44Cherry-picked the non Visual Studio fixes from
45https://github.com/dhoerl/htmlcxx. Kudos for David Hoerl for the fixes and
46code babysitting for so long.
47
48        News for version 0.86
49        =====================
50
51Fixed a few compilation problems.
52
53        News for version 0.85
54        =====================
55
56Fixed gcc 4.3 compiler errors, several minor bug fixes, improved distribution
57of the css library.
58
59
60	News for version 0.7.3
61	======================
62
63Added utility code to escape/decode urls as defined by RFC 2396.
64Added new SAX interface. The API was slightly broken to support the new
65SAX interface :-(.
66Added Visual Studio 2003 projects for the WIN32 port.
67
68
69	Examples
70	========
71
72Using htmlcxx is quite simple. Take a look
73at this example.
74
75-----------------------------------------------------------------------
76
77  #include <htmlcxx/html/ParserDom.h>
78  ...
79  using namespace std;
80  using namespace htmlcxx;
81
82  //Parse some html code
83  string html = "<html><body>hey</body></html>";
84  HTML::ParserDom parser;
85  tree<HTML::Node> dom = parser.parseTree(html);
86
87  //Print whole DOM tree
88  cout << dom << endl;
89
90  //Dump all links in the tree
91  tree<HTML::Node>::iterator it = dom.begin();
92  tree<HTML::Node>::iterator end = dom.end();
93  for (; it != end; ++it)
94  {
95     if (strcasecmp(it->tagName().c_str(), "A") == 0)
96     {
97       it->parseAttributes();
98       cout << it->attribute("href").second << endl;
99     }
100  }
101
102  //Dump all text of the document
103  it = dom.begin();
104  end = dom.end();
105  for (; it != end; ++it)
106  {
107    if ((!it->isTag()) && (!it->isComment()))
108    {
109      cout << it->text();
110    }
111  }
112  cout << endl;
113
114-------------------------------------------------
115
116
117	The htmlcxx application
118	=======================
119
120htmlcxx is the name of both the library and the utility
121application that comes with this package. Although the
122htmlcxx (the application) is mostly useless for programming, you can use it
123to easily see how htmlcxx (the library) would parse your html code.
124Just install and try htmlcxx -h.
125
126
127	Downloads
128	=========
129
130Use the project page at sourceforge: http://sf.net/projects/htmlcxx
131
132
133	License Stuff
134	=============
135
136Code is now under the LGPL. This was our initial intention, and is
137now possible thanks to the author of tree.hh, who allowed us to use it
138under LGPL only for HTML::Node template instances. Check
139http://www.fsf.org or the COPYING file in the distribution for details
140about the LGPL license. The uri parsing code is a derivative work of
141Apache web server uri parsing routines. Check
142www.apache.org/licenses/LICENSE-2.0 or the ASF-2.0 file in the
143distribution for details.
144
145----------------------------------------
146
147Enjoy!
148
149Davi de Castro Reis - <davi (a) users sf net>
150
151Robson Braga Ara�jo - <braga (a) users sf net>
152
153Last Updated: Tue Dec  8 20:37:41 BRST 2015
154