README.md
1# quick-xml
2
3[![Build Status](https://travis-ci.org/tafia/quick-xml.svg?branch=master)](https://travis-ci.org/tafia/quick-xml)
4[![Crate](http://meritbadge.herokuapp.com/quick-xml)](https://crates.io/crates/quick-xml)
5
6High performance xml pull reader/writer.
7
8The reader:
9- is almost zero-copy (use of `Cow` whenever possible)
10- is easy on memory allocation (the API provides a way to reuse buffers)
11- support various encoding (with `encoding` feature), namespaces resolution, special characters.
12
13[docs.rs](https://docs.rs/quick-xml)
14
15Syntax is inspired by [xml-rs](https://github.com/netvl/xml-rs).
16
17## Example
18
19### Reader
20
21```rust
22use quick_xml::Reader;
23use quick_xml::events::Event;
24
25let xml = r#"<tag1 att1 = "test">
26 <tag2><!--Test comment-->Test</tag2>
27 <tag2>
28 Test 2
29 </tag2>
30 </tag1>"#;
31
32let mut reader = Reader::from_str(xml);
33reader.trim_text(true);
34
35let mut count = 0;
36let mut txt = Vec::new();
37let mut buf = Vec::new();
38
39// The `Reader` does not implement `Iterator` because it outputs borrowed data (`Cow`s)
40loop {
41 match reader.read_event(&mut buf) {
42 Ok(Event::Start(ref e)) => {
43 match e.name() {
44 b"tag1" => println!("attributes values: {:?}",
45 e.attributes().map(|a| a.unwrap().value).collect::<Vec<_>>()),
46 b"tag2" => count += 1,
47 _ => (),
48 }
49 },
50 Ok(Event::Text(e)) => txt.push(e.unescape_and_decode(&reader).unwrap()),
51 Ok(Event::Eof) => break, // exits the loop when reaching end of file
52 Err(e) => panic!("Error at position {}: {:?}", reader.buffer_position(), e),
53 _ => (), // There are several other `Event`s we do not consider here
54 }
55
56 // if we don't keep a borrow elsewhere, we can clear the buffer to keep memory usage low
57 buf.clear();
58}
59```
60
61### Writer
62
63```rust
64use quick_xml::Writer;
65use quick_xml::Reader;
66use quick_xml::events::{Event, BytesEnd, BytesStart};
67use std::io::Cursor;
68use std::iter;
69
70let xml = r#"<this_tag k1="v1" k2="v2"><child>text</child></this_tag>"#;
71let mut reader = Reader::from_str(xml);
72reader.trim_text(true);
73let mut writer = Writer::new(Cursor::new(Vec::new()));
74let mut buf = Vec::new();
75loop {
76 match reader.read_event(&mut buf) {
77 Ok(Event::Start(ref e)) if e.name() == b"this_tag" => {
78
79 // crates a new element ... alternatively we could reuse `e` by calling
80 // `e.into_owned()`
81 let mut elem = BytesStart::owned(b"my_elem".to_vec(), "my_elem".len());
82
83 // collect existing attributes
84 elem.extend_attributes(e.attributes().map(|attr| attr.unwrap()));
85
86 // copy existing attributes, adds a new my-key="some value" attribute
87 elem.push_attribute(("my-key", "some value"));
88
89 // writes the event to the writer
90 assert!(writer.write_event(Event::Start(elem)).is_ok());
91 },
92 Ok(Event::End(ref e)) if e.name() == b"this_tag" => {
93 assert!(writer.write_event(Event::End(BytesEnd::borrowed(b"my_elem"))).is_ok());
94 },
95 Ok(Event::Eof) => break,
96 // you can use either `e` or `&e` if you don't want to move the event
97 Ok(e) => assert!(writer.write_event(&e).is_ok()),
98 Err(e) => panic!("Error at position {}: {:?}", reader.buffer_position(), e),
99 }
100 buf.clear();
101}
102
103let result = writer.into_inner().into_inner();
104let expected = r#"<my_elem k1="v1" k2="v2" my-key="some value"><child>text</child></my_elem>"#;
105assert_eq!(result, expected.as_bytes());
106```
107
108## Serde
109
110When using the `serialize` feature, quick-xml can be used with serde's `Serialize`/`Deserialize` traits.
111
112Here is an example deserializing crates.io source:
113
114```rust
115// Cargo.toml
116// [dependencies]
117// serde = { version = "1.0", features = [ "derive" ] }
118// quick-xml = { version = "0.21", features = [ "serialize" ] }
119extern crate serde;
120extern crate quick_xml;
121
122use serde::Deserialize;
123use quick_xml::de::{from_str, DeError};
124
125#[derive(Debug, Deserialize, PartialEq)]
126struct Link {
127 rel: String,
128 href: String,
129 sizes: Option<String>,
130}
131
132#[derive(Debug, Deserialize, PartialEq)]
133#[serde(rename_all = "lowercase")]
134enum Lang {
135 En,
136 Fr,
137 De,
138}
139
140#[derive(Debug, Deserialize, PartialEq)]
141struct Head {
142 title: String,
143 #[serde(rename = "link", default)]
144 links: Vec<Link>,
145}
146
147#[derive(Debug, Deserialize, PartialEq)]
148struct Script {
149 src: String,
150 integrity: String,
151}
152
153#[derive(Debug, Deserialize, PartialEq)]
154struct Body {
155 #[serde(rename = "script", default)]
156 scripts: Vec<Script>,
157}
158
159#[derive(Debug, Deserialize, PartialEq)]
160struct Html {
161 lang: Option<String>,
162 head: Head,
163 body: Body,
164}
165
166fn crates_io() -> Result<Html, DeError> {
167 let xml = "<!DOCTYPE html>
168 <html lang=\"en\">
169 <head>
170 <meta charset=\"utf-8\">
171 <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">
172 <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">
173
174 <title>crates.io: Rust Package Registry</title>
175
176
177 <!-- EMBER_CLI_FASTBOOT_TITLE --><!-- EMBER_CLI_FASTBOOT_HEAD -->
178 <link rel=\"manifest\" href=\"/manifest.webmanifest\">
179 <link rel=\"apple-touch-icon\" href=\"/cargo-835dd6a18132048a52ac569f2615b59d.png\" sizes=\"227x227\">
180
181 <link rel=\"stylesheet\" href=\"/assets/vendor-8d023d47762d5431764f589a6012123e.css\" integrity=\"sha256-EoB7fsYkdS7BZba47+C/9D7yxwPZojsE4pO7RIuUXdE= sha512-/SzGQGR0yj5AG6YPehZB3b6MjpnuNCTOGREQTStETobVRrpYPZKneJwcL/14B8ufcvobJGFDvnTKdcDDxbh6/A==\" >
182 <link rel=\"stylesheet\" href=\"/assets/cargo-cedb8082b232ce89dd449d869fb54b98.css\" integrity=\"sha256-S9K9jZr6nSyYicYad3JdiTKrvsstXZrvYqmLUX9i3tc= sha512-CDGjy3xeyiqBgUMa+GelihW394pqAARXwsU+HIiOotlnp1sLBVgO6v2ZszL0arwKU8CpvL9wHyLYBIdfX92YbQ==\" >
183
184
185 <link rel=\"shortcut icon\" href=\"/favicon.ico\" type=\"image/x-icon\">
186 <link rel=\"icon\" href=\"/cargo-835dd6a18132048a52ac569f2615b59d.png\" type=\"image/png\">
187 <link rel=\"search\" href=\"/opensearch.xml\" type=\"application/opensearchdescription+xml\" title=\"Cargo\">
188 </head>
189 <body>
190 <!-- EMBER_CLI_FASTBOOT_BODY -->
191 <noscript>
192 <div id=\"main\">
193 <div class='noscript'>
194 This site requires JavaScript to be enabled.
195 </div>
196 </div>
197 </noscript>
198
199 <script src=\"/assets/vendor-bfe89101b20262535de5a5ccdc276965.js\" integrity=\"sha256-U12Xuwhz1bhJXWyFW/hRr+Wa8B6FFDheTowik5VLkbw= sha512-J/cUUuUN55TrdG8P6Zk3/slI0nTgzYb8pOQlrXfaLgzr9aEumr9D1EzmFyLy1nrhaDGpRN1T8EQrU21Jl81pJQ==\" ></script>
200 <script src=\"/assets/cargo-4023b68501b7b3e17b2bb31f50f5eeea.js\" integrity=\"sha256-9atimKc1KC6HMJF/B07lP3Cjtgr2tmET8Vau0Re5mVI= sha512-XJyBDQU4wtA1aPyPXaFzTE5Wh/mYJwkKHqZ/Fn4p/ezgdKzSCFu6FYn81raBCnCBNsihfhrkb88uF6H5VraHMA==\" ></script>
201
202 </body>
203 </html>
204}";
205 let html: Html = from_str(xml)?;
206 assert_eq!(&html.head.title, "crates.io: Rust Package Registry");
207 Ok(html)
208}
209```
210
211### Credits
212
213This has largely been inspired by [serde-xml-rs](https://github.com/RReverser/serde-xml-rs).
214quick-xml follows its convention for deserialization, including the
215[`$value`](https://github.com/RReverser/serde-xml-rs#parsing-the-value-of-a-tag) special name.
216
217### Parsing the "value" of a tag
218
219If you have an input of the form `<foo abc="xyz">bar</foo>`, and you want to get at the `bar`, you can use the special name `$value`:
220
221```rust,ignore
222struct Foo {
223 pub abc: String,
224 #[serde(rename = "$value")]
225 pub body: String,
226}
227```
228
229### Performance
230
231Note that despite not focusing on performance (there are several unecessary copies), it remains about 10x faster than serde-xml-rs.
232
233# Features
234
235- `encoding`: support non utf8 xmls
236- `serialize`: support serde `Serialize`/`Deserialize`
237
238## Performance
239
240Benchmarking is hard and the results depend on your input file and your machine.
241
242Here on my particular file, quick-xml is around **50 times faster** than [xml-rs](https://crates.io/crates/xml-rs) crate.
243
244```
245// quick-xml benches
246test bench_quick_xml ... bench: 198,866 ns/iter (+/- 9,663)
247test bench_quick_xml_escaped ... bench: 282,740 ns/iter (+/- 61,625)
248test bench_quick_xml_namespaced ... bench: 389,977 ns/iter (+/- 32,045)
249
250// same bench with xml-rs
251test bench_xml_rs ... bench: 14,468,930 ns/iter (+/- 321,171)
252
253// serde-xml-rs vs serialize feature
254test bench_serde_quick_xml ... bench: 1,181,198 ns/iter (+/- 138,290)
255test bench_serde_xml_rs ... bench: 15,039,564 ns/iter (+/- 783,485)
256```
257
258For a feature and performance comparison, you can also have a look at RazrFalcon's [parser comparison table](https://github.com/RazrFalcon/roxmltree#parsing).
259
260## Contribute
261
262Any PR is welcomed!
263
264## License
265
266MIT
267