README.md
1SweetXml
2========
3
4`SweetXml` is a thin wrapper around `:xmerl`. It allows you to converts a
5`string` or `xmlElement` record as defined in `:xmerl` to an elixir value such
6as `map`, `list`, `char_list`, or any combination of these.
7
8
9## Examples
10
11Given a xml document such as below
12
13```xml
14<?xml version="1.05" encoding="UTF-8"?>
15<game>
16 <matchups>
17 <matchup winner-id="1">
18 <name>Match One</name>
19 <teams>
20 <team>
21 <id>1</id>
22 <name>Team One</name>
23 </team>
24 <team>
25 <id>2</id>
26 <name>Team Two</name>
27 </team>
28 </teams>
29 </matchup>
30 <matchup winner-id="2">
31 <name>Match Two</name>
32 <teams>
33 <team>
34 <id>2</id>
35 <name>Team Two</name>
36 </team>
37 <team>
38 <id>3</id>
39 <name>Team Three</name>
40 </team>
41 </teams>
42 </matchup>
43 <matchup winner-id="1">
44 <name>Match Three</name>
45 <teams>
46 <team>
47 <id>1</id>
48 <name>Team One</name>
49 </team>
50 <team>
51 <id>3</id>
52 <name>Team Three</name>
53 </team>
54 </teams>
55 </matchup>
56 </matchups>
57</game>
58```
59We can do the following
60
61```elixir
62import SweetXml
63doc = "..." # as above
64```
65
66get the name of the first match
67
68```elixir
69result = doc |> xpath(~x"//matchup/name/text()") # `sigil_x` for (x)path
70assert result == 'Match One'
71```
72
73get the xml record of the name of the first match
74
75```elixir
76result = doc |> xpath(~x"//matchup/name"e) # `e` is the modifier for (e)ntity
77assert result == {:xmlElement, :name, :name, [], {:xmlNamespace, [], []},
78 [matchup: 2, matchups: 2, game: 1], 2, [],
79 [{:xmlText, [name: 2, matchup: 2, matchups: 2, game: 1], 1, [],
80 'Match One', :text}], [],
81 ...}
82```
83
84get the full list of matchup name
85
86```elixir
87result = doc |> xpath(~x"//matchup/name/text()"l) # `l` stands for (l)ist
88assert result == ['Match One', 'Match Two', 'Match Three']
89```
90
91get a list of matchups with different map structure
92
93```elixir
94result = doc |> xpath(
95 ~x"//matchups/matchup"l,
96 name: ~x"./name/text()",
97 winner: [
98 ~x".//team/id[.=ancestor::matchup/@winner-id]/..",
99 name: ~x"./name/text()"
100 ]
101)
102assert result == [
103 %{name: 'Match One', winner: %{name: 'Team One'}},
104 %{name: 'Match Two', winner: %{name: 'Team Two'}},
105 %{name: 'Match Three', winner: %{name: 'Team One'}}
106]
107```
108
109Or directly return a mapping of your liking
110
111```elixir
112result = doc |> xmap(
113 matchups: [
114 ~x"//matchups/matchup"l,
115 name: ~x"./name/text()",
116 winner: [
117 ~x".//team/id[.=ancestor::matchup/@winner-id]/..",
118 name: ~x"./name/text()"
119 ]
120 ],
121 last_matchup: [
122 ~x"//matchups/matchup[last()]",
123 name: ~x"./name/text()",
124 winner: [
125 ~x".//team/id[.=ancestor::matchup/@winner-id]/..",
126 name: ~x"./name/text()"
127 ]
128 ]
129)
130assert result == %{
131 matchups: [
132 %{name: 'Match One', winner: %{name: 'Team One'}},
133 %{name: 'Match Two', winner: %{name: 'Team Two'}},
134 %{name: 'Match Three', winner: %{name: 'Team One'}}
135 ],
136 last_matchup: %{name: 'Match Three', winner: %{name: 'Team One'}}
137}
138```
139
140## The ~x Sigil
141
142In the above examples, we used the expression `~x"//some/path"` to
143define the path. The reason is it allows us to more precisely specify what
144is being returned.
145
146 * `~x"//some/path"`
147
148 without any modifiers, `xpath/2` will return the value of the entity if
149 the entity is of type `xmlText`, `xmlAttribute`, `xmlPI`, `xmlComment`
150 as defined in `:xmerl`
151
152 * `~x"//some/path"e`
153
154 `e` stands for (e)ntity. This forces `xpath/2` to return the entity with
155 which you can further chain your `xpath/2` call
156
157 * `~x"//some/path"l`
158
159 'l' stands for (l)ist. This forces `xpath/2` to return a list. Without
160 `l`, `xpath/2` will only return the first element of the match
161
162 * `~x"//some/path"el` - mix of the above
163
164 * `~x"//some/path"s`
165
166 's' stands for (s)tring. This forces `xpath/2` to return the value as
167 string instead of a char list.
168
169 * `~x"//some/path"sl` - string list.
170
171Also in the examples section, we always import SweetXml first. This
172makes `x_sigil` available in the current scope. Without it, instead of using
173`~x`, you can use the `%SweetXpath` struct
174
175```elixir
176assert ~x"//some/path"e == %SweetXpath{path: '//some/path', is_value: false, is_string: false, is_list: false}
177```
178
179Note the use of char_list in the path definition.
180
181## From Chaining to Nesting
182
183Here's a brief explanation to how nesting came about.
184
185### Chaining
186
187Both `xpath` and `xmap` can take an `:xmerl` xml record as the first argment.
188Therefore you can chain calls to these functions like below:
189
190```elixir
191doc
192|> xpath(~x"//li"l)
193|> Enum.map fn (li_node) ->
194 %{
195 name: li_node |> xpath(~x"./name/text()"),
196 age: li_node |> xpath(~x"./age/text()")
197 }
198end
199```
200
201### Mapping to a structure
202
203Since the previous example is such a common use case, SweetXml allows you just
204simply do the following
205
206```elixir
207doc
208|> xpath(
209 ~x"//li"l,
210 name: ~x"./name/text()",
211 age: ~x"./age/text()"
212)
213```
214
215### Nesting
216
217But what you want is sometimes more complex than just that, SweetXml thus also
218allows nesting
219
220```elixir
221doc
222|> xpath(
223 ~x"//li"l,
224 name: [
225 ~x"./name",
226 first: ~x"./first/text()",
227 last: ~x"./last/text()"
228 ],
229 age: ~x"./age/text()"
230)
231```
232
233For more examples, please take a look at the tests and help.
234
235## Streaming
236
237`SweetXml` now also supports streaming in various forms. Here's a sample xml doc.
238Notice the certain lines have xml tags that span multiple lines.
239
240```xml
241<?xml version="1.05" encoding="UTF-8"?>
242<html>
243 <head>
244 <title>XML Parsing</title>
245 <head><title>Nested Head</title></head>
246 </head>
247 <body>
248 <p>Neato €</p><ul>
249 <li class="first star" data-index="1">
250 First</li><li class="second">Second
251 </li><li
252 class="third">Third</li>
253 </ul>
254 <div>
255 <ul>
256 <li>Forth</li>
257 </ul>
258 </div>
259 <special_match_key>first star</special_match_key>
260 </body>
261</html>
262```
263
264### Working with `File.stream!`
265
266Working with streams is exactly the same as working with binaries.
267
268```elixir
269File.stream!("file_above.xml") |> xpath(...)
270```
271
272### `SweetXml` element streaming
273
274Once you have a file stream, you may not want to work with the entire document to
275save memory.
276
277```elixir
278file_stream = File.stream!("file_above.xml")
279
280result = file_stream
281|> stream_tags([:li, :special_match_key])
282|> Stream.map(fn
283 {_, doc} ->
284 xpath(doc, ~x"./text()")
285 end)
286|> Enum.to_list
287
288assert result == ['\n First', 'Second\n ', 'Third', 'Forth', 'first star']
289```
290