• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

bench/H03-May-2022-

doc/H04-Feb-2020-

misc/git/H04-Feb-2020-

testdata/H03-May-2022-

.gitattributesH A D04-Feb-202029

.gitignoreH A D04-Feb-2020151

.travis.ymlH A D04-Feb-2020177

LICENSEH A D04-Feb-20201.5 KiB

README.mdH A D04-Feb-202012.6 KiB

array.goH A D04-Feb-20203.8 KiB

array_test.goH A D04-Feb-20205.3 KiB

bench_array_test.goH A D04-Feb-20201.9 KiB

bench_example_test.goH A D04-Feb-2020922

bench_expand_test.goH A D04-Feb-20201.6 KiB

bench_filter_test.goH A D04-Feb-20203.8 KiB

bench_iteration_test.goH A D04-Feb-2020959

bench_property_test.goH A D04-Feb-2020763

bench_query_test.goH A D04-Feb-20201.7 KiB

bench_traversal_test.goH A D04-Feb-202013.7 KiB

doc.goH A D04-Feb-20204.6 KiB

example_test.goH A D04-Feb-20201.9 KiB

expand.goH A D04-Feb-20202.7 KiB

expand_test.goH A D04-Feb-20202.8 KiB

filter.goH A D04-Feb-20205.8 KiB

filter_test.goH A D04-Feb-20204.9 KiB

go.modH A D04-Feb-2020153

go.sumH A D04-Feb-2020792

iteration.goH A D04-Feb-20201.4 KiB

iteration_test.goH A D04-Feb-20201.9 KiB

manipulation.goH A D04-Feb-202017.9 KiB

manipulation_test.goH A D04-Feb-202012.6 KiB

property.goH A D04-Feb-20206.4 KiB

property_test.goH A D04-Feb-20205.9 KiB

query.goH A D04-Feb-20201.7 KiB

query_test.goH A D04-Feb-20202.3 KiB

traversal.goH A D04-Feb-202027.6 KiB

traversal_test.goH A D04-Feb-202020.4 KiB

type.goH A D04-Feb-20204.4 KiB

type_test.goH A D04-Feb-20203.6 KiB

utilities.goH A D04-Feb-20204.6 KiB

utilities_test.goH A D04-Feb-20202.6 KiB

README.md

1# goquery - a little like that j-thing, only in Go
2[![build status](https://secure.travis-ci.org/PuerkitoBio/goquery.svg?branch=master)](http://travis-ci.org/PuerkitoBio/goquery) [![GoDoc](https://godoc.org/github.com/PuerkitoBio/goquery?status.png)](http://godoc.org/github.com/PuerkitoBio/goquery) [![Sourcegraph Badge](https://sourcegraph.com/github.com/PuerkitoBio/goquery/-/badge.svg)](https://sourcegraph.com/github.com/PuerkitoBio/goquery?badge)
3
4goquery brings a syntax and a set of features similar to [jQuery][] to the [Go language][go]. It is based on Go's [net/html package][html] and the CSS Selector library [cascadia][]. Since the net/html parser returns nodes, and not a full-featured DOM tree, jQuery's stateful manipulation functions (like height(), css(), detach()) have been left off.
5
6Also, because the net/html parser requires UTF-8 encoding, so does goquery: it is the caller's responsibility to ensure that the source document provides UTF-8 encoded HTML. See the [wiki][] for various options to do this.
7
8Syntax-wise, it is as close as possible to jQuery, with the same function names when possible, and that warm and fuzzy chainable interface. jQuery being the ultra-popular library that it is, I felt that writing a similar HTML-manipulating library was better to follow its API than to start anew (in the same spirit as Go's `fmt` package), even though some of its methods are less than intuitive (looking at you, [index()][index]...).
9
10## Table of Contents
11
12* [Installation](#installation)
13* [Changelog](#changelog)
14* [API](#api)
15* [Examples](#examples)
16* [Related Projects](#related-projects)
17* [Support](#support)
18* [License](#license)
19
20## Installation
21
22Please note that because of the net/html dependency, goquery requires Go1.1+.
23
24    $ go get github.com/PuerkitoBio/goquery
25
26(optional) To run unit tests:
27
28    $ cd $GOPATH/src/github.com/PuerkitoBio/goquery
29    $ go test
30
31(optional) To run benchmarks (warning: it runs for a few minutes):
32
33    $ cd $GOPATH/src/github.com/PuerkitoBio/goquery
34    $ go test -bench=".*"
35
36## Changelog
37
38**Note that goquery's API is now stable, and will not break.**
39
40*    **2020-02-04 (v1.5.1)** : Update module dependencies.
41*    **2018-11-15 (v1.5.0)** : Go module support (thanks @Zaba505).
42*    **2018-06-07 (v1.4.1)** : Add `NewDocumentFromReader` examples.
43*    **2018-03-24 (v1.4.0)** : Deprecate `NewDocument(url)` and `NewDocumentFromResponse(response)`.
44*    **2018-01-28 (v1.3.0)** : Add `ToEnd` constant to `Slice` until the end of the selection (thanks to @davidjwilkins for raising the issue).
45*    **2018-01-11 (v1.2.0)** : Add `AddBack*` and deprecate `AndSelf` (thanks to @davidjwilkins).
46*    **2017-02-12 (v1.1.0)** : Add `SetHtml` and `SetText` (thanks to @glebtv).
47*    **2016-12-29 (v1.0.2)** : Optimize allocations for `Selection.Text` (thanks to @radovskyb).
48*    **2016-08-28 (v1.0.1)** : Optimize performance for large documents.
49*    **2016-07-27 (v1.0.0)** : Tag version 1.0.0.
50*    **2016-06-15** : Invalid selector strings internally compile to a `Matcher` implementation that never matches any node (instead of a panic). So for example, `doc.Find("~")` returns an empty `*Selection` object.
51*    **2016-02-02** : Add `NodeName` utility function similar to the DOM's `nodeName` property. It returns the tag name of the first element in a selection, and other relevant values of non-element nodes (see godoc for details). Add `OuterHtml` utility function similar to the DOM's `outerHTML` property (named `OuterHtml` in small caps for consistency with the existing `Html` method on the `Selection`).
52*    **2015-04-20** : Add `AttrOr` helper method to return the attribute's value or a default value if absent. Thanks to [piotrkowalczuk][piotr].
53*    **2015-02-04** : Add more manipulation functions - Prepend* - thanks again to [Andrew Stone][thatguystone].
54*    **2014-11-28** : Add more manipulation functions - ReplaceWith*, Wrap* and Unwrap - thanks again to [Andrew Stone][thatguystone].
55*    **2014-11-07** : Add manipulation functions (thanks to [Andrew Stone][thatguystone]) and `*Matcher` functions, that receive compiled cascadia selectors instead of selector strings, thus avoiding potential panics thrown by goquery via `cascadia.MustCompile` calls. This results in better performance (selectors can be compiled once and reused) and more idiomatic error handling (you can handle cascadia's compilation errors, instead of recovering from panics, which had been bugging me for a long time). Note that the actual type expected is a `Matcher` interface, that `cascadia.Selector` implements. Other matcher implementations could be used.
56*    **2014-11-06** : Change import paths of net/html to golang.org/x/net/html (see https://groups.google.com/forum/#!topic/golang-nuts/eD8dh3T9yyA). Make sure to update your code to use the new import path too when you call goquery with `html.Node`s.
57*    **v0.3.2** : Add `NewDocumentFromReader()` (thanks jweir) which allows creating a goquery document from an io.Reader.
58*    **v0.3.1** : Add `NewDocumentFromResponse()` (thanks assassingj) which allows creating a goquery document from an http response.
59*    **v0.3.0** : Add `EachWithBreak()` which allows to break out of an `Each()` loop by returning false. This function was added instead of changing the existing `Each()` to avoid breaking compatibility.
60*    **v0.2.1** : Make go-getable, now that [go.net/html is Go1.0-compatible][gonet] (thanks to @matrixik for pointing this out).
61*    **v0.2.0** : Add support for negative indices in Slice(). **BREAKING CHANGE** `Document.Root` is removed, `Document` is now a `Selection` itself (a selection of one, the root element, just like `Document.Root` was before). Add jQuery's Closest() method.
62*    **v0.1.1** : Add benchmarks to use as baseline for refactorings, refactor Next...() and Prev...() methods to use the new html package's linked list features (Next/PrevSibling, FirstChild). Good performance boost (40+% in some cases).
63*    **v0.1.0** : Initial release.
64
65## API
66
67goquery exposes two structs, `Document` and `Selection`, and the `Matcher` interface. Unlike jQuery, which is loaded as part of a DOM document, and thus acts on its containing document, goquery doesn't know which HTML document to act upon. So it needs to be told, and that's what the `Document` type is for. It holds the root document node as the initial Selection value to manipulate.
68
69jQuery often has many variants for the same function (no argument, a selector string argument, a jQuery object argument, a DOM element argument, ...). Instead of exposing the same features in goquery as a single method with variadic empty interface arguments, statically-typed signatures are used following this naming convention:
70
71*    When the jQuery equivalent can be called with no argument, it has the same name as jQuery for the no argument signature (e.g.: `Prev()`), and the version with a selector string argument is called `XxxFiltered()` (e.g.: `PrevFiltered()`)
72*    When the jQuery equivalent **requires** one argument, the same name as jQuery is used for the selector string version (e.g.: `Is()`)
73*    The signatures accepting a jQuery object as argument are defined in goquery as `XxxSelection()` and take a `*Selection` object as argument (e.g.: `FilterSelection()`)
74*    The signatures accepting a DOM element as argument in jQuery are defined in goquery as `XxxNodes()` and take a variadic argument of type `*html.Node` (e.g.: `FilterNodes()`)
75*    The signatures accepting a function as argument in jQuery are defined in goquery as `XxxFunction()` and take a function as argument (e.g.: `FilterFunction()`)
76*    The goquery methods that can be called with a selector string have a corresponding version that take a `Matcher` interface and are defined as `XxxMatcher()` (e.g.: `IsMatcher()`)
77
78Utility functions that are not in jQuery but are useful in Go are implemented as functions (that take a `*Selection` as parameter), to avoid a potential naming clash on the `*Selection`'s methods (reserved for jQuery-equivalent behaviour).
79
80The complete [godoc reference documentation can be found here][doc].
81
82Please note that Cascadia's selectors do not necessarily match all supported selectors of jQuery (Sizzle). See the [cascadia project][cascadia] for details. Invalid selector strings compile to a `Matcher` that fails to match any node. Behaviour of the various functions that take a selector string as argument follows from that fact, e.g. (where `~` is an invalid selector string):
83
84* `Find("~")` returns an empty selection because the selector string doesn't match anything.
85* `Add("~")` returns a new selection that holds the same nodes as the original selection, because it didn't add any node (selector string didn't match anything).
86* `ParentsFiltered("~")` returns an empty selection because the selector string doesn't match anything.
87* `ParentsUntil("~")` returns all parents of the selection because the selector string didn't match any element to stop before the top element.
88
89## Examples
90
91See some tips and tricks in the [wiki][].
92
93Adapted from example_test.go:
94
95```Go
96package main
97
98import (
99  "fmt"
100  "log"
101  "net/http"
102
103  "github.com/PuerkitoBio/goquery"
104)
105
106func ExampleScrape() {
107  // Request the HTML page.
108  res, err := http.Get("http://metalsucks.net")
109  if err != nil {
110    log.Fatal(err)
111  }
112  defer res.Body.Close()
113  if res.StatusCode != 200 {
114    log.Fatalf("status code error: %d %s", res.StatusCode, res.Status)
115  }
116
117  // Load the HTML document
118  doc, err := goquery.NewDocumentFromReader(res.Body)
119  if err != nil {
120    log.Fatal(err)
121  }
122
123  // Find the review items
124  doc.Find(".sidebar-reviews article .content-block").Each(func(i int, s *goquery.Selection) {
125    // For each item found, get the band and title
126    band := s.Find("a").Text()
127    title := s.Find("i").Text()
128    fmt.Printf("Review %d: %s - %s\n", i, band, title)
129  })
130}
131
132func main() {
133  ExampleScrape()
134}
135```
136
137## Related Projects
138
139- [Goq][goq], an HTML deserialization and scraping library based on goquery and struct tags.
140- [andybalholm/cascadia][cascadia], the CSS selector library used by goquery.
141- [suntong/cascadia][cascadiacli], a command-line interface to the cascadia CSS selector library, useful to test selectors.
142- [gocolly/colly](https://github.com/gocolly/colly), a lightning fast and elegant Scraping Framework
143- [gnulnx/goperf](https://github.com/gnulnx/goperf), a website performance test tool that also fetches static assets.
144- [MontFerret/ferret](https://github.com/MontFerret/ferret), declarative web scraping.
145- [tacusci/berrycms](https://github.com/tacusci/berrycms), a modern simple to use CMS with easy to write plugins
146- [Dataflow kit](https://github.com/slotix/dataflowkit), Web Scraping framework for Gophers.
147- [Geziyor](https://github.com/geziyor/geziyor), a fast web crawling & scraping framework for Go. Supports JS rendering.
148
149## Support
150
151There are a number of ways you can support the project:
152
153* Use it, star it, build something with it, spread the word!
154  - If you do build something open-source or otherwise publicly-visible, let me know so I can add it to the [Related Projects](#related-projects) section!
155* Raise issues to improve the project (note: doc typos and clarifications are issues too!)
156  - Please search existing issues before opening a new one - it may have already been adressed.
157* Pull requests: please discuss new code in an issue first, unless the fix is really trivial.
158  - Make sure new code is tested.
159  - Be mindful of existing code - PRs that break existing code have a high probability of being declined, unless it fixes a serious issue.
160
161If you desperately want to send money my way, I have a BuyMeACoffee.com page:
162
163<a href="https://www.buymeacoffee.com/mna" target="_blank"><img src="https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png" alt="Buy Me A Coffee" style="height: 41px !important;width: 174px !important;box-shadow: 0px 3px 2px 0px rgba(190, 190, 190, 0.5) !important;-webkit-box-shadow: 0px 3px 2px 0px rgba(190, 190, 190, 0.5) !important;" ></a>
164
165## License
166
167The [BSD 3-Clause license][bsd], the same as the [Go language][golic]. Cascadia's license is [here][caslic].
168
169[jquery]: http://jquery.com/
170[go]: http://golang.org/
171[cascadia]: https://github.com/andybalholm/cascadia
172[cascadiacli]: https://github.com/suntong/cascadia
173[bsd]: http://opensource.org/licenses/BSD-3-Clause
174[golic]: http://golang.org/LICENSE
175[caslic]: https://github.com/andybalholm/cascadia/blob/master/LICENSE
176[doc]: http://godoc.org/github.com/PuerkitoBio/goquery
177[index]: http://api.jquery.com/index/
178[gonet]: https://github.com/golang/net/
179[html]: http://godoc.org/golang.org/x/net/html
180[wiki]: https://github.com/PuerkitoBio/goquery/wiki/Tips-and-tricks
181[thatguystone]: https://github.com/thatguystone
182[piotr]: https://github.com/piotrkowalczuk
183[goq]: https://github.com/andrewstuart/goq
184