• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

soup-1.1.1/H28-Nov-2018-

.gitignoreH A D25-Aug-20207

.travis.ymlH A D25-Aug-202071

CHANGELOG.mdH A D25-Aug-2020778

README.mdH A D25-Aug-20203.9 KiB

licenseH A D25-Aug-20201 KiB

soup.goH A D25-Aug-202010.9 KiB

README.md

1# soup
2[![Build Status](https://travis-ci.org/anaskhan96/soup.svg?branch=master)](https://travis-ci.org/anaskhan96/soup)
3[![GoDoc](https://godoc.org/github.com/anaskhan96/soup?status.svg)](https://godoc.org/github.com/anaskhan96/soup)
4[![Go Report Card](https://goreportcard.com/badge/github.com/anaskhan96/soup)](https://goreportcard.com/report/github.com/anaskhan96/soup)
5
6**Web Scraper in Go, similar to BeautifulSoup**
7
8*soup* is a small web scraper package for Go, with its interface highly similar to that of BeautifulSoup.
9
10Exported variables and functions implemented till now :
11```go
12var Headers map[string]string // Set headers as a map of key-value pairs, an alternative to calling Header() individually
13var Cookies map[string]string // Set cookies as a map of key-value  pairs, an alternative to calling Cookie() individually
14func Get(string) (string,error){} // Takes the url as an argument, returns HTML string
15func GetWithClient(string, *http.Client){} // Takes the url and a custom HTTP client as arguments, returns HTML string
16func Header(string, string){} // Takes key,value pair to set as headers for the HTTP request made in Get()
17func Cookie(string, string){} // Takes key, value pair to set as cookies to be sent with the HTTP request in Get()
18func HTMLParse(string) Root {} // Takes the HTML string as an argument, returns a pointer to the DOM constructed
19func Find([]string) Root {} // Element tag,(attribute key-value pair) as argument, pointer to first occurence returned
20func FindAll([]string) []Root {} // Same as Find(), but pointers to all occurrences returned
21func FindStrict([]string) Root {} //  Element tag,(attribute key-value pair) as argument, pointer to first occurence returned with exact matching values
22func FindAllStrict([]string) []Root {} // Same as FindStrict(), but pointers to all occurrences returned
23func FindNextSibling() Root {} // Pointer to the next sibling of the Element in the DOM returned
24func FindNextElementSibling() Root {} // Pointer to the next element sibling of the Element in the DOM returned
25func FindPrevSibling() Root {} // Pointer to the previous sibling of the Element in the DOM returned
26func FindPrevElementSibling() Root {} // Pointer to the previous element sibling of the Element in the DOM returned
27func Children() []Root {} // Find all direct children of this DOM element
28func Attrs() map[string]string {} // Map returned with all the attributes of the Element as lookup to their respective values
29func Text() string {} // Full text inside a non-nested tag returned, first half returned in a non-nested one
30func FullText() string {} // Full text inside a nested/non-nested tag returned
31func SetDebug(bool) {} // Sets the debug mode to true or false; false by default
32```
33
34`Root` is a struct, containing three fields :
35* `Pointer` containing the pointer to the current html node
36* `NodeValue` containing the current html node's value, i.e. the tag name for an ElementNode, or the text in case of a TextNode
37* `Error` containing an error if one occurrs, else `nil` is returned.
38
39## Installation
40Install the package using the command
41```bash
42go get github.com/anaskhan96/soup
43```
44
45## Example
46An example code is given below to scrape the "Comics I Enjoy" part (text and its links) from [xkcd](https://xkcd.com).
47
48[More Examples](https://github.com/anaskhan96/soup/tree/master/examples)
49```go
50package main
51
52import (
53	"fmt"
54	"github.com/anaskhan96/soup"
55	"os"
56)
57
58func main() {
59	resp, err := soup.Get("https://xkcd.com")
60	if err != nil {
61		os.Exit(1)
62	}
63	doc := soup.HTMLParse(resp)
64	links := doc.Find("div", "id", "comicLinks").FindAll("a")
65	for _, link := range links {
66		fmt.Println(link.Text(), "| Link :", link.Attrs()["href"])
67	}
68}
69```
70
71## Contributions
72This package was developed in my free time. However, contributions from everybody in the community are welcome, to make it a better web scraper. If you think there should be a particular feature or function included in the package, feel free to open up a new issue or pull request.
73