README.md
1HDFS for Go
2===========
3
4[![GoDoc](https://godoc.org/github.com/colinmarc/hdfs/web?status.svg)](https://godoc.org/github.com/colinmarc/hdfs) [![build](https://travis-ci.org/colinmarc/hdfs.svg?branch=master)](https://travis-ci.org/colinmarc/hdfs)
5
6This is a native golang client for hdfs. It connects directly to the namenode using
7the protocol buffers API.
8
9It tries to be idiomatic by aping the stdlib `os` package, where possible, and
10implements the interfaces from it, including `os.FileInfo` and `os.PathError`.
11
12Here's what it looks like in action:
13
14```go
15client, _ := hdfs.New("namenode:8020")
16
17file, _ := client.Open("/mobydick.txt")
18
19buf := make([]byte, 59)
20file.ReadAt(buf, 48847)
21
22fmt.Println(string(buf))
23// => Abominable are the tumblers into which he pours his poison.
24```
25
26For complete documentation, check out the [Godoc][1].
27
28The `hdfs` Binary
29-----------------
30
31Along with the library, this repo contains a commandline client for HDFS. Like
32the library, its primary aim is to be idiomatic, by enabling your favorite unix
33verbs:
34
35
36 $ hdfs --help
37 Usage: hdfs COMMAND
38 The flags available are a subset of the POSIX ones, but should behave similarly.
39
40 Valid commands:
41 ls [-lah] [FILE]...
42 rm [-rf] FILE...
43 mv [-fT] SOURCE... DEST
44 mkdir [-p] FILE...
45 touch [-amc] FILE...
46 chmod [-R] OCTAL-MODE FILE...
47 chown [-R] OWNER[:GROUP] FILE...
48 cat SOURCE...
49 head [-n LINES | -c BYTES] SOURCE...
50 tail [-n LINES | -c BYTES] SOURCE...
51 du [-sh] FILE...
52 checksum FILE...
53 get SOURCE [DEST]
54 getmerge SOURCE DEST
55 put SOURCE DEST
56
57Since it doesn't have to wait for the JVM to start up, it's also a lot faster
58`hadoop -fs`:
59
60 $ time hadoop fs -ls / > /dev/null
61
62 real 0m2.218s
63 user 0m2.500s
64 sys 0m0.376s
65
66 $ time hdfs ls / > /dev/null
67
68 real 0m0.015s
69 user 0m0.004s
70 sys 0m0.004s
71
72Best of all, it comes with bash tab completion for paths!
73
74Installing the commandline client
75---------------------------------
76
77Grab a tarball from the [releases page](https://github.com/colinmarc/hdfs/releases)
78and unzip it wherever you like.
79
80To configure the client, make sure one or both of these environment variables
81point to your Hadoop configuration (`core-site.xml` and `hdfs-site.xml`). On
82systems with Hadoop installed, they should already be set.
83
84 $ export HADOOP_HOME="/etc/hadoop"
85 $ export HADOOP_CONF_DIR="/etc/hadoop/conf"
86
87To install tab completion globally on linux, copy or link the `bash_completion`
88file which comes with the tarball into the right place:
89
90 $ ln -sT bash_completion /etc/bash_completion.d/gohdfs
91
92By default on non-kerberized clusters, the HDFS user is set to the
93currently-logged-in user. You can override this with another environment
94variable:
95
96 $ export HADOOP_USER_NAME=username
97
98Using the commandline client with Kerberos authentication
99---------------------------------------------------------
100
101Like `hadoop fs`, the commandline client expects a `ccache` file in the default
102location: `/tmp/krb5cc_<uid>`. That means it should 'just work' to use `kinit`:
103
104 $ kinit bob@EXAMPLE.com
105 $ hdfs ls /
106
107If that doesn't work, try setting the `KRB5CCNAME` environment variable to
108wherever you have the `ccache` saved.
109
110Compatibility
111-------------
112
113This library uses "Version 9" of the HDFS protocol, which means it should work
114with hadoop distributions based on 2.2.x and above. The tests run against CDH
1155.x and HDP 2.x.
116
117Acknowledgements
118----------------
119
120This library is heavily indebted to [snakebite][3].
121
122[1]: https://godoc.org/github.com/colinmarc/hdfs
123[2]: https://golang.org/doc/install
124[3]: https://github.com/spotify/snakebite
125