1.. Licensed to the Apache Software Foundation (ASF) under one
2.. or more contributor license agreements.  See the NOTICE file
3.. distributed with this work for additional information
4.. regarding copyright ownership.  The ASF licenses this file
5.. to you under the Apache License, Version 2.0 (the
6.. "License"); you may not use this file except in compliance
7.. with the License.  You may obtain a copy of the License at
8
9..   http://www.apache.org/licenses/LICENSE-2.0
10
11.. Unless required by applicable law or agreed to in writing,
12.. software distributed under the License is distributed on an
13.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14.. KIND, either express or implied.  See the License for the
15.. specific language governing permissions and limitations
16.. under the License.
17
18Filesystem Interface (legacy)
19=============================
20
21.. warning::
22   This section documents the deprecated filesystem layer.  You should
23   use the :ref:`new filesystem layer <filesystem>` instead.
24
25.. _hdfs:
26
27Hadoop File System (HDFS)
28-------------------------
29
30PyArrow comes with bindings to a C++-based interface to the Hadoop File
31System. You connect like so:
32
33.. code-block:: python
34
35   import pyarrow as pa
36   fs = pa.hdfs.connect(host, port, user=user, kerb_ticket=ticket_cache_path)
37   with fs.open(path, 'rb') as f:
38       # Do something with f
39
40By default, ``pyarrow.hdfs.HadoopFileSystem`` uses libhdfs, a JNI-based
41interface to the Java Hadoop client. This library is loaded **at runtime**
42(rather than at link / library load time, since the library may not be in your
43LD_LIBRARY_PATH), and relies on some environment variables.
44
45* ``HADOOP_HOME``: the root of your installed Hadoop distribution. Often has
46  `lib/native/libhdfs.so`.
47
48* ``JAVA_HOME``: the location of your Java SDK installation.
49
50* ``ARROW_LIBHDFS_DIR`` (optional): explicit location of ``libhdfs.so`` if it is
51  installed somewhere other than ``$HADOOP_HOME/lib/native``.
52
53* ``CLASSPATH``: must contain the Hadoop jars. You can set these using:
54
55.. code-block:: shell
56
57    export CLASSPATH=`$HADOOP_HOME/bin/hdfs classpath --glob`
58
59If ``CLASSPATH`` is not set, then it will be set automatically if the
60``hadoop`` executable is in your system path, or if ``HADOOP_HOME`` is set.
61
62You can also use libhdfs3, a thirdparty C++ library for HDFS from Pivotal Labs:
63
64.. code-block:: python
65
66   fs = pa.hdfs.connect(host, port, user=user, kerb_ticket=ticket_cache_path,
67                       driver='libhdfs3')
68
69HDFS API
70~~~~~~~~
71
72.. currentmodule:: pyarrow
73
74.. autosummary::
75   :toctree: generated/
76
77   hdfs.connect
78   HadoopFileSystem.cat
79   HadoopFileSystem.chmod
80   HadoopFileSystem.chown
81   HadoopFileSystem.delete
82   HadoopFileSystem.df
83   HadoopFileSystem.disk_usage
84   HadoopFileSystem.download
85   HadoopFileSystem.exists
86   HadoopFileSystem.get_capacity
87   HadoopFileSystem.get_space_used
88   HadoopFileSystem.info
89   HadoopFileSystem.ls
90   HadoopFileSystem.mkdir
91   HadoopFileSystem.open
92   HadoopFileSystem.rename
93   HadoopFileSystem.rm
94   HadoopFileSystem.upload
95   HdfsFile
96