1.. Licensed to the Apache Software Foundation (ASF) under one 2.. or more contributor license agreements. See the NOTICE file 3.. distributed with this work for additional information 4.. regarding copyright ownership. The ASF licenses this file 5.. to you under the Apache License, Version 2.0 (the 6.. "License"); you may not use this file except in compliance 7.. with the License. You may obtain a copy of the License at 8 9.. http://www.apache.org/licenses/LICENSE-2.0 10 11.. Unless required by applicable law or agreed to in writing, 12.. software distributed under the License is distributed on an 13.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 14.. KIND, either express or implied. See the License for the 15.. specific language governing permissions and limitations 16.. under the License. 17 18Filesystem Interface (legacy) 19============================= 20 21.. warning:: 22 This section documents the deprecated filesystem layer. You should 23 use the :ref:`new filesystem layer <filesystem>` instead. 24 25.. _hdfs: 26 27Hadoop File System (HDFS) 28------------------------- 29 30PyArrow comes with bindings to a C++-based interface to the Hadoop File 31System. You connect like so: 32 33.. code-block:: python 34 35 import pyarrow as pa 36 fs = pa.hdfs.connect(host, port, user=user, kerb_ticket=ticket_cache_path) 37 with fs.open(path, 'rb') as f: 38 # Do something with f 39 40By default, ``pyarrow.hdfs.HadoopFileSystem`` uses libhdfs, a JNI-based 41interface to the Java Hadoop client. This library is loaded **at runtime** 42(rather than at link / library load time, since the library may not be in your 43LD_LIBRARY_PATH), and relies on some environment variables. 44 45* ``HADOOP_HOME``: the root of your installed Hadoop distribution. Often has 46 `lib/native/libhdfs.so`. 47 48* ``JAVA_HOME``: the location of your Java SDK installation. 49 50* ``ARROW_LIBHDFS_DIR`` (optional): explicit location of ``libhdfs.so`` if it is 51 installed somewhere other than ``$HADOOP_HOME/lib/native``. 52 53* ``CLASSPATH``: must contain the Hadoop jars. You can set these using: 54 55.. code-block:: shell 56 57 export CLASSPATH=`$HADOOP_HOME/bin/hdfs classpath --glob` 58 59If ``CLASSPATH`` is not set, then it will be set automatically if the 60``hadoop`` executable is in your system path, or if ``HADOOP_HOME`` is set. 61 62You can also use libhdfs3, a thirdparty C++ library for HDFS from Pivotal Labs: 63 64.. code-block:: python 65 66 fs = pa.hdfs.connect(host, port, user=user, kerb_ticket=ticket_cache_path, 67 driver='libhdfs3') 68 69HDFS API 70~~~~~~~~ 71 72.. currentmodule:: pyarrow 73 74.. autosummary:: 75 :toctree: generated/ 76 77 hdfs.connect 78 HadoopFileSystem.cat 79 HadoopFileSystem.chmod 80 HadoopFileSystem.chown 81 HadoopFileSystem.delete 82 HadoopFileSystem.df 83 HadoopFileSystem.disk_usage 84 HadoopFileSystem.download 85 HadoopFileSystem.exists 86 HadoopFileSystem.get_capacity 87 HadoopFileSystem.get_space_used 88 HadoopFileSystem.info 89 HadoopFileSystem.ls 90 HadoopFileSystem.mkdir 91 HadoopFileSystem.open 92 HadoopFileSystem.rename 93 HadoopFileSystem.rm 94 HadoopFileSystem.upload 95 HdfsFile 96