• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

R/H23-Feb-2021-9,8195,235

data-raw/H23-Feb-2021-248169

extra-tests/H23-Feb-2021-278176

inst/H03-May-2022-199130

man/H23-Feb-2021-3,4853,023

src/H23-Feb-2021-12,0319,034

tests/H23-Feb-2021-8,4636,220

tools/H23-Feb-2021-646487

vignettes/H23-Feb-2021-1,2761,015

.RbuildignoreH A D23-Feb-2021331 2726

.gitignoreH A D23-Feb-2021196 2120

DESCRIPTIONH A D23-Feb-20212.5 KiB10099

MakefileH A D23-Feb-20212.1 KiB5328

NAMESPACEH A D23-Feb-20217.7 KiB316314

NEWS.mdH A D23-Feb-202121.4 KiB312210

README.mdH A D23-Feb-20219.7 KiB248192

_pkgdown.ymlH A D23-Feb-20213.6 KiB162159

arrow.RprojH A D23-Feb-2021386 2216

cleanupH A D23-Feb-2021817 221

configureH A D23-Feb-20218.7 KiB227136

configure.winH A D23-Feb-20213.1 KiB7432

cran-comments.mdH A D23-Feb-2021336 118

lint.shH A D23-Feb-20211.5 KiB4213

README.md

1# arrow
2
3[![cran](https://www.r-pkg.org/badges/version-last-release/arrow)](https://cran.r-project.org/package=arrow)
4[![CI](https://github.com/apache/arrow/workflows/R/badge.svg?event=push)](https://github.com/apache/arrow/actions?query=workflow%3AR+branch%3Amaster+event%3Apush)
5[![conda-forge](https://img.shields.io/conda/vn/conda-forge/r-arrow.svg)](https://anaconda.org/conda-forge/r-arrow)
6
7[Apache Arrow](https://arrow.apache.org/) is a cross-language
8development platform for in-memory data. It specifies a standardized
9language-independent columnar memory format for flat and hierarchical
10data, organized for efficient analytic operations on modern hardware. It
11also provides computational libraries and zero-copy streaming messaging
12and interprocess communication.
13
14The `arrow` package exposes an interface to the Arrow C++ library to
15access many of its features in R. This includes support for analyzing
16large, multi-file datasets (`open_dataset()`), working with individual
17Parquet (`read_parquet()`, `write_parquet()`) and Feather
18(`read_feather()`, `write_feather()`) files, as well as lower-level
19access to Arrow memory and messages.
20
21## Installation
22
23Install the latest release of `arrow` from CRAN with
24
25```r
26install.packages("arrow")
27```
28
29Conda users can install `arrow` from conda-forge with
30
31```
32conda install -c conda-forge --strict-channel-priority r-arrow
33```
34
35Installing a released version of the `arrow` package requires no
36additional system dependencies. For macOS and Windows, CRAN hosts binary
37packages that contain the Arrow C++ library. On Linux, source package
38installation will also build necessary C++ dependencies. For a faster,
39more complete installation, set the environment variable `NOT_CRAN=true`.
40See `vignette("install", package = "arrow")` for details.
41
42## Installing a development version
43
44Development versions of the package (binary and source) are built daily and hosted at
45<https://arrow-r-nightly.s3.amazonaws.com>. To install from there:
46
47``` r
48install.packages("arrow", repos = "https://arrow-r-nightly.s3.amazonaws.com")
49```
50
51Or
52
53```r
54arrow::install_arrow(nightly = TRUE)
55```
56
57Conda users can install `arrow` nightlies from our nightlies channel using:
58
59```
60conda install -c arrow-nightlies -c conda-forge --strict-channel-priority r-arrow
61```
62
63These daily package builds are not official Apache releases and are not
64recommended for production use. They may be useful for testing bug fixes
65and new features under active development.
66
67## Developing
68
69Windows and macOS users who wish to contribute to the R package and
70don’t need to alter the Arrow C++ library may be able to obtain a
71recent version of the library without building from source. On macOS,
72you may install the C++ library using [Homebrew](https://brew.sh/):
73
74``` shell
75# For the released version:
76brew install apache-arrow
77# Or for a development version, you can try:
78brew install apache-arrow --HEAD
79```
80
81On Windows, you can download a .zip file with the arrow dependencies from the
82[nightly repository](https://dl.bintray.com/ursalabs/arrow-r/libarrow/bin/windows/),
83and then set the `RWINLIB_LOCAL` environment variable to point to that
84zip file before installing the `arrow` R package. Version numbers in that
85repository correspond to dates, and you will likely want the most recent.
86
87If you need to alter both the Arrow C++ library and the R package code,
88or if you can’t get a binary version of the latest C++ library
89elsewhere, you’ll need to build it from source too.
90
91First, install the C++ library. See the [developer
92guide](https://arrow.apache.org/docs/developers/cpp/building.html) for details.
93It's recommended to make a `build` directory inside of the `cpp` directory of
94the Arrow git repository (it is git-ignored). Assuming you are inside `cpp/build`,
95you'll first call `cmake` to configure the build and then `make install`.
96For the R package, you'll need to enable several features in the C++ library
97using `-D` flags:
98
99```
100cmake \
101  -DARROW_COMPUTE=ON \
102  -DARROW_CSV=ON \
103  -DARROW_DATASET=ON \
104  -DARROW_FILESYSTEM=ON \
105  -DARROW_JEMALLOC=ON \
106  -DARROW_JSON=ON \
107  -DARROW_PARQUET=ON \
108  -DCMAKE_BUILD_TYPE=release \
109  -DARROW_INSTALL_NAME_RPATH=OFF \
110  ..
111```
112
113where `..` is the path to the `cpp/` directory when you're in `cpp/build`.
114
115To enable optional features including S3 support, an alternative memory allocator, and additional compression libraries, add some or all of these flags:
116
117```
118  -DARROW_S3=ON \
119  -DARROW_MIMALLOC=ON \
120  -DARROW_WITH_BROTLI=ON \
121  -DARROW_WITH_BZ2=ON \
122  -DARROW_WITH_LZ4=ON \
123  -DARROW_WITH_SNAPPY=ON \
124  -DARROW_WITH_ZLIB=ON \
125  -DARROW_WITH_ZSTD=ON \
126```
127
128Other flags that may be useful:
129
130* `-DARROW_EXTRA_ERROR_CONTEXT=ON` makes errors coming from the C++ library point to files and line numbers
131* `-DBOOST_SOURCE=BUNDLED`, for example, or any other dependency `*_SOURCE`, if you have a system version of a C++ dependency that doesn't work correctly with Arrow. This tells the build to compile its own version of the dependency from source.
132
133Note that after any change to the C++ library, you must reinstall it and
134run `make clean` or `git clean -fdx .` to remove any cached object code
135in the `r/src/` directory before reinstalling the R package. This is
136only necessary if you make changes to the C++ library source; you do not
137need to manually purge object files if you are only editing R or C++
138code inside `r/`.
139
140Once you’ve built the C++ library, you can install the R package and its
141dependencies, along with additional dev dependencies, from the git
142checkout:
143
144``` shell
145cd ../../r
146R -e 'install.packages(c("devtools", "roxygen2", "pkgdown", "covr")); devtools::install_dev_deps()'
147R CMD INSTALL .
148```
149
150If you need to set any compilation flags while building the C++
151extensions, you can use the `ARROW_R_CXXFLAGS` environment variable. For
152example, if you are using `perf` to profile the R extensions, you may
153need to set
154
155``` shell
156export ARROW_R_CXXFLAGS=-fno-omit-frame-pointer
157```
158
159If the package fails to install/load with an error like this:
160
161    ** testing if installed package can be loaded from temporary location
162    Error: package or namespace load failed for 'arrow' in dyn.load(file, DLLpath = DLLpath, ...):
163    unable to load shared object '/Users/you/R/00LOCK-r/00new/arrow/libs/arrow.so':
164    dlopen(/Users/you/R/00LOCK-r/00new/arrow/libs/arrow.so, 6): Library not loaded: @rpath/libarrow.14.dylib
165
166ensure that `-DARROW_INSTALL_NAME_RPATH=OFF` was passed (this is important on
167macOS to prevent problems at link time and is a no-op on other platforms).
168Alternativelly, try setting the environment variable `R_LD_LIBRARY_PATH` to
169wherever Arrow C++ was put in `make install`, e.g. `export
170R_LD_LIBRARY_PATH=/usr/local/lib`, and retry installing the R package.
171
172When installing from source, if the R and C++ library versions do not
173match, installation may fail. If you’ve previously installed the
174libraries and want to upgrade the R package, you’ll need to update the
175Arrow C++ library first.
176
177For any other build/configuration challenges, see the [C++ developer
178guide](https://arrow.apache.org/docs/developers/cpp/building.html) and
179`vignette("install", package = "arrow")`.
180
181### Editing C++ code
182
183The `arrow` package uses some customized tools on top of `cpp11` to
184prepare its C++ code in `src/`. If you change C++ code in the R package,
185you will need to set the `ARROW_R_DEV` environment variable to `TRUE`
186(optionally, add it to your`~/.Renviron` file to persist across
187sessions) so that the `data-raw/codegen.R` file is used for code
188generation.
189
190We use Google C++ style in our C++ code. Check for style errors with
191
192    ./lint.sh
193
194Fix any style issues before committing with
195
196    ./lint.sh --fix
197
198The lint script requires Python 3 and `clang-format-8`. If the command
199isn’t found, you can explicitly provide the path to it like
200`CLANG_FORMAT=$(which clang-format-8) ./lint.sh`. On macOS, you can get
201this by installing LLVM via Homebrew and running the script as
202`CLANG_FORMAT=$(brew --prefix llvm@8)/bin/clang-format ./lint.sh`
203
204### Running tests
205
206Some tests are conditionally enabled based on the availability of certain
207features in the package build (S3 support, compression libraries, etc.).
208Others are generally skipped by default but can be enabled with environment
209variables or other settings:
210
211* All tests are skipped on Linux if the package builds without the C++ libarrow.
212  To make the build fail if libarrow is not available (as in, to test that
213  the C++ build was successful), set `TEST_R_WITH_ARROW=TRUE`
214* Some tests are disabled unless `ARROW_R_DEV=TRUE`
215* Tests that require allocating >2GB of memory to test Large types are disabled
216  unless `ARROW_LARGE_MEMORY_TESTS=TRUE`
217* Integration tests against a real S3 bucket are disabled unless credentials
218  are set in `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`; these are available
219  on request
220* S3 tests using [MinIO](https://min.io/) locally are enabled if the
221  `minio server` process is found running. If you're running MinIO with custom
222  settings, you can set `MINIO_ACCESS_KEY`, `MINIO_SECRET_KEY`, and
223  `MINIO_PORT` to override the defaults.
224
225### Useful functions
226
227Within an R session, these can help with package development:
228
229``` r
230devtools::load_all() # Load the dev package
231devtools::test(filter="^regexp$") # Run the test suite, optionally filtering file names
232devtools::document() # Update roxygen documentation
233pkgdown::build_site() # To preview the documentation website
234devtools::check() # All package checks; see also below
235covr::package_coverage() # See test coverage statistics
236```
237
238Any of those can be run from the command line by wrapping them in `R -e
239'$COMMAND'`. There’s also a `Makefile` to help with some common tasks
240from the command line (`make test`, `make doc`, `make clean`, etc.)
241
242### Full package validation
243
244``` shell
245R CMD build .
246R CMD check arrow_*.tar.gz --as-cran
247```
248