• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

dummy.git/H10-Nov-2021-42

examples/H10-Nov-2021-10672

src/H10-Nov-2021-17,57610,958

test/H10-Nov-2021-8,5176,252

.editorconfigH A D10-Nov-20217.7 KiB170150

.gitattributesH A D10-Nov-20211.6 KiB3634

.gitignoreH A D10-Nov-20214.3 KiB268217

Apache.Arrow.slnH A D10-Nov-20214.7 KiB6866

Directory.Build.propsH A D10-Nov-20212.3 KiB6033

Directory.Build.targetsH A D10-Nov-20211.3 KiB309

README.mdH A D10-Nov-20216 KiB185117

README.md

1<!---
2  Licensed to the Apache Software Foundation (ASF) under one
3  or more contributor license agreements.  See the NOTICE file
4  distributed with this work for additional information
5  regarding copyright ownership.  The ASF licenses this file
6  to you under the Apache License, Version 2.0 (the
7  "License"); you may not use this file except in compliance
8  with the License.  You may obtain a copy of the License at
9
10    http://www.apache.org/licenses/LICENSE-2.0
11
12  Unless required by applicable law or agreed to in writing,
13  software distributed under the License is distributed on an
14  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15  KIND, either express or implied.  See the License for the
16  specific language governing permissions and limitations
17  under the License.
18-->
19
20# Apache Arrow
21
22An implementation of Arrow targeting .NET Standard.
23
24This implementation is under development and may not be suitable for use in production environments.
25
26# Implementation
27
28- Arrow 0.11 (specification)
29- C# 7.2
30- .NET Standard 1.3
31- Asynchronous I/O
32- Uses modern .NET runtime features such as **Span&lt;T&gt;**, **Memory&lt;T&gt;**, **MemoryManager&lt;T&gt;**, and **System.Buffers** primitives for memory allocation, memory storage, and fast serialization.
33- Uses **Acyclic Visitor Pattern** for array types and arrays to facilitate serialization, record batch traversal, and format growth.
34
35# Known Issues
36
37- Can not read Arrow files containing dictionary batches, tensors, or tables.
38- Can not easily modify allocation strategy without implementing a custom memory pool. All allocations are currently 64-byte aligned and padded to 8-bytes.
39- Default memory allocation strategy uses an over-allocation strategy with pointer fixing, which results in significant memory overhead for small buffers. A buffer that requires a single byte for storage may be backed by an allocation of up to 64-bytes to satisfy alignment requirements.
40- There are currently few builder APIs available for specific array types. Arrays must be built manually with an arrow buffer builder abstraction.
41- FlatBuffer code generation is not included in the build process.
42- Serialization implementation does not perform exhaustive validation checks during deserialization in every scenario.
43- Throws exceptions with vague, inconsistent, or non-localized messages in many situations
44- Throws exceptions that are non-specific to the Arrow implementation in some circumstances where it probably should (eg. does not throw ArrowException exceptions)
45- Lack of code documentation
46- Lack of usage examples
47- Lack of comprehensive unit tests
48- Lack of comprehensive benchmarks
49
50# Usage
51
52	using System.Diagnostics;
53	using System.IO;
54	using System.Threading.Tasks;
55	using Apache.Arrow;
56	using Apache.Arrow.Ipc;
57
58    public static async Task<RecordBatch> ReadArrowAsync(string filename)
59    {
60        using (var stream = File.OpenRead("test.arrow"))
61        using (var reader = new ArrowFileReader(stream))
62        {
63            var recordBatch = await reader.ReadNextRecordBatchAsync();
64            Debug.WriteLine("Read record batch with {0} column(s)", recordBatch.ColumnCount);
65            return recordBatch;
66        }
67    }
68
69
70# Status
71
72## Memory Management
73
74- Allocations are 64-byte aligned and padded to 8-bytes.
75- Allocations are automatically garbage collected
76
77## Arrays
78
79### Primitive Types
80
81- Int8, Int16, Int32, Int64
82- UInt8, UInt16, UInt32, UInt64
83- Float, Double
84- Binary (variable-length)
85- String (utf-8)
86- Null
87
88### Parametric Types
89
90- Timestamp
91- Date32
92- Date64
93- Decimal
94- Time32
95- Time64
96- Binary (fixed-length)
97- List
98- Struct
99
100### Type Metadata
101
102- Data Types
103- Fields
104- Schema
105
106### Serialization
107
108- File
109- Stream
110
111## Not Implemented
112
113- Serialization
114    - Exhaustive validation
115    - Dictionary Batch
116        - Can not serialize or deserialize files or streams containing dictionary batches
117    - Dictionary Encoding
118	- Schema Metadata
119	- Schema Field Metadata
120- Types
121    - Tensor
122    - Table
123- Arrays
124    - Union
125        - Dense
126        - Sparse
127    - Half-Float
128    - Dictionary
129- Array Operations
130	- Equality / Comparison
131	- Casting
132	- Builders
133- Compute
134    - There is currently no API available for a compute / kernel abstraction.
135
136# Build
137
138Install the latest `.NET Core SDK` from https://dotnet.microsoft.com/download.
139
140    dotnet build
141
142## NuGet Build
143
144To build the NuGet package run the following command to build a debug flavor, preview package into the **artifacts** folder.
145
146    dotnet pack
147
148When building the officially released version run: (see Note below about current `git` repository)
149
150    dotnet pack -c Release
151
152Which will build the final/stable package.
153
154NOTE: When building the officially released version, ensure that your `git` repository has the `origin` remote set to `https://github.com/apache/arrow.git`, which will ensure Source Link is set correctly. See https://github.com/dotnet/sourcelink/blob/master/docs/README.md for more information.
155
156There are two output artifacts:
1571. `Apache.Arrow.<version>.nupkg` - this contains the executable assemblies
1582. `Apache.Arrow.<version>.snupkg` - this contains the debug symbols files
159
160Both of these artifacts can then be uploaded to https://www.nuget.org/packages/manage/upload.
161
162## Docker Build
163
164Build from the Apache Arrow project root.
165
166    docker build -f csharp/build/docker/Dockerfile .
167
168## Testing
169
170	dotnet test
171
172All build artifacts are placed in the **artifacts** folder in the project root.
173
174# Coding Style
175
176This project follows the coding style specified in [Coding Style](https://github.com/dotnet/runtime/blob/master/docs/coding-guidelines/coding-style.md).
177
178# Updating FlatBuffers code
179
180See https://google.github.io/flatbuffers/flatbuffers_guide_use_java_c-sharp.html for how to get the `flatc` executable.
181
182Run `flatc --csharp` on each `.fbs` file in the [format](../format) folder. And replace the checked in `.cs` files under [FlatBuf](src/Apache.Arrow/Flatbuf) with the generated files.
183
184Update the non-generated [FlatBuffers](src/Apache.Arrow/Flatbuf/FlatBuffers) `.cs` files with the files from the [google/flatbuffers repo](https://github.com/google/flatbuffers/tree/master/net/FlatBuffers).
185