Apache Arrow 0.1.0 (10 October 2016)
Download
- Source Release: [apache-arrow-0.1.0.tar.gz][6]
- Verification: [md5][3], [asc][7]
Changelog
Contributors
$ git shortlog -sn d5aa7c46..apache-arrow-0.1.0
49 Wes McKinney
27 Uwe L. Korn
25 Julien Le Dem
13 Micah Kornfield
11 Steven Phillips
6 Jihoon Son
5 Laurent Goujon
5 adeneche
4 Dan Robinson
4 proflin
2 Jacques Nadeau
1 Christopher C. Aycock
1 Edmon Begoli
1 Kai Zheng
1 MechCoder
1 Minji Kim
1 Philipp Moritz
1 Smyatkin Maxim
1 fengguangyuan
1 hyukjinkwon
1 hzhang2
1 lfzCarlosC
New Features and Improvements
- ARROW-1 - Import Initial Codebase
- ARROW-10 - Fix mismatch of javadoc names and method parameters
- ARROW-100 - [C++] Computing RowBatch size
- ARROW-101 - Fix java warnings emitted by java compiler
- ARROW-102 - travis-ci support for java project
- ARROW-106 - Add IPC round trip for string types (string, char, varchar, binary)
- ARROW-107 - [C++] add ipc round trip for struct types
- ARROW-11 - Mirror JIRA activity to dev@arrow.apache.org
- ARROW-13 - Add PR merge tool similar to that used in Parquet
- ARROW-14 - Add JIRA components
- ARROW-15 - Fix a naming typo for memory.AllocationManager.AllocationOutcome
- ARROW-19 - C++: Externalize memory allocations and add a MemoryPool abstract interface to builder classes
- ARROW-190 - Python: Provide installable sdist builds
- ARROW-197 - [Python] Add conda dev recipe for pyarrow
- ARROW-199 - [C++] Refine third party dependency
- ARROW-2 - Post Simple Website
- ARROW-20 - C++: Add null count member to Array containers, remove nullable member
- ARROW-201 - C++: Initial ParquetWriter implementation
- ARROW-203 - Python: Basic filename based Parquet read/write
- ARROW-204 - [Python] Automate uploading conda build artifacts for libarrow and pyarrow
- ARROW-206 - [C++] Expose an equality API for arrays that compares a range of slots on two arrays
- ARROW-21 - C++: Add in-memory schema metadata container
- ARROW-212 - [C++] Clarify the fact that PrimitiveArray is now abstract class
- ARROW-213 - Exposing static arrow build
- ARROW-214 - C++: Add String support to Parquet I/O
- ARROW-215 - C++: Support other integer types in Parquet I/O
- ARROW-218 - Add option to use GitHub API token via environment variable when merging PRs
- ARROW-22 - C++: Add schema adapter routines for converting flat Parquet schemas to in-memory Arrow schemas
- ARROW-222 - [C++] Create prototype file-like interface to HDFS (via libhdfs) and begin defining more general IO interface for Arrow data adapters
- ARROW-23 - C++: Add logical “Column” container for chunked data
- ARROW-233 - [C++] Add visibility defines for limiting shared library symbol visibility
- ARROW-234 - [C++] Build with libhdfs support in arrow_io in conda builds
- ARROW-236 - [Python] Enable Parquet read/write to work with HDFS file objects
- ARROW-237 - [C++] Create Arrow specializations of Parquet allocator and read interfaces
- ARROW-238 - C++: InternalMemoryPool::Free() should throw an error when there is insufficient allocated memory
- ARROW-24 - C++: Add logical “Table” container
- ARROW-242 - C++/Python: Support Timestamp Data Type
- ARROW-245 - [Format] Clarify Arrow’s relationship with big endian platforms
- ARROW-251 - [C++] Expose APIs for getting code and message of the status
- ARROW-252 - Add implementation guidelines to the documentation
- ARROW-253 - Int types should only have width of 8*2^n (8, 16, 32, 64)
- ARROW-254 - Remove Bit type as it is redundant with boolean
- ARROW-255 - Finalize Dictionary representation
- ARROW-256 - Add versioning to the arrow spec.
- ARROW-257 - Add a typeids Vector to Union type
- ARROW-26 - C++: Add developer instructions for building parquet-cpp integration
- ARROW-260 - TestValueVector.testFixedVectorReallocation and testVariableVectorReallocation are flaky
- ARROW-262 - [Format] Add a new format document for metadata and logical types for messaging and IPC / on-wire/file representations
- ARROW-264 - Create an Arrow File format
- ARROW-267 - [C++] C++ implementation of file-like layout for RPC / IPC
- ARROW-270 - [Format] Define more generic Interval logical type
- ARROW-271 - Update Field structure to be more explicit
- ARROW-272 - Arrow release 0.1
- ARROW-279 - rename vector module to arrow-vector for consistency
- ARROW-28 - C++: Add google/benchmark to the 3rd-party build toolchain
- ARROW-280 - [C++] Consolidate file and shared memory IO interfaces
- ARROW-285 - Allow for custom flatc compiler
- ARROW-286 - Build thirdparty dependencies in parallel
- ARROW-289 - Install test-util.h
- ARROW-290 - Specialize alloc() in ArrowBuf
- ARROW-292 - [Java] Upgrade Netty to 4.041
- ARROW-293 - [C++] Implementations of IO interfaces for operating system files
- ARROW-296 - [C++] Remove arrow_parquet C++ module and related parts of build system
- ARROW-298 - create release scripts
- ARROW-299 - Use absolute namespace in macros
- ARROW-3 - Post Initial Arrow Format Spec
- ARROW-30 - Python: pandas/NumPy to/from Arrow conversion routines
- ARROW-301 - [Format] Add some form of user field metadata to IPC schemas
- ARROW-302 - [Python] Add support to use the Arrow file format with file-like objects
- ARROW-305 - Add compression and use_dictionary options to Parquet interface
- ARROW-306 - Add option to pass cmake arguments via environment variable
- ARROW-31 - Python: basic PyList <-> Arrow marshaling code
- ARROW-315 - Finalize timestamp type
- ARROW-318 - [Python] Revise README to reflect current state of project
- ARROW-319 - Add canonical Arrow Schema json representation
- ARROW-324 - Update arrow metadata diagram
- ARROW-325 - make TestArrowFile not dependent on timezone
- ARROW-35 - Add a short call-to-action / how-to-get-involved to the main README.md
- ARROW-37 - C++: Represent boolean array data in bit-packed form
- ARROW-4 - Initial Arrow CPP Implementation
- ARROW-42 - Python: Add to Travis CI build
- ARROW-43 - Python: Add rudimentary console repr for array types
- ARROW-44 - Python: Implement basic object model for scalar values (i.e. results of arrow_arr[i])
- ARROW-48 - Python: Add Schema object wrapper
- ARROW-49 - Python: Add Column and Table wrapper interface
- ARROW-50 - C++: Enable library builds for 3rd-party users without having to build thirdparty googletest
- ARROW-53 - Python: Fix RPATH and add source installation instructions
- ARROW-54 - Python: rename package to “pyarrow”
- ARROW-56 - Format: Specify LSB bit ordering in bit arrays
- ARROW-57 - Format: Draft data headers IDL for data interchange
- ARROW-58 - Format: Draft type metadata (“schemas”) IDL
- ARROW-59 - Python: Boolean data support for builtin data structures
- ARROW-60 - C++: Struct type builder API
- ARROW-64 - Add zsh support to C++ build scripts
- ARROW-66 - Maybe some missing steps in installation guide
- ARROW-67 - C++: Draft type metadata conversion to/from IPC representation
- ARROW-68 - Update setup_build_env and third-party script to be more userfriendly
- ARROW-7 - Add Python library build toolchain
- ARROW-70 - C++: Add “lite” DCHECK macros used in parquet-cpp
- ARROW-71 - C++: Add script to run clang-tidy on codebase
- ARROW-73 - Support CMake 2.8
- ARROW-76 - Revise format document to include null count, defer non-nullable arrays to the domain of metadata
- ARROW-78 - C++: Add constructor for DecimalType
- ARROW-79 - Python: Add benchmarks
- ARROW-8 - Set up Travis CI
- ARROW-82 - C++: Implement IPC exchange for List types
- ARROW-83 - Add basic test infrastructure for DecimalType
- ARROW-85 - C++: memcmp can be avoided in Equal when comparing with the same Buffer
- ARROW-86 - Python: Implement zero-copy Arrow-to-Pandas conversion
- ARROW-87 - Implement Decimal schema conversion for all ways supported in Parquet
- ARROW-89 - Python: Add benchmarks for Arrow<->Pandas conversion
- ARROW-9 - Rename some unchanged “Drill” to “Arrow”
- ARROW-90 - Apache Arrow cpp code does not support power architecture
- ARROW-91 - C++: First draft of an adapter class for parquet-cpp’s ParquetFileReader that produces Arrow table/row batch objects
- ARROW-92 - C++: Arrow to Parquet Schema conversion
Bug Fixes
- ARROW-103 - Missing patterns from .gitignore
- ARROW-104 - Update Layout.md based on discussion on the mailing list
- ARROW-105 - Unit tests fail if assertions are disabled
- ARROW-113 - TestValueVector test fails if cannot allocate 2GB of memory
- ARROW-16 - Building cpp issues on XCode 7.2.1
- ARROW-17 - Set some vector fields to default access level for Drill compatibility
- ARROW-18 - Fix bug with decimal precision and scale
- ARROW-185 - [C++] Make sure alignment and memory padding conform to spec
- ARROW-188 - Python: Add numpy as install requirement
- ARROW-193 - For the instruction, typos “int his” should be “in this”
- ARROW-194 - C++: Allow read-only memory mapped source
- ARROW-200 - [Python] Convert Values String looks like it has incorrect error handling
- ARROW-209 - [C++] Broken builds: llvm.org apt repos are unavailable
- ARROW-210 - [C++] Tidy up the type system a little bit
- ARROW-211 - Several typos/errors in Layout.md examples
- ARROW-217 - Fix Travis w.r.t conda 4.1.0 changes
- ARROW-219 - [C++] Passed CMAKE_CXX_FLAGS are being dropped, fix compiler warnings
- ARROW-223 - Do not link against libpython
- ARROW-225 - [C++/Python] master Travis CI build is broken
- ARROW-244 - [C++] Some global APIs of IPC module should be visible to the outside
- ARROW-246 - [Java] UnionVector doesn’t call allocateNew() when creating it’s vectorType
- ARROW-247 - [C++] Missing explicit destructor in RowBatchReader causes an incomplete type error
- ARROW-250 - Fix for ARROW-246 may cause memory leaks
- ARROW-259 - Use flatbuffer fields in java implementation
- ARROW-265 - Negative decimal values have wrong padding
- ARROW-266 - [C++] Fix the broken build
- ARROW-274 - Make the MapVector nullable
- ARROW-278 - [Format] Struct type name consistency in implementations and metadata
- ARROW-283 - [C++] Update arrow_parquet to account for API changes in PARQUET-573
- ARROW-284 - [C++] Triage builds by disabling Arrow-Parquet module
- ARROW-287 - [java] Make nullable vectors use a BitVecor instead of UInt1Vector for bits
- ARROW-297 - Fix Arrow pom for release
- ARROW-304 - NullableMapReaderImpl.isSet() always returns true
- ARROW-308 - UnionListWriter.setPosition() should not call startList()
- ARROW-309 - Types.getMinorTypeForArrowType() does not work for Union type
- ARROW-313 - XCode 8.0 breaks builds
- ARROW-314 - JSONScalar is unnecessary and unused.
- ARROW-320 - ComplexCopier.copy(FieldReader, FieldWriter) should not start a list if reader is not set
- ARROW-321 - Fix Arrow licences
- ARROW-36 - Remove fixVersions from patch tool (until we have them)
- ARROW-46 - Port DRILL-4410 to Arrow
- ARROW-5 - Error when run maven install
- ARROW-51 - Move ValueVector test from Drill project
- ARROW-55 - Python: fix legacy Python (2.7) tests and add to Travis CI
- ARROW-62 - Format: Are the nulls bits 0 or 1 for null values?
- ARROW-63 - C++: ctest fails if Python 3 is the active Python interpreter
- ARROW-65 - Python: FindPythonLibsNew does not work in a virtualenv
- ARROW-69 - Change permissions for assignable users
- ARROW-72 - FindParquet searches for non-existent header
- ARROW-75 - C++: Fix handling of empty strings
- ARROW-77 - C++: conform null bit interpretation to match ARROW-62
- ARROW-80 - Segmentation fault on len(Array) for empty arrays
- ARROW-88 - C++: Refactor given PARQUET-572
- ARROW-93 - XCode 7.3 breaks builds
- ARROW-94 - Expand list example to clarify null vs empty list