Apache Arrow 0.6.0 (14 August 2017)
This is a major release. Read more in the release blog post.
Download
Contributors
$ git shortlog -sn apache-arrow-0.5.0..apache-arrow-0.6.0
48 Wes McKinney
7 siddharth
5 Matt Darwin
5 Max Risuhin
5 Philipp Moritz
4 Kouhei Sutou
3 Bryan Cutler
2 Emilio Lahr-Vivaz
2 Li Jin
2 Robert Nishihara
1 Antony Mayi
1 Marco Neumann
1 Stepan Kadlec
1 Steven Phillips
1 Yeolar
1 fjetter
1 rendel
Changelog
New Features and Improvements
- ARROW-1076 - [Python] Handle nanosecond timestamps more gracefully when writing to Parquet format
- ARROW-1093 - [Python] Fail Python builds if flake8 yields warnings
- ARROW-1104 - Integrate in-memory object store from Ray
- ARROW-1121 - [C++] Improve error message when opening OS file fails
- ARROW-1140 - [C++] Allow optional build of plasma
- ARROW-1149 - [Plasma] Create Cython client library for Plasma
- ARROW-1173 - [Plasma] Blog post for Plasma
- ARROW-1211 - [C++] Consider making default_memory_pool() the default for builder classes
- ARROW-1213 - [Python] Enable s3fs to be used with ParquetDataset and reader/writer functions
- ARROW-1219 - [C++] Use more vanilla Google C++ formatting
- ARROW-1224 - [Format] Clarify language around buffer padding and alignment in IPC
- ARROW-1230 - [Plasma] Install libraries and headers
- ARROW-1241 - [C++] Visual Studio 2017 Appveyor build job
- ARROW-1243 - [Java] security: upgrade all libraries to latest stable versions
- ARROW-1246 - [Format] Add Map logical type to metadata
- ARROW-1251 - [Python/C++] Revise build documentation to account for latest build toolchain
- ARROW-1253 - [C++] Use pre-built toolchain libraries where prudent to speed up CI builds
- ARROW-1255 - [Plasma] Check plasma flatbuffer messages with the flatbuffer verifier
- ARROW-1257 - [Plasma] Plasma documentation
- ARROW-1258 - [C++] Suppress dlmalloc warnings on Clang
- ARROW-1259 - [Plasma] Speed up Plasma tests
- ARROW-1260 - [Plasma] Use factory method to create Python PlasmaClient
- ARROW-1264 - [Plasma] Don’t exit the Python interpreter if the plasma client can’t connect to the store
- ARROW-1268 - [Website] Blog post on Arrow integration with Spark
- ARROW-1270 - [Packaging] Add Python wheel build scripts for macOS to arrow-dist
- ARROW-1272 - [Python] Add script to arrow-dist to generate and upload manylinux1 Python wheels
- ARROW-1273 - [Python] Add convenience functions for reading only Parquet metadata or effective Arrow schema from a particular Parquet file
- ARROW-1274 - [C++] add_compiler_export_flags() throws warning with CMake >= 3.3
- ARROW-1281 - [C++/Python] Add Docker setup for running HDFS tests and other tests we may not run in Travis CI
- ARROW-1288 - Clean up many ASF license headers
- ARROW-1289 - [Python] Add PYARROW_BUILD_PLASMA option like Parquet
- ARROW-1297 - 0.6.0 Release
- ARROW-1301 - [C++/Python] Add remaining supported libhdfs UNIX-like filesystem APIs
- ARROW-1303 - [C++] Support downloading Boost
- ARROW-1304 - [Java] Fix checkstyle checks warning
- ARROW-1305 - [GLib] Add GArrowIntArrayBuilder
- ARROW-1315 - [GLib] Status check of arrow::ArrayBuilder::Finish() is missing
- ARROW-1323 - [GLib] Add garrow_boolean_array_get_values()
- ARROW-1333 - [Plasma] Sorting example for DataFrames in plasma
- ARROW-1334 - [C++] Instantiate arrow::Table from vector of Array objects (instead of Columns)
- ARROW-1336 - [C++] Add arrow::schema factory function
- ARROW-439 - [Python] Add option in “to_pandas” conversions to yield Categorical from String/Binary arrays
- ARROW-622 - [Python] Investigate alternatives to timestamps_to_ms argument in pandas conversion
Bug Fixes
- ARROW-1192 - [JAVA] Improve splitAndTransfer performance for List and Union vectors
- ARROW-1195 - [C++] CpuInfo doesn’t get cache size on Windows
- ARROW-1204 - [C++] lz4 ExternalProject fails in Visual Studio 2015
- ARROW-1225 - [Python] pyarrow.array does not attempt to convert bytes to UTF8 when passed a StringType
- ARROW-1237 - [JAVA] Expose the ability to set lastSet
- ARROW-1239 - issue with current version of git-commit-id-plugin
- ARROW-1240 - security: upgrade logback to address CVE-2017-5929
- ARROW-1242 - [Java] security - upgrade Jackson to mitigate 3 CVE vulnerabilities
- ARROW-1245 - [Integration] Java Integration Tests Disabled
- ARROW-1248 - [Python] C linkage warnings in Clang with public Cython API
- ARROW-1249 - [JAVA] Expose the fillEmpties function from Nullable
Vector.mutator - ARROW-1263 - [C++] CpuInfo should be able to get CPU features on Windows
- ARROW-1265 - [Plasma] Plasma store memory leak warnings in Python test suite
- ARROW-1267 - [Java] Handle zero length case in BitVector.splitAndTransfer
- ARROW-1269 - [Packaging] Add Windows wheel build scripts from ARROW-1068 to arrow-dist
- ARROW-1275 - [C++] Default static library prefix for Snappy should be “_static”
- ARROW-1276 - Cannot serializer empty DataFrame to parquet
- ARROW-1283 - [Java] VectorSchemaRoot should be able to be closed() more than once
- ARROW-1285 - PYTHON: NotImplemented exception creates empty parquet file
- ARROW-1287 - [Python] Emulate “whence” argument of seek in NativeFile
- ARROW-1290 - [C++] Use array capacity doubling in arrow::BufferBuilder
- ARROW-1291 - [Python] pa.RecordBatch.from_pandas doesn’t accept DataFrame with numeric column names
- ARROW-1294 - [C++] New Appveyor build failures
- ARROW-1296 - [Java] templates/FixValueVectors reset() method doesn’t set allocationSizeInBytes correctly
- ARROW-1300 - [JAVA] Fix ListVector Tests
- ARROW-1306 - [Python] Encoding? issue with error reporting for parquet.read_table
- ARROW-1308 - [C++] ld tries to link ‘arrow_static’ even when -DARROW_BUILD_STATIC=off
- ARROW-1309 - [Python] Error inferring List type in Array.from_pandas when inner values are all None
- ARROW-1310 - [JAVA] Revert ARROW-886
- ARROW-1312 - [C++] Set default value to ARROW_JEMALLOC to OFF until ARROW-1282 is resolved
- ARROW-1326 - [Python] Fix Sphinx build in Travis CI
- ARROW-1327 - [Python] Failing to release GIL in MemoryMappedFile._open causes deadlock
- ARROW-1328 - [Python] pyarrow.Table.from_pandas option timestamps_to_ms changes column values
- ARROW-1330 - [Plasma] Turn on plasma tests on manylinux1
- ARROW-1335 - [C++] PrimitiveArray::raw_values has inconsistent semantics re: offsets compared with subclasses
- ARROW-1338 - [Python] Investigate non-deterministic core dump on Python 2.7, Travis CI builds
- ARROW-1340 - [Java] NullableMapVector field doesn’t maintain metadata
- ARROW-1342 - [Python] Support strided array of lists
- ARROW-1343 - [Format/Java/C++] Ensuring encapsulated stream / IPC message sizes are always a multiple of 8
- ARROW-1350 - [C++] Include Plasma source tree in source distribution
- ARROW-187 - [C++] Decide on how pedantic we want to be about exceptions
- ARROW-276 - [JAVA] Nullable Value Vectors should extend BaseValueVector instead of BaseDataValueVector
- ARROW-573 - [Python/C++] Support ordered dictionaries data, pandas Categorical
- ARROW-884 - [C++] Exclude internal classes from documentation
- ARROW-932 - [Python] Fix compiler warnings on MSVC
- ARROW-968 - [Python] RecordBatch [i:j] syntax is incomplete