Bug reports and feature requests¶
Arrow relies upon user feedback to identify defects and improvement opportunities. All users are encouraged to participate by creating bug reports and feature requests or commenting on existing issues. Even if you cannot contribute solutions to the issues yourself, your feedback helps us understand problems and prioritize work to improve the libraries.
Apache Arrow Jira¶
The Arrow project uses Jira to track issues - both bug reports and feature requests. No account or permissions are required to view or search Jira issues. The Jira server hosts issue tracking for multiple Apache projects. The Jira project name for Arrow is “ARROW”.
Required permissions¶
Any registered Apache Software Foundation (ASF) Jira account may create or assign Jira issues in the Apache Arrow project without additional permissions. Individuals may create an ASF Jira account here.
Creating issues¶
Apache Arrow relies upon community contributions to address reported bugs and feature requests. As with most software projects, contributor time and resources are finite. The following guidelines aim to produce high-quality bug reports and feature requests, enabling community contributors to respond to more issues, faster:
Check existing issues¶
Before you create a new issue, we recommend you first search for unresolved existing issues identifying the same problem or feature request.
Issue description¶
A clear description of the problem or requested feature is the most important element of any issue. An effective description helps developers understand and efficiently engage on reported issues, and may include the following:
Clear, minimal steps to reproduce the issue, with as few non-Arrow dependencies as possible. If there’s a problem on reading a file, try to provide as small of an example file as possible, or code to create one. If your bug report says “it crashes trying to read my file, but I can’t share it with you,” it’s really hard for us to debug.
Any relevant operating system, language, and library version information
If it isn’t obvious, clearly state the expected behavior and what actually happened.
Avoid overloading a single issue with multiple problems or feature requests. Each issue should deal with a single bug or feature.
If a developer can’t get a failing unit test, they won’t be able to know that the issue has been identified, and they won’t know when it has been fixed. Try to anticipate the questions you might be asked by someone working to understand the issue and provide those supporting details up front.
Examples of good bug reports are found below:
The print
method of a timestamp with timezone errors:
import pyarrow as pa
a = pa.array([0], pa.timestamp('s', tz='+02:00'))
print(a) # representation not correct?
# <pyarrow.lib.TimestampArray object at 0x7f834c7cb9a8>
# [
# 1970-01-01 00:00:00
# ]
print(a[0])
#Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# File "pyarrow/scalar.pxi", line 80, in pyarrow.lib.Scalar.__repr__
# File "pyarrow/scalar.pxi", line 463, in pyarrow.lib.TimestampScalar.as_py
# File "pyarrow/scalar.pxi", line 393, in pyarrow.lib._datetime_from_int
#ValueError: fromutc: dt.tzinfo is not self
Error when reading a CSV file with col_types
option "T"
or "t"
when source data is in millisecond precision:
library(arrow, warn.conflicts = FALSE)
tf <- tempfile()
write.csv(data.frame(x = '2018-10-07 19:04:05.005'), tf, row.names = FALSE)
# successfully read in file
read_csv_arrow(tf, as_data_frame = TRUE)
#> # A tibble: 1 × 1
#> x
#> <dttm>
#> 1 2018-10-07 20:04:05
# the unit here is seconds - doesn't work
read_csv_arrow(
tf,
col_names = "x",
col_types = "T",
skip = 1
)
#> Error in `handle_csv_read_error()`:
#> ! Invalid: In CSV column #0: CSV conversion error to timestamp[s]: invalid value '2018-10-07 19:04:05.005'
# the unit here is ms - doesn't work
read_csv_arrow(
tf,
col_names = "x",
col_types = "t",
skip = 1
)
#> Error in `handle_csv_read_error()`:
#> ! Invalid: In CSV column #0: CSV conversion error to time32[ms]: invalid value '2018-10-07 19:04:05.005'
# the unit here is inferred as ns - does work!
read_csv_arrow(
tf,
col_names = "x",
col_types = "?",
skip = 1,
as_data_frame = FALSE
)
#> Table
#> 1 rows x 1 columns
#> $x <timestamp[ns]>
Other resources for producing useful bug reports:
Identify Arrow component¶
Arrow is an expansive project supporting many languages and organized into a number of components. Identifying the affected component(s) helps new issues get attention from appropriate contributors.
Use the Component field to indicate the area of the project that your issue pertains to (for example “Python” or “C++”).
Also prefix the issue title with the component name in brackets, for example
[Python] issue summary
; this helps when navigating lists of open issues, and it also makes our changelogs more readable. Most prefixes are exactly the same as the Component name, with the following exceptions:Component: Continuous Integration — Summary prefix: [CI]
Component: Developer Tools — Summary prefix: [Dev]
Component: Documentation — Summary prefix: [Docs]
Identify affected version¶
If you’re reporting something that used to work in a previous version but doesn’t work in the current release, you can add the Affects version field to identify the earliest known version where the bug is observed. For feature requests and other proposals, leave Affects version empty as it is not applicable.
Issue lifecycle¶
Both bug reports and feature requests follow a defined lifecycle. The issue Status field is used to document the current state of the issue, while the Resolution field indicates the outcome of issues that have reached terminal status.
Issue Status¶
The Arrow project uses the following statuses in Jira to indicate what has - and will be - done on an issue:
Open - This is the initial issue state, prior to a contributor assigning the issue and starting progress. Issues in this state should be unassigned.
In progress - At the time a contributor self-assigns an issue, the status should be set to In progress by clicking the Start progress button. All issues in this status should have an assignee - unassigned issues will be set back to a status of Open. Issues remain “in progress” until resolved or closed, including during review of pull requests.
Resolved - This is a terminal status indicating action has been taken on the issue, which is now considered completed. Issues in a resolved status should have a resolution code set to Fixed.
Closed - Another terminal status, Closed indicates the issue is complete, but without action being taken. The following resolution codes apply to issues in Closed status:
Won’t Fix
Duplicate
Invalid
Incomplete
Cannot Reproduce
Not a Problem
Not a Bug
Workaround
Information Provided
Works for Me
Won’t Do
Abandoned
Reopened - When an issue has been closed or resolved, but additional attention is needed, it may be reopened.
Issue assignment¶
Assignment signals commitment to work on an issue, and contributors should self-assign issues when that work starts. At the same time the issue is assigned, the status field should be updated to In Progress.
The Arrow project relies upon community contributors to resolve issues. We recognize that priorities and plans may change, resulting in an issue assigned to an individual who cannot attend to it. Assigned issues without updates in the past 90 days will be unassigned and set to Open status.