A record batch is a collection of equal-length arrays matching a particular Schema. It is a table-like data structure that is semantically a sequence of fields, each a contiguous Arrow Array.
record_batch(..., schema = NULL)
... | A |
---|---|
schema | a Schema, or |
Record batches are data-frame-like, and many methods you expect to work on
a data.frame
are implemented for RecordBatch
. This includes [
, [[
,
$
, names
, dim
, nrow
, ncol
, head
, and tail
. You can also pull
the data from an Arrow record batch into R with as.data.frame()
. See the
examples.
A caveat about the $
method: because RecordBatch
is an R6
object,
$
is also used to access the object's methods (see below). Methods take
precedence over the table's columns. So, batch$Slice
would return the
"Slice" method function even if there were a column in the table called
"Slice".
In addition to the more R-friendly S3 methods, a RecordBatch
object has
the following R6 methods that map onto the underlying C++ methods:
$Equals(other)
: Returns TRUE
if the other
record batch is equal
$column(i)
: Extract an Array
by integer position from the batch
$column_name(i)
: Get a column's name by integer position
$names()
: Get all column names (called by names(batch)
)
$GetColumnByName(name)
: Extract an Array
by string name
$RemoveColumn(i)
: Drops a column from the batch by integer position
$selectColumns(indices)
: Return a new record batch with a selection of columns, expressed as 0-based integers.
$Slice(offset, length = NULL)
: Create a zero-copy view starting at the
indicated integer offset and going for the given length, or to the end
of the table if NULL
, the default.
$Take(i)
: return an RecordBatch
with rows at positions given by
integers (R vector or Array Array) i
.
$Filter(i, keep_na = TRUE)
: return an RecordBatch
with rows at positions where logical
vector (or Arrow boolean Array) i
is TRUE
.
$serialize()
: Returns a raw vector suitable for interprocess communication
$cast(target_schema, safe = TRUE, options = cast_options(safe))
: Alter
the schema of the record batch.
There are also some active bindings
$num_columns
$num_rows
$schema
$metadata
: Returns the key-value metadata of the Schema
as a named list.
Modify or replace by assigning in (batch$metadata <- new_metadata
).
All list elements are coerced to string.
$columns
: Returns a list of Array
s
#> [1] 32 12#> [1] 6 12#> [1] "name" "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" #> [11] "gear" "carb"batch$mpg#> Array #> <double> #> [ #> 21, #> 21, #> 22.8, #> 21.4, #> 18.7, #> 18.1, #> 14.3, #> 24.4, #> 22.8, #> 19.2, #> ... #> 15.2, #> 13.3, #> 19.2, #> 27.3, #> 26, #> 30.4, #> 15.8, #> 19.7, #> 15, #> 21.4 #> ]batch[["cyl"]]#> Array #> <double> #> [ #> 6, #> 6, #> 4, #> 6, #> 8, #> 6, #> 8, #> 4, #> 4, #> 6, #> ... #> 8, #> 8, #> 8, #> 4, #> 4, #> 4, #> 8, #> 6, #> 8, #> 4 #> ]#> # A tibble: 5 x 3 #> gear hp wt #> <dbl> <dbl> <dbl> #> 1 3 110 3.22 #> 2 3 175 3.44 #> 3 3 105 3.46 #> 4 3 245 3.57 #> 5 4 62 3.19# }