A Table is a sequence of chunked arrays. They have a similar interface to record batches, but they can be composed from multiple record batches or chunked arrays.
Arguments
- ...
A
data.frame
or a named set of Arrays or vectors. If given a mixture of data.frames and named vectors, the inputs will be autospliced together (see examples). Alternatively, you can provide a single Arrow IPCInputStream
,Message
,Buffer
, or Rraw
object containing aBuffer
.- schema
a Schema, or
NULL
(the default) to infer the schema from the data in...
. When providing an Arrow IPC buffer,schema
is required.
S3 Methods and Usage
Tables are data-frame-like, and many methods you expect to work on
a data.frame
are implemented for Table
. This includes [
, [[
,
$
, names
, dim
, nrow
, ncol
, head
, and tail
. You can also pull
the data from an Arrow table into R with as.data.frame()
. See the
examples.
A caveat about the $
method: because Table
is an R6
object,
$
is also used to access the object's methods (see below). Methods take
precedence over the table's columns. So, tab$Slice
would return the
"Slice" method function even if there were a column in the table called
"Slice".
R6 Methods
In addition to the more R-friendly S3 methods, a Table
object has
the following R6 methods that map onto the underlying C++ methods:
$column(i)
: Extract aChunkedArray
by integer position from the table$ColumnNames()
: Get all column names (called bynames(tab)
)$nbytes()
: Total number of bytes consumed by the elements of the table$RenameColumns(value)
: Set all column names (called bynames(tab) <- value
)$GetColumnByName(name)
: Extract aChunkedArray
by string name$field(i)
: Extract aField
from the table schema by integer position$SelectColumns(indices)
: Return newTable
with specified columns, expressed as 0-based integers.$Slice(offset, length = NULL)
: Create a zero-copy view starting at the indicated integer offset and going for the given length, or to the end of the table ifNULL
, the default.$Take(i)
: return anTable
with rows at positions given by integersi
. Ifi
is an ArrowArray
orChunkedArray
, it will be coerced to an R vector before taking.$Filter(i, keep_na = TRUE)
: return anTable
with rows at positions where logical vector or Arrow boolean-type(Chunked)Array
i
isTRUE
.$SortIndices(names, descending = FALSE)
: return anArray
of integer row positions that can be used to rearrange theTable
in ascending or descending order by the first named column, breaking ties with further named columns.descending
can be a logical vector of length one or of the same length asnames
.$serialize(output_stream, ...)
: Write the table to the given OutputStream$cast(target_schema, safe = TRUE, options = cast_options(safe))
: Alter the schema of the record batch.
There are also some active bindings:
$num_columns
$num_rows
$schema
$metadata
: Returns the key-value metadata of theSchema
as a named list. Modify or replace by assigning in (tab$metadata <- new_metadata
). All list elements are coerced to string. Seeschema()
for more information.$columns
: Returns a list ofChunkedArray
s
Examples
tbl <- arrow_table(name = rownames(mtcars), mtcars)
dim(tbl)
#> [1] 32 12
dim(head(tbl))
#> [1] 6 12
names(tbl)
#> [1] "name" "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am"
#> [11] "gear" "carb"
tbl$mpg
#> ChunkedArray
#> <double>
#> [
#> [
#> 21,
#> 21,
#> 22.8,
#> 21.4,
#> 18.7,
#> 18.1,
#> 14.3,
#> 24.4,
#> 22.8,
#> 19.2,
#> ...
#> 15.2,
#> 13.3,
#> 19.2,
#> 27.3,
#> 26,
#> 30.4,
#> 15.8,
#> 19.7,
#> 15,
#> 21.4
#> ]
#> ]
tbl[["cyl"]]
#> ChunkedArray
#> <double>
#> [
#> [
#> 6,
#> 6,
#> 4,
#> 6,
#> 8,
#> 6,
#> 8,
#> 4,
#> 4,
#> 6,
#> ...
#> 8,
#> 8,
#> 8,
#> 4,
#> 4,
#> 4,
#> 8,
#> 6,
#> 8,
#> 4
#> ]
#> ]
as.data.frame(tbl[4:8, c("gear", "hp", "wt")])
#> gear hp wt
#> 1 3 110 3.215
#> 2 3 175 3.440
#> 3 3 105 3.460
#> 4 3 245 3.570
#> 5 4 62 3.190