A Table is a sequence of chunked arrays. They have a similar interface to record batches, but they can be composed from multiple record batches or chunked arrays.
Arguments
- ...
A
data.frameor a named set of Arrays or vectors. If given a mixture of data.frames and named vectors, the inputs will be autospliced together (see examples). Alternatively, you can provide a single Arrow IPCInputStream,Message,Buffer, or Rrawobject containing aBuffer.- schema
a Schema, or
NULL(the default) to infer the schema from the data in.... When providing an Arrow IPC buffer,schemais required.
S3 Methods and Usage
Tables are data-frame-like, and many methods you expect to work on
a data.frame are implemented for Table. This includes [, [[,
$, names, dim, nrow, ncol, head, and tail. You can also pull
the data from an Arrow table into R with as.data.frame(). See the
examples.
A caveat about the $ method: because Table is an R6 object,
$ is also used to access the object's methods (see below). Methods take
precedence over the table's columns. So, tab$Slice would return the
"Slice" method function even if there were a column in the table called
"Slice".
R6 Methods
In addition to the more R-friendly S3 methods, a Table object has
the following R6 methods that map onto the underlying C++ methods:
$column(i): Extract aChunkedArrayby integer position from the table$ColumnNames(): Get all column names (called bynames(tab))$nbytes(): Total number of bytes consumed by the elements of the table$RenameColumns(value): Set all column names (called bynames(tab) <- value)$GetColumnByName(name): Extract aChunkedArrayby string name$field(i): Extract aFieldfrom the table schema by integer position$SelectColumns(indices): Return newTablewith specified columns, expressed as 0-based integers.$Slice(offset, length = NULL): Create a zero-copy view starting at the indicated integer offset and going for the given length, or to the end of the table ifNULL, the default.$Take(i): return anTablewith rows at positions given by integersi. Ifiis an ArrowArrayorChunkedArray, it will be coerced to an R vector before taking.$Filter(i, keep_na = TRUE): return anTablewith rows at positions where logical vector or Arrow boolean-type(Chunked)ArrayiisTRUE.$SortIndices(names, descending = FALSE): return anArrayof integer row positions that can be used to rearrange theTablein ascending or descending order by the first named column, breaking ties with further named columns.descendingcan be a logical vector of length one or of the same length asnames.$serialize(output_stream, ...): Write the table to the given OutputStream$cast(target_schema, safe = TRUE, options = cast_options(safe)): Alter the schema of the record batch.
There are also some active bindings:
$num_columns$num_rows$schema$metadata: Returns the key-value metadata of theSchemaas a named list. Modify or replace by assigning in (tab$metadata <- new_metadata). All list elements are coerced to string. Seeschema()for more information.$columns: Returns a list ofChunkedArrays
Examples
tbl <- arrow_table(name = rownames(mtcars), mtcars)
dim(tbl)
#> [1] 32 12
dim(head(tbl))
#> [1] 6 12
names(tbl)
#> [1] "name" "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am"
#> [11] "gear" "carb"
tbl$mpg
#> ChunkedArray
#> <double>
#> [
#> [
#> 21,
#> 21,
#> 22.8,
#> 21.4,
#> 18.7,
#> 18.1,
#> 14.3,
#> 24.4,
#> 22.8,
#> 19.2,
#> ...
#> 15.2,
#> 13.3,
#> 19.2,
#> 27.3,
#> 26,
#> 30.4,
#> 15.8,
#> 19.7,
#> 15,
#> 21.4
#> ]
#> ]
tbl[["cyl"]]
#> ChunkedArray
#> <double>
#> [
#> [
#> 6,
#> 6,
#> 4,
#> 6,
#> 8,
#> 6,
#> 8,
#> 4,
#> 4,
#> 6,
#> ...
#> 8,
#> 8,
#> 8,
#> 4,
#> 4,
#> 4,
#> 8,
#> 6,
#> 8,
#> 4
#> ]
#> ]
as.data.frame(tbl[4:8, c("gear", "hp", "wt")])
#> # A tibble: 5 x 3
#> gear hp wt
#> <dbl> <dbl> <dbl>
#> 1 3 110 3.22
#> 2 3 175 3.44
#> 3 3 105 3.46
#> 4 3 245 3.57
#> 5 4 62 3.19