Apache Arrow defines two formats for serializing data for interprocess communication (IPC):
a "stream" format and a "file" format, known as Feather.
interfaces for writing record batches to those formats, respectively.
For guidance on how to use these classes, see the examples section.
factory methods instantiate the object and take the following arguments:
schema A Schema for the data to be written
use_legacy_format logical: write data formatted so that Arrow libraries
versions 0.14 and lower can read it. Default is
FALSE. You can also
enable this by setting the environment variable
metadata_version: A string like "V5" or the equivalent integer indicating
the Arrow IPC MetadataVersion. Default (NULL) will use the latest version,
unless the environment variable
which case it will be V4.
$write_batch(batch): Write a
RecordBatch to stream
$write_table(table): Write a
Table to stream
$close(): close stream. Note that this indicates end-of-file or
end-of-stream--it does not close the connection to the
sink. That needs
to be closed separately.
write_feather() provide a much simpler
interface for writing data to these formats and are sufficient for many use
write_to_raw() is a version that serializes data to a buffer.
tf <- tempfile() on.exit(unlink(tf)) batch <- record_batch(chickwts) # This opens a connection to the file in Arrow file_obj <- FileOutputStream$create(tf) # Pass that to a RecordBatchWriter to write data conforming to a schema writer <- RecordBatchFileWriter$create(file_obj, batch$schema) writer$write(batch) # You may write additional batches to the stream, provided that they have # the same schema. # Call "close" on the writer to indicate end-of-file/stream writer$close() # Then, close the connection--closing the IPC message does not close the file file_obj$close() # Now, we have a file we can read from. Same pattern: open file connection, # then pass it to a RecordBatchReader read_file_obj <- ReadableFile$create(tf) reader <- RecordBatchFileReader$create(read_file_obj) # RecordBatchFileReader knows how many batches it has (StreamReader does not) reader$num_record_batches#>  1# We could consume the Reader by calling $read_next_batch() until all are, # consumed, or we can call $read_table() to pull them all into a Table tab <- reader$read_table() # Call as.data.frame to turn that Table into an R data.frame df <- as.data.frame(tab) # This should be the same data we sent all.equal(df, chickwts, check.attributes = FALSE)#>  TRUE# Unlike the Writers, we don't have to close RecordBatchReaders, # but we do still need to close the file connection read_file_obj$close()