Extension arrays are wrappers around regular Arrow Array objects that provide some customized behaviour and/or storage. A common use-case for extension types is to define a customized conversion between an an Arrow Array and an R object when the default conversion is slow or loses metadata important to the interpretation of values in the array. For most types, the built-in vctrs extension type is probably sufficient.
Usage
new_extension_type(
storage_type,
extension_name,
extension_metadata = raw(),
type_class = ExtensionType
)
new_extension_array(storage_array, extension_type)
register_extension_type(extension_type)
reregister_extension_type(extension_type)
unregister_extension_type(extension_name)
Arguments
- storage_type
The data type of the underlying storage array.
- extension_name
The extension name. This should be namespaced using "dot" syntax (i.e., "some_package.some_type"). The namespace "arrow" is reserved for extension types defined by the Apache Arrow libraries.
- extension_metadata
A
raw()
orcharacter()
vector containing the serialized version of the type. Chatacter vectors must be length 1 and are converted to UTF-8 before converting toraw()
.- type_class
An R6::R6Class whose
$new()
class method will be used to construct a new instance of the type.- storage_array
An Array object of the underlying storage.
- extension_type
An ExtensionType instance.
Value
new_extension_type()
returns an ExtensionType instance according to thetype_class
specified.new_extension_array()
returns an ExtensionArray whose$type
corresponds toextension_type
.register_extension_type()
,unregister_extension_type()
andreregister_extension_type()
returnNULL
, invisibly.
Details
These functions create, register, and unregister ExtensionType and ExtensionArray objects. To use an extension type you will have to:
Define an R6::R6Class that inherits from ExtensionType and reimplement one or more methods (e.g.,
deserialize_instance()
).Make a type constructor function (e.g.,
my_extension_type()
) that callsnew_extension_type()
to create an R6 instance that can be used as a data type elsewhere in the package.Make an array constructor function (e.g.,
my_extension_array()
) that callsnew_extension_array()
to create an Array instance of your extension type.Register a dummy instance of your extension type created using you constructor function using
register_extension_type()
.
If defining an extension type in an R package, you will probably want to
use reregister_extension_type()
in that package's .onLoad()
hook
since your package will probably get reloaded in the same R session
during its development and register_extension_type()
will error if
called twice for the same extension_name
. For an example of an
extension type that uses most of these features, see
vctrs_extension_type()
.
Examples
# Create the R6 type whose methods control how Array objects are
# converted to R objects, how equality between types is computed,
# and how types are printed.
QuantizedType <- R6::R6Class(
"QuantizedType",
inherit = ExtensionType,
public = list(
# methods to access the custom metadata fields
center = function() private$.center,
scale = function() private$.scale,
# called when an Array of this type is converted to an R vector
as_vector = function(extension_array) {
if (inherits(extension_array, "ExtensionArray")) {
unquantized_arrow <-
(extension_array$storage()$cast(float64()) / private$.scale) +
private$.center
as.vector(unquantized_arrow)
} else {
super$as_vector(extension_array)
}
},
# populate the custom metadata fields from the serialized metadata
deserialize_instance = function() {
vals <- as.numeric(strsplit(self$extension_metadata_utf8(), ";")[[1]])
private$.center <- vals[1]
private$.scale <- vals[2]
}
),
private = list(
.center = NULL,
.scale = NULL
)
)
# Create a helper type constructor that calls new_extension_type()
quantized <- function(center = 0, scale = 1, storage_type = int32()) {
new_extension_type(
storage_type = storage_type,
extension_name = "arrow.example.quantized",
extension_metadata = paste(center, scale, sep = ";"),
type_class = QuantizedType
)
}
# Create a helper array constructor that calls new_extension_array()
quantized_array <- function(x, center = 0, scale = 1,
storage_type = int32()) {
type <- quantized(center, scale, storage_type)
new_extension_array(
Array$create((x - center) * scale, type = storage_type),
type
)
}
# Register the extension type so that Arrow knows what to do when
# it encounters this extension type
reregister_extension_type(quantized())
# Create Array objects and use them!
(vals <- runif(5, min = 19, max = 21))
#> [1] 19.63750 20.85706 20.72989 20.76217 20.67601
(array <- quantized_array(
vals,
center = 20,
scale = 2^15 - 1,
storage_type = int16()
)
)
#> ExtensionArray
#> <QuantizedType <20;32767>>
#> [
#> -11877,
#> 28083,
#> 23916,
#> 24974,
#> 22150
#> ]
array$type$center()
#> [1] 20
array$type$scale()
#> [1] 32767
as.vector(array)
#> [1] 19.63753 20.85705 20.72988 20.76217 20.67598