# 6 Manipulating Data - Arrays

## 6.1 Introduction

An Arrow Array is roughly equivalent to an R vector - it can be used to represent a single column of data, with all values having the same data type.

A number of base R functions which have S3 generic methods have been implemented to work on Arrow Arrays; for example `mean`, `min`, and `max`.

## 6.2 Filter by values matching a predicate or mask

You want to search for values in an Array that match a predicate condition.

### 6.2.1 Solution

``````my_values <- Array\$create(c(1:5, NA))
my_values[my_values > 3]``````
``````## Array
## <int32>
## [
##   4,
##   5,
##   null
## ]``````

### 6.2.2 Discussion

You can refer to items in an Array using the square brackets `[]` like you can an R vector.

## 6.3 Compute Mean/Min/Max, etc value of an Array

You want to calculate the mean, minimum, or maximum of values in an array.

### 6.3.1 Solution

``````my_values <- Array\$create(c(1:5, NA))
mean(my_values, na.rm = TRUE)``````
``````## Scalar
## 3``````

### 6.3.2 Discussion

Many base R generic functions such as `mean()`, `min()`, and `max()` have been mapped to their Arrow equivalents, and so can be called on Arrow Array objects in the same way. They will return Arrow objects themselves.

If you want to use an R function which does not have an Arrow mapping, you can use `as.vector()` to convert Arrow objects to base R vectors.

``````arrow_array <- Array\$create(1:100)
# get Tukey's five-number summary
fivenum(as.vector(arrow_array))``````
``## [1]   1.0  25.5  50.5  75.5 100.0``

You can tell if a function is a standard S3 generic function by looking at the body of the function - S3 generic functions call `UseMethod()` to determine the appropriate version of that function to use for the object.

``mean``
``````## function (x, ...)
## UseMethod("mean")
## <bytecode: 0x564a10424388>
## <environment: namespace:base>``````

You can also use `isS3stdGeneric()` to determine if a function is an S3 generic.

``isS3stdGeneric("mean")``
``````## mean
## TRUE``````

If you find an S3 generic function which isn’t implemented for Arrow objects but you would like to be able to use, please open an issue on the project JIRA.

## 6.4 Count occurrences of elements in an Array

You want to count repeated values in an Array.

### 6.4.1 Solution

``````repeated_vals <- Array\$create(c(1, 1, 2, 3, 3, 3, 3, 3))
value_counts(repeated_vals)``````
``````## StructArray
## <struct<values: double, counts: int64>>
## -- is_valid: all not null
## -- child 0 type: double
##   [
##     1,
##     2,
##     3
##   ]
## -- child 1 type: int64
##   [
##     2,
##     1,
##     5
##   ]``````

### 6.4.2 Discussion

Some functions in the Arrow R package do not have base R equivalents. In other cases, the base R equivalents are not generic functions so they cannot be called directly on Arrow Array objects.

For example, the `value_counts()` function in the Arrow R package is loosely equivalent to the base R function `table()`, which is not a generic function.

## 6.5 Apply arithmetic functions to Arrays.

You want to use the various arithmetic operators on Array objects.

### 6.5.1 Solution

``````num_array <- Array\$create(1:10)
num_array + 10``````
``````## Array
## <double>
## [
##   11,
##   12,
##   13,
##   14,
##   15,
##   16,
##   17,
##   18,
##   19,
##   20
## ]``````

### 6.5.2 Discussion

You will get the same result if you pass in the value you’re adding as an Arrow object.

``num_array + Scalar\$create(10)``
``````## Array
## <double>
## [
##   11,
##   12,
##   13,
##   14,
##   15,
##   16,
##   17,
##   18,
##   19,
##   20
## ]``````

## 6.6 Call Arrow compute functions directly on Arrays

You want to call an Arrow compute function directly on an Array.

### 6.6.1 Solution

``````first_100_numbers <- Array\$create(1:100)

# Calculate the variance of 1 to 100, setting the delta degrees of freedom to 0.
call_function("variance", first_100_numbers, options = list(ddof = 0))``````
``````## Scalar
## 833.25``````

### 6.6.2 Discussion

You can use `call_function()` to call Arrow compute functions directly on Scalar, Array, and ChunkedArray objects. The returned object will be an Arrow object.