.. Licensed to the Apache Software Foundation (ASF) under one
.. or more contributor license agreements.  See the NOTICE file
.. distributed with this work for additional information
.. regarding copyright ownership.  The ASF licenses this file
.. to you under the Apache License, Version 2.0 (the
.. "License"); you may not use this file except in compliance
.. with the License.  You may obtain a copy of the License at

..   http://www.apache.org/licenses/LICENSE-2.0

.. Unless required by applicable law or agreed to in writing,
.. software distributed under the License is distributed on an
.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
.. KIND, either express or implied.  See the License for the
.. specific language governing permissions and limitations
.. under the License.

=================
Quick Start Guide
=================

Arrow Java provides several building blocks. Data types describe the types of values;
ValueVectors are sequences of typed values; fields describe the types of columns in
tabular data; schemas describe a sequence of columns in tabular data, and
VectorSchemaRoot represents tabular data. Arrow also provides readers and
writers for loading data from and persisting data to storage.

Create a ValueVector
********************

**ValueVectors** represent a sequence of values of the same type.
They are also known as "arrays" in the columnar format.

Example: create a vector of 32-bit integers representing ``[1, null, 2]``:

.. code-block:: Java

    import org.apache.arrow.memory.BufferAllocator;
    import org.apache.arrow.memory.RootAllocator;
    import org.apache.arrow.vector.IntVector;

    try(
        BufferAllocator allocator = new RootAllocator();
        IntVector intVector = new IntVector("fixed-size-primitive-layout", allocator);
    ){
        intVector.allocateNew(3);
        intVector.set(0,1);
        intVector.setNull(1);
        intVector.set(2,2);
        intVector.setValueCount(3);
        System.out.println("Vector created in memory: " + intVector);
    }

.. code-block:: shell

    Vector created in memory: [1, null, 2]


Example: create a vector of UTF-8 encoded strings representing ``["one", "two", "three"]``:

.. code-block:: Java

    import org.apache.arrow.memory.BufferAllocator;
    import org.apache.arrow.memory.RootAllocator;
    import org.apache.arrow.vector.VarCharVector;

    try(
        BufferAllocator allocator = new RootAllocator();
        VarCharVector varCharVector = new VarCharVector("variable-size-primitive-layout", allocator);
    ){
        varCharVector.allocateNew(3);
        varCharVector.set(0, "one".getBytes());
        varCharVector.set(1, "two".getBytes());
        varCharVector.set(2, "three".getBytes());
        varCharVector.setValueCount(3);
        System.out.println("Vector created in memory: " + varCharVector);
    }

.. code-block:: shell

    Vector created in memory: [one, two, three]

Create a Field
**************

**Fields** are used to denote the particular columns of tabular data.
They consist of a name, a data type, a flag indicating whether the column can have null values,
and optional key-value metadata.

Example: create a field named "document" of string type:

.. code-block:: Java

    import org.apache.arrow.vector.types.pojo.ArrowType;
    import org.apache.arrow.vector.types.pojo.Field;
    import org.apache.arrow.vector.types.pojo.FieldType;
    import java.util.HashMap;
    import java.util.Map;

    Map<String, String> metadata = new HashMap<>();
    metadata.put("A", "Id card");
    metadata.put("B", "Passport");
    metadata.put("C", "Visa");
    Field document = new Field("document",
            new FieldType(true, new ArrowType.Utf8(), /*dictionary*/ null, metadata),
            /*children*/ null);
    System.out.println("Field created: " + document + ", Metadata: " + document.getMetadata());

.. code-block:: shell

    Field created: document: Utf8, Metadata: {A=Id card, B=Passport, C=Visa}

Create a Schema
***************

**Schemas** hold a sequence of fields together with some optional metadata.

Example: Create a schema describing datasets with two columns:
an int32 column "A" and a UTF8-encoded string column "B"

.. code-block:: Java

    import org.apache.arrow.vector.types.pojo.ArrowType;
    import org.apache.arrow.vector.types.pojo.Field;
    import org.apache.arrow.vector.types.pojo.FieldType;
    import org.apache.arrow.vector.types.pojo.Schema;
    import java.util.HashMap;
    import java.util.Map;
    import static java.util.Arrays.asList;

    Map<String, String> metadata = new HashMap<>();
    metadata.put("K1", "V1");
    metadata.put("K2", "V2");
    Field a = new Field("A", FieldType.nullable(new ArrowType.Int(32, true)), /*children*/ null);
    Field b = new Field("B", FieldType.nullable(new ArrowType.Utf8()), /*children*/ null);
    Schema schema = new Schema(asList(a, b), metadata);
    System.out.println("Schema created: " + schema);

.. code-block:: shell

    Schema created: Schema<A: Int(32, true), B: Utf8>(metadata: {K1=V1, K2=V2})

Create a VectorSchemaRoot
*************************

A **VectorSchemaRoot** combines ValueVectors with a Schema to represent tabular data.

Example: Create a dataset of names (strings) and ages (32-bit signed integers).

.. code-block:: Java

    import org.apache.arrow.memory.BufferAllocator;
    import org.apache.arrow.memory.RootAllocator;
    import org.apache.arrow.vector.IntVector;
    import org.apache.arrow.vector.VarCharVector;
    import org.apache.arrow.vector.VectorSchemaRoot;
    import org.apache.arrow.vector.types.pojo.ArrowType;
    import org.apache.arrow.vector.types.pojo.Field;
    import org.apache.arrow.vector.types.pojo.FieldType;
    import org.apache.arrow.vector.types.pojo.Schema;
    import java.nio.charset.StandardCharsets;
    import java.util.HashMap;
    import java.util.Map;
    import static java.util.Arrays.asList;

    Field age = new Field("age",
            FieldType.nullable(new ArrowType.Int(32, true)),
            /*children*/null
    );
    Field name = new Field("name",
            FieldType.nullable(new ArrowType.Utf8()),
            /*children*/null
    );
    Schema schema = new Schema(asList(age, name), /*metadata*/ null);
    try(
        BufferAllocator allocator = new RootAllocator();
        VectorSchemaRoot root = VectorSchemaRoot.create(schema, allocator);
        IntVector ageVector = (IntVector) root.getVector("age");
        VarCharVector nameVector = (VarCharVector) root.getVector("name");
    ){
        ageVector.allocateNew(3);
        ageVector.set(0, 10);
        ageVector.set(1, 20);
        ageVector.set(2, 30);
        nameVector.allocateNew(3);
        nameVector.set(0, "Dave".getBytes(StandardCharsets.UTF_8));
        nameVector.set(1, "Peter".getBytes(StandardCharsets.UTF_8));
        nameVector.set(2, "Mary".getBytes(StandardCharsets.UTF_8));
        root.setRowCount(3);
        System.out.println("VectorSchemaRoot created: \n" + root.contentToTSVString());
    }

.. code-block:: shell

    VectorSchemaRoot created:
    age      name
    10      Dave
    20      Peter
    30      Mary


Interprocess Communication (IPC)
********************************

Arrow data can be written to and read from disk, and both of these can be done in
a streaming and/or random-access fashion depending on application requirements.

**Write data to an arrow file**

Example: Write the dataset from the previous example to an Arrow IPC file (random-access).

.. code-block:: Java

    import org.apache.arrow.memory.BufferAllocator;
    import org.apache.arrow.memory.RootAllocator;
    import org.apache.arrow.vector.IntVector;
    import org.apache.arrow.vector.VarCharVector;
    import org.apache.arrow.vector.VectorSchemaRoot;
    import org.apache.arrow.vector.ipc.ArrowFileWriter;
    import org.apache.arrow.vector.types.pojo.ArrowType;
    import org.apache.arrow.vector.types.pojo.Field;
    import org.apache.arrow.vector.types.pojo.FieldType;
    import org.apache.arrow.vector.types.pojo.Schema;
    import java.io.File;
    import java.io.FileOutputStream;
    import java.io.IOException;
    import java.nio.charset.StandardCharsets;
    import java.util.HashMap;
    import java.util.Map;
    import static java.util.Arrays.asList;

    Field age = new Field("age",
            FieldType.nullable(new ArrowType.Int(32, true)),
            /*children*/ null);
    Field name = new Field("name",
            FieldType.nullable(new ArrowType.Utf8()),
            /*children*/ null);
    Schema schema = new Schema(asList(age, name));
    try(
        BufferAllocator allocator = new RootAllocator();
        VectorSchemaRoot root = VectorSchemaRoot.create(schema, allocator);
        IntVector ageVector = (IntVector) root.getVector("age");
        VarCharVector nameVector = (VarCharVector) root.getVector("name");
    ){
        ageVector.allocateNew(3);
        ageVector.set(0, 10);
        ageVector.set(1, 20);
        ageVector.set(2, 30);
        nameVector.allocateNew(3);
        nameVector.set(0, "Dave".getBytes(StandardCharsets.UTF_8));
        nameVector.set(1, "Peter".getBytes(StandardCharsets.UTF_8));
        nameVector.set(2, "Mary".getBytes(StandardCharsets.UTF_8));
        root.setRowCount(3);
        File file = new File("random_access_file.arrow");
        try (
            FileOutputStream fileOutputStream = new FileOutputStream(file);
            ArrowFileWriter writer = new ArrowFileWriter(root, /*provider*/ null, fileOutputStream.getChannel());
        ) {
            writer.start();
            writer.writeBatch();
            writer.end();
            System.out.println("Record batches written: " + writer.getRecordBlocks().size()
                    + ". Number of rows written: " + root.getRowCount());
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

.. code-block:: shell

    Record batches written: 1. Number of rows written: 3

**Read data from an arrow file**

Example: Read the dataset from the previous example from an Arrow IPC file (random-access).

.. code-block:: Java

    import org.apache.arrow.memory.RootAllocator;
    import org.apache.arrow.vector.ipc.ArrowFileReader;
    import org.apache.arrow.vector.ipc.message.ArrowBlock;
    import org.apache.arrow.vector.VectorSchemaRoot;
    import java.io.File;
    import java.io.FileInputStream;
    import java.io.FileOutputStream;
    import java.io.IOException;

    try(
        BufferAllocator allocator = new RootAllocator(Long.MAX_VALUE);
        FileInputStream fileInputStream = new FileInputStream(new File("random_access_file.arrow"));
        ArrowFileReader reader = new ArrowFileReader(fileInputStream.getChannel(), allocator);
    ){
        System.out.println("Record batches in file: " + reader.getRecordBlocks().size());
        for (ArrowBlock arrowBlock : reader.getRecordBlocks()) {
            reader.loadRecordBatch(arrowBlock);
            VectorSchemaRoot root = reader.getVectorSchemaRoot();
            System.out.println("VectorSchemaRoot read: \n" + root.contentToTSVString());
        }
    } catch (IOException e) {
        e.printStackTrace();
    }

.. code-block:: shell

    Record batches in file: 1
    VectorSchemaRoot read:
    age      name
    10       Dave
    20       Peter
    30       Mary

More examples available at `Arrow Java Cookbook`_.

.. _`Arrow Java Cookbook`: https://arrow.apache.org/cookbook/java