Class ArrowToAvroUtils

java.lang.Object
org.apache.arrow.adapter.avro.ArrowToAvroUtils

public class ArrowToAvroUtils extends Object
  • Field Details

  • Constructor Details

    • ArrowToAvroUtils

      public ArrowToAvroUtils()
  • Method Details

    • createAvroSchema

      public static org.apache.avro.Schema createAvroSchema(List<Field> arrowFields, String typeName, String namespace)
      Create an Avro record schema for a given list of Arrow fields.

      This method currently performs following type mapping for Avro data types to corresponding Arrow data types.

      Arrow typeAvro encoding
      ArrowType.NullNULL
      ArrowType.BoolBOOLEAN
      ArrowType.Int(64 bit, unsigned 32 bit)LONG
      ArrowType.Int(signed 32 bit, < 32 bit)INT
      ArrowType.FloatingPoint(double)DOUBLE
      ArrowType.FloatingPoint(single, half)FLOAT
      ArrowType.Utf8STRING
      ArrowType.LargeUtf8STRING
      ArrowType.BinaryBYTES
      ArrowType.LargeBinaryBYTES
      ArrowType.FixedSizeBinaryFIXED
      ArrowType.Decimaldecimal (FIXED)
      ArrowType.Datedate (INT)
      ArrowType.Time (SEC | MILLI)time-millis (INT)
      ArrowType.Time (MICRO | NANO)time-micros (LONG)
      ArrowType.Timestamp (NANOSECONDS, TZ != NULL)time-nanos (LONG)
      ArrowType.Timestamp (MICROSECONDS, TZ != NULL)time-micros (LONG)
      ArrowType.Timestamp (MILLISECONDS | SECONDS, TZ != NULL)time-millis (LONG)
      ArrowType.Timestamp (NANOSECONDS, TZ == NULL)local-time-nanos (LONG)
      ArrowType.Timestamp (MICROSECONDS, TZ == NULL)local-time-micros (LONG)
      ArrowType.Timestamp (MILLISECONDS | SECONDS, TZ == NULL)local-time-millis (LONG)
      ArrowType.Durationduration (FIXED)
      ArrowType.Intervalduration (FIXED)
      ArrowType.Structrecord
      ArrowType.Listarray
      ArrowType.LargeListarray
      ArrowType.FixedSizeListarray
      ArrowType.Mapmap
      ArrowType.Unionunion

      Nullable fields are represented as a union of [base-type | null]. Special treatment is given to nullability of unions - a union is considered nullable if any of its child fields are nullable. The schema for a nullable union will always contain a null type as its first member, with none of the child types being nullable.

      List fields must contain precisely one child field, which may be nullable. Map fields are represented as a list of structs, where the struct fields are "key" and "value". The key field must always be of type STRING (Utf8) and cannot be nullable. The value can be of any type and may be nullable. Record types must contain at least one child field and cannot contain multiple fields with the same name

      Parameters:
      arrowFields - The arrow fields used to generate the Avro schema
      typeName - Name of the top level Avro record type
      namespace - Namespace of the top level Avro record type
      Returns:
      An Avro record schema for the given list of fields, with the specified name and namespace
    • createAvroSchema

      public static org.apache.avro.Schema createAvroSchema(List<Field> arrowFields, String typeName)
      Overload provided for convenience, sets namespace = null.
    • createAvroSchema

      public static org.apache.avro.Schema createAvroSchema(List<Field> arrowFields)
      Overload provided for convenience, sets name = GENERIC_RECORD_TYPE_NAME.
    • createCompositeProducer

      public static CompositeAvroProducer createCompositeProducer(List<FieldVector> vectors)
      Create a composite Avro producer for a set of field vectors (typically the root set of a VSR).
      Parameters:
      vectors - The vectors that will be used to produce Avro data
      Returns:
      The resulting composite Avro producer