Class ArrowToAvroUtils

java.lang.Object
org.apache.arrow.adapter.avro.ArrowToAvroUtils

public class ArrowToAvroUtils extends Object
  • Field Details

  • Constructor Details

    • ArrowToAvroUtils

      public ArrowToAvroUtils()
  • Method Details

    • createAvroSchema

      public static org.apache.avro.Schema createAvroSchema(List<Field> arrowFields, String typeName, String namespace, DictionaryProvider dictionaries)
      Create an Avro record schema for a given list of Arrow fields.

      This method currently performs following type mapping for Avro data types to corresponding Arrow data types.

      Arrow typeAvro encoding
      ArrowType.NullNULL
      ArrowType.BoolBOOLEAN
      ArrowType.Int(64 bit, unsigned 32 bit)LONG
      ArrowType.Int(signed 32 bit, < 32 bit)INT
      ArrowType.FloatingPoint(double)DOUBLE
      ArrowType.FloatingPoint(single, half)FLOAT
      ArrowType.Utf8STRING
      ArrowType.LargeUtf8STRING
      ArrowType.BinaryBYTES
      ArrowType.LargeBinaryBYTES
      ArrowType.FixedSizeBinaryFIXED
      ArrowType.Decimaldecimal (FIXED)
      ArrowType.Datedate (INT)
      ArrowType.Time (SEC | MILLI)time-millis (INT)
      ArrowType.Time (MICRO | NANO)time-micros (LONG)
      ArrowType.Timestamp (NANOSECONDS, TZ != NULL)time-nanos (LONG)
      ArrowType.Timestamp (MICROSECONDS, TZ != NULL)time-micros (LONG)
      ArrowType.Timestamp (MILLISECONDS | SECONDS, TZ != NULL)time-millis (LONG)
      ArrowType.Timestamp (NANOSECONDS, TZ == NULL)local-time-nanos (LONG)
      ArrowType.Timestamp (MICROSECONDS, TZ == NULL)local-time-micros (LONG)
      ArrowType.Timestamp (MILLISECONDS | SECONDS, TZ == NULL)local-time-millis (LONG)
      ArrowType.Durationduration (FIXED)
      ArrowType.Intervalduration (FIXED)
      ArrowType.Structrecord
      ArrowType.Listarray
      ArrowType.LargeListarray
      ArrowType.FixedSizeListarray
      ArrowType.Mapmap
      ArrowType.Unionunion

      Nullable fields are represented as a union of [base-type | null]. Special treatment is given to nullability of unions - a union is considered nullable if any of its child fields are nullable. The schema for a nullable union will always contain a null type as its first member, with none of the child types being nullable.

      List fields must contain precisely one child field, which may be nullable. Map fields are represented as a list of structs, where the struct fields are "key" and "value". The key field must always be of type STRING (Utf8) and cannot be nullable. The value can be of any type and may be nullable. Record types must contain at least one child field and cannot contain multiple fields with the same name

      String fields that are dictionary-encoded will be represented as an Avro enum, so long as all the values meet the restrictions on Avro enums (non-null, valid identifiers). Other data types that are dictionary encoded, or string fields that do not meet the avro requirements, will be output as their decoded type.

      Parameters:
      arrowFields - The arrow fields used to generate the Avro schema
      typeName - Name of the top level Avro record type
      namespace - Namespace of the top level Avro record type
      dictionaries - A dictionary provider is required if any fields use dictionary encoding
      Returns:
      An Avro record schema for the given list of fields, with the specified name and namespace
    • createAvroSchema

      public static org.apache.avro.Schema createAvroSchema(List<Field> arrowFields, String typeName, String namespace)
      Overload provided for convenience, sets dictionaries = null.
    • createAvroSchema

      public static org.apache.avro.Schema createAvroSchema(List<Field> arrowFields, String typeName)
      Overload provided for convenience, sets namespace = null.
    • createAvroSchema

      public static org.apache.avro.Schema createAvroSchema(List<Field> arrowFields)
      Overload provided for convenience, sets name = GENERIC_RECORD_TYPE_NAME.
    • createAvroSchema

      public static org.apache.avro.Schema createAvroSchema(List<Field> arrowFields, DictionaryProvider dictionaries)
      Overload provided for convenience, sets name = GENERIC_RECORD_TYPE_NAME and namespace = null.
    • createCompositeProducer

      public static CompositeAvroProducer createCompositeProducer(List<FieldVector> vectors, DictionaryProvider dictionaries)
      Create a composite Avro producer for a set of field vectors (typically the root set of a VSR).
      Parameters:
      vectors - The vectors that will be used to produce Avro data
      Returns:
      The resulting composite Avro producer
    • createCompositeProducer

      public static CompositeAvroProducer createCompositeProducer(List<FieldVector> vectors)
      Overload provided for convenience, sets dictionaries = null.