-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic org.apache.avro.SchemacreateAvroSchema(List<Field> arrowFields) Overload provided for convenience, sets name = GENERIC_RECORD_TYPE_NAME.static org.apache.avro.SchemacreateAvroSchema(List<Field> arrowFields, String typeName) Overload provided for convenience, sets namespace = null.static org.apache.avro.SchemacreateAvroSchema(List<Field> arrowFields, String typeName, String namespace) Overload provided for convenience, sets dictionaries = null.static org.apache.avro.SchemacreateAvroSchema(List<Field> arrowFields, String typeName, String namespace, DictionaryProvider dictionaries) Create an Avro record schema for a given list of Arrow fields.static org.apache.avro.SchemacreateAvroSchema(List<Field> arrowFields, DictionaryProvider dictionaries) Overload provided for convenience, sets name = GENERIC_RECORD_TYPE_NAME and namespace = null.static CompositeAvroProducercreateCompositeProducer(List<FieldVector> vectors) Overload provided for convenience, sets dictionaries = null.static CompositeAvroProducercreateCompositeProducer(List<FieldVector> vectors, DictionaryProvider dictionaries) Create a composite Avro producer for a set of field vectors (typically the root set of a VSR).
-
Field Details
-
GENERIC_RECORD_TYPE_NAME
- See Also:
-
-
Constructor Details
-
ArrowToAvroUtils
public ArrowToAvroUtils()
-
-
Method Details
-
createAvroSchema
public static org.apache.avro.Schema createAvroSchema(List<Field> arrowFields, String typeName, String namespace, DictionaryProvider dictionaries) Create an Avro record schema for a given list of Arrow fields.This method currently performs following type mapping for Avro data types to corresponding Arrow data types.
Arrow type Avro encoding ArrowType.Null NULL ArrowType.Bool BOOLEAN ArrowType.Int(64 bit, unsigned 32 bit) LONG ArrowType.Int(signed 32 bit, < 32 bit) INT ArrowType.FloatingPoint(double) DOUBLE ArrowType.FloatingPoint(single, half) FLOAT ArrowType.Utf8 STRING ArrowType.LargeUtf8 STRING ArrowType.Binary BYTES ArrowType.LargeBinary BYTES ArrowType.FixedSizeBinary FIXED ArrowType.Decimal decimal (FIXED) ArrowType.Date date (INT) ArrowType.Time (SEC | MILLI) time-millis (INT) ArrowType.Time (MICRO | NANO) time-micros (LONG) ArrowType.Timestamp (NANOSECONDS, TZ != NULL) time-nanos (LONG) ArrowType.Timestamp (MICROSECONDS, TZ != NULL) time-micros (LONG) ArrowType.Timestamp (MILLISECONDS | SECONDS, TZ != NULL) time-millis (LONG) ArrowType.Timestamp (NANOSECONDS, TZ == NULL) local-time-nanos (LONG) ArrowType.Timestamp (MICROSECONDS, TZ == NULL) local-time-micros (LONG) ArrowType.Timestamp (MILLISECONDS | SECONDS, TZ == NULL) local-time-millis (LONG) ArrowType.Duration duration (FIXED) ArrowType.Interval duration (FIXED) ArrowType.Struct record ArrowType.List array ArrowType.LargeList array ArrowType.FixedSizeList array ArrowType.Map map ArrowType.Union union Nullable fields are represented as a union of [base-type | null]. Special treatment is given to nullability of unions - a union is considered nullable if any of its child fields are nullable. The schema for a nullable union will always contain a null type as its first member, with none of the child types being nullable.
List fields must contain precisely one child field, which may be nullable. Map fields are represented as a list of structs, where the struct fields are "key" and "value". The key field must always be of type STRING (Utf8) and cannot be nullable. The value can be of any type and may be nullable. Record types must contain at least one child field and cannot contain multiple fields with the same name
String fields that are dictionary-encoded will be represented as an Avro enum, so long as all the values meet the restrictions on Avro enums (non-null, valid identifiers). Other data types that are dictionary encoded, or string fields that do not meet the avro requirements, will be output as their decoded type.
- Parameters:
arrowFields- The arrow fields used to generate the Avro schematypeName- Name of the top level Avro record typenamespace- Namespace of the top level Avro record typedictionaries- A dictionary provider is required if any fields use dictionary encoding- Returns:
- An Avro record schema for the given list of fields, with the specified name and namespace
-
createAvroSchema
public static org.apache.avro.Schema createAvroSchema(List<Field> arrowFields, String typeName, String namespace) Overload provided for convenience, sets dictionaries = null. -
createAvroSchema
Overload provided for convenience, sets namespace = null. -
createAvroSchema
Overload provided for convenience, sets name = GENERIC_RECORD_TYPE_NAME. -
createAvroSchema
public static org.apache.avro.Schema createAvroSchema(List<Field> arrowFields, DictionaryProvider dictionaries) Overload provided for convenience, sets name = GENERIC_RECORD_TYPE_NAME and namespace = null. -
createCompositeProducer
public static CompositeAvroProducer createCompositeProducer(List<FieldVector> vectors, DictionaryProvider dictionaries) Create a composite Avro producer for a set of field vectors (typically the root set of a VSR).- Parameters:
vectors- The vectors that will be used to produce Avro data- Returns:
- The resulting composite Avro producer
-
createCompositeProducer
Overload provided for convenience, sets dictionaries = null.
-