Crate arrow_json

Crate arrow_json 

Source
Expand description

Transfer data between the Arrow memory format and JSON line-delimited records.

See the module level documentation for the reader and writer for usage examples.

§Binary Data uses Base16 Encoding

As per RFC7159 JSON cannot encode arbitrary binary data. This crate works around that limitation by encoding/decoding binary data as a hexadecimal string (i.e. Base16 encoding).

Note that Base16 only has 50% space efficiency (i.e., the encoded data is twice as large as the original). If that is an issue, we recommend to convert binary data to/from a different encoding format such as Base64 instead. See the following example for details.

§Base64 Encoding Example

Base64 is a common binary-to-text encoding scheme with a space efficiency of 75%. The following example shows how to use the [arrow_cast] crate to encode binary data to Base64 before converting it to JSON and how to decode it back.

use arrow_cast::base64::{b64_decode, b64_encode, BASE64_STANDARD};
// The data we want to write
let input = BinaryArray::from(vec![b"\xDE\x00\xFF".as_ref()]);

// Base64 encode it to a string
let encoded: StringArray = b64_encode(&BASE64_STANDARD, &input);

// Write the StringArray to JSON
let batch = RecordBatch::try_from_iter([("col", Arc::new(encoded) as _)]).unwrap();
let mut buf = Vec::with_capacity(1024);
let mut writer = LineDelimitedWriter::new(&mut buf);
writer.write(&batch).unwrap();
writer.finish().unwrap();

// Read the JSON data
let cursor = Cursor::new(buf);
let mut reader = ReaderBuilder::new(batch.schema()).build(cursor).unwrap();
let batch = reader.next().unwrap().unwrap();

// Reverse the base64 encoding
let col: BinaryArray = batch.column(0).as_string::<i32>().clone().into();
let output = b64_decode(&BASE64_STANDARD, &col).unwrap();

assert_eq!(input, output);

Re-exports§

pub use self::reader::Reader;
pub use self::reader::ReaderBuilder;
pub use self::writer::ArrayWriter;
pub use self::writer::Encoder;
pub use self::writer::EncoderFactory;
pub use self::writer::EncoderOptions;
pub use self::writer::LineDelimitedWriter;
pub use self::writer::Writer;
pub use self::writer::WriterBuilder;

Modules§

reader
JSON reader
writer
JSON Writer

Macros§

json_serializable 🔒

Enums§

StructMode
Specifies what is considered valid JSON when reading or writing RecordBatches or StructArrays.

Traits§

JsonSerializable
Trait declaring any type that is serializable to JSON. This includes all primitive types (bool, i32, etc.).