Class JniWrapper

java.lang.Object
org.apache.arrow.dataset.jni.JniWrapper

public class JniWrapper extends Object
JNI wrapper for Dataset API's native implementation.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    closeDataset(long datasetId)
    Release the Dataset by destroying its reference held by JNI wrapper.
    void
    closeDatasetFactory(long datasetFactoryId)
    Release the DatasetFactory by destroying its reference held by JNI wrapper.
    void
    closeScanner(long scannerId)
    Release the Scanner by destroying its reference held by JNI wrapper.
    long
    createDataset(long datasetFactoryId, byte[] schema)
    Create Dataset from a DatasetFactory and get the native pointer of the Dataset.
    long
    createScanner(long datasetId, String[] columns, ByteBuffer substraitProjection, ByteBuffer substraitFilter, long batchSize, int fileFormat, String[] serializedFragmentScanOptions, long memoryPool)
    Create Scanner from a Dataset and get the native pointer of the Dataset.
    void
    Ensure the S3 APIs are shutdown, but only if not already done.
    static JniWrapper
    get()
     
    byte[]
    getSchemaFromScanner(long scannerId)
    Get a serialized schema from native instance of a Scanner.
    byte[]
    inspectSchema(long datasetFactoryId)
    Get a serialized schema from native instance of a DatasetFactory.
    boolean
    nextRecordBatch(long scannerId, long arrowArray)
    Read next record batch from the specified scanner.
    void
    releaseBuffer(long bufferId)
    Release the Buffer by destroying its reference held by JNI wrapper.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Method Details

    • get

      public static JniWrapper get()
    • closeDatasetFactory

      public void closeDatasetFactory(long datasetFactoryId)
      Release the DatasetFactory by destroying its reference held by JNI wrapper.
      Parameters:
      datasetFactoryId - the native pointer of the arrow::dataset::DatasetFactory instance.
    • inspectSchema

      public byte[] inspectSchema(long datasetFactoryId)
      Get a serialized schema from native instance of a DatasetFactory.
      Parameters:
      datasetFactoryId - the native pointer of the arrow::dataset::DatasetFactory instance.
      Returns:
      the serialized schema
      See Also:
    • createDataset

      public long createDataset(long datasetFactoryId, byte[] schema)
      Create Dataset from a DatasetFactory and get the native pointer of the Dataset.
      Parameters:
      datasetFactoryId - the native pointer of the arrow::dataset::DatasetFactory instance.
      schema - the predefined schema of the resulting Dataset.
      Returns:
      the native pointer of the arrow::dataset::Dataset instance.
    • closeDataset

      public void closeDataset(long datasetId)
      Release the Dataset by destroying its reference held by JNI wrapper.
      Parameters:
      datasetId - the native pointer of the arrow::dataset::Dataset instance.
    • createScanner

      public long createScanner(long datasetId, String[] columns, ByteBuffer substraitProjection, ByteBuffer substraitFilter, long batchSize, int fileFormat, String[] serializedFragmentScanOptions, long memoryPool)
      Create Scanner from a Dataset and get the native pointer of the Dataset.
      Parameters:
      datasetId - the native pointer of the arrow::dataset::Dataset instance.
      columns - desired column names. Columns not in this list will not be emitted when performing scan operation. Null equals to "all columns".
      substraitProjection - substrait extended expression to evaluate for project new columns
      substraitFilter - substrait extended expression to evaluate for apply filter
      batchSize - batch size of scanned record batches.
      fileFormat - file format ID.
      serializedFragmentScanOptions - serialized FragmentScanOptions.
      memoryPool - identifier of memory pool used in the native scanner.
      Returns:
      the native pointer of the arrow::dataset::Scanner instance.
    • getSchemaFromScanner

      public byte[] getSchemaFromScanner(long scannerId)
      Get a serialized schema from native instance of a Scanner.
      Parameters:
      scannerId - the native pointer of the arrow::dataset::Scanner instance.
      Returns:
      the serialized schema
      See Also:
    • closeScanner

      public void closeScanner(long scannerId)
      Release the Scanner by destroying its reference held by JNI wrapper.
      Parameters:
      scannerId - the native pointer of the arrow::dataset::Scanner instance.
    • nextRecordBatch

      public boolean nextRecordBatch(long scannerId, long arrowArray)
      Read next record batch from the specified scanner.
      Parameters:
      scannerId - the native pointer of the arrow::dataset::Scanner instance.
      arrowArray - pointer to an empty ArrowArray struct to store C++ side record batch that conforms to C data interface.
      Returns:
      true if valid record batch is returned; false if stream ended.
    • releaseBuffer

      public void releaseBuffer(long bufferId)
      Release the Buffer by destroying its reference held by JNI wrapper.
      Parameters:
      bufferId - the native pointer of the arrow::Buffer instance.
    • ensureS3Finalized

      public void ensureS3Finalized()
      Ensure the S3 APIs are shutdown, but only if not already done. If the S3 APIs are uninitialized, then this is a noop.