java.lang.Object
org.apache.arrow.dataset.jni.JniWrapper
JNI wrapper for Dataset API's native implementation.
-
Method Summary
Modifier and TypeMethodDescriptionvoid
closeDataset
(long datasetId) Release the Dataset by destroying its reference held by JNI wrapper.void
closeDatasetFactory
(long datasetFactoryId) Release the DatasetFactory by destroying its reference held by JNI wrapper.void
closeScanner
(long scannerId) Release the Scanner by destroying its reference held by JNI wrapper.long
createDataset
(long datasetFactoryId, byte[] schema) Create Dataset from a DatasetFactory and get the native pointer of the Dataset.long
createScanner
(long datasetId, String[] columns, ByteBuffer substraitProjection, ByteBuffer substraitFilter, long batchSize, int fileFormat, String[] serializedFragmentScanOptions, long memoryPool) Create Scanner from a Dataset and get the native pointer of the Dataset.void
Ensure the S3 APIs are shutdown, but only if not already done.static JniWrapper
get()
byte[]
getSchemaFromScanner
(long scannerId) Get a serialized schema from native instance of a Scanner.byte[]
inspectSchema
(long datasetFactoryId) Get a serialized schema from native instance of a DatasetFactory.boolean
nextRecordBatch
(long scannerId, long arrowArray) Read next record batch from the specified scanner.void
releaseBuffer
(long bufferId) Release the Buffer by destroying its reference held by JNI wrapper.
-
Method Details
-
get
-
closeDatasetFactory
public void closeDatasetFactory(long datasetFactoryId) Release the DatasetFactory by destroying its reference held by JNI wrapper.- Parameters:
datasetFactoryId
- the native pointer of the arrow::dataset::DatasetFactory instance.
-
inspectSchema
public byte[] inspectSchema(long datasetFactoryId) Get a serialized schema from native instance of a DatasetFactory.- Parameters:
datasetFactoryId
- the native pointer of the arrow::dataset::DatasetFactory instance.- Returns:
- the serialized schema
- See Also:
-
createDataset
public long createDataset(long datasetFactoryId, byte[] schema) Create Dataset from a DatasetFactory and get the native pointer of the Dataset.- Parameters:
datasetFactoryId
- the native pointer of the arrow::dataset::DatasetFactory instance.schema
- the predefined schema of the resulting Dataset.- Returns:
- the native pointer of the arrow::dataset::Dataset instance.
-
closeDataset
public void closeDataset(long datasetId) Release the Dataset by destroying its reference held by JNI wrapper.- Parameters:
datasetId
- the native pointer of the arrow::dataset::Dataset instance.
-
createScanner
public long createScanner(long datasetId, String[] columns, ByteBuffer substraitProjection, ByteBuffer substraitFilter, long batchSize, int fileFormat, String[] serializedFragmentScanOptions, long memoryPool) Create Scanner from a Dataset and get the native pointer of the Dataset.- Parameters:
datasetId
- the native pointer of the arrow::dataset::Dataset instance.columns
- desired column names. Columns not in this list will not be emitted when performing scan operation. Null equals to "all columns".substraitProjection
- substrait extended expression to evaluate for project new columnssubstraitFilter
- substrait extended expression to evaluate for apply filterbatchSize
- batch size of scanned record batches.fileFormat
- file format ID.serializedFragmentScanOptions
- serialized FragmentScanOptions.memoryPool
- identifier of memory pool used in the native scanner.- Returns:
- the native pointer of the arrow::dataset::Scanner instance.
-
getSchemaFromScanner
public byte[] getSchemaFromScanner(long scannerId) Get a serialized schema from native instance of a Scanner.- Parameters:
scannerId
- the native pointer of the arrow::dataset::Scanner instance.- Returns:
- the serialized schema
- See Also:
-
closeScanner
public void closeScanner(long scannerId) Release the Scanner by destroying its reference held by JNI wrapper.- Parameters:
scannerId
- the native pointer of the arrow::dataset::Scanner instance.
-
nextRecordBatch
public boolean nextRecordBatch(long scannerId, long arrowArray) Read next record batch from the specified scanner.- Parameters:
scannerId
- the native pointer of the arrow::dataset::Scanner instance.arrowArray
- pointer to an emptyArrowArray
struct to store C++ side record batch that conforms to C data interface.- Returns:
- true if valid record batch is returned; false if stream ended.
-
releaseBuffer
public void releaseBuffer(long bufferId) Release the Buffer by destroying its reference held by JNI wrapper.- Parameters:
bufferId
- the native pointer of the arrow::Buffer instance.
-
ensureS3Finalized
public void ensureS3Finalized()Ensure the S3 APIs are shutdown, but only if not already done. If the S3 APIs are uninitialized, then this is a noop.
-