Class JniWrapper

java.lang.Object
org.apache.arrow.dataset.file.JniWrapper

public class JniWrapper extends Object
JniWrapper for filesystem based Dataset implementations.
  • Method Details

    • get

      public static JniWrapper get()
    • makeFileSystemDatasetFactory

      public long makeFileSystemDatasetFactory(String uri, int fileFormat, String[] serializedFragmentScanOptions)
      Create FileSystemDatasetFactory and return its native pointer. The pointer is pointing to a intermediate shared_ptr of the factory instance.
      Parameters:
      uri - file uri to read, either a file or a directory
      fileFormat - file format ID.
      serializedFragmentScanOptions - serialized FragmentScanOptions.
      Returns:
      the native pointer of the arrow::dataset::FileSystemDatasetFactory instance.
      See Also:
    • makeFileSystemDatasetFactoryWithFiles

      public long makeFileSystemDatasetFactoryWithFiles(String[] uris, int fileFormat, String[] serializedFragmentScanOptions)
      Create FileSystemDatasetFactory and return its native pointer. The pointer is pointing to a intermediate shared_ptr of the factory instance.
      Parameters:
      uris - List of file uris to read, each path pointing to an individual file
      fileFormat - file format ID.
      serializedFragmentScanOptions - serialized FragmentScanOptions.
      Returns:
      the native pointer of the arrow::dataset::FileSystemDatasetFactory instance.
      See Also:
    • writeFromScannerToFile

      public void writeFromScannerToFile(long streamAddress, long fileFormat, String uri, String[] partitionColumns, int maxPartitions, String baseNameTemplate)
      Write the content in a ArrowArrayStream into files. This internally depends on C++ write API: FileSystemDataset::Write.
      Parameters:
      streamAddress - the ArrowArrayStream address
      fileFormat - target file format (ID)
      uri - target file uri
      partitionColumns - columns used to partition output files
      maxPartitions - maximum partitions to be included in written files
      baseNameTemplate - file name template used to make partitions. E.g. "dat_{i}", i is current partition ID around all written files.