Implement ORC chunked reader #15094

ttnghia · 2024-02-20T19:19:50Z

This implements ORC chunked reader, to support reading ORC such that:

The output is multiple tables instead of once, each of them is issue when calling to read_chunk(), and has limited size which stays within a given output_limit parameter.
The temporary device memory usage can be limited by a soft limit data_read_limit parameter, allowing to read very large ORC files without OOM.
ORC files containing many billions of rows can be properly read chunk-by-chunk without seeing the size overflow issue when the number of rows exceeds cudf size limit (2^31 rows).

Depends on:

Partially contribute to #12228.

Benchmarks

Due to some small optimizations in ORC reader, reading ORC files all-at-once (reading the entire file into just one output table) can be a little bit faster. For example, with the benchmark orc_read_io_compression:

## [0] Quadro RTX 6000

|      io       |  compression  |  cardinality  |  run_length  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |          Diff |   %Diff |  Status  |
|---------------|---------------|---------------|--------------|------------|-------------|------------|-------------|---------------|---------|----------|
|   FILEPATH    |    SNAPPY     |       0       |      1       | 183.027 ms |       7.45% | 157.293 ms |       4.72% | -25733.837 us | -14.06% |   FAIL   |
|   FILEPATH    |    SNAPPY     |     1000      |      1       | 198.228 ms |       6.43% | 164.395 ms |       4.14% | -33833.020 us | -17.07% |   FAIL   |
|   FILEPATH    |    SNAPPY     |       0       |      32      |  96.676 ms |       6.19% |  82.522 ms |       1.36% | -14153.945 us | -14.64% |   FAIL   |
|   FILEPATH    |    SNAPPY     |     1000      |      32      |  94.508 ms |       4.80% |  81.078 ms |       0.48% | -13429.672 us | -14.21% |   FAIL   |
|   FILEPATH    |     NONE      |       0       |      1       | 161.868 ms |       5.40% | 139.849 ms |       2.44% | -22018.910 us | -13.60% |   FAIL   |
|   FILEPATH    |     NONE      |     1000      |      1       | 164.902 ms |       5.80% | 142.041 ms |       3.43% | -22861.258 us | -13.86% |   FAIL   |
|   FILEPATH    |     NONE      |       0       |      32      |  88.298 ms |       5.15% |  74.924 ms |       1.97% | -13374.607 us | -15.15% |   FAIL   |
|   FILEPATH    |     NONE      |     1000      |      32      |  87.147 ms |       5.61% |  72.502 ms |       0.50% | -14645.122 us | -16.81% |   FAIL   |
|  HOST_BUFFER  |    SNAPPY     |       0       |      1       | 124.990 ms |       0.39% | 111.670 ms |       2.13% | -13320.483 us | -10.66% |   FAIL   |
|  HOST_BUFFER  |    SNAPPY     |     1000      |      1       | 149.858 ms |       4.10% | 126.266 ms |       0.48% | -23591.543 us | -15.74% |   FAIL   |
|  HOST_BUFFER  |    SNAPPY     |       0       |      32      |  92.499 ms |       4.46% |  77.653 ms |       1.58% | -14846.471 us | -16.05% |   FAIL   |
|  HOST_BUFFER  |    SNAPPY     |     1000      |      32      |  93.373 ms |       4.14% |  80.033 ms |       3.19% | -13340.002 us | -14.29% |   FAIL   |
|  HOST_BUFFER  |     NONE      |       0       |      1       | 111.792 ms |       0.50% |  97.083 ms |       0.50% | -14709.530 us | -13.16% |   FAIL   |
|  HOST_BUFFER  |     NONE      |     1000      |      1       | 117.646 ms |       5.60% |  97.634 ms |       0.44% | -20012.301 us | -17.01% |   FAIL   |
|  HOST_BUFFER  |     NONE      |       0       |      32      |  84.983 ms |       4.96% |  66.975 ms |       0.50% | -18007.403 us | -21.19% |   FAIL   |
|  HOST_BUFFER  |     NONE      |     1000      |      32      |  82.648 ms |       4.42% |  65.510 ms |       0.91% | -17137.910 us | -20.74% |   FAIL   |
| DEVICE_BUFFER |    SNAPPY     |       0       |      1       |  65.538 ms |       4.02% |  59.399 ms |       2.54% |  -6138.560 us |  -9.37% |   FAIL   |
| DEVICE_BUFFER |    SNAPPY     |     1000      |      1       | 101.427 ms |       4.10% |  92.276 ms |       3.30% |  -9150.278 us |  -9.02% |   FAIL   |
| DEVICE_BUFFER |    SNAPPY     |       0       |      32      |  80.133 ms |       4.64% |  73.959 ms |       3.50% |  -6173.818 us |  -7.70% |   FAIL   |
| DEVICE_BUFFER |    SNAPPY     |     1000      |      32      |  86.232 ms |       4.71% |  77.446 ms |       3.32% |  -8786.606 us | -10.19% |   FAIL   |
| DEVICE_BUFFER |     NONE      |       0       |      1       |  52.189 ms |       6.62% |  45.018 ms |       4.11% |  -7171.043 us | -13.74% |   FAIL   |
| DEVICE_BUFFER |     NONE      |     1000      |      1       |  54.664 ms |       6.76% |  46.855 ms |       3.35% |  -7809.803 us | -14.29% |   FAIL   |
| DEVICE_BUFFER |     NONE      |       0       |      32      |  67.975 ms |       5.12% |  60.553 ms |       4.22% |  -7422.279 us | -10.92% |   FAIL   |
| DEVICE_BUFFER |     NONE      |     1000      |      32      |  68.485 ms |       4.86% |  62.253 ms |       6.23% |  -6232.340 us |  -9.10% |   FAIL   |

When memory is limited, chunked read can help avoiding OOM but with some sort of performance trade-off. For example, for reading a table of size 500MB from file using 64MB output limits and 640 MB data read limit:

|      io       |  compression  |  cardinality  |  run_length  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |       Diff |   %Diff |  Status  |
|---------------|---------------|---------------|--------------|------------|-------------|------------|-------------|------------|---------|----------|
|   FILEPATH    |    SNAPPY     |       0       |      1       | 183.027 ms |       7.45% | 350.824 ms |       2.74% | 167.796 ms |  91.68% |   FAIL   |
|   FILEPATH    |    SNAPPY     |     1000      |      1       | 198.228 ms |       6.43% | 322.414 ms |       3.46% | 124.186 ms |  62.65% |   FAIL   |
|   FILEPATH    |    SNAPPY     |       0       |      32      |  96.676 ms |       6.19% | 133.363 ms |       4.78% |  36.686 ms |  37.95% |   FAIL   |
|   FILEPATH    |    SNAPPY     |     1000      |      32      |  94.508 ms |       4.80% | 128.897 ms |       0.37% |  34.389 ms |  36.39% |   FAIL   |
|   FILEPATH    |     NONE      |       0       |      1       | 161.868 ms |       5.40% | 316.637 ms |       4.21% | 154.769 ms |  95.61% |   FAIL   |
|   FILEPATH    |     NONE      |     1000      |      1       | 164.902 ms |       5.80% | 326.043 ms |       3.06% | 161.141 ms |  97.72% |   FAIL   |
|   FILEPATH    |     NONE      |       0       |      32      |  88.298 ms |       5.15% | 124.819 ms |       5.17% |  36.520 ms |  41.36% |   FAIL   |
|   FILEPATH    |     NONE      |     1000      |      32      |  87.147 ms |       5.61% | 123.047 ms |       5.82% |  35.900 ms |  41.19% |   FAIL   |
|  HOST_BUFFER  |    SNAPPY     |       0       |      1       | 124.990 ms |       0.39% | 285.718 ms |       0.78% | 160.728 ms | 128.59% |   FAIL   |
|  HOST_BUFFER  |    SNAPPY     |     1000      |      1       | 149.858 ms |       4.10% | 263.491 ms |       2.89% | 113.633 ms |  75.83% |   FAIL   |
|  HOST_BUFFER  |    SNAPPY     |       0       |      32      |  92.499 ms |       4.46% | 127.881 ms |       0.86% |  35.382 ms |  38.25% |   FAIL   |
|  HOST_BUFFER  |    SNAPPY     |     1000      |      32      |  93.373 ms |       4.14% | 128.022 ms |       0.98% |  34.650 ms |  37.11% |   FAIL   |
|  HOST_BUFFER  |     NONE      |       0       |      1       | 111.792 ms |       0.50% | 241.064 ms |       1.89% | 129.271 ms | 115.64% |   FAIL   |
|  HOST_BUFFER  |     NONE      |     1000      |      1       | 117.646 ms |       5.60% | 248.134 ms |       3.08% | 130.488 ms | 110.92% |   FAIL   |
|  HOST_BUFFER  |     NONE      |       0       |      32      |  84.983 ms |       4.96% | 118.049 ms |       5.99% |  33.066 ms |  38.91% |   FAIL   |
|  HOST_BUFFER  |     NONE      |     1000      |      32      |  82.648 ms |       4.42% | 114.577 ms |       2.34% |  31.929 ms |  38.63% |   FAIL   |
| DEVICE_BUFFER |    SNAPPY     |       0       |      1       |  65.538 ms |       4.02% | 232.466 ms |       3.28% | 166.928 ms | 254.71% |   FAIL   |
| DEVICE_BUFFER |    SNAPPY     |     1000      |      1       | 101.427 ms |       4.10% | 221.578 ms |       1.43% | 120.152 ms | 118.46% |   FAIL   |
| DEVICE_BUFFER |    SNAPPY     |       0       |      32      |  80.133 ms |       4.64% | 120.604 ms |       0.35% |  40.471 ms |  50.50% |   FAIL   |
| DEVICE_BUFFER |    SNAPPY     |     1000      |      32      |  86.232 ms |       4.71% | 125.521 ms |       3.93% |  39.289 ms |  45.56% |   FAIL   |
| DEVICE_BUFFER |     NONE      |       0       |      1       |  52.189 ms |       6.62% | 182.943 ms |       0.29% | 130.754 ms | 250.54% |   FAIL   |
| DEVICE_BUFFER |     NONE      |     1000      |      1       |  54.664 ms |       6.76% | 190.501 ms |       0.49% | 135.836 ms | 248.49% |   FAIL   |
| DEVICE_BUFFER |     NONE      |       0       |      32      |  67.975 ms |       5.12% | 107.172 ms |       3.56% |  39.197 ms |  57.66% |   FAIL   |
| DEVICE_BUFFER |     NONE      |     1000      |      32      |  68.485 ms |       4.86% | 108.097 ms |       2.92% |  39.611 ms |  57.84% |   FAIL   |

And if memory is too limited, chunked read with 8MB output limit/80MB data read limit:

|      io       |  compression  |  cardinality  |  run_length  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |       Diff |   %Diff |  Status  |
|      io       |  compression  |  cardinality  |  run_length  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |       Diff |   %Diff |  Status  |
|---------------|---------------|---------------|--------------|------------|-------------|------------|-------------|------------|---------|----------|
|   FILEPATH    |    SNAPPY     |       0       |      1       | 183.027 ms |       7.45% | 732.926 ms |       1.98% | 549.899 ms | 300.45% |   FAIL   |
|   FILEPATH    |    SNAPPY     |     1000      |      1       | 198.228 ms |       6.43% | 834.309 ms |       4.21% | 636.081 ms | 320.88% |   FAIL   |
|   FILEPATH    |    SNAPPY     |       0       |      32      |  96.676 ms |       6.19% | 363.033 ms |       1.66% | 266.356 ms | 275.51% |   FAIL   |
|   FILEPATH    |    SNAPPY     |     1000      |      32      |  94.508 ms |       4.80% | 313.813 ms |       1.28% | 219.305 ms | 232.05% |   FAIL   |
|   FILEPATH    |     NONE      |       0       |      1       | 161.868 ms |       5.40% | 607.700 ms |       2.90% | 445.832 ms | 275.43% |   FAIL   |
|   FILEPATH    |     NONE      |     1000      |      1       | 164.902 ms |       5.80% | 616.101 ms |       3.46% | 451.199 ms | 273.62% |   FAIL   |
|   FILEPATH    |     NONE      |       0       |      32      |  88.298 ms |       5.15% | 267.703 ms |       0.46% | 179.405 ms | 203.18% |   FAIL   |
|   FILEPATH    |     NONE      |     1000      |      32      |  87.147 ms |       5.61% | 250.528 ms |       0.43% | 163.381 ms | 187.48% |   FAIL   |
|  HOST_BUFFER  |    SNAPPY     |       0       |      1       | 124.990 ms |       0.39% | 636.270 ms |       0.44% | 511.280 ms | 409.06% |   FAIL   |
|  HOST_BUFFER  |    SNAPPY     |     1000      |      1       | 149.858 ms |       4.10% | 747.264 ms |       0.50% | 597.406 ms | 398.65% |   FAIL   |
|  HOST_BUFFER  |    SNAPPY     |       0       |      32      |  92.499 ms |       4.46% | 359.660 ms |       0.19% | 267.161 ms | 288.82% |   FAIL   |
|  HOST_BUFFER  |    SNAPPY     |     1000      |      32      |  93.373 ms |       4.14% | 311.608 ms |       0.43% | 218.235 ms | 233.73% |   FAIL   |
|  HOST_BUFFER  |     NONE      |       0       |      1       | 111.792 ms |       0.50% | 493.797 ms |       0.13% | 382.005 ms | 341.71% |   FAIL   |
|  HOST_BUFFER  |     NONE      |     1000      |      1       | 117.646 ms |       5.60% | 516.706 ms |       0.12% | 399.060 ms | 339.20% |   FAIL   |
|  HOST_BUFFER  |     NONE      |       0       |      32      |  84.983 ms |       4.96% | 258.477 ms |       0.46% | 173.495 ms | 204.15% |   FAIL   |
|  HOST_BUFFER  |     NONE      |     1000      |      32      |  82.648 ms |       4.42% | 248.028 ms |       5.30% | 165.380 ms | 200.10% |   FAIL   |
| DEVICE_BUFFER |    SNAPPY     |       0       |      1       |  65.538 ms |       4.02% | 606.010 ms |       3.76% | 540.472 ms | 824.68% |   FAIL   |
| DEVICE_BUFFER |    SNAPPY     |     1000      |      1       | 101.427 ms |       4.10% | 742.774 ms |       4.64% | 641.347 ms | 632.33% |   FAIL   |
| DEVICE_BUFFER |    SNAPPY     |       0       |      32      |  80.133 ms |       4.64% | 364.701 ms |       2.70% | 284.568 ms | 355.12% |   FAIL   |
| DEVICE_BUFFER |    SNAPPY     |     1000      |      32      |  86.232 ms |       4.71% | 320.387 ms |       2.80% | 234.155 ms | 271.54% |   FAIL   |
| DEVICE_BUFFER |     NONE      |       0       |      1       |  52.189 ms |       6.62% | 458.100 ms |       2.15% | 405.912 ms | 777.78% |   FAIL   |
| DEVICE_BUFFER |     NONE      |     1000      |      1       |  54.664 ms |       6.76% | 478.527 ms |       1.41% | 423.862 ms | 775.39% |   FAIL   |
| DEVICE_BUFFER |     NONE      |       0       |      32      |  67.975 ms |       5.12% | 260.009 ms |       3.71% | 192.034 ms | 282.51% |   FAIL   |
| DEVICE_BUFFER |     NONE      |     1000      |      32      |  68.485 ms |       4.86% | 243.705 ms |       2.09% | 175.220 ms | 255.85% |   FAIL   |

copy-pr-bot · 2024-02-20T19:19:54Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

# Conflicts: # cpp/src/io/orc/aggregate_orc_metadata.hpp # cpp/src/io/orc/reader_impl.cu # cpp/src/io/orc/reader_impl.hpp # cpp/src/io/orc/reader_impl_preprocess.cu

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

mythrocks

Sorry, this is going to take me a little while to traverse. :]

mythrocks · 2024-02-28T19:13:00Z

cpp/benchmarks/io/orc/orc_reader_input.cpp

+// NVBENCH_BENCH_TYPES(BM_orc_read_data,
+//                     NVBENCH_TYPE_AXES(d_type_list,
+//                                       nvbench::enum_type_list<cudf::io::io_type::DEVICE_BUFFER>))
+//   .set_name("orc_read_decode")
+//   .set_type_axes_names({"data_type", "io"})
+//   .set_min_samples(4)
+//   .add_int64_axis("cardinality", {0, 1000})
+//   .add_int64_axis("run_length", {1, 32});



Do we intend to remove this, or un-comment it?

Sorry I still need to clean up this and much more 😄

mythrocks · 2024-02-28T22:06:08Z

cpp/include/cudf/io/detail/orc.hpp

@@ -62,7 +67,7 @@ class reader {
  /**
   * @brief Destructor explicitly declared to avoid inlining in header
   */
-  ~reader();
+  virtual ~reader();


mythrocks · 2024-02-28T22:08:08Z

cpp/include/cudf/io/detail/orc.hpp

+   *
+   * @param output_size_limit Limit on total number of bytes to be returned per read,
+   *        or `0` if there is no limit
+   * @param data_read_limit Limit on memory usage for the purposes of decompression and processing


If this is in bytes, data_read_limit_bytes might be a more descriptive name.

The doxygen should clarify its name thus I'm inclined to keep the name shorter. I even prefer read_limit but that seems too short and less descriptive so maybe data_read_limit is better.

mythrocks · 2024-02-28T22:11:20Z

cpp/include/cudf/io/detail/orc.hpp

+   * @param stream CUDA stream used for device memory operations and kernel launches
+   * @param mr Device memory resource to use for device memory allocation
+   */
+  explicit chunked_reader(std::size_t output_size_limit,


Silly question: Why is this ctor explicit? This isn't a single argument ctor, so I don't know if we run the risk of accidental conversion into a chunked_reader.

I take my question back. From scripture, this might be to prevent construction like:

chunked_reader const my_reader = { 10, 20, std::vector{...}, orc_reader_options{...} };

Do I have this right?

Exactly, the explicit keyword here is just to prevent such construction. Technically we would not have any issue with it, just the matter of code cleaness.

cpp/include/cudf/io/orc.hpp

cpp/src/io/orc/aggregate_orc_metadata.cpp

cpp/src/io/orc/reader_impl.hpp

cpp/src/io/orc/reader_impl_chunking.hpp

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

ttnghia · 2024-04-26T03:54:36Z

/ok to test

vuule

Another set of small suggestions
Looks pretty good overall

cpp/src/io/orc/aggregate_orc_metadata.cpp

cpp/src/io/orc/reader_impl_chunking.hpp

cpp/src/io/orc/reader_impl.hpp

cpp/src/io/orc/reader_impl_chunking.hpp

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

ttnghia · 2024-04-29T20:36:47Z

/ok to test

vuule

finally finished a full pass
bunch of small questions/suggestions, nothing really blocking

cpp/src/io/orc/reader_impl_decode.cu

cpp/src/io/orc/reader_impl.cu

cpp/src/io/orc/reader_impl_chunking.cu

cpp/src/io/orc/reader_impl_chunking.hpp

cpp/src/io/orc/reader_impl_chunking.cu

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

ttnghia · 2024-05-02T21:54:38Z

/ok to test

ttnghia · 2024-05-02T22:10:24Z

/merge

This adds JNI implementation for chunked ORC reader, allowing to read ORC files by an iterative manner. Depends on: * #15094 Closes #12228. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Jason Lowe (https://github.com/jlowe) URL: #15446

ttnghia added feature request New feature or request 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change labels Feb 20, 2024

ttnghia self-assigned this Feb 20, 2024

github-actions bot added the CMake CMake build issue label Feb 20, 2024

ttnghia force-pushed the chunked_orc_reader branch from c0ed5c3 to c75daeb Compare February 20, 2024 19:30

ttnghia added 18 commits February 24, 2024 08:50

Fix stripe lookup bug

a38b115

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Fix a bug

75cec9b

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Fix another bug

a7bd47a

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Debugging

db768fb

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

All tests pass

f8652d7

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Reverse tests

537ea0c

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Fix for temp concatenation

24e1552

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Turn off debug printing

df8d9b3

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Merge branch 'branch-24.04' into chunked_orc_reader

4e1614e

# Conflicts: # cpp/src/io/orc/aggregate_orc_metadata.hpp # cpp/src/io/orc/reader_impl.cu # cpp/src/io/orc/reader_impl.hpp # cpp/src/io/orc/reader_impl_preprocess.cu

Some fixes

12dff3b

Fix host memory issue

5401826

Some cleanup

e53cf56

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Compute table row size

f2ec94c

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Compute column row size

fd325b6

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Test column size

416d810

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Test column sizes using segmented_bit_count

b745787

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Compute table sizes using segmented_bit_count

d0ed05a

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Temporary store multiple decoded tables

ae06017

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

mythrocks reviewed Feb 29, 2024

View reviewed changes

ttnghia added 2 commits February 28, 2024 19:25

Add test file

6cccca3

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Add comment

2488cb2

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Change variable order

3ed1e2e

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

vuule self-requested a review April 26, 2024 20:27

vuule reviewed Apr 26, 2024

View reviewed changes

cpp/src/io/orc/reader_impl_chunking.hpp Outdated Show resolved Hide resolved

ttnghia added 10 commits April 29, 2024 13:05

Move data to output

6335586

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Rename function

437c9c0

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Change has_no_data into has_data

cc174bb

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Rename variable

1e69335

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Implement size for range

ad69236

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Change docs

890abb4

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Rename _config into _options

cb21a6d

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Change comments

4e64eb7

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Change cumulative_size_and_row to subclass cumulative_size

d42ed14

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Merge branch 'branch-24.06' into chunked_orc_reader

bd5290c

vuule self-requested a review April 30, 2024 23:25

nvdbaranec approved these changes May 1, 2024

View reviewed changes

vuule reviewed May 2, 2024

View reviewed changes

ttnghia and others added 5 commits May 1, 2024 21:34

Address some review comments

a0ca333

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Merge branch 'branch-24.06' into chunked_orc_reader

536078b

Remove handling for READ_ALL when number of rows exceed 2B rows

cf18e4c

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Merge branch 'branch-24.06' into chunked_orc_reader

7b8a96f

Rename parameter

42601b2

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

vuule approved these changes May 2, 2024

View reviewed changes

rapids-bot bot merged commit 81f8cdf into rapidsai:branch-24.06 May 2, 2024
69 checks passed

ttnghia deleted the chunked_orc_reader branch May 2, 2024 23:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement ORC chunked reader #15094

Implement ORC chunked reader #15094

ttnghia commented Feb 20, 2024 •

edited

copy-pr-bot bot commented Feb 20, 2024

mythrocks left a comment

mythrocks Feb 28, 2024

ttnghia Mar 1, 2024

mythrocks Feb 28, 2024

mythrocks Feb 28, 2024

ttnghia Mar 7, 2024

mythrocks Feb 28, 2024

mythrocks Feb 28, 2024

ttnghia Mar 7, 2024

ttnghia commented Apr 26, 2024

vuule left a comment

ttnghia commented Apr 29, 2024

vuule left a comment

ttnghia commented May 2, 2024

ttnghia commented May 2, 2024

Implement ORC chunked reader #15094

Implement ORC chunked reader #15094

Conversation

ttnghia commented Feb 20, 2024 • edited

Benchmarks

copy-pr-bot bot commented Feb 20, 2024

mythrocks left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ttnghia commented Apr 26, 2024

vuule left a comment

Choose a reason for hiding this comment

ttnghia commented Apr 29, 2024

vuule left a comment

Choose a reason for hiding this comment

ttnghia commented May 2, 2024

ttnghia commented May 2, 2024

ttnghia commented Feb 20, 2024 •

edited