Cleanup `hostdevice_vector` and add more APIs #15252

ttnghia · 2024-03-08T00:24:36Z

This work includes:

Fix a bug in hostdevice_vector when the host buffer does not change its size when appending new elements. Instead, the new elements are written directly into raw memory (which is out of bounds). Previously, this did not trigger any issue since the host buffer has reserved plenty of memory upon its construction, until I attempted to access the front() and back() elements of it.
Add front() and back() accessors which return the first and last elements in the host buffer.

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

vuule · 2024-03-08T00:33:58Z

How did you manage to open this hours after #15079 merged?!

vuule · 2024-03-08T00:36:33Z

how did this come up?
I just talked to @nvdbaranec about removing push_back from hostdevice_vector, but it sounds like you are actively using that feature.

ttnghia · 2024-03-08T00:41:41Z

I've just merged 15079 into my chunked ORC reader and all the tests crashed, so I figured out this bug.

ttnghia · 2024-03-08T00:42:34Z

how did this come up? I just talked to @nvdbaranec about removing push_back from hostdevice_vector, but it sounds like you are actively using that feature.

~~Indeed I'm not using push_back~~ Sorry I just checked my code and see that I actually use push_back 😄

I'm using front and back() in my ORC chunked reader. Removing push_back would be fine (code can be adapted easily) if that is necessary.

ttnghia · 2024-03-08T00:52:08Z

I've just tried to remove push_back and found that only one place in the ORC reader is using it. So I think we can remove it quickly in this PR too. Please let me know if you want that.

Note that this will change the interface of hostdevice_vector, removing the initial_size parameter completely. From now, we will always construct host/device vector at maximum size.

vuule · 2024-03-08T01:01:14Z

I've just tried to remove push_back and found that only one place in the ORC reader is using it. So I think we can remove it quickly in this PR too. Please let me know if you want that.

Note that this will change the interface of hostdevice_vector, removing the initial_size parameter completely. From now, we will always construct host/device vector at maximum size.

If you think you can quickly remove it, please go ahead. I'll be happy to see initial_size and current_size go away.

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

bdice · 2024-03-08T03:26:37Z

No push_back… not resizable… hmm. Hey, that’s not a vector at all!

That’s an array!

(It’s a bit oversimplified, with dynamic vs static allocation and such. But this does make me question the name.)

ttnghia · 2024-03-08T03:43:20Z

hostdevice_array LGTM :)

vuule · 2024-03-08T18:33:05Z

@bdice first of all, 😂
I guess it is more of a buffer/array (array to me implies compile-time size). Too bad thrust's buffers are untyped, hostdevice_buffer sounds great to me.

vuule · 2024-03-08T18:34:42Z

cpp/src/io/orc/reader_impl_preprocess.cu

+  cudf::detail::hostdevice_vector<gpu::CompressedStreamInfo> compinfo(stream_info.size(), stream);
+  for (std::size_t idx = 0; idx < stream_info.size(); ++idx) {
+    auto const& info = stream_info[idx];
+    compinfo[idx]    = gpu::CompressedStreamInfo(
+      static_cast<uint8_t const*>(stripe_data[info.stripe_idx].data()) + info.dst_pos, info.length);


Are you kidding me, this was it?!

What's wrong with it?

Nothing, I just thought (feared) bigger changes were needed to remove push_back.

vuule · 2024-03-08T18:37:02Z

apparently span tests also use push_back

ttnghia · 2024-03-08T18:42:36Z

So far we have the new names:

hostdevice_vector (current)
hostdevice_array
hostdevice_buffer

Any other candidate? @nvdbaranec?

vuule · 2024-03-08T18:44:37Z

So far we have the new names:

hostdevice_vector (current)

hostdevice_array

hostdevice_buffer

Any other candidate? @nvdbaranec?

Not sure if this is clear - IMO we should not rename it in this PR. Maybe something we can evaluate for 24.06.

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

nvdbaranec · 2024-03-08T20:50:31Z

I'd point out: rmm::device_uvector doesn't support push_back either. It does support resize though. I would probably lean towards no change, but if I had to choose, I'd go with array.

nvdbaranec · 2024-03-08T20:53:17Z

@bdice first of all, 😂 I guess it is more of a buffer/array (array to me implies compile-time size). Too bad thrust's buffers are untyped, hostdevice_buffer sounds great to me.

For me "buffer" implies typeless, ie rmm::device_buffer. Just bytes.

vuule · 2024-03-08T21:07:35Z

we could add a resize if that helps the argument for not renaming the class 🤷

bdice · 2024-03-08T21:28:01Z

I don’t want to force a name change in this PR. Just had to ~~make a meme~~ share the idea. Let’s discuss in an issue, as it seems this PR isn’t 100% sure on the API yet.

ttnghia · 2024-03-10T19:59:26Z

I think we should better keep the push_back API as it is more convenient to use. My upcoming chunked ORC reader has more of its usage. Without it, we have to create an index variable and keep track of it to write elements iteratively.

So, keep push_back, and keep the name hostdevice_vector unchanged?

vuule · 2024-03-11T17:05:16Z

So, keep push_back, and keep the name hostdevice_vector unchanged?

Feel free to limit this PR to its original scope.

I always felt like push_back was half-baked in this class and thus should be removed. Maybe instead we need to make sure it's fully baked. Either way, nothing urgent for this PR/release.

This reverts commit 0c149c3.

This reverts commit f931d43. # Conflicts: # cpp/src/io/utilities/hostdevice_vector.hpp

This reverts commit eac0d51. # Conflicts: # cpp/src/io/utilities/hostdevice_vector.hpp

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

ttnghia · 2024-03-11T22:01:15Z

/merge

This implements ORC chunked reader, to support reading ORC such that: * The output is multiple tables instead of once, each of them is issue when calling to `read_chunk()`, and has limited size which stays within a given `output_limit` parameter. * The temporary device memory usage can be limited by a soft limit `data_read_limit` parameter, allowing to read very large ORC files without OOM. * ORC files containing many billions of rows can be properly read chunk-by-chunk without seeing the size overflow issue when the number of rows exceeds cudf size limit (`2^31` rows). Depends on: * #14911 * #15008 * #15169 * #15252 Partially contribute to #12228. --- ## Benchmarks Due to some small optimizations in ORC reader, reading ORC files all-at-once (reading the entire file into just one output table) can be a little bit faster. For example, with the benchmark `orc_read_io_compression`: ``` ## [0] Quadro RTX 6000 | io | compression | cardinality | run_length | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status | |---------------|---------------|---------------|--------------|------------|-------------|------------|-------------|---------------|---------|----------| | FILEPATH | SNAPPY | 0 | 1 | 183.027 ms | 7.45% | 157.293 ms | 4.72% | -25733.837 us | -14.06% | FAIL | | FILEPATH | SNAPPY | 1000 | 1 | 198.228 ms | 6.43% | 164.395 ms | 4.14% | -33833.020 us | -17.07% | FAIL | | FILEPATH | SNAPPY | 0 | 32 | 96.676 ms | 6.19% | 82.522 ms | 1.36% | -14153.945 us | -14.64% | FAIL | | FILEPATH | SNAPPY | 1000 | 32 | 94.508 ms | 4.80% | 81.078 ms | 0.48% | -13429.672 us | -14.21% | FAIL | | FILEPATH | NONE | 0 | 1 | 161.868 ms | 5.40% | 139.849 ms | 2.44% | -22018.910 us | -13.60% | FAIL | | FILEPATH | NONE | 1000 | 1 | 164.902 ms | 5.80% | 142.041 ms | 3.43% | -22861.258 us | -13.86% | FAIL | | FILEPATH | NONE | 0 | 32 | 88.298 ms | 5.15% | 74.924 ms | 1.97% | -13374.607 us | -15.15% | FAIL | | FILEPATH | NONE | 1000 | 32 | 87.147 ms | 5.61% | 72.502 ms | 0.50% | -14645.122 us | -16.81% | FAIL | | HOST_BUFFER | SNAPPY | 0 | 1 | 124.990 ms | 0.39% | 111.670 ms | 2.13% | -13320.483 us | -10.66% | FAIL | | HOST_BUFFER | SNAPPY | 1000 | 1 | 149.858 ms | 4.10% | 126.266 ms | 0.48% | -23591.543 us | -15.74% | FAIL | | HOST_BUFFER | SNAPPY | 0 | 32 | 92.499 ms | 4.46% | 77.653 ms | 1.58% | -14846.471 us | -16.05% | FAIL | | HOST_BUFFER | SNAPPY | 1000 | 32 | 93.373 ms | 4.14% | 80.033 ms | 3.19% | -13340.002 us | -14.29% | FAIL | | HOST_BUFFER | NONE | 0 | 1 | 111.792 ms | 0.50% | 97.083 ms | 0.50% | -14709.530 us | -13.16% | FAIL | | HOST_BUFFER | NONE | 1000 | 1 | 117.646 ms | 5.60% | 97.634 ms | 0.44% | -20012.301 us | -17.01% | FAIL | | HOST_BUFFER | NONE | 0 | 32 | 84.983 ms | 4.96% | 66.975 ms | 0.50% | -18007.403 us | -21.19% | FAIL | | HOST_BUFFER | NONE | 1000 | 32 | 82.648 ms | 4.42% | 65.510 ms | 0.91% | -17137.910 us | -20.74% | FAIL | | DEVICE_BUFFER | SNAPPY | 0 | 1 | 65.538 ms | 4.02% | 59.399 ms | 2.54% | -6138.560 us | -9.37% | FAIL | | DEVICE_BUFFER | SNAPPY | 1000 | 1 | 101.427 ms | 4.10% | 92.276 ms | 3.30% | -9150.278 us | -9.02% | FAIL | | DEVICE_BUFFER | SNAPPY | 0 | 32 | 80.133 ms | 4.64% | 73.959 ms | 3.50% | -6173.818 us | -7.70% | FAIL | | DEVICE_BUFFER | SNAPPY | 1000 | 32 | 86.232 ms | 4.71% | 77.446 ms | 3.32% | -8786.606 us | -10.19% | FAIL | | DEVICE_BUFFER | NONE | 0 | 1 | 52.189 ms | 6.62% | 45.018 ms | 4.11% | -7171.043 us | -13.74% | FAIL | | DEVICE_BUFFER | NONE | 1000 | 1 | 54.664 ms | 6.76% | 46.855 ms | 3.35% | -7809.803 us | -14.29% | FAIL | | DEVICE_BUFFER | NONE | 0 | 32 | 67.975 ms | 5.12% | 60.553 ms | 4.22% | -7422.279 us | -10.92% | FAIL | | DEVICE_BUFFER | NONE | 1000 | 32 | 68.485 ms | 4.86% | 62.253 ms | 6.23% | -6232.340 us | -9.10% | FAIL | ``` When memory is limited, chunked read can help avoiding OOM but with some sort of performance trade-off. For example, for reading a table of size 500MB from file using 64MB output limits and 640 MB data read limit: ``` | io | compression | cardinality | run_length | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status | |---------------|---------------|---------------|--------------|------------|-------------|------------|-------------|------------|---------|----------| | FILEPATH | SNAPPY | 0 | 1 | 183.027 ms | 7.45% | 350.824 ms | 2.74% | 167.796 ms | 91.68% | FAIL | | FILEPATH | SNAPPY | 1000 | 1 | 198.228 ms | 6.43% | 322.414 ms | 3.46% | 124.186 ms | 62.65% | FAIL | | FILEPATH | SNAPPY | 0 | 32 | 96.676 ms | 6.19% | 133.363 ms | 4.78% | 36.686 ms | 37.95% | FAIL | | FILEPATH | SNAPPY | 1000 | 32 | 94.508 ms | 4.80% | 128.897 ms | 0.37% | 34.389 ms | 36.39% | FAIL | | FILEPATH | NONE | 0 | 1 | 161.868 ms | 5.40% | 316.637 ms | 4.21% | 154.769 ms | 95.61% | FAIL | | FILEPATH | NONE | 1000 | 1 | 164.902 ms | 5.80% | 326.043 ms | 3.06% | 161.141 ms | 97.72% | FAIL | | FILEPATH | NONE | 0 | 32 | 88.298 ms | 5.15% | 124.819 ms | 5.17% | 36.520 ms | 41.36% | FAIL | | FILEPATH | NONE | 1000 | 32 | 87.147 ms | 5.61% | 123.047 ms | 5.82% | 35.900 ms | 41.19% | FAIL | | HOST_BUFFER | SNAPPY | 0 | 1 | 124.990 ms | 0.39% | 285.718 ms | 0.78% | 160.728 ms | 128.59% | FAIL | | HOST_BUFFER | SNAPPY | 1000 | 1 | 149.858 ms | 4.10% | 263.491 ms | 2.89% | 113.633 ms | 75.83% | FAIL | | HOST_BUFFER | SNAPPY | 0 | 32 | 92.499 ms | 4.46% | 127.881 ms | 0.86% | 35.382 ms | 38.25% | FAIL | | HOST_BUFFER | SNAPPY | 1000 | 32 | 93.373 ms | 4.14% | 128.022 ms | 0.98% | 34.650 ms | 37.11% | FAIL | | HOST_BUFFER | NONE | 0 | 1 | 111.792 ms | 0.50% | 241.064 ms | 1.89% | 129.271 ms | 115.64% | FAIL | | HOST_BUFFER | NONE | 1000 | 1 | 117.646 ms | 5.60% | 248.134 ms | 3.08% | 130.488 ms | 110.92% | FAIL | | HOST_BUFFER | NONE | 0 | 32 | 84.983 ms | 4.96% | 118.049 ms | 5.99% | 33.066 ms | 38.91% | FAIL | | HOST_BUFFER | NONE | 1000 | 32 | 82.648 ms | 4.42% | 114.577 ms | 2.34% | 31.929 ms | 38.63% | FAIL | | DEVICE_BUFFER | SNAPPY | 0 | 1 | 65.538 ms | 4.02% | 232.466 ms | 3.28% | 166.928 ms | 254.71% | FAIL | | DEVICE_BUFFER | SNAPPY | 1000 | 1 | 101.427 ms | 4.10% | 221.578 ms | 1.43% | 120.152 ms | 118.46% | FAIL | | DEVICE_BUFFER | SNAPPY | 0 | 32 | 80.133 ms | 4.64% | 120.604 ms | 0.35% | 40.471 ms | 50.50% | FAIL | | DEVICE_BUFFER | SNAPPY | 1000 | 32 | 86.232 ms | 4.71% | 125.521 ms | 3.93% | 39.289 ms | 45.56% | FAIL | | DEVICE_BUFFER | NONE | 0 | 1 | 52.189 ms | 6.62% | 182.943 ms | 0.29% | 130.754 ms | 250.54% | FAIL | | DEVICE_BUFFER | NONE | 1000 | 1 | 54.664 ms | 6.76% | 190.501 ms | 0.49% | 135.836 ms | 248.49% | FAIL | | DEVICE_BUFFER | NONE | 0 | 32 | 67.975 ms | 5.12% | 107.172 ms | 3.56% | 39.197 ms | 57.66% | FAIL | | DEVICE_BUFFER | NONE | 1000 | 32 | 68.485 ms | 4.86% | 108.097 ms | 2.92% | 39.611 ms | 57.84% | FAIL | ``` And if memory is too limited, chunked read with 8MB output limit/80MB data read limit: ``` | io | compression | cardinality | run_length | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status | | io | compression | cardinality | run_length | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status | |---------------|---------------|---------------|--------------|------------|-------------|------------|-------------|------------|---------|----------| | FILEPATH | SNAPPY | 0 | 1 | 183.027 ms | 7.45% | 732.926 ms | 1.98% | 549.899 ms | 300.45% | FAIL | | FILEPATH | SNAPPY | 1000 | 1 | 198.228 ms | 6.43% | 834.309 ms | 4.21% | 636.081 ms | 320.88% | FAIL | | FILEPATH | SNAPPY | 0 | 32 | 96.676 ms | 6.19% | 363.033 ms | 1.66% | 266.356 ms | 275.51% | FAIL | | FILEPATH | SNAPPY | 1000 | 32 | 94.508 ms | 4.80% | 313.813 ms | 1.28% | 219.305 ms | 232.05% | FAIL | | FILEPATH | NONE | 0 | 1 | 161.868 ms | 5.40% | 607.700 ms | 2.90% | 445.832 ms | 275.43% | FAIL | | FILEPATH | NONE | 1000 | 1 | 164.902 ms | 5.80% | 616.101 ms | 3.46% | 451.199 ms | 273.62% | FAIL | | FILEPATH | NONE | 0 | 32 | 88.298 ms | 5.15% | 267.703 ms | 0.46% | 179.405 ms | 203.18% | FAIL | | FILEPATH | NONE | 1000 | 32 | 87.147 ms | 5.61% | 250.528 ms | 0.43% | 163.381 ms | 187.48% | FAIL | | HOST_BUFFER | SNAPPY | 0 | 1 | 124.990 ms | 0.39% | 636.270 ms | 0.44% | 511.280 ms | 409.06% | FAIL | | HOST_BUFFER | SNAPPY | 1000 | 1 | 149.858 ms | 4.10% | 747.264 ms | 0.50% | 597.406 ms | 398.65% | FAIL | | HOST_BUFFER | SNAPPY | 0 | 32 | 92.499 ms | 4.46% | 359.660 ms | 0.19% | 267.161 ms | 288.82% | FAIL | | HOST_BUFFER | SNAPPY | 1000 | 32 | 93.373 ms | 4.14% | 311.608 ms | 0.43% | 218.235 ms | 233.73% | FAIL | | HOST_BUFFER | NONE | 0 | 1 | 111.792 ms | 0.50% | 493.797 ms | 0.13% | 382.005 ms | 341.71% | FAIL | | HOST_BUFFER | NONE | 1000 | 1 | 117.646 ms | 5.60% | 516.706 ms | 0.12% | 399.060 ms | 339.20% | FAIL | | HOST_BUFFER | NONE | 0 | 32 | 84.983 ms | 4.96% | 258.477 ms | 0.46% | 173.495 ms | 204.15% | FAIL | | HOST_BUFFER | NONE | 1000 | 32 | 82.648 ms | 4.42% | 248.028 ms | 5.30% | 165.380 ms | 200.10% | FAIL | | DEVICE_BUFFER | SNAPPY | 0 | 1 | 65.538 ms | 4.02% | 606.010 ms | 3.76% | 540.472 ms | 824.68% | FAIL | | DEVICE_BUFFER | SNAPPY | 1000 | 1 | 101.427 ms | 4.10% | 742.774 ms | 4.64% | 641.347 ms | 632.33% | FAIL | | DEVICE_BUFFER | SNAPPY | 0 | 32 | 80.133 ms | 4.64% | 364.701 ms | 2.70% | 284.568 ms | 355.12% | FAIL | | DEVICE_BUFFER | SNAPPY | 1000 | 32 | 86.232 ms | 4.71% | 320.387 ms | 2.80% | 234.155 ms | 271.54% | FAIL | | DEVICE_BUFFER | NONE | 0 | 1 | 52.189 ms | 6.62% | 458.100 ms | 2.15% | 405.912 ms | 777.78% | FAIL | | DEVICE_BUFFER | NONE | 1000 | 1 | 54.664 ms | 6.76% | 478.527 ms | 1.41% | 423.862 ms | 775.39% | FAIL | | DEVICE_BUFFER | NONE | 0 | 32 | 67.975 ms | 5.12% | 260.009 ms | 3.71% | 192.034 ms | 282.51% | FAIL | | DEVICE_BUFFER | NONE | 1000 | 32 | 68.485 ms | 4.86% | 243.705 ms | 2.09% | 175.220 ms | 255.85% | FAIL | ``` Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Bradley Dice (https://github.com/bdice) - https://github.com/nvdbaranec - Vukasin Milovanovic (https://github.com/vuule) URL: #15094

ttnghia added 3 commits March 7, 2024 16:15

Remove current_size

c3df9ad

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Add front() and back()

2a1e99a

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Remove redundant headers

4e3829d

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

ttnghia added bug Something isn't working 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change labels Mar 8, 2024

ttnghia requested review from vuule and nvdbaranec March 8, 2024 00:24

ttnghia self-assigned this Mar 8, 2024

ttnghia requested a review from a team as a code owner March 8, 2024 00:24

ttnghia and others added 5 commits March 7, 2024 17:39

Remove push_back API

eac0d51

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Remove constructor using two size parameters

f931d43

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Fix callers

6ebc756

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Remove capacity

f2ec56a

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Merge branch 'branch-24.04' into bug_hd_vector

17a596a

ttnghia changed the title ~~Fix bug in hostdevice_vector and add more APIs~~ Cleanup hostdevice_vector and add more APIs Mar 8, 2024

ttnghia mentioned this pull request Mar 8, 2024

Implement ORC chunked reader #15094

Merged

Merge branch 'branch-24.04' into bug_hd_vector

69f1d8d

vuule reviewed Mar 8, 2024

View reviewed changes

Fix span test

0c149c3

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

ttnghia and others added 6 commits March 11, 2024 10:15

Revert "Fix span test"

4640d8c

This reverts commit 0c149c3.

Revert "Remove constructor using two size parameters"

91e9338

This reverts commit f931d43. # Conflicts: # cpp/src/io/utilities/hostdevice_vector.hpp

Revert "Remove push_back API"

d7979d8

This reverts commit eac0d51. # Conflicts: # cpp/src/io/utilities/hostdevice_vector.hpp

Construct device vector in initializer list

2730507

Revert changes in parquet code

4ed86b9

Signed-off-by: Nghia Truong <nghiat@nvidia.com>

Merge branch 'branch-24.04' into bug_hd_vector

259d082

nvdbaranec approved these changes Mar 11, 2024

View reviewed changes

vuule approved these changes Mar 11, 2024

View reviewed changes

Merge branch 'branch-24.04' into bug_hd_vector

034cdfe

rapids-bot bot merged commit 8ed3e20 into rapidsai:branch-24.04 Mar 11, 2024
73 checks passed

ttnghia deleted the bug_hd_vector branch March 11, 2024 22:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup `hostdevice_vector` and add more APIs #15252

Cleanup `hostdevice_vector` and add more APIs #15252

ttnghia commented Mar 8, 2024 •

edited

vuule commented Mar 8, 2024

vuule commented Mar 8, 2024

ttnghia commented Mar 8, 2024

ttnghia commented Mar 8, 2024 •

edited

ttnghia commented Mar 8, 2024 •

edited

vuule commented Mar 8, 2024

bdice commented Mar 8, 2024 •

edited

ttnghia commented Mar 8, 2024

vuule commented Mar 8, 2024

vuule Mar 8, 2024

ttnghia Mar 8, 2024

vuule Mar 8, 2024

vuule commented Mar 8, 2024

ttnghia commented Mar 8, 2024

vuule commented Mar 8, 2024

nvdbaranec commented Mar 8, 2024 •

edited

nvdbaranec commented Mar 8, 2024

vuule commented Mar 8, 2024

bdice commented Mar 8, 2024

ttnghia commented Mar 10, 2024

vuule commented Mar 11, 2024

ttnghia commented Mar 11, 2024

Cleanup hostdevice_vector and add more APIs #15252

Cleanup hostdevice_vector and add more APIs #15252

Conversation

ttnghia commented Mar 8, 2024 • edited

vuule commented Mar 8, 2024

vuule commented Mar 8, 2024

ttnghia commented Mar 8, 2024

ttnghia commented Mar 8, 2024 • edited

ttnghia commented Mar 8, 2024 • edited

vuule commented Mar 8, 2024

bdice commented Mar 8, 2024 • edited

ttnghia commented Mar 8, 2024

vuule commented Mar 8, 2024

vuule Mar 8, 2024

Choose a reason for hiding this comment

ttnghia Mar 8, 2024

Choose a reason for hiding this comment

vuule Mar 8, 2024

Choose a reason for hiding this comment

vuule commented Mar 8, 2024

ttnghia commented Mar 8, 2024

vuule commented Mar 8, 2024

nvdbaranec commented Mar 8, 2024 • edited

nvdbaranec commented Mar 8, 2024

vuule commented Mar 8, 2024

bdice commented Mar 8, 2024

ttnghia commented Mar 10, 2024

vuule commented Mar 11, 2024

ttnghia commented Mar 11, 2024

Cleanup `hostdevice_vector` and add more APIs #15252

Cleanup `hostdevice_vector` and add more APIs #15252

ttnghia commented Mar 8, 2024 •

edited

ttnghia commented Mar 8, 2024 •

edited

ttnghia commented Mar 8, 2024 •

edited

bdice commented Mar 8, 2024 •

edited

nvdbaranec commented Mar 8, 2024 •

edited