[BUG] test_hash_reduction_collect_set_on_nested_array_type failed in a distributed environment #10133

sameerz · 2023-12-31T19:10:41Z

Describe the bug

test_hash_reduction_collect_set_on_nested_array_type failed in a distributed environment with the error "Type converstion is not allowed..."

[2023-12-30T22:21:16.115Z] E                   Caused by: java.lang.AssertionError: Type conversion is not allowed from LIST(STRUCT(INT8,INT16,INT32,INT64,FLOAT32,FLOAT64,STRING,BOOL8,TIMESTAMP_DAYS,TIMESTAMP_MICROSECONDS,INT8)) to ArrayType(ArrayType(StructType(StructField(child0,ByteType,true),StructField(child1,ShortType,true),StructField(child2,IntegerType,true),StructField(child3,LongType,true),StructField(child4,FloatType,true),StructField(child5,DoubleType,true),StructField(child6,StringType,true),StructField(child7,BooleanType,true),StructField(child8,DateType,true),StructField(child9,TimestampType,true),StructField(child10,NullType,true)),true),false) expected LIST(LIST(STRUCT(INT8,INT16,INT32,INT64,FLOAT32,FLOAT64,STRING,BOOL8,TIMESTAMP_DAYS,TIMESTAMP_MICROSECONDS,INT8)))

Detailed output

[2023-12-30T22:21:16.114Z] _ test_hash_reduction_collect_set_on_nested_array_type[[('a', RepeatSeq(Long)), ('b', RepeatSeq(Array(Struct(['child0', Byte],['child1', Short],['child2', Integer],['child3', Long],['child4', Float],['child5', Double],['child6', String],['child7', Boolean],['child8', Date],['child9', Timestamp],['child10', Null]))))]] _
[2023-12-30T22:21:16.114Z] 
[2023-12-30T22:21:16.114Z] data_gen = [('a', RepeatSeq(Long)), ('b', RepeatSeq(Array(Struct(['child0', Byte],['child1', Short],['child2', Integer],['child3'...['child5', Double],['child6', String],['child7', Boolean],['child8', Date],['child9', Timestamp],['child10', Null]))))]
[2023-12-30T22:21:16.114Z] 
[2023-12-30T22:21:16.114Z]     @ignore_order(local=True, arrays=["collect_set"])
[2023-12-30T22:21:16.114Z]     @allow_non_gpu("ProjectExec", *non_utc_allow)
[2023-12-30T22:21:16.114Z]     @pytest.mark.parametrize('data_gen', _gen_data_for_collect_set_op_nested, ids=idfn)
[2023-12-30T22:21:16.114Z]     def test_hash_reduction_collect_set_on_nested_array_type(data_gen):
[2023-12-30T22:21:16.114Z]         conf = copy_and_update(_float_conf, {
[2023-12-30T22:21:16.114Z]             "spark.rapids.sql.castFloatToString.enabled": "true",
[2023-12-30T22:21:16.114Z]         })
[2023-12-30T22:21:16.114Z]     
[2023-12-30T22:21:16.114Z]         def do_it(spark):
[2023-12-30T22:21:16.114Z]             return gen_df(spark, data_gen, length=100)\
[2023-12-30T22:21:16.114Z]                 .agg(f.collect_set('b').alias("collect_set"))
[2023-12-30T22:21:16.114Z]     
[2023-12-30T22:21:16.114Z] >       assert_gpu_and_cpu_are_equal_collect(do_it, conf=conf)
[2023-12-30T22:21:16.114Z] 
[2023-12-30T22:21:16.114Z] ../../src/main/python/hash_aggregate_test.py:734: 
[2023-12-30T22:21:16.114Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2023-12-30T22:21:16.114Z] ../../src/main/python/asserts.py:595: in assert_gpu_and_cpu_are_equal_collect
[2023-12-30T22:21:16.114Z]     _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first, result_canonicalize_func_before_compare=result_canonicalize_func_before_compare)
[2023-12-30T22:21:16.114Z] ../../src/main/python/asserts.py:503: in _assert_gpu_and_cpu_are_equal
[2023-12-30T22:21:16.114Z]     from_gpu = run_on_gpu()
[2023-12-30T22:21:16.114Z] ../../src/main/python/asserts.py:496: in run_on_gpu
[2023-12-30T22:21:16.114Z]     from_gpu = with_gpu_session(bring_back, conf=conf)
[2023-12-30T22:21:16.114Z] ../../src/main/python/spark_session.py:164: in with_gpu_session
[2023-12-30T22:21:16.114Z]     return with_spark_session(func, conf=copy)
[2023-12-30T22:21:16.114Z] /opt/miniconda3/lib/python3.8/contextlib.py:75: in inner
[2023-12-30T22:21:16.114Z]     return func(*args, **kwds)
[2023-12-30T22:21:16.114Z] ../../src/main/python/spark_session.py:131: in with_spark_session
[2023-12-30T22:21:16.114Z]     ret = func(_spark)
[2023-12-30T22:21:16.114Z] ../../src/main/python/asserts.py:205: in 
[2023-12-30T22:21:16.114Z]     bring_back = lambda spark: limit_func(spark).collect()
[2023-12-30T22:21:16.114Z] /var/lib/jenkins/spark/spark-3.3.0-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/dataframe.py:817: in collect
[2023-12-30T22:21:16.114Z]     sock_info = self._jdf.collectToPython()
[2023-12-30T22:21:16.114Z] /var/lib/jenkins/spark/spark-3.3.0-bin-hadoop3.2/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py:1321: in __call__
[2023-12-30T22:21:16.114Z]     return_value = get_return_value(
[2023-12-30T22:21:16.114Z] /var/lib/jenkins/spark/spark-3.3.0-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/utils.py:190: in deco
[2023-12-30T22:21:16.114Z]     return f(*a, **kw)
[2023-12-30T22:21:16.114Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2023-12-30T22:21:16.114Z] 
[2023-12-30T22:21:16.114Z] answer = 'xro1818544'
[2023-12-30T22:21:16.114Z] gateway_client = 
[2023-12-30T22:21:16.114Z] target_id = 'o1818543', name = 'collectToPython'
[2023-12-30T22:21:16.114Z] 
[2023-12-30T22:21:16.114Z]     def get_return_value(answer, gateway_client, target_id=None, name=None):
[2023-12-30T22:21:16.114Z]         """Converts an answer received from the Java gateway into a Python object.
[2023-12-30T22:21:16.114Z]     
[2023-12-30T22:21:16.114Z]         For example, string representation of integers are converted to Python
[2023-12-30T22:21:16.114Z]         integer, string representation of objects are converted to JavaObject
[2023-12-30T22:21:16.114Z]         instances, etc.
[2023-12-30T22:21:16.114Z]     
[2023-12-30T22:21:16.114Z]         :param answer: the string returned by the Java gateway
[2023-12-30T22:21:16.114Z]         :param gateway_client: the gateway client used to communicate with the Java
[2023-12-30T22:21:16.114Z]             Gateway. Only necessary if the answer is a reference (e.g., object,
[2023-12-30T22:21:16.114Z]             list, map)
[2023-12-30T22:21:16.114Z]         :param target_id: the name of the object from which the answer comes from
[2023-12-30T22:21:16.114Z]             (e.g., *object1* in `object1.hello()`). Optional.
[2023-12-30T22:21:16.114Z]         :param name: the name of the member from which the answer comes from
[2023-12-30T22:21:16.114Z]             (e.g., *hello* in `object1.hello()`). Optional.
[2023-12-30T22:21:16.114Z]         """
[2023-12-30T22:21:16.114Z]         if is_error(answer)[0]:
[2023-12-30T22:21:16.114Z]             if len(answer) > 1:
[2023-12-30T22:21:16.114Z]                 type = answer[1]
[2023-12-30T22:21:16.114Z]                 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
[2023-12-30T22:21:16.114Z]                 if answer[1] == REFERENCE_TYPE:
[2023-12-30T22:21:16.114Z] >                   raise Py4JJavaError(
[2023-12-30T22:21:16.114Z]                         "An error occurred while calling {0}{1}{2}.\n".
[2023-12-30T22:21:16.114Z]                         format(target_id, ".", name), value)
[2023-12-30T22:21:16.114Z] E                   py4j.protocol.Py4JJavaError: An error occurred while calling o1818543.collectToPython.
[2023-12-30T22:21:16.114Z] E                   : org.apache.spark.SparkException: Job aborted due to stage failure: Task 25 in stage 25094.0 failed 1 times, most recent failure: Lost task 25.0 in stage 25094.0 (TID 786354) (10.136.6.4 executor 2): java.lang.AssertionError: Type conversion is not allowed from LIST(STRUCT(INT8,INT16,INT32,INT64,FLOAT32,FLOAT64,STRING,BOOL8,TIMESTAMP_DAYS,TIMESTAMP_MICROSECONDS,INT8)) to ArrayType(ArrayType(StructType(StructField(child0,ByteType,true),StructField(child1,ShortType,true),StructField(child2,IntegerType,true),StructField(child3,LongType,true),StructField(child4,FloatType,true),StructField(child5,DoubleType,true),StructField(child6,StringType,true),StructField(child7,BooleanType,true),StructField(child8,DateType,true),StructField(child9,TimestampType,true),StructField(child10,NullType,true)),true),false) expected LIST(LIST(STRUCT(INT8,INT16,INT32,INT64,FLOAT32,FLOAT64,STRING,BOOL8,TIMESTAMP_DAYS,TIMESTAMP_MICROSECONDS,INT8)))
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.GpuColumnVector.from(GpuColumnVector.java:710)
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.AggHelper.$anonfun$performReduction$3(GpuAggregateExec.scala:363)
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.AggHelper.$anonfun$performReduction$2(GpuAggregateExec.scala:361)
[2023-12-30T22:21:16.114Z] E                   	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
[2023-12-30T22:21:16.114Z] E                   	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
[2023-12-30T22:21:16.114Z] E                   	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.AggHelper.$anonfun$performReduction$1(GpuAggregateExec.scala:357)
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.AggHelper.performReduction(GpuAggregateExec.scala:355)
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.AggHelper.aggregate(GpuAggregateExec.scala:294)
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.AggHelper.$anonfun$aggregateWithoutCombine$4(GpuAggregateExec.scala:311)
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.AggHelper.$anonfun$aggregateWithoutCombine$3(GpuAggregateExec.scala:309)
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.AggHelper.$anonfun$aggregateWithoutCombine$2(GpuAggregateExec.scala:308)
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$AutoCloseableAttemptSpliterator.next(RmmRapidsRetryIterator.scala:477)
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryIterator.next(RmmRapidsRetryIterator.scala:613)
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryAutoCloseableIterator.next(RmmRapidsRetryIterator.scala:517)
[2023-12-30T22:21:16.114Z] E                   	at scala.collection.Iterator$$anon$11.next(Iterator.scala:496)
[2023-12-30T22:21:16.114Z] E                   	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.GpuMergeAggregateIterator.aggregateInputBatches(GpuAggregateExec.scala:795)
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.GpuMergeAggregateIterator.$anonfun$next$2(GpuAggregateExec.scala:752)
[2023-12-30T22:21:16.114Z] E                   	at scala.Option.getOrElse(Option.scala:189)
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.GpuMergeAggregateIterator.next(GpuAggregateExec.scala:749)
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.GpuMergeAggregateIterator.next(GpuAggregateExec.scala:711)
[2023-12-30T22:21:16.114Z] E                   	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.$anonfun$next$6(GpuAggregateExec.scala:2042)
[2023-12-30T22:21:16.114Z] E                   	at scala.Option.map(Option.scala:230)
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.next(GpuAggregateExec.scala:2042)
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.next(GpuAggregateExec.scala:1906)
[2023-12-30T22:21:16.114Z] E                   	at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.partNextBatch(GpuShuffleExchangeExecBase.scala:333)
[2023-12-30T22:21:16.114Z] E                   	at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.hasNext(GpuShuffleExchangeExecBase.scala:355)
[2023-12-30T22:21:16.114Z] E                   	at org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.$anonfun$write$2(RapidsShuffleInternalManagerBase.scala:285)
[2023-12-30T22:21:16.114Z] E                   	at org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.$anonfun$write$2$adapted(RapidsShuffleInternalManagerBase.scala:278)
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
[2023-12-30T22:21:16.114Z] E                   	at org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.$anonfun$write$1(RapidsShuffleInternalManagerBase.scala:278)
[2023-12-30T22:21:16.114Z] E                   	at org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.$anonfun$write$1$adapted(RapidsShuffleInternalManagerBase.scala:277)
[2023-12-30T22:21:16.114Z] E                   	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
[2023-12-30T22:21:16.114Z] E                   	at org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.write(RapidsShuffleInternalManagerBase.scala:277)
[2023-12-30T22:21:16.114Z] E                   	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
[2023-12-30T22:21:16.114Z] E                   	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
[2023-12-30T22:21:16.114Z] E                   	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
[2023-12-30T22:21:16.114Z] E                   	at org.apache.spark.scheduler.Task.run(Task.scala:136)
[2023-12-30T22:21:16.114Z] E                   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
[2023-12-30T22:21:16.114Z] E                   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
[2023-12-30T22:21:16.114Z] E                   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
[2023-12-30T22:21:16.114Z] E                   	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[2023-12-30T22:21:16.114Z] E                   	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[2023-12-30T22:21:16.114Z] E                   	at java.base/java.lang.Thread.run(Thread.java:833)
[2023-12-30T22:21:16.114Z] E                   	Suppressed: com.nvidia.spark.rapids.jni.GpuRetryOOM: injected RetryOOM
[2023-12-30T22:21:16.114Z] E                   		at ai.rapids.cudf.ColumnView.reduce(Native Method)
[2023-12-30T22:21:16.114Z] E                   		at ai.rapids.cudf.ColumnView.reduce(ColumnView.java:1583)
[2023-12-30T22:21:16.114Z] E                   		at org.apache.spark.sql.rapids.aggregate.CudfCollectSet.$anonfun$reductionAggregate$7(aggregateFunctions.scala:140)
[2023-12-30T22:21:16.114Z] E                   		... 47 more
[2023-12-30T22:21:16.114Z] E                   
[2023-12-30T22:21:16.114Z] E                   Driver stacktrace:
[2023-12-30T22:21:16.114Z] E                   	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2672)
[2023-12-30T22:21:16.114Z] E                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2608)
[2023-12-30T22:21:16.114Z] E                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2607)
[2023-12-30T22:21:16.114Z] E                   	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
[2023-12-30T22:21:16.114Z] E                   	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
[2023-12-30T22:21:16.114Z] E                   	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
[2023-12-30T22:21:16.114Z] E                   	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2607)
[2023-12-30T22:21:16.114Z] E                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1182)
[2023-12-30T22:21:16.114Z] E                   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1182)
[2023-12-30T22:21:16.114Z] E                   	at scala.Option.foreach(Option.scala:407)
[2023-12-30T22:21:16.114Z] E                   	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1182)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2860)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2802)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2791)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:952)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2228)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2249)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2268)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2293)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1021)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.rdd.RDD.withScope(RDD.scala:406)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.rdd.RDD.collect(RDD.scala:1020)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:424)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.sql.Dataset.$anonfun$collectToPython$1(Dataset.scala:3688)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3858)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3856)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:109)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3856)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:3685)
[2023-12-30T22:21:16.115Z] E                   	at jdk.internal.reflect.GeneratedMethodAccessor100.invoke(Unknown Source)
[2023-12-30T22:21:16.115Z] E                   	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[2023-12-30T22:21:16.115Z] E                   	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
[2023-12-30T22:21:16.115Z] E                   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
[2023-12-30T22:21:16.115Z] E                   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
[2023-12-30T22:21:16.115Z] E                   	at py4j.Gateway.invoke(Gateway.java:282)
[2023-12-30T22:21:16.115Z] E                   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
[2023-12-30T22:21:16.115Z] E                   	at py4j.commands.CallCommand.execute(CallCommand.java:79)
[2023-12-30T22:21:16.115Z] E                   	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
[2023-12-30T22:21:16.115Z] E                   	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
[2023-12-30T22:21:16.115Z] E                   	at java.base/java.lang.Thread.run(Thread.java:833)
[2023-12-30T22:21:16.115Z] E                   Caused by: java.lang.AssertionError: Type conversion is not allowed from LIST(STRUCT(INT8,INT16,INT32,INT64,FLOAT32,FLOAT64,STRING,BOOL8,TIMESTAMP_DAYS,TIMESTAMP_MICROSECONDS,INT8)) to ArrayType(ArrayType(StructType(StructField(child0,ByteType,true),StructField(child1,ShortType,true),StructField(child2,IntegerType,true),StructField(child3,LongType,true),StructField(child4,FloatType,true),StructField(child5,DoubleType,true),StructField(child6,StringType,true),StructField(child7,BooleanType,true),StructField(child8,DateType,true),StructField(child9,TimestampType,true),StructField(child10,NullType,true)),true),false) expected LIST(LIST(STRUCT(INT8,INT16,INT32,INT64,FLOAT32,FLOAT64,STRING,BOOL8,TIMESTAMP_DAYS,TIMESTAMP_MICROSECONDS,INT8)))
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.GpuColumnVector.from(GpuColumnVector.java:710)
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.AggHelper.$anonfun$performReduction$3(GpuAggregateExec.scala:363)
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.AggHelper.$anonfun$performReduction$2(GpuAggregateExec.scala:361)
[2023-12-30T22:21:16.115Z] E                   	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
[2023-12-30T22:21:16.115Z] E                   	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
[2023-12-30T22:21:16.115Z] E                   	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.AggHelper.$anonfun$performReduction$1(GpuAggregateExec.scala:357)
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.AggHelper.performReduction(GpuAggregateExec.scala:355)
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.AggHelper.aggregate(GpuAggregateExec.scala:294)
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.AggHelper.$anonfun$aggregateWithoutCombine$4(GpuAggregateExec.scala:311)
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.AggHelper.$anonfun$aggregateWithoutCombine$3(GpuAggregateExec.scala:309)
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.AggHelper.$anonfun$aggregateWithoutCombine$2(GpuAggregateExec.scala:308)
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$AutoCloseableAttemptSpliterator.next(RmmRapidsRetryIterator.scala:477)
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryIterator.next(RmmRapidsRetryIterator.scala:613)
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryAutoCloseableIterator.next(RmmRapidsRetryIterator.scala:517)
[2023-12-30T22:21:16.115Z] E                   	at scala.collection.Iterator$$anon$11.next(Iterator.scala:496)
[2023-12-30T22:21:16.115Z] E                   	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.GpuMergeAggregateIterator.aggregateInputBatches(GpuAggregateExec.scala:795)
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.GpuMergeAggregateIterator.$anonfun$next$2(GpuAggregateExec.scala:752)
[2023-12-30T22:21:16.115Z] E                   	at scala.Option.getOrElse(Option.scala:189)
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.GpuMergeAggregateIterator.next(GpuAggregateExec.scala:749)
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.GpuMergeAggregateIterator.next(GpuAggregateExec.scala:711)
[2023-12-30T22:21:16.115Z] E                   	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.$anonfun$next$6(GpuAggregateExec.scala:2042)
[2023-12-30T22:21:16.115Z] E                   	at scala.Option.map(Option.scala:230)
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.next(GpuAggregateExec.scala:2042)
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.next(GpuAggregateExec.scala:1906)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.partNextBatch(GpuShuffleExchangeExecBase.scala:333)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.hasNext(GpuShuffleExchangeExecBase.scala:355)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.$anonfun$write$2(RapidsShuffleInternalManagerBase.scala:285)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.$anonfun$write$2$adapted(RapidsShuffleInternalManagerBase.scala:278)
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.$anonfun$write$1(RapidsShuffleInternalManagerBase.scala:278)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.$anonfun$write$1$adapted(RapidsShuffleInternalManagerBase.scala:277)
[2023-12-30T22:21:16.115Z] E                   	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.sql.rapids.RapidsShuffleThreadedWriterBase.write(RapidsShuffleInternalManagerBase.scala:277)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.scheduler.Task.run(Task.scala:136)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
[2023-12-30T22:21:16.115Z] E                   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
[2023-12-30T22:21:16.115Z] E                   	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[2023-12-30T22:21:16.115Z] E                   	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[2023-12-30T22:21:16.115Z] E                   	... 1 more
[2023-12-30T22:21:16.115Z] E                   	Suppressed: com.nvidia.spark.rapids.jni.GpuRetryOOM: injected RetryOOM
[2023-12-30T22:21:16.115Z] E                   		at ai.rapids.cudf.ColumnView.reduce(Native Method)
[2023-12-30T22:21:16.115Z] E                   		at ai.rapids.cudf.ColumnView.reduce(ColumnView.java:1583)
[2023-12-30T22:21:16.115Z] E                   		at org.apache.spark.sql.rapids.aggregate.CudfCollectSet.$anonfun$reductionAggregate$7(aggregateFunctions.scala:140)
[2023-12-30T22:21:16.115Z] E                   		... 47 more
[2023-12-30T22:21:16.115Z] 
[2023-12-30T22:21:16.115Z] /var/lib/jenkins/spark/spark-3.3.0-bin-hadoop3.2/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py:326: Py4JJavaError
[2023-12-30T22:21:16.115Z] ----------------------------- Captured stdout call -----------------------------
[2023-12-30T22:21:16.115Z] ### CPU RUN ###
[2023-12-30T22:21:16.115Z] ### GPU RUN ###

Steps/Code to reproduce bug
Run integration tests in a distributed environment

Expected behavior
Tests pass

Environment details (please complete the following information)

Environment location: YARN
Spark configuration settings related to the issue

Additional context

The text was updated successfully, but these errors were encountered:

tgravescs · 2024-01-04T22:59:45Z

It looks like this happens when there is a batch that has one row with an empty List in a List where the datatype is supposed to be List[List[Something]]. It reproduces for me very often on a single node standalone cluster with 1 worker. My box has 64 cores.

Its doing a group by reduction collect_set in this test and

The error above I believe is when inner type "Something" is complex type like a struct. If it is like an INT32 you get a slightly different error:

java.lang.IllegalArgumentException: ArrayType(IntegerType,true) is not supported for GPU processing yet.
        at com.nvidia.spark.rapids.GpuColumnVector.getNonNestedRapidsType(GpuColumnVector.java:429)

So the data type going into the reduction is List[List[INT32] and the inner list is empty and after the reduction we get back a LIST[INT32] which doesn't match the expected type.

I'm still trying to narrow down exactly where this is happening or where it should be handled.

tgravescs · 2024-01-05T16:28:02Z

Looking at java/src/test/java/ai/rapids/cudf/ReductionTest.java in cudf it doesn't look it has any tested for nested complex columns

tgravescs · 2024-01-09T19:35:52Z

Ok so this happens when you have a column type of like List[List[INT32]] and you get the data as List[null].

Smaller manual reproduce steps with pyspark against standalone cluster with 1 worker that has 64 cores and 1 CPU:

CPU:

>>> spark.conf.set("spark.rapids.sql.enabled", "false")
>>> my_x = [None]
>>> my_df = spark.createDataFrame(my_x, ArrayType(ArrayType(IntegerType())))
>>> my_df.agg(f.collect_set('value')).show()
+------------------+
|collect_set(value)|
+------------------+
|                []|
+------------------+

Another CPU case with some valid, GPU here also fails.

>>> my_x = [None, [(1,2,3)]]
>>> my_df = spark.createDataFrame(my_x, ArrayType(ArrayType(IntegerType())))
>>> my_df.printSchema()
root
 |-- value: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: integer (containsNull = true)

>>> my_df.show()
+-----------+
|      value|
+-----------+
|       null|
|[[1, 2, 3]]|
+-----------+

>>> my_df.agg(f.collect_set('value')).show()
+------------------+
|collect_set(value)|
+------------------+
|     [[[1, 2, 3]]]|
+------------------+

GPU:

>>> spark.conf.set("spark.rapids.sql.enabled", "true")
>>> my_x = [None]
>>> my_df = spark.createDataFrame(my_x, ArrayType(ArrayType(IntegerType())))
>>> my_df.agg(f.collect_set('value')).show()
24/01/09 13:30:37 WARN TaskSetManager: Lost task 0.0 in stage 37.0 (TID 715) (10.28.9.218 executor 0): java.lang.IllegalArgumentException: Type mismatch at table 63column 0 expected LIST but found INT32

Valid case with out array being null

>>> my_x = [[([1,100]), ([2])], [([3,2])]]
>>> my_df = spark.createDataFrame(my_x, ArrayType(ArrayType(IntegerType())))
>>> my_df.agg(f.collect_set('value')).show()
+--------------------+
|  collect_set(value)|
+--------------------+
|[[[1, 100], [2]],...|
+--------------------+

tgravescs · 2024-01-09T19:45:17Z

Note for the integration tests its again easy to reproduce in standalone mode with 1 workers where it has 64 cores and 1GPU:

PYSP_TEST_spark_master=spark://myStandaloneMaster:7077 TEST_PARALLEL=0 ./integration_tests/run_pyspark_from_build.sh --test_oom_injection_mode=never -s -k test_hash_reduction_collect_set_on_nested_array_type

tgravescs · 2024-01-09T19:49:52Z

Note the exception stack traces are different from the 2 examples I gave above:

Manual pyspark repro:

: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 50.0 failed 4 times, most recent failure: Lost task 0.3 in stage 50.0 (TID 980) (10.28.9.218 executor 0): java.lang.IllegalArgumentException: Type mismatch at table 31column 0 expected LIST but found INT32
        at ai.rapids.cudf.JCudfSerialization.checkCompatibleTypes(JCudfSerialization.java:999)
        at ai.rapids.cudf.JCudfSerialization.checkCompatibleTypes(JCudfSerialization.java:1010)
        at ai.rapids.cudf.JCudfSerialization.checkCompatibleTypes(JCudfSerialization.java:1010)
        at ai.rapids.cudf.JCudfSerialization.checkCompatibleTypes(JCudfSerialization.java:989)
        at ai.rapids.cudf.JCudfSerialization.providersFrom(JCudfSerialization.java:954)
        at ai.rapids.cudf.JCudfSerialization.concatToHostBuffer(JCudfSerialization.java:1820)
        at ai.rapids.cudf.JCudfSerialization.concatToHostBuffer(JCudfSerialization.java:1846)
        at com.nvidia.spark.rapids.HostShuffleCoalesceIterator.$anonfun$concatenateTablesInHost$3(GpuShuffleCoalesceExec.scala:123)
        at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:56)
        at com.nvidia.spark.rapids.HostShuffleCoalesceIterator.$anonfun$concatenateTablesInHost$1(GpuShuffleCoalesceExec.scala:117)
        at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
        at com.nvidia.spark.rapids.HostShuffleCoalesceIterator.concatenateTablesInHost(GpuShuffleCoalesceExec.scala:110)
        at com.nvidia.spark.rapids.HostShuffleCoalesceIterator.next(GpuShuffleCoalesceExec.scala:179)
        at com.nvidia.spark.rapids.HostShuffleCoalesceIterator.next(GpuShuffleCoalesceExec.scala:84)
        at com.nvidia.spark.rapids.GpuShuffleCoalesceIterator.$anonfun$next$2(GpuShuffleCoalesceExec.scala:218)
        at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
        at com.nvidia.spark.rapids.GpuShuffleCoalesceIterator.$anonfun$next$1(GpuShuffleCoalesceExec.scala:214)
        at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
        at com.nvidia.spark.rapids.GpuShuffleCoalesceIterator.next(GpuShuffleCoalesceExec.scala:213)
        at com.nvidia.spark.rapids.GpuShuffleCoalesceIterator.next(GpuShuffleCoalesceExec.scala:199)
        at com.nvidia.spark.rapids.AbstractProjectSplitIterator.next(basicPhysicalOperators.scala:247)
        at com.nvidia.spark.rapids.AbstractProjectSplitIterator.next(basicPhysicalOperators.scala:227)
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
        at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at com.nvidia.spark.rapids.GpuMergeAggregateIterator.$anonfun$next$2(GpuAggregateExec.scala:806)
        at scala.Option.getOrElse(Option.scala:189)
        at com.nvidia.spark.rapids.GpuMergeAggregateIterator.next(GpuAggregateExec.scala:804)
        at com.nvidia.spark.rapids.GpuMergeAggregateIterator.next(GpuAggregateExec.scala:766)
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
        at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.$anonfun$next$11(GpuAggregateExec.scala:2105)
        at scala.Option.map(Option.scala:230)
        at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.next(GpuAggregateExec.scala:2105)
        at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.next(GpuAggregateExec.scala:1969)
        at com.nvidia.spark.rapids.ColumnarToRowIterator.$anonfun$fetchNextBatch$3(GpuColumnarToRowExec.scala:290)
        at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
        at com.nvidia.spark.rapids.ColumnarToRowIterator.fetchNextBatch(GpuColumnarToRowExec.scala:287)

Integration test test_hash_reduction_collect_set_on_nested_array_type failure:

E                   Caused by: java.lang.IllegalArgumentException: ArrayType(IntegerType,true) is not supported for GPU processing yet.
E                       at com.nvidia.spark.rapids.GpuColumnVector.getNonNestedRapidsType(GpuColumnVector.java:429)
E                       at com.nvidia.spark.rapids.GpuColumnVector.typeConversionAllowed(GpuColumnVector.java:570)
E                       at com.nvidia.spark.rapids.GpuColumnVector.typeConversionAllowed(GpuColumnVector.java:599)
E                       at com.nvidia.spark.rapids.GpuColumnVector.typeConversionAllowed(GpuColumnVector.java:599)
E                       at com.nvidia.spark.rapids.GpuColumnVector.from(GpuColumnVector.java:717)
E                       at com.nvidia.spark.rapids.AggHelper.$anonfun$performReduction$8(GpuAggregateExec.scala:395)
E                       at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
E                       at com.nvidia.spark.rapids.AggHelper.$anonfun$performReduction$5(GpuAggregateExec.scala:376)
E                       at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
E                       at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
E                       at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
E                       at com.nvidia.spark.rapids.AggHelper.$anonfun$performReduction$1(GpuAggregateExec.scala:367)
E                       at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
E                       at com.nvidia.spark.rapids.AggHelper.performReduction(GpuAggregateExec.scala:361)
E                       at com.nvidia.spark.rapids.AggHelper.aggregate(GpuAggregateExec.scala:300)
E                       at com.nvidia.spark.rapids.AggHelper.$anonfun$aggregateWithoutCombine$4(GpuAggregateExec.scala:317)
E                       at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
E                       at com.nvidia.spark.rapids.AggHelper.$anonfun$aggregateWithoutCombine$3(GpuAggregateExec.scala:315)
E                       at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
E                       at com.nvidia.spark.rapids.AggHelper.$anonfun$aggregateWithoutCombine$2(GpuAggregateExec.scala:314)
E                       at com.nvidia.spark.rapids.RmmRapidsRetryIterator$AutoCloseableAttemptSpliterator.next(RmmRapidsRetryIterator.scala:477)
E                       at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryIterator.next(RmmRapidsRetryIterator.scala:613)
E                       at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryAutoCloseableIterator.next(RmmRapidsRetryIterator.scala:517)
E                       at scala.collection.Iterator$$anon$11.next(Iterator.scala:496)
E                       at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
E                       at com.nvidia.spark.rapids.GpuMergeAggregateIterator.aggregateInputBatches(GpuAggregateExec.scala:858)
E                       at com.nvidia.spark.rapids.GpuMergeAggregateIterator.$anonfun$next$2(GpuAggregateExec.scala:808)
E                       at scala.Option.getOrElse(Option.scala:189)
E                       at com.nvidia.spark.rapids.GpuMergeAggregateIterator.next(GpuAggregateExec.scala:804)
E                       at com.nvidia.spark.rapids.GpuMergeAggregateIterator.next(GpuAggregateExec.scala:766)
E                       at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
E                       at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.$anonfun$next$11(GpuAggregateExec.scala:2105)
E                       at scala.Option.map(Option.scala:230)
E                       at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.next(GpuAggregateExec.scala:2105)
E                       at com.nvidia.spark.rapids.DynamicGpuPartialSortAggregateIterator.next(GpuAggregateExec.scala:1969)
E                       at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.partNextBatch(GpuShuffleExchangeExecBase.scala:333)
E                       at org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase$$anon$1.hasNext(GpuShuffleExchangeExecBase.scala:355)
E                       at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
E                       at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
E                       at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
E                       at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)

jlowe · 2024-01-29T19:54:37Z

Filed rapidsai/cudf#14924.

ttnghia · 2024-03-06T22:21:11Z

I can reproduce the bug using example in #10133 (comment), and verify that it is fixed by rapidsai/cudf#15243.

ttnghia · 2024-03-15T05:52:52Z

This should be closed by rapidsai/cudf#15243.
However, I don't know if there is any temporary workaround for this issue that needs to be reverted?

jlowe · 2024-03-15T14:23:18Z

No workaround/disable was added for this. Verified the recent EGX nightly tests that always failed with this are now passing.

sameerz added bug Something isn't working ? - Needs Triage Need team to review and classify labels Dec 31, 2023

mattahrens removed the ? - Needs Triage Need team to review and classify label Jan 2, 2024

mattahrens assigned tgravescs Jan 2, 2024

tgravescs added the ? - Needs Triage Need team to review and classify label Jan 9, 2024

mattahrens assigned revans2 and unassigned tgravescs Jan 9, 2024

mattahrens removed the ? - Needs Triage Need team to review and classify label Jan 9, 2024

sameerz mentioned this issue Jan 10, 2024

[BUG] CI failure in test_hash_reduction_collect_set_on_nested_array_type #10173

Closed

jlowe mentioned this issue Jan 18, 2024

[BUG] Test failure hash_aggregate_test.py::test_hash_reduction_collect_set_on_nested_array_type DATAGEN_SEED=1705515231 #10209

Closed

jlowe mentioned this issue Feb 7, 2024

[BUG] hash_aggregate_test failed : IllegalArgumentException: ArrayType(DecimalType(20,2),true) is not supported for GPU processing yet. #10390

Closed

jlowe added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Feb 23, 2024

sameerz assigned ttnghia and unassigned revans2 Mar 2, 2024

jlowe closed this as completed Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] test_hash_reduction_collect_set_on_nested_array_type failed in a distributed environment #10133

[BUG] test_hash_reduction_collect_set_on_nested_array_type failed in a distributed environment #10133

sameerz commented Dec 31, 2023

tgravescs commented Jan 4, 2024

tgravescs commented Jan 5, 2024

tgravescs commented Jan 9, 2024 •

edited

tgravescs commented Jan 9, 2024

tgravescs commented Jan 9, 2024

jlowe commented Jan 29, 2024

ttnghia commented Mar 6, 2024

ttnghia commented Mar 15, 2024 •

edited

jlowe commented Mar 15, 2024

[BUG] test_hash_reduction_collect_set_on_nested_array_type failed in a distributed environment #10133

[BUG] test_hash_reduction_collect_set_on_nested_array_type failed in a distributed environment #10133

Comments

sameerz commented Dec 31, 2023

tgravescs commented Jan 4, 2024

tgravescs commented Jan 5, 2024

tgravescs commented Jan 9, 2024 • edited

tgravescs commented Jan 9, 2024

tgravescs commented Jan 9, 2024

jlowe commented Jan 29, 2024

ttnghia commented Mar 6, 2024

ttnghia commented Mar 15, 2024 • edited

jlowe commented Mar 15, 2024

tgravescs commented Jan 9, 2024 •

edited

ttnghia commented Mar 15, 2024 •

edited