{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":17165658,"defaultBranch":"master","name":"spark","ownerLogin":"apache","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2014-02-25T08:00:08.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/47359?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1718082245.0","currentOid":""},"activityList":{"items":[{"before":"a3625a98e78c43c64cbe4a21f7c70f46307df508","after":"b5e1b7988031044d3cbdb277668b775c08db1a74","ref":"refs/heads/master","pushedAt":"2024-06-12T12:23:09.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-48596][SQL] Perf improvement for calculating hex string for long\n\n### What changes were proposed in this pull request?\n\nThis pull request optimizes the `Hex.hex(num: Long)` method by removing leading zeros, thus eliminating the need to copy the array to remove them afterward.\n### Why are the changes needed?\n\n- Unit tests added\n- Did a benchmark locally (30~50% speedup)\n\n```scala\nHex Long Tests:                           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative\n------------------------------------------------------------------------------------------------------------------------\nLegacy                                             1062           1094          16          9.4         106.2       1.0X\nNew                                                 739            807          26         13.5          73.9       1.4X\n```\n\n```scala\nobject HexBenchmark extends BenchmarkBase {\n  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {\n    val N = 10_000_000\n    runBenchmark(\"Hex\") {\n      val benchmark = new Benchmark(\"Hex Long Tests\", N, 10, output = output)\n      val range = 1 to 12\n      benchmark.addCase(\"Legacy\") { _ =>\n        (1 to N).foreach(x => range.foreach(y => hexLegacy(x - y)))\n      }\n\n      benchmark.addCase(\"New\") { _ =>\n        (1 to N).foreach(x => range.foreach(y => Hex.hex(x - y)))\n      }\n      benchmark.run()\n    }\n  }\n\n  def hexLegacy(num: Long): UTF8String = {\n    // Extract the hex digits of num into value[] from right to left\n    val value = new Array[Byte](16)\n    var numBuf = num\n    var len = 0\n    do {\n      len += 1\n      // Hex.hexDigits need to be seen here\n      value(value.length - len) = Hex.hexDigits((numBuf & 0xF).toInt)\n      numBuf >>>= 4\n    } while (numBuf != 0)\n    UTF8String.fromBytes(java.util.Arrays.copyOfRange(value, value.length - len, value.length))\n  }\n}\n```\n\n### Does this PR introduce _any_ user-facing change?\nno\n\n### How was this patch tested?\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46952 from yaooqinn/SPARK-48596.\n\nAuthored-by: Kent Yao <yao@apache.org>\nSigned-off-by: Kent Yao <yao@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48596\">SPARK-48596</a>][SQL] Perf improvement for calculating hex string for long"}},{"before":"da81d8ecb80226fa5fb2b6e50048f05d67fb5904","after":"a3625a98e78c43c64cbe4a21f7c70f46307df508","ref":"refs/heads/master","pushedAt":"2024-06-12T09:11:27.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-48595][CORE] Cleanup deprecated api usage related to `commons-compress`\n\n### What changes were proposed in this pull request?\nThis pr use `org.apache.commons.io.output.CountingOutputStream` instead of `org.apache.commons.compress.utils.CountingOutputStream` to fix the following compilation warnings related to 'commons-compress':\n\n```\n[WARNING] [Warn] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala:308: class CountingOutputStream in package utils is deprecated\nApplicable -Wconf / nowarn filters for this warning: msg=<part of the message>, cat=deprecation, site=org.apache.spark.deploy.history.RollingEventLogFilesWriter.countingOutputStream, origin=org.apache.commons.compress.utils.CountingOutputStream\n[WARNING] [Warn] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala:351: class CountingOutputStream in package utils is deprecated\nApplicable -Wconf / nowarn filters for this warning: msg=<part of the message>, cat=deprecation, site=org.apache.spark.deploy.history.RollingEventLogFilesWriter.rollEventLogFile.$anonfun, origin=org.apache.commons.compress.utils.CountingOutputStream\n```\n\nThe fix refers to:\n\nhttps://github.com/apache/commons-compress/blob/95727006cac0892c654951c4e7f1db142462f22a/src/main/java/org/apache/commons/compress/utils/CountingOutputStream.java#L25-L33\n\n```\n/**\n * Stream that tracks the number of bytes read.\n *\n * since 1.3\n * NotThreadSafe\n * deprecated Use {link org.apache.commons.io.output.CountingOutputStream}.\n */\nDeprecated\npublic class CountingOutputStream extends FilterOutputStream {\n```\n\n### Why are the changes needed?\nCleanup deprecated api usage related to `commons-compress`\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nPass GitHub Actions\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46950 from LuciferYang/SPARK-48595.\n\nAuthored-by: yangjie01 <yangjie01@baidu.com>\nSigned-off-by: Kent Yao <yao@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48595\">SPARK-48595</a>][CORE] Cleanup deprecated api usage related to `commons-…"}},{"before":"8870efce19f2abb8419f835d29304ffa7cc53251","after":"da81d8ecb80226fa5fb2b6e50048f05d67fb5904","ref":"refs/heads/master","pushedAt":"2024-06-12T08:39:55.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-48584][SQL] Perf improvement for unescapePathName\n\n### What changes were proposed in this pull request?\n\nThis PR improves perf for unescapePathName with algorithms briefly described as:\n- If a path contains no '%' or contains '%' at `position > path.length-2`, we return the original identity instead of creating a new StringBuilder to append char by char\n- Otherwise, we loop with 2 indices, `plaintextStartIdx` which starts from 0 and then points to the next char after resolving `%xx`, and `plaintextEndIdx` which points to the next `'%'`. `plaintextStartIdx` moves to `plaintextEndIdx + 3` if `%xx` is valid, or moves to `plaintextEndIdx + 1` if `%xx` is invalid.\n- Instead of using Integer.parseInt with error capture, we identify the high and low characters manually.\n\n### Why are the changes needed?\n\nperformance improvement for hotspots\n\n### Does this PR introduce _any_ user-facing change?\n\nno\n\n### How was this patch tested?\n\n- new tests in ExternalCatalogUtilsSuite\n- Benchmark results (9-11x faster)\n### Was this patch authored or co-authored using generative AI tooling?\n\nno\n\nCloses #46938 from yaooqinn/SPARK-48584.\n\nAuthored-by: Kent Yao <yao@apache.org>\nSigned-off-by: Kent Yao <yao@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48584\">SPARK-48584</a>][SQL] Perf improvement for unescapePathName"}},{"before":"334816af5f865d0e4c7a8e9c02b30236052d214e","after":"8870efce19f2abb8419f835d29304ffa7cc53251","ref":"refs/heads/master","pushedAt":"2024-06-12T07:41:46.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-48581][BUILD] Upgrade dropwizard metrics to 4.2.26\n\n### What changes were proposed in this pull request?\n\nUpgrade dropwizard metrics to 4.2.26.\n\n### Why are the changes needed?\n\nThere are some bug fixes as belows:\n\n- Correction for the Jetty-12 QTP metrics by dkaukov in https://github.com/dropwizard/metrics/pull/4181\n\n- Fix metrics for InstrumentedEE10Handler by zUniQueX in https://github.com/dropwizard/metrics/pull/3928\n\nThe full release notes:\nhttps://github.com/dropwizard/metrics/releases/tag/v4.2.26\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nPassed GA.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46932 from wayneguow/codahale.\n\nAuthored-by: Wei Guo <guow93@gmail.com>\nSigned-off-by: Kent Yao <yao@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48581\">SPARK-48581</a>][BUILD] Upgrade dropwizard metrics to 4.2.26"}},{"before":"72df3cb1a43bd3cc0b20456733228dbb0b403305","after":"334816af5f865d0e4c7a8e9c02b30236052d214e","ref":"refs/heads/master","pushedAt":"2024-06-12T03:45:44.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HeartSaVioR","name":"Jungtaek Lim","path":"/HeartSaVioR","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1317309?s=80&v=4"},"commit":{"message":"[SPARK-48411][SS][PYTHON] Add E2E test for DropDuplicateWithinWatermark\n\n### What changes were proposed in this pull request?\nThis PR adds a test for API DropDuplicateWithinWatermark in Python, which was previously missing.\n\n### Why are the changes needed?\nCheck the correctness of API DropDuplicateWithinWatermark.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nPassed:\n```\npython/run-tests --testnames pyspark.sql.tests.streaming.test_streaming\npython/run-tests --testnames pyspark.sql.tests.connect.streaming.test_parity_streaming\n```\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46740 from eason-yuchen-liu/DropDuplicateWithinWatermark_test.\n\nAuthored-by: Yuchen Liu <yuchen.liu@databricks.com>\nSigned-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48411\">SPARK-48411</a>][SS][PYTHON] Add E2E test for DropDuplicateWithinWatermark"}},{"before":"82a84ede6a47232fe3af86672ceea97f703b3e8a","after":"72df3cb1a43bd3cc0b20456733228dbb0b403305","ref":"refs/heads/master","pushedAt":"2024-06-12T02:14:47.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-48582][BUILD] Upgrade `braces` from 3.0.2 to 3.0.3 in ui-test\n\n### What changes were proposed in this pull request?\nThis pr aims to upgrade `braces` from 3.0.2 to 3.0.3 in ui-test.\n\nThe original pr was submitted by `dependabot`: https://github.com/apache/spark/pull/46931\n\n### Why are the changes needed?\nThe new version fix vulnerability https://security.snyk.io/vuln/SNYK-JS-BRACES-6838727\n\n- https://github.com/micromatch/braces/commit/9f5b4cf47329351bcb64287223ffb6ecc9a5e6d3\n\nThe complete list of changes is as follows:\n\n- https://github.com/micromatch/braces/compare/3.0.2...3.0.3\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nPass GitHub Actions\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46933 from LuciferYang/SPARK-48582.\n\nLead-authored-by: yangjie01 <yangjie01@baidu.com>\nCo-authored-by: YangJie <yangjie01@baidu.com>\nSigned-off-by: Kent Yao <yao@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48582\">SPARK-48582</a>][BUILD] Upgrade <code>braces</code> from 3.0.2 to 3.0.3 in ui-test"}},{"before":"61078366b672696244b8cd1922dd52d823b75249","after":"82a84ede6a47232fe3af86672ceea97f703b3e8a","ref":"refs/heads/master","pushedAt":"2024-06-11T19:55:20.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-46937][SQL] Revert \"[] Improve concurrency performance for FunctionRegistry\"\n\n### What changes were proposed in this pull request?\n\nReverts https://github.com/apache/spark/pull/44976 as it breaks thread-safety\n\n### Why are the changes needed?\n\nFix thread-safety\n\n### Does this PR introduce _any_ user-facing change?\n\nno\n\n### How was this patch tested?\n\nN/A\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nno\n\nCloses #46940 from cloud-fan/revert.\n\nAuthored-by: Wenchen Fan <wenchen@databricks.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-46937\">SPARK-46937</a>][SQL] Revert \"[] Improve concurrency performance for Fun…"}},{"before":"aad6771aac3d7b2adbdf53c0f4c8b9e52cbcf2f3","after":"61078366b672696244b8cd1922dd52d823b75249","ref":"refs/heads/master","pushedAt":"2024-06-11T17:38:40.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48576][SQL][FOLLOWUP] Rename UTF8_BINARY_LCASE to UTF8_LCASE\n\n### What changes were proposed in this pull request?\nRenaming `UTF8_BINARY_LCASE` collation to `UTF8_LCASE` in leftover tests.\n\n### Why are the changes needed?\nDue to a merge conflict, one additional test was using the old collation name.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nExisting tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46939 from uros-db/renaming-fix.\n\nAuthored-by: Uros Bojanic <157381213+uros-db@users.noreply.github.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48576\">SPARK-48576</a>][SQL][FOLLOWUP] Rename UTF8_BINARY_LCASE to UTF8_LCASE"}},{"before":"224ba162b5d6e0b8956c423f0cb097d32f1aad4d","after":"aad6771aac3d7b2adbdf53c0f4c8b9e52cbcf2f3","ref":"refs/heads/master","pushedAt":"2024-06-11T17:26:34.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48576][SQL] Rename UTF8_BINARY_LCASE to UTF8_LCASE\n\n### What changes were proposed in this pull request?\nRenaming `UTF8_BINARY_LCASE` collation to `UTF8_LCASE`.\n\n### Why are the changes needed?\nAs part of the collation effort in Spark, we've moved away from byte-by-byte logic towards character-by-character logic, so what we used to call `UTF8_BINARY_LCASE` is now more precisely `UTF8_LCASE`. For example, string searching in UTF8_LCASE now works on character-level (rather than on byte-level), which is reflected in this PRs: https://github.com/apache/spark/pull/46511, https://github.com/apache/spark/pull/46589, https://github.com/apache/spark/pull/46682, https://github.com/apache/spark/pull/46761, https://github.com/apache/spark/pull/46762. In addition, string comparison also works on character-level now, as per the changes introduced in this PR: https://github.com/apache/spark/pull/46700.\n\n### Does this PR introduce _any_ user-facing change?\nYes, what was previously named `UTF8_BINARY_LCASE` collation, will from now on be named `UTF8_LCASE`.\n\n### How was this patch tested?\nExisting tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46924 from uros-db/rename-lcase.\n\nAuthored-by: Uros Bojanic <157381213+uros-db@users.noreply.github.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48576\">SPARK-48576</a>][SQL] Rename UTF8_BINARY_LCASE to UTF8_LCASE"}},{"before":"583ab0500c79bd3cf0146bc1f05f8a11ec37d9a5","after":"224ba162b5d6e0b8956c423f0cb097d32f1aad4d","ref":"refs/heads/master","pushedAt":"2024-06-11T17:01:35.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48556][SQL] Fix incorrect error message pointing to UNSUPPORTED_GROUPING_EXPRESSION\n\n### What changes were proposed in this pull request?\n\nFollowing sequence of queries produces `UNSUPPORTED_GROUPING_EXPRESSION` error:\n```\ncreate table t1(a int, b int) using parquet;\nselect grouping(a), dummy from t1 group by a with rollup;\n```\nHowever, the appropriate error should point the user to the invalid `dummy` column name.\n\nFix the problem by deprioritizing `Grouping` and `GroupingID` nodes in plan which were not resolved and thus cause the unwanted error.\n\n### Why are the changes needed?\n\nTo fix the described issue.\n\n### Does this PR introduce _any_ user-facing change?\n\nYes, it displays proper error message to user instead of misleading one.\n\n### How was this patch tested?\n\nAdded test to `QueryCompilationErrorsSuite`.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46900 from nikolamand-db/SPARK-48556.\n\nAuthored-by: Nikola Mandic <nikola.mandic@databricks.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48556\">SPARK-48556</a>][SQL] Fix incorrect error message pointing to UNSUPPORTE…"}},{"before":"df4156aa3217cf0f58b4c6cbf33c967bb43f7155","after":"583ab0500c79bd3cf0146bc1f05f8a11ec37d9a5","ref":"refs/heads/master","pushedAt":"2024-06-11T16:42:05.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-47415][SQL] Add collation support for Levenshtein expression\n\n### What changes were proposed in this pull request?\nIntroduce collation support for `levenshtein` string expression (pass-through).\n\n### Why are the changes needed?\nAdd collation support for Levenshtein expression in Spark.\n\n### Does this PR introduce _any_ user-facing change?\nYes, users should now be able to use collated strings within arguments for string function: levenshtein.\n\n### How was this patch tested?\nE2e sql tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46788 from uros-db/levenshtein.\n\nAuthored-by: Uros Bojanic <157381213+uros-db@users.noreply.github.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-47415\">SPARK-47415</a>][SQL] Add collation support for Levenshtein expression"}},{"before":"452c1b64b6252b981d261084d870e65ba6d006c9","after":"df4156aa3217cf0f58b4c6cbf33c967bb43f7155","ref":"refs/heads/master","pushedAt":"2024-06-11T10:45:13.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"zhengruifeng","name":"Ruifeng Zheng","path":"/zhengruifeng","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7322292?s=80&v=4"},"commit":{"message":"[SPARK-48372][SPARK-45716][PYTHON][FOLLOW-UP] Remove unused helper method\n\n### What changes were proposed in this pull request?\nfollowup of https://github.com/apache/spark/pull/46685, to remove unused helper method\n\n### Why are the changes needed?\nmethod `_tree_string` is no longer needed\n\n### Does this PR introduce _any_ user-facing change?\nNo, internal change only\n\n### How was this patch tested?\nCI\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46936 from zhengruifeng/tree_string_followup.\n\nAuthored-by: Ruifeng Zheng <ruifengz@apache.org>\nSigned-off-by: Ruifeng Zheng <ruifengz@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48372\">SPARK-48372</a>][<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-45716\">SPARK-45716</a>][PYTHON][FOLLOW-UP] Remove unused helper me…"}},{"before":"53d65fd12dd9231139188227ef9040d40d759021","after":"452c1b64b6252b981d261084d870e65ba6d006c9","ref":"refs/heads/master","pushedAt":"2024-06-11T07:55:40.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"LuciferYang","name":"YangJie","path":"/LuciferYang","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1475305?s=80&v=4"},"commit":{"message":"[SPARK-48551][SQL] Perf improvement for escapePathName\n\n### What changes were proposed in this pull request?\n\nThis PR improves perf for escapePathName with algorithms briefly described as:\n- If a path contains no special characters, we return the original identity instead of creating a new StringBuilder to append char by char\n- If a path contains special characters, we relocate the IDX of the first special character. Then initialize the StringBuilder with [0, IDX) of the original string, and do heximal padding if necessary starting from IDX.\n- An optimized char-to-hex function replaces the `String.format`\n\nAdd a fast path for storage paths or their parts that do not require escaping to avoid creating a StringBuilder to append per character.\n### Why are the changes needed?\n\nperformance improvement for hotspots\n\n### Does this PR introduce _any_ user-facing change?\n\nno\n\n### How was this patch tested?\n\n- new tests in ExternalCatalogUtilsSuite\n- Benchmark results (9x faster)\n### Was this patch authored or co-authored using generative AI tooling?\n\nno\n\nCloses #46894 from yaooqinn/SPARK-48551.\n\nAuthored-by: Kent Yao <yao@apache.org>\nSigned-off-by: yangjie01 <yangjie01@baidu.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48551\">SPARK-48551</a>][SQL] Perf improvement for escapePathName"}},{"before":"e8752784e002d70cf79232f97202586517efa28e","after":null,"ref":"refs/heads/dependabot/npm_and_yarn/ui-test/braces-3.0.3","pushedAt":"2024-06-11T05:04:05.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"dependabot[bot]","name":null,"path":"/apps/dependabot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/29110?s=80&v=4"}},{"before":"1e4750e63403341331d77a3df4c1dfefbe240b02","after":"53d65fd12dd9231139188227ef9040d40d759021","ref":"refs/heads/master","pushedAt":"2024-06-11T03:28:56.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-48565][UI] Fix thread dump display in UI\n\n### What changes were proposed in this pull request?\n\nThread dump display in UI is not pretty as before, this is side-effect introduced by SPARK-44863\n\n### Why are the changes needed?\n\nRestore thread dump display in UI.\n\n### Does this PR introduce _any_ user-facing change?\n\nYes, it only affects UI display.\n\n### How was this patch tested?\n\nCurrent master:\n<img width=\"1545\" alt=\"master-branch\" src=\"https://github.com/apache/spark/assets/26535726/5c6fd770-467f-481c-a635-2855a2853633\">\n\nWith this patch applied:\n<img width=\"1542\" alt=\"Xnip2024-06-07_20-00-38\" src=\"https://github.com/apache/spark/assets/26535726/3998c2aa-671f-4921-8444-b7bca8667202\">\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46916 from pan3793/SPARK-48565.\n\nAuthored-by: Cheng Pan <chengpan@apache.org>\nSigned-off-by: Kent Yao <yao@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48565\">SPARK-48565</a>][UI] Fix thread dump display in UI"}},{"before":"3fe6abde125b7c34437a3f72d17ee97d9653c218","after":"1e4750e63403341331d77a3df4c1dfefbe240b02","ref":"refs/heads/master","pushedAt":"2024-06-11T02:53:23.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-47500][PYTHON][CONNECT][FOLLOWUP] Restore error message for `DataFrame.select(None)`\n\n### What changes were proposed in this pull request?\nthe refactor PR https://github.com/apache/spark/pull/45636 changed the error message of `DataFrame.select(None)` from `PySparkTypeError` to `AssertionError`, this PR restore the previous error message\n\n### Why are the changes needed?\nerror message improvement\n\n### Does this PR introduce _any_ user-facing change?\nyes, error message improvement\n\n### How was this patch tested?\nadded test\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46930 from zhengruifeng/py_restore_select_error.\n\nAuthored-by: Ruifeng Zheng <ruifengz@apache.org>\nSigned-off-by: Hyukjin Kwon <gurwls223@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-47500\">SPARK-47500</a>][PYTHON][CONNECT][FOLLOWUP] Restore error message for `D…"}},{"before":null,"after":"e8752784e002d70cf79232f97202586517efa28e","ref":"refs/heads/dependabot/npm_and_yarn/ui-test/braces-3.0.3","pushedAt":"2024-06-11T02:37:22.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"dependabot[bot]","name":null,"path":"/apps/dependabot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/29110?s=80&v=4"},"commit":{"message":"Bump braces from 3.0.2 to 3.0.3 in /ui-test\n\nBumps [braces](https://github.com/micromatch/braces) from 3.0.2 to 3.0.3.\n- [Changelog](https://github.com/micromatch/braces/blob/master/CHANGELOG.md)\n- [Commits](https://github.com/micromatch/braces/compare/3.0.2...3.0.3)\n\n---\nupdated-dependencies:\n- dependency-name: braces\n  dependency-type: indirect\n...\n\nSigned-off-by: dependabot[bot] <support@github.com>","shortMessageHtmlLink":"Bump braces from 3.0.2 to 3.0.3 in /ui-test"}},{"before":"5a2f374a208f9580ea8d0183d75df6cd2bee8e1f","after":"3fe6abde125b7c34437a3f72d17ee97d9653c218","ref":"refs/heads/master","pushedAt":"2024-06-11T02:36:42.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"LuciferYang","name":"YangJie","path":"/LuciferYang","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1475305?s=80&v=4"},"commit":{"message":"[SPARK-48563][BUILD] Upgrade `pickle` to 1.5\n\n### What changes were proposed in this pull request?\nThis pr aims upgrade `pickle` from 1.3 to 1.5.\n\n### Why are the changes needed?\nThe new version include a new  fix related to [empty bytes object construction](https://github.com/irmen/pickle/commit/badc8fe08c9e47b87df66b8a16c67010e3614e35)\n\nAll changes from 1.3 to 1.5 are as follows:\n\n- https://github.com/irmen/pickle/compare/pickle-1.3...pickle-1.5\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nPass GitHub Actions\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46913 from LuciferYang/pickle-1.5.\n\nAuthored-by: yangjie01 <yangjie01@baidu.com>\nSigned-off-by: yangjie01 <yangjie01@baidu.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48563\">SPARK-48563</a>][BUILD] Upgrade <code>pickle</code> to 1.5"}},{"before":"ec6db63ca6acdf7ba32d3ded99ea207dd3823633","after":"5a2f374a208f9580ea8d0183d75df6cd2bee8e1f","ref":"refs/heads/master","pushedAt":"2024-06-10T18:18:15.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"JoshRosen","name":"Josh Rosen","path":"/JoshRosen","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/50748?s=80&v=4"},"commit":{"message":"[SPARK-48544][SQL] Reduce memory pressure of empty TreeNode BitSets\n\n### What changes were proposed in this pull request?\n\n- Changed the `ineffectiveRules` variable of the `TreeNode` class to initialize lazily. This will reduce unnecessary driver memory pressure.\n\n### Why are the changes needed?\n\n- Plans with large expression or operator trees are known to cause driver memory pressure; this is one step in alleviating that issue.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nExisting UT covers behavior. Outwards facing behavior does not change.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46919 from n-young-db/ineffective-rules-lazy.\n\nAuthored-by: Nick Young <nick.young@databricks.com>\nSigned-off-by: Josh Rosen <joshrosen@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48544\">SPARK-48544</a>][SQL] Reduce memory pressure of empty TreeNode BitSets"}},{"before":"3857a9d36d4ae923994ac70a50e5be9c36686836","after":"ec6db63ca6acdf7ba32d3ded99ea207dd3823633","ref":"refs/heads/master","pushedAt":"2024-06-10T18:06:46.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48569][SS][CONNECT] Handle edge cases in query.name\n\n### What changes were proposed in this pull request?\n\n1. In connect, when a streaming query name is not specified, it's query.name should return None. Currently it returns an empty string without this patch.\n2. In classic spark, one cannot set the streaming query's name to be empty string. This check was missing in Spark Connect. Adding it back.\n\n### Why are the changes needed?\n\nEdge case handling.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo\n\n### How was this patch tested?\n\nAdded unit test.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46920 from WweiL/SPARK-48569-query-name-None.\n\nAuthored-by: Wei Liu <wei.liu@databricks.com>\nSigned-off-by: Hyukjin Kwon <gurwls223@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48569\">SPARK-48569</a>][SS][CONNECT] Handle edge cases in query.name"}},{"before":"61fd936c8939b139f8fa02be8c90e655c2afa355","after":"3857a9d36d4ae923994ac70a50e5be9c36686836","ref":"refs/heads/master","pushedAt":"2024-06-10T16:18:20.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48410][SQL] Fix InitCap expression for UTF8_BINARY_LCASE & ICU collations\n\n### What changes were proposed in this pull request?\nString titlecase conversion under UTF8_BINARY_LCASE and other ICU collations now work using the appropriate ICU default locale for character mapping, and uses ICU BreakIterator.getWordInstance to locate boundaries between words.\n\n### Why are the changes needed?\nSimilar Spark expressions such as Lower & Upper use the same interface (UCharacter) to perform collation-aware string transformation, and InitCap should offer a consistant way to titlecase strings across the collation space.\n\n### Does this PR introduce _any_ user-facing change?\nYes, InitCap should now work properly for all collations other than UTF8_BINARY.\n\n### How was this patch tested?\nNew and existing unit tests, as well as existing e2e sql tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46732 from uros-db/initcap-icu.\n\nAuthored-by: Uros Bojanic <157381213+uros-db@users.noreply.github.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48410\">SPARK-48410</a>][SQL] Fix InitCap expression for UTF8_BINARY_LCASE &amp; ICU…"}},{"before":"190166908ba03897235ba2591f0f9200e7e80387","after":"61fd936c8939b139f8fa02be8c90e655c2afa355","ref":"refs/heads/master","pushedAt":"2024-06-10T16:14:19.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48403][SQL] Fix Lower & Upper expressions for UTF8_BINARY_LCASE & ICU collations\n\n### What changes were proposed in this pull request?\nString lowercase/uppercase conversion in UTF8_BINARY_LCASE now works using ICU default locale, similar to how other ICU collations currently work in Spark.\n\n### Why are the changes needed?\nAll collations apart from UTF8_BINARY should use the same interface (UCharacter) that utilizes ICU toLowerCase/toUpperCase implementation, rather than mixing JVM & ICU implementations.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nExisting unit tests and e2e sql tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46720 from uros-db/lower-upper-initcap.\n\nAuthored-by: Uros Bojanic <157381213+uros-db@users.noreply.github.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48403\">SPARK-48403</a>][SQL] Fix Lower &amp; Upper expressions for UTF8_BINARY_LCAS…"}},{"before":"d9394eee5ebbeb695baaec6122da2ed970842dfd","after":"190166908ba03897235ba2591f0f9200e7e80387","ref":"refs/heads/master","pushedAt":"2024-06-10T16:10:25.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48564][PYTHON][CONNECT] Propagate cached schema in set operations\n\n### What changes were proposed in this pull request?\nPropagate cached schema in set operations\n\n### Why are the changes needed?\nto avoid extra RPC to get the schema of result data frame\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nadded tests\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46915 from zhengruifeng/set_op_schema.\n\nAuthored-by: Ruifeng Zheng <ruifengz@apache.org>\nSigned-off-by: Hyukjin Kwon <gurwls223@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48564\">SPARK-48564</a>][PYTHON][CONNECT] Propagate cached schema in set operations"}},{"before":"24bce72c9065336a962fe76feeb14fa2119ef961","after":"d9394eee5ebbeb695baaec6122da2ed970842dfd","ref":"refs/heads/master","pushedAt":"2024-06-09T15:36:08.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48560][SS][PYTHON] Make StreamingQueryListener.spark settable\n\n### What changes were proposed in this pull request?\n\nThis PR proposes to make StreamingQueryListener.spark settable\n\n### Why are the changes needed?\n\n```python\nfrom pyspark.sql.streaming.listener import StreamingQueryListener\n\nclass MyListener(StreamingQueryListener):\n  def __init__(self, spark):\n    self.spark = spark\n\n  def onQueryStarted(self, event):\n    pass\n\n  def onQueryProgress(self, event):\n    pass\n\n  def onQueryTerminated(self, event):\n    pass\n\nMyListener(spark)\n```\n\nis broken from 3.5.0 after SPARK-42941.\n\n### Does this PR introduce _any_ user-facing change?\n\nYes, end users who implement `StreamingQueryListener` can add `spark` attribute in their implementation.\n\n### How was this patch tested?\n\nManually tested, and added a unittest.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46909 from HyukjinKwon/compat-spark-prop.\n\nAuthored-by: Hyukjin Kwon <gurwls223@apache.org>\nSigned-off-by: Hyukjin Kwon <gurwls223@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48560\">SPARK-48560</a>][SS][PYTHON] Make StreamingQueryListener.spark settable"}},{"before":"201df0d7ac81f6bd5c39f513b0a06cb659dc9a3f","after":"24bce72c9065336a962fe76feeb14fa2119ef961","ref":"refs/heads/master","pushedAt":"2024-06-09T14:22:30.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"sunchao","name":"Chao Sun","path":"/sunchao","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/506679?s=80&v=4"},"commit":{"message":"[SPARK-48012][SQL] SPJ: Support Transfrom Expressions for One Side Shuffle\n\n### Why are the changes needed?\n\nSupport SPJ one-side shuffle if other side has partition transform expression\n\n  ### How was this patch tested?\n\nNew unit test in KeyGroupedPartitioningSuite\n\n  ### Was this patch authored or co-authored using generative AI tooling?\n\n No.\n\nCloses #46255 from szehon-ho/spj_auto_bucket.\n\nAuthored-by: Szehon Ho <szehon.apache@gmail.com>\nSigned-off-by: Chao Sun <chao@openai.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48012\">SPARK-48012</a>][SQL] SPJ: Support Transfrom Expressions for One Side Sh…"}},{"before":"8911d59005e81062c3a515531b03dcf3478db82a","after":"201df0d7ac81f6bd5c39f513b0a06cb659dc9a3f","ref":"refs/heads/master","pushedAt":"2024-06-07T23:49:25.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"zhengruifeng","name":"Ruifeng Zheng","path":"/zhengruifeng","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7322292?s=80&v=4"},"commit":{"message":"[MINOR][PYTHON][TESTS] Move a test out of parity tests\n\n### What changes were proposed in this pull request?\nMove a test out of parity tests\n\n### Why are the changes needed?\nit is not tested in Spark Classic, not a parity test\n\n### Does this PR introduce _any_ user-facing change?\nno\n\n### How was this patch tested?\nci\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46914 from zhengruifeng/move_a_non_parity_test.\n\nAuthored-by: Ruifeng Zheng <ruifengz@apache.org>\nSigned-off-by: Ruifeng Zheng <ruifengz@apache.org>","shortMessageHtmlLink":"[MINOR][PYTHON][TESTS] Move a test out of parity tests"}},{"before":"d81b1e3d358c7c9b3992413b6c7078c92ae9072f","after":"8911d59005e81062c3a515531b03dcf3478db82a","ref":"refs/heads/master","pushedAt":"2024-06-07T20:56:43.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable and Fix UT\n\n### What changes were proposed in this pull request?\nThis is a followup of https://github.com/apache/spark/pull/46905, to fix `some UT` on GA.\n\n### Why are the changes needed?\nFix UT.\n\n### Does this PR introduce _any_ user-facing change?\nNo.,\n\n### How was this patch tested?\nManually test.\nPass GA\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46912 from panbingkun/SPARK-46393_FOLLOWUP.\n\nLead-authored-by: panbingkun <panbingkun@baidu.com>\nCo-authored-by: Wenchen Fan <wenchen@databricks.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-46393\">SPARK-46393</a>][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.…"}},{"before":"87b0f5995383173f6736695211994a1a26995192","after":"d81b1e3d358c7c9b3992413b6c7078c92ae9072f","ref":"refs/heads/master","pushedAt":"2024-06-07T08:54:01.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48559][SQL] Fetch globalTempDatabase name directly without invoking initialization of GlobalaTempViewManager\n\n### What changes were proposed in this pull request?\n\nIt's not necessary to create `GlobalaTempViewManager` only for getting the global temp db name. This PR updates the code to avoid this, as global temp db name is just a config.\n\n### Why are the changes needed?\n\navoid unnecessary RPC calls to check existence of global temp db\n\n### Does this PR introduce _any_ user-facing change?\n\nno\n\n### How was this patch tested?\n\nexisting tests\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nno\n\nCloses #46907 from willwwt/master.\n\nAuthored-by: Weitao Wen <weitao.wen@databricks.com>\nSigned-off-by: Wenchen Fan <wenchen@databricks.com>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48559\">SPARK-48559</a>][SQL] Fetch globalTempDatabase name directly without inv…"}},{"before":"b7d9c317aa2e4de8024e44db895fa8b0cbbb36db","after":"87b0f5995383173f6736695211994a1a26995192","ref":"refs/heads/master","pushedAt":"2024-06-07T08:37:09.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"zhengruifeng","name":"Ruifeng Zheng","path":"/zhengruifeng","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7322292?s=80&v=4"},"commit":{"message":"[SPARK-48561][PS][CONNECT] Throw `PandasNotImplementedError` for unsupported plotting functions\n\n### What changes were proposed in this pull request?\nThrow `PandasNotImplementedError` for unsupported plotting functions:\n- {Frame, Series}.plot.hist\n- {Frame, Series}.plot.kde\n- {Frame, Series}.plot.density\n- {Frame, Series}.plot(kind=\"hist\", ...)\n- {Frame, Series}.plot(kind=\"hist\", ...)\n- {Frame, Series}.plot(kind=\"density\", ...)\n\n### Why are the changes needed?\nthe previous error message is confusing:\n```\nIn [3]: psdf.plot.hist()\n/Users/ruifeng.zheng/Dev/spark/python/pyspark/pandas/utils.py:1017: PandasAPIOnSparkAdviceWarning: The config 'spark.sql.ansi.enabled' is set to True. This can cause unexpected behavior from pandas API on Spark since pandas API on Spark follows the behavior of pandas, not SQL.\n  warnings.warn(message, PandasAPIOnSparkAdviceWarning)\n[*********************************************-----------------------------------] 57.14% Complete (0 Tasks running, 1s, Scanned[*********************************************-----------------------------------] 57.14% Complete (0 Tasks running, 1s, Scanned[*********************************************-----------------------------------] 57.14% Complete (0 Tasks running, 1s, Scanned                                                                                                                                ---------------------------------------------------------------------------\nPySparkAttributeError                     Traceback (most recent call last)\nCell In[3], line 1\n----> 1 psdf.plot.hist()\n\nFile ~/Dev/spark/python/pyspark/pandas/plot/core.py:951, in PandasOnSparkPlotAccessor.hist(self, bins, **kwds)\n    903 def hist(self, bins=10, **kwds):\n    904     \"\"\"\n    905     Draw one histogram of the DataFrame’s columns.\n    906     A `histogram`_ is a representation of the distribution of data.\n   (...)\n    949         >>> df.plot.hist(bins=12, alpha=0.5)  # doctest: +SKIP\n    950     \"\"\"\n--> 951     return self(kind=\"hist\", bins=bins, **kwds)\n\nFile ~/Dev/spark/python/pyspark/pandas/plot/core.py:580, in PandasOnSparkPlotAccessor.__call__(self, kind, backend, **kwargs)\n    577 kind = {\"density\": \"kde\"}.get(kind, kind)\n    578 if hasattr(plot_backend, \"plot_pandas_on_spark\"):\n    579     # use if there's pandas-on-Spark specific method.\n--> 580     return plot_backend.plot_pandas_on_spark(plot_data, kind=kind, **kwargs)\n    581 else:\n    582     # fallback to use pandas'\n    583     if not PandasOnSparkPlotAccessor.pandas_plot_data_map[kind]:\n\nFile ~/Dev/spark/python/pyspark/pandas/plot/plotly.py:41, in plot_pandas_on_spark(data, kind, **kwargs)\n     39     return plot_pie(data, **kwargs)\n     40 if kind == \"hist\":\n---> 41     return plot_histogram(data, **kwargs)\n     42 if kind == \"box\":\n     43     return plot_box(data, **kwargs)\n\nFile ~/Dev/spark/python/pyspark/pandas/plot/plotly.py:87, in plot_histogram(data, **kwargs)\n     85 psdf, bins = HistogramPlotBase.prepare_hist_data(data, bins)\n     86 assert len(bins) > 2, \"the number of buckets must be higher than 2.\"\n---> 87 output_series = HistogramPlotBase.compute_hist(psdf, bins)\n     88 prev = float(\"%.9f\" % bins[0])  # to make it prettier, truncate.\n     89 text_bins = []\n\nFile ~/Dev/spark/python/pyspark/pandas/plot/core.py:189, in HistogramPlotBase.compute_hist(psdf, bins)\n    183 for group_id, (colname, bucket_name) in enumerate(zip(colnames, bucket_names)):\n    184     # creates a Bucketizer to get corresponding bin of each value\n    185     bucketizer = Bucketizer(\n    186         splits=bins, inputCol=colname, outputCol=bucket_name, handleInvalid=\"skip\"\n    187     )\n--> 189     bucket_df = bucketizer.transform(sdf)\n    191     if output_df is None:\n    192         output_df = bucket_df.select(\n    193             F.lit(group_id).alias(\"__group_id\"), F.col(bucket_name).alias(\"__bucket\")\n    194         )\n\nFile ~/Dev/spark/python/pyspark/ml/base.py:260, in Transformer.transform(self, dataset, params)\n    258         return self.copy(params)._transform(dataset)\n    259     else:\n--> 260         return self._transform(dataset)\n    261 else:\n    262     raise TypeError(\"Params must be a param map but got %s.\" % type(params))\n\nFile ~/Dev/spark/python/pyspark/ml/wrapper.py:412, in JavaTransformer._transform(self, dataset)\n    409 assert self._java_obj is not None\n    411 self._transfer_params_to_java()\n--> 412 return DataFrame(self._java_obj.transform(dataset._jdf), dataset.sparkSession)\n\nFile ~/Dev/spark/python/pyspark/sql/connect/dataframe.py:1696, in DataFrame.__getattr__(self, name)\n   1694 def __getattr__(self, name: str) -> \"Column\":\n   1695     if name in [\"_jseq\", \"_jdf\", \"_jmap\", \"_jcols\", \"rdd\", \"toJSON\"]:\n-> 1696         raise PySparkAttributeError(\n   1697             error_class=\"JVM_ATTRIBUTE_NOT_SUPPORTED\", message_parameters={\"attr_name\": name}\n   1698         )\n   1700     if name not in self.columns:\n   1701         raise PySparkAttributeError(\n   1702             error_class=\"ATTRIBUTE_NOT_SUPPORTED\", message_parameters={\"attr_name\": name}\n   1703         )\n\nPySparkAttributeError: [JVM_ATTRIBUTE_NOT_SUPPORTED] Attribute `_jdf` is not supported in Spark Connect as it depends on the JVM. If you need to use this attribute, do not use Spark Connect when creating your session. Visit https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession for creating regular Spark Session in detail.\n```\n\nafter this PR:\n```\nIn [3]: psdf.plot.hist()\n---------------------------------------------------------------------------\nPandasNotImplementedError                 Traceback (most recent call last)\nCell In[3], line 1\n----> 1 psdf.plot.hist()\n\nFile ~/Dev/spark/python/pyspark/pandas/plot/core.py:957, in PandasOnSparkPlotAccessor.hist(self, bins, **kwds)\n    909 \"\"\"\n    910 Draw one histogram of the DataFrame’s columns.\n    911 A `histogram`_ is a representation of the distribution of data.\n   (...)\n    954     >>> df.plot.hist(bins=12, alpha=0.5)  # doctest: +SKIP\n    955 \"\"\"\n    956 if is_remote():\n--> 957     return unsupported_function(class_name=\"pd.DataFrame\", method_name=\"hist\")()\n    959 return self(kind=\"hist\", bins=bins, **kwds)\n\nFile ~/Dev/spark/python/pyspark/pandas/missing/__init__.py:23, in unsupported_function.<locals>.unsupported_function(*args, **kwargs)\n     22 def unsupported_function(*args, **kwargs):\n---> 23     raise PandasNotImplementedError(\n     24         class_name=class_name, method_name=method_name, reason=reason\n     25     )\n\nPandasNotImplementedError: The method `pd.DataFrame.hist()` is not implemented yet.\n```\n\n### Does this PR introduce _any_ user-facing change?\nyes, error message improvement\n\n### How was this patch tested?\nCI\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46911 from zhengruifeng/ps_plotting_unsupported.\n\nAuthored-by: Ruifeng Zheng <ruifengz@apache.org>\nSigned-off-by: Ruifeng Zheng <ruifengz@apache.org>","shortMessageHtmlLink":"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-48561\">SPARK-48561</a>][PS][CONNECT] Throw <code>PandasNotImplementedError</code> for unsu…"}},{"before":"94912920b0e92c9470bdb27a409799d0fe48ff69","after":"b7d9c317aa2e4de8024e44db895fa8b0cbbb36db","ref":"refs/heads/master","pushedAt":"2024-06-07T08:32:02.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"Revert \"[SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable\"\n\nThis reverts commit 82b4ad2af64845503604da70ff02748c3969c991.","shortMessageHtmlLink":"Revert \"[<a class=\"issue-link js-issue-link notranslate\" rel=\"noopener noreferrer nofollow\" href=\"https://issues.apache.org/jira/browse/SPARK-46393\">SPARK-46393</a>][SQL][FOLLOWUP] Classify exceptions in JDBCTable…"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEY0eouAA","startCursor":null,"endCursor":null}},"title":"Activity · apache/spark"}