Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential fix for the negation causing any file that's not the negated file to match #10

Merged

Conversation

kbendick
Copy link
Owner

@kbendick kbendick commented Nov 4, 2020

I found this solution via this discussion: actions/labeler#101

It seems as though when any is used it means that any file that matches ALL of the given glob patterns will be labeled. So the workaround seems to be to use an individual any list for the negated ones.

In order to determine which files might match the negated ones, I used find and then checked the output against the list of labels, like so:

$  find .  -path '*UI.scala' 
./resource-managers/mesos/src/main/scala/org/apache/spark/deploy/mesos/ui/MesosClusterUI.scala
./core/src/main/scala/org/apache/spark/ui/WebUI.scala
./core/src/main/scala/org/apache/spark/ui/SparkUI.scala
./core/src/main/scala/org/apache/spark/deploy/master/ui/MasterWebUI.scala
./core/src/main/scala/org/apache/spark/deploy/worker/ui/WorkerWebUI.scala
./core/src/main/scala/org/apache/spark/internal/config/UI.scala

This tells me that when using the !**/*UI.scala glob, we only need to be concerned (at the moment) with any matches that would match the other existing globs for the label, in this case CORE. So here, we're only worried about /core/**/* as that's the only glob from the CORE label which potentially has any *UI.scala files in it.

Admittedly having to write the config this way is a bit cumbersome. We can either be thorough and use a separate any block with a list for each glob plus all of the negations, which would future proof us a bit more, or we can be conservative to keep the config simpler at the risk of possibly introducing invalid matches later if a *UI.scala file gets placed under some other directory that is matched in the label which currently does not have subfiles with *UI.scala, which we're trying to avoid.

I would vote to keep the config more legible and then open an issue with the maintainers of this github action to see if something can be done.

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

@kbendick kbendick merged commit 9348623 into master Nov 4, 2020
kbendick pushed a commit that referenced this pull request May 11, 2021
…mand

### What changes were proposed in this pull request?

This PR proposes to sort table properties in DESCRIBE TABLE command. This is consistent with DSv2 command as well:
https://github.com/apache/spark/blob/e3058ba17cb4512537953eb4ded884e24ee93ba2/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeTableExec.scala#L63

This PR fixes the test case in Scala 2.13 build as well where the table properties have different order in the map.

### Why are the changes needed?

To keep the deterministic and pretty output, and fix the tests in Scala 2.13 build.
See https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-3.2-scala-2.13/49/testReport/junit/org.apache.spark.sql/SQLQueryTestSuite/describe_sql/

```
describe.sql
Expected "...spark_catalog, view.[query.out.col.2=c, view.referredTempFunctionsNames=[], view.catalogAndNamespace.part.1=default]]", but got "...spark_catalog, view.[catalogAndNamespace.part.1=default, view.query.out.col.2=c, view.referredTempFunctionsNames=[]]]" Result did not match for query apache#29
DESC FORMATTED v
```

### Does this PR introduce _any_ user-facing change?

Yes, it will change the text output from `DESCRIBE [EXTENDED|FORMATTED] table_name`.
Now the table properties are sorted by its key.

### How was this patch tested?

Related unittests were fixed accordingly.

Closes apache#30799 from HyukjinKwon/SPARK-33803.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
kbendick pushed a commit that referenced this pull request May 29, 2022
…aceable

### What changes were proposed in this pull request?

This PR uses a manual recursion to replace `RuntimeReplaceable` expressions instead of `transformAllExpressionsWithPruning`. The problem of `transformAllExpressionsWithPruning` is it will automatically make the replacement expression inherit  the function alias name from the parent node, which is quite misleading. For example, `select date_part('month', c) from t`, the optimized plan in EXPLAIN before this PR is
```
Project [date_part(cast(c#18 as date)) AS date_part(month, c)#19]
+- Relation default.t[c#18] parquet
```
Now it's
```
Project [month(cast(c#9 as date)) AS date_part(month, c)#10]
+- Relation default.t[c#9] parquet
```

### Why are the changes needed?

fix misleading EXPLAIN result

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new test

Closes apache#35821 from cloud-fan/follow2.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant