Skip to content

Improve candidate extraction when candidates contain . characters #17113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Mar 11, 2025

Conversation

RobinMalfait
Copy link
Member

@RobinMalfait RobinMalfait commented Mar 10, 2025

This PR fixes an issue where some classes weren't properly extracted due to some incorrect assumptions in the pre processors.

Templating languages such as Haml, Slim and Pug allow you to write classes in a shorter way that are not properly contained inside of strings. E.g.:

p.flex.px-2

These candidates are not properly extracted because there are no bounding characters like quotes. To solve this, we pre-process it and replace . with characters. This results in something like:

p flex px-2

However, this has some challenges on its own. Candidates like px-2.5 cannot be written in this shorthand form, instead they need to be in strings. Now we cannot replace the . because otherwise we would change px-2.5 to px-2 5 which is wrong.

The next problem is that we need to know when they are in a "string". This has another set of problems because these templating languages allow you to write normal text that will eventually be the contents of the HTML tags.

.text-red-500.text-3xl
  | This text can't should be red
                 ^ Wait, is this the start of a string now???

In this example, if we consider the ' the start of a string, when it's clearly not, how would we know it's for sure not a string?

This ended up as a bit of a rabbit hole, but we came up with another approach entirely if we think about the original problem we want to solve which is when do we change . to characters.

One of the rules in our current extractor is that a . has to be between 2 numbers. Which works great in a scenario like: px-2.5. However, if you look at Haml or Slim syntax, this is also allowed:

p.bg-red-500.2xl:flex
           ^^^ Uh oh...

In this scenario, a . is surrounded by numbers so we shouldn't replace it with a space. But as you can see, we clearly do... so we need another heuristic in this case.

Luckily, one of the rules in Tailwind CSS is that a utility cannot start with a number, but a variant can. This means that if we see a scenario like <digit>.<digit> then we can just check if the value after the . is a valid variant or not.

In this case it is a valid variant so we do want to replace the . with a even though we do have the <digit>.<digit> format.

🥴

Test plan

  1. Added additional tests.
  2. Existing tests still pass

@RobinMalfait RobinMalfait requested a review from a team as a code owner March 10, 2025 19:08
thecrypticace
thecrypticace previously approved these changes Mar 10, 2025
@RobinMalfait RobinMalfait force-pushed the fix/handle-strings-better-in-preprocessor branch from 0f377c1 to 1d7d188 Compare March 10, 2025 22:03
@RobinMalfait RobinMalfait marked this pull request as draft March 10, 2025 23:45
Top-level "strings" don't have quotes, so if your paragraph contains a
quote then it should just be a quote and not be considered a string.

This also means that we have to make sure that `<` and `>` are
considered brackets for the HTML case.
In Slim you don't need any type of brackets, but we still want
`"px-2.5"` to be treated as a string.
@RobinMalfait RobinMalfait force-pushed the fix/handle-strings-better-in-preprocessor branch from 1d7d188 to 8a52320 Compare March 11, 2025 12:17
@RobinMalfait RobinMalfait force-pushed the fix/handle-strings-better-in-preprocessor branch from 8a52320 to 462aea7 Compare March 11, 2025 12:26
@RobinMalfait RobinMalfait marked this pull request as ready for review March 11, 2025 12:33
Co-authored-by: Philipp Spiess <hello@philippspiess.com>
@philipp-spiess
Copy link
Member

Screenshot 2025-03-11 at 15 36 41

Unfortunately this still breaks if a double-quote is used in the text. :/

@RobinMalfait have been chatting about this and we'll going to revert the string-based handling of dots inside utilities in pug and instead make it replace dots to space unless:

This way we can avoid thinking about strings in this languages

Let's not care about the concept of a string, instead let's only replace
`.` when it's not inside of a bracket stack _or_ when it's not
surrounded by numbers.

That means that `.flex.items-center` will become ` flex items-center`
but `px-2.5` remains as `px-2.5`.

It _does_ mean that some (top-level) strings will be mutated now, but
that shouldn't be an issue for our extractor to run.
@RobinMalfait RobinMalfait force-pushed the fix/handle-strings-better-in-preprocessor branch from a8b7a03 to ff53406 Compare March 11, 2025 16:18
daisy_button "😉 Get Started", css: "btn-primary text-xl",
right_icon: "arrow-right", target: "_blank",
class: "px-2.5"
href: "https://github com/profoundry-us/loco_motion locomotion-components"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit silly, but the . and # characters are replaced because:

  1. The . is not surrounded by numbers
  2. The # is replaced because it's not inside of a bracket stack

While this does break the string, it doesn't matter from Tailwind's perspective and since we're not updating the real code, it's not an issue.

@RobinMalfait RobinMalfait changed the title Improve string handling in pre processors Improve candidate extraction when candidates contain . characters Mar 11, 2025
@RobinMalfait RobinMalfait dismissed thecrypticace’s stale review March 11, 2025 16:43

This was from before the new implementation. Removing so we can enable auto-merge.

@RobinMalfait RobinMalfait enabled auto-merge (squash) March 11, 2025 16:43
Copy link
Member

@philipp-spiess philipp-spiess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach a lot more. Having to worry about what these template languages consider strings and not felt like a rabbit-hole where the solution is to have a full parser for those languages. This fix is more surgical for sure 👍

@RobinMalfait RobinMalfait merged commit 3706292 into main Mar 11, 2025
5 checks passed
@RobinMalfait RobinMalfait deleted the fix/handle-strings-better-in-preprocessor branch March 11, 2025 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants