Skip to content

SpinningVinyl/SimpleRegexMatcher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple Regex Matcher

What’s this?

This is exactly what it says on the tin — a Java implementation of a simple matcher for regular expressions, based on my NFA Runner.

It supports the following standard metacharacters:

  • The period ., which matches any character.
  • Kleene star *, which means “zero or more of the preceding character/group”.
  • The + character, which means “one or more of the preceding character/group”.
  • The ? character, which means “zero or one of the preceding character/group”.
  • The union operator |, which means “either the preceding or the following character/group”.
  • Left and right brackets [ and ], which are used for specifying character groups and ranges (e.g. [aA] would match either a or A, and [a-z] would match any character between a and z).
  • Left and right parentheses ( and ), which are used for character grouping (exactly the same way as in mathematical notation).

To use a metacharacter as a standard character, escape it with a \ (backslash). For example, \? matches a literal question mark, and \. matches a literal full stop. Naturally, to match a backslash, use two backslashes \\ in your pattern.

This matcher is written as a proof of concept (or, more precisely, as a proof that I can write something like this from scratch), so it does not support more advanced features like character classes and backreferences.

How does it work?

It works by converting a regular expression into a non-deterministic finite automaton (or NFA).

To achieve this, the regex is first converted into a stream of unambigous tokens. During this process, the tokenizer also inserts explicit concatenation tokens between character literals and groups of characters.

The token stream generated by the tokenizer is then converted into postfix notation using Dijkstra’s shunting yard algorithm. Finally, the postfix token stream is converted into an NFA by the NFABuilder class. The NFA simulator itself is a modified version of my NFA Runner.

Can I see it in action?

If you insist.

./regex_matcher.png

Releases

No releases published

Packages

No packages published

Languages