Skip to content

converting regex (regular expression) to DFA directly by creating syntax tree in java

License

Notifications You must be signed in to change notification settings

alirezakay/RegexToDFA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Regex To DFA In Java

This is how to convert regex (REGular EXpression) to DFA by creating and using syntax tree in Java language.

This project is a part of a bigger project we'd done for Compiler Course to create a simple compiler.


Getting Started

🔹 Watch this video to comprehend the concepts: https://www.youtube.com/watch?v=uPnpkWwO9hE

Pay attention to some Rules:

rule3 rule2

NetBeans is the IDE I coded in. you can clone this project and import it in NetBeans easily.

The classes used, are as follows:

  • RegexToDfa
  • SyntaxTree
  • BinaryTree
  • Node
  • LeafNode
  • DfaTraversal
  • State

here is an initialize method which is called in the main function:

public static void initialize() {
    DStates = new HashSet<>();
    input = new HashSet<String>();

    String regex = getRegex();
    getSymbols(regex);

    SyntaxTree st = new SyntaxTree(regex);
    root = st.getRoot();
    followPos = st.getFollowPos();

    State q0 = createDFA();
    DfaTraversal dfat = new DfaTraversal(q0, input);    
}
  • DStates is a Set of States which is used for creating the final dfa.

  • input is also a Set which holding the characters of the input regular expression taken from user.

    pay attention to this issue that, some characters like '*' can be used as an operator (closure, union, concatination, ...)

    so if you want to enter these characters just as a normal character, you could bring a backslash '\' following up the intended character

    for example "\*" meaning a normal '*' character. and "*" meaning star opeartor (closure)
    this is why we use a set of String for declaring input variable.

  • String regex = getRegex(); is for getting the intended regular expression from stdin.

  • getSymbols(regex); this line of code sets the input Set.

  • SyntaxTree st = new SyntaxTree(regex); and this line creates the corresponding syntax tree of the inputted regex.

  • root = st.getRoot(); gets the root of the syntax tree.

  • followPos = st.getFollowPos(); make a new refference to the Set of followpos variable.

  • State q0 = createDFA(); creates the DFA through using the syntax tree and assigns q0 the start state of resulted DFA.

  • DfaTraversal dfat = new DfaTraversal(q0, input); makes a new DFA Traversal object for traversing the resulted DFA and recognizing whether the DFA can accept a particular string or not.


Proof Of Concepts

/*
    ***
        (a|b)*a => creating binary syntax tree:
                            .
                           / \
                          *   a
                         /
                        |
                       / \
                      a   b
    ***
*/

if you look at the SyntaxTree class, you will understand that a BinaryTree object is created.

so a syntax tree is a binary tree with some new special attributes (firstpos, lastpos, nullable, followpos).

in the BinaryTree class, the inputted regex is going to be converted to a tree which contains some nodes.

the last most bottom nodes are called, the leaf nodes.

So, now you know what the Node and LeafNode are used for.


Running the tests

click here to see the acceped result.

click here to see the rejected result.

Authors

License

This project is licensed under the MIT License - see the LICENSE file for details

Acknowledgments

  • Fragmenting section of the regex, using the stack, and also doOperation is adopted by a fine code of @felipemoura in github.