Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect removal of epsilon transitions #2

Closed
interrogator opened this issue Mar 29, 2022 · 21 comments
Closed

Incorrect removal of epsilon transitions #2

interrogator opened this issue Mar 29, 2022 · 21 comments
Assignees
Labels
bug Something isn't working

Comments

@interrogator
Copy link

Hi,

Thanks for the library!

I generate an NFA with the following data and attempt to remove epsilon transitions.

The result is incorrect: in the produced DFA, it becomes possible to jump from A to D without passing through B and C.

image

Another issue I've found is that if there are two mirroring epsilon transitions, i.e. from 1 to 2 and from 2 to 1, attempting to remove epsilon transitions breaks with a max recursion error.

Any idea what's happening? If need be I could potentially help with a fix!

Thanks

@rohaquinlop rohaquinlop pinned this issue Mar 29, 2022
@rohaquinlop
Copy link
Owner

Hi,

Foremost, I want to thank you for using automathon, I was really excited to know that you were using it. You are absolutely right, I am going to look into the errors you mention and also if you want to help with your solution, I am completely open to receive your help.

Thank you very much, greetings from Colombia

@rohaquinlop rohaquinlop added the bug Something isn't working label Mar 29, 2022
@rohaquinlop rohaquinlop self-assigned this Mar 29, 2022
@interrogator
Copy link
Author

Hey,

Thanks for the fast reply!

I've found a working solution at: https://github.com/lewiuberg/automata/blob/lambda/automata/fa/nfa.py#L154

But this means I currently have to use both libs (that one for the epsilon removal, yours for the visualisation), which isn't much good. And this fix is in a branch of a fork, which is not ideal.

I'm not much of a graph/automaton theorist, so I'm not sure I'll be able to code a solution. But if there is something I can do I'd be happy to try if I've got time.

Thanks again for your help. We're using this code to facilitate a kind of advanced way of searching natural language text, just in case you're curious!

@rohaquinlop
Copy link
Owner

Thank you for the solution!

I'll be working on it tomorrow and hope I can fix it as soon as possible.

I'm very happy that the library is helping you on the research!

@rohaquinlop
Copy link
Owner

Have a good day Daniel, hope you are going well.

I think I just fixed the errors, you can use the new version 0.0.8!

Thank you for report me the error, if there is anything that I can do for you just tell me or email me, all my contact information is on my profile.

@interrogator
Copy link
Author

Hey, wow, that fix is great, thanks so much for helping so quickly.

There is one other thing I'm trying to accomplish, wondering if you can help.

For example, I generate this NFA with epsilon:

image

After running the removeEpsilonTransitions() method, I get:

image

What I would like to generate instead is the equivalent:

image

Would you consider coding an option for this, or providing me with some tips so that I can adjust your code and do it myself?

Thanks so much!

@rohaquinlop
Copy link
Owner

It is with pleasure for me!

What you propose is very interesting, I will work on what you say, what I can see is that I must make a minimization of the automaton.

@interrogator
Copy link
Author

Yes, minimization, that is probably the right term for it. I think you understand what I mean! If you can find a solution, that would be fantastic. I suppose it could be accessed by automaton.removeEpsilonTransitions(minimal=bool) or even there could be a more general method, automaton.minimize() which removes duplicate routes where possible. Anything you like would be fine with us! And let me know if there's something I can do to help.

@rohaquinlop
Copy link
Owner

Yes! I was thinking on one general method. I'm looking how to make it

@rohaquinlop
Copy link
Owner

Hey @interrogator I think I found a way to minimize a NFA, only created this method for the NFA class because the theory says that is different when you want to reduce the number of states on DFA, for the moment the minimize function is only available for NFA. Here is a preview

Before

minimize - NFA without EpsilonTransitions gv

After
minimize - result of minimization gv

Hope this is what you're looking, if you find any error or anything that I can make for you just let me know.

Have a good day!

@interrogator
Copy link
Author

Looks great! I will try it out shortly!

@interrogator
Copy link
Author

OK, I gave it a try and the new algorithm seems to work well!

I think I have just one more issue: When I look inside the .delta attribute, I see that the node names are stringified lists:

{"['0']": {'AA': ["['1']"]}, "['1']": {'BB': ["['10', '13', '2']"]}

Would it be possible for this data structure to be cleaned up, so that the keys are just 0 and the value's values are either a list or set of ints? Otherwise I have to eval the data, which is never very nice.

Thanks again for your help, it's really working great!

@rohaquinlop
Copy link
Owner

rohaquinlop commented Apr 1, 2022

Great!!

Let me see if I understand what you say.

From this:
{"['0']": {'AA': ["['1']"]}, "['1']": {'BB': ["['10', '13', '2']"]}

to one of these options:

  • {'0': {'AA': [['1']]}, '1': {'BB': [['10', '13', '2']]}
  • {'0': {'AA': ['1']}, '1': {'BB': ['2']}
  • {0: {'AA': [1]}, 1: {'BB': [2]}

Please tell me which one of the options.

It's a pleasure to help you.

@interrogator
Copy link
Author

I think using sets might be best where possible:

{'0': {'AA': {'1'}}, '1': {'BB': {'10', '13', '2'}}

Of course I would be fine if integers could be used here, as long as nothing ever gets turned into a string, then I would pass in strings to begin with and end up with:

{0: {'AA': {1}}, 1: {'BB': {10, 13, 2}}

Make sense? Thanks again!

@interrogator
Copy link
Author

By the way, something else I found useful in a different lib, is the ability to renumber an automaton starting from 0 after minimization. For example, with the pythomata.impl.simple.SimpleNFA class, you can do nfa.renumbering() and your nodes start from 0 again. So this would be really nice to have as well.

@rohaquinlop
Copy link
Owner

Hi @interrogator

Didn't reply because I was on an ICPC competition, so excuse me for my absence. I think that for the moment I can change on the algorithm to return the result on this way: {'0': {'AA': ['1']}, '1': {'BB': ['2']} while I rewrite the code to use sets instead of lists, and I was already thinking on that function, to "fix" the array results on the states, so I think that we are on the same page. Let me work on the renumbering function and then publish the update, and while you work with this update I'll be rewriting the code to use the sets instead of lists.

Hope this is what you're looking, if you find any error or anything that I can make for you, just let me know.

@rohaquinlop
Copy link
Owner

Hey @interrogator,

The renumber method is already implemented, this method doesn't return a NFA, simply changes the labels on the automata when you call the function: automata.renumber()

@interrogator
Copy link
Author

Hey,

I just got an unexpected error on what should be a very simple graph. Here I catch the error and print out the automaton's __dict__ at error time for you

{'Q': {'3', '2', '1', '0'}, 'sigma': {"token.upos = 'A'", "token.upos = 'C'", "token.upos = 'B'"}, 'delta': defaultdict(<class 'dict'>, {'0': {"token.upos = 'A'": {'1'}}, '1': {"token.upos = 'B'": {'2'}}, '2': {"token.upos = 'C'": {'3'}}}), 'initialState': '0', 'F': {'3'}}
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_24489/1628032756.py in <module>
      2 
      3 query = '[upos="A"] [upos="B"] [upos="C"]'
----> 4 data = cquery(query=query, retrieve=True, epsilon=False, minimise=True)
      5 data

~/work/cquery/cquery/cquery.py in cquery(db, corpus, id_field, query, prefilter, graph, automaton, epsilon, script, as_dict, retrieve, minimise, verbose)
    581     if minimise:
    582         try:
--> 583             auto = auto.minimize()
    584         except:
    585             print(auto.__dict__)

~/venv/py3.9/lib/python3.9/site-packages/automathon/finiteAutomata/nfa.py in minimize(self)
    289   def minimize(self):
    290     """Minimize the automata and return the NFA result of the minimization"""
--> 291     localDFA = self.getDFA()
    292     localNFA = localDFA.getNFA()
    293     localNFA.renumber()

~/venv/py3.9/lib/python3.9/site-packages/automathon/finiteAutomata/nfa.py in getDFA(self)
    260 
    261       for t in T:
--> 262         T[t].sort()
    263         tmp = T[t].copy()
    264         if tmp not in visited:

AttributeError: 'set' object has no attribute 'sort'

@rohaquinlop
Copy link
Owner

This error is because you're using sets instead of lists on the delta variable, you have defined delta like this {'0': {"token.upos = 'A'": {'1'}}, '1': {"token.upos = 'B'": {'2'}}, '2': {"token.upos = 'C'": {'3'}}} instead of this {'0': {"token.upos = 'A'": ['1']}, '1': {"token.upos = 'B'": ['2']}, '2': {"token.upos = 'C'": ['3']}}. Actually, I'm rewriting the code to use sets instead of lists, but for the moment the update that I made yesterday was to add the renumber function.

@interrogator
Copy link
Author

Ah ok, thanks for that!

@rohaquinlop
Copy link
Owner

Hi @interrogator,

No problem, I have changed the use of lists on the delta variable on the NFA's, so you can use it right now like you were trying {'0': {"token.upos = 'A'": {'1'}}, '1': {"token.upos = 'B'": {'2'}}, '2': {"token.upos = 'C'": {'3'}}}.

Have a good day!

@interrogator
Copy link
Author

I guess I can close this issue, since all the mentioned problems are solved! Are there any other updates planned?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants