The objective of this programming assignment is to be able to implement and analyze efficient algorithms for fragments of XPath queries.
-
Preliminaries
This assignment will use the streaming format for the XML. An XML document will be given as input to the implemented algorithms a whitespace delimited file having the following format:
0|1 <whitespace> element1 0|1 <whitespace> element2
where 0|1 is a bit indicating a startElement or endElement event respectively, and element is the name of the element.
-
Assignments
-
Assignment 1 - SimpleQuery
Implement a streaming algorithm for XPath queries of the form
//e1/e2/.../en
where ei are element names. -
Assignment 2 - LazyDFA
Implement a streaming algorithm using the lazy DFA method explained in for queries of the form:
//p1//p2//...//pn
where each pi is an path of the form:ei1/ei2/.../eim
and eij are element names. -
-
Execution
-
Compile
ant compile
-
Create file jar
ant jar
-
Execute file jar
ant run -Darg0=<path of xml file> -Darg1=<xpath query>
eg:
ant run -Darg0=test/input.txt -Darg1=//a
-
Delete files
ant clean
-
-
Reference:
[1] Todd J Green, Ashish Gupta, Gerome Miklau, Makoto Onizuka, and Dan Suciu. Processing XML streams with deterministic automata and stream indexes. ACM TODS, 29(4):752–788, 2004.
[2] Maynooth University - Constructing a DFA from an NFA Link