[groovy-dev] parsing dynamic data in an unorthodox format into a map

Discussion:

Protocol-X

2015-02-15 00:36:36 UTC

I am looking for help. I have been racking my brain for days on how to parse
data in simular formats below into a map. I have yet to come up with a
solution that is very universal. + symbols indicate adding of field strings
togethere and brackets indicate what can be mapped together. Any thoughts
or suggestions would be greatly appreciated.
Example entry:

[MAINK:[
XO=xo:[XTYPE, XCHOSEN, XFAULT]],
XOCD=xocd:[]
];

/ \
|XO=xo... |
| |
|XO=xo...,XTYPE |
| / \ |
| |,XCHOSEN| |
|XO=xo + + |
| |,XFAULY | |
| \ / |
MAINK:+ // \+;
| || / \|||
| || |ziditemnm||||
| ||,ZIDITEMNM=+ +|||
| || |TOP ||||
|XOCD=xocd|+ \ /+||
| || |||
| ||,THESH |||
| || |||
| || |||
\ \\ ///

<Loading Image...

>

--
View this message in context: http://groovy.329449.n5.nabble.com/parsing-dynamic-data-in-an-unorthodox-format-into-a-map-tp5722615.html
Sent from the groovy - dev mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Jim White

2015-02-15 01:53:47 UTC

Permalink

This looks like a job for a recursive descent parser which could be written
by hand pretty easily although most any parser generator (Antlr, Jparsec,
etc) would be happy to do the job as well.

Which example format are you trying to parse? The one with the square
brackets or the diagram? If the former then you might look at JsonSlurper
to see how such parsing is done. The diagram is a little trickier since
you need to use some information about the surrounding context (which may
simply be the nesting depth) in order to know how many pipe characters to
expect but other than that the logic is about the same.

Jim

Post by Protocol-X
I am looking for help. I have been racking my brain for days on how to parse
data in simular formats below into a map. I have yet to come up with a
solution that is very universal. + symbols indicate adding of field strings
togethere and brackets indicate what can be mapped together. Any thoughts
or suggestions would be greatly appreciated.
[MAINK:[
XO=xo:[XTYPE, XCHOSEN, XFAULT]],
XOCD=xocd:[]
];
/ \
|XO=xo... |
| |
|XO=xo...,XTYPE |
| / \ |
| |,XCHOSEN| |
|XO=xo + + |
| |,XFAULY | |
| \ / |
MAINK:+ // \+;
| || / \|||
| || |ziditemnm||||
| ||,ZIDITEMNM=+ +|||
| || |TOP ||||
|XOCD=xocd|+ \ /+||
| || |||
| ||,THESH |||
| || |||
| || |||
\ \\ ///
<
http://groovy.329449.n5.nabble.com/file/n5722615/Screenshot_2015-02-14-18-14-27-1.png
--
http://groovy.329449.n5.nabble.com/parsing-dynamic-data-in-an-unorthodox-format-into-a-map-tp5722615.html
Sent from the groovy - dev mailing list archive at Nabble.com.
---------------------------------------------------------------------
http://xircles.codehaus.org/manage_email

Protocol-X

2015-02-15 01:57:55 UTC

Permalink

The square brackets is the exaple output of what i am trying to obtain. The
diagram is how the data is. It has multiple varients some are longer some
are wider with other options.

--
View this message in context: http://groovy.329449.n5.nabble.com/parsing-dynamic-data-in-an-unorthodox-format-into-a-map-tp5722615p5722619.html
Sent from the groovy - dev mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Jim White

2015-02-15 02:37:22 UTC

Permalink

Do you have access to the specification for that input format and/or code
that is generating it? If so then begin there.

In any case you should then write down input/output pairs, starting with
the simplest cases (as in empty or null values) and then extending into
more complex cases one feature at a time. Code those cases into test code
(as with Spock) then write code that passes each of those cases one at a
time starting with the simplest.

For example, a possible interpretation for the example you have there would
be to produce this Groovy:

[_type:'MAINK', XO:[_type:'xo', XTYPE:null, XCHOSEN:null, XFAULT:null] ,
XOCD:[_type:'xocd']]

You then would have a test case like:

assert parse("""[MAINK:[
XO=xo:[XTYPE, XCHOSEN, XFAULT],
XOCD=xocd:[]]
];""") == [_type:'MAINK', XO:[_type:'xo', XTYPE:null, XCHOSEN:null,
XFAULT:null] , XOCD:[_type:'xocd']]

Note that I moved the closing bracket from the XO line to the XOCD line
because it looked unbalanced to me. I may have totally misunderstood the
precise structure you're after but the process I described above can be
used to solve most any algorithmic programming problem such as this one.
There are a ton of resources about test-driven design (TDD) around that
will help you if you need more information about this.

If all of the above is totally obvious to you and your question is about
some specific aspect of the format and its conversion then you'll need to
spell out what you have and the part you're stuck on. Also please note that
questions like this one belong on groovy-user not groovy-dev.

Jim

Post by Protocol-X
The square brackets is the exaple output of what i am trying to obtain.
The
diagram is how the data is. It has multiple varients some are longer some
are wider with other options.
--
http://groovy.329449.n5.nabble.com/parsing-dynamic-data-in-an-unorthodox-format-into-a-map-tp5722615p5722619.html
Sent from the groovy - dev mailing list archive at Nabble.com.
---------------------------------------------------------------------
http://xircles.codehaus.org/manage_email

Protocol-X

2015-02-15 02:47:59 UTC

Permalink

Thanks Jim,
Sorry I was not sure which one to post in as they both had general
questions.

--
View this message in context: http://groovy.329449.n5.nabble.com/parsing-dynamic-data-in-an-unorthodox-format-into-a-map-tp5722615p5722622.html
Sent from the groovy - dev mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Protocol-X

2015-02-15 21:35:49 UTC

Permalink

Jim,

Here is what I did as you can see it is not very dynamic. Maybe if it is
not to much to ask you can look over this and see what a better approach is?
Sorry I do not see any code tag options.

data = """\
/ \\
|MO=mo... |
| |
|MO=mo...,MAINT |
| / \\ |
| |,CLUSTER| |
|MO=mo + + |
| |,SUBORD | |
| \\ / |
RXMOP:+ // \\+;
| || / \\|||
| || |clusterid||||
| ||,CLUSTERID=+ +|||
| || |ALL ||||
|MOTY=moty|+ \\ /+||
| || |||
| ||,COUNT |||
| || |||
| || |||
\\ \\\\ ///
""";

rowIndex = 0;
mapIndex = 0;
mmap = [:];
mmap[0] = [];
rowMap = [:];

def cv = "";

data.toCharArray().each {
cav ->

if(cav =~ /[\|\/+]/) {

if(mmap[mapIndex] == null && mapIndex >= 0) {
map[mapIndex] = [];
}

if(cv.size() > 0) {
mmap[mapIndex] << cv;
rowMap[cv] = rowIndex;
cv = "";
}

mapIndex ++;

}else if(cav =~ /[\\]/) {

if(cv.size() > 0) {
mmap[mapIndex] << cv;
rowMap[cv] = rowIndex;
cv = "";
}

mapIndex --;

}else if(!(cav =~ /[ +\n.:;]/)) {
cv += cav;
}else if(cav =~ /[\n]/) {

if(cv.size() > 0) {
mmap[mapIndex] << cv;
rowMap[cv] = rowIndex;
}

cv = "";
mapIndex = 0;
rowIndex ++;
}

if(mmap[mapIndex] == null && mapIndex >= 0) {
mmap[mapIndex] = [];
}
}

newMap = [:];

def rIndex = rowMap[mmap.entrySet().value[0][0]];

ktr = [];

mmap.reverseEach {
k, v ->

if(v.size() > 0) {

mmap[k-1].each {
vs ->

if(vs.getClass() =~ /String/ && vs =~ /\w+=/) {

def iv = [(vs):[]];

mmap[k-1][mmap[k-1].indexOf(vs)] = iv;

v.each {
vi ->

if((rowMap[vs] > rIndex && rowMap[vi] >
rIndex) || (rowMap[vs] < rIndex && rowMap[vi] < rIndex)) {
mmap[k-1][mmap[k-1].indexOf(iv)][vs] <<
vi;
}

}

mmap.remove(k);

}else if(vs.getClass() =~ /LinkedHashMap/){

v.each {
vi ->

def ro = vi;

if(vi.getClass() =~ /LinkedHashMap/) {
ro =vi.entrySet().key[0];
}

if((rowMap[vs.entrySet().key[0]] > rIndex &&
rowMap[ro] > rIndex) || (rowMap[vs.entrySet().key[0]] < rIndex && rowMap[ro]
< rIndex)) {

mmap[k-1][mmap[k-1].indexOf(vs)][vs.entrySet().key[0]] << vi
}
}

mmap.remove(k);
}
}

}else {
mmap.remove(k);
}
}

mmap.each {

def ccv = it.key;

if(ccv > 1 && mmap[ccv - 1] == null) {

while(mmap[ccv - 1] == null) {
ccv = ccv -1;
println ccv
}

mmap[ccv] = mmap[it.key];
mmap.remove(it.key);
}
}

mmap.reverseEach {
k, v ->

if(v.size() > 0) {

mmap[k-1].each {
vs ->

if(vs.getClass() =~ /String/ && vs =~ /\w+=/) {

def iv = [(vs):[]];

mmap[k-1][mmap[k-1].indexOf(vs)] = iv;

v.each {
vi ->

if((rowMap[vs] > rIndex && rowMap[vi] >
rIndex) || (rowMap[vs] < rIndex && rowMap[vi] < rIndex)) {
mmap[k-1][mmap[k-1].indexOf(iv)][vs] <<
vi;
}
}

mmap.remove(k);

}else if(vs.getClass() =~ /LinkedHashMap/){

v.each {
vi ->

def ro = vi;

if(vi.getClass() =~ /LinkedHashMap/) {
ro =vi.entrySet().key[0];
}

if((rowMap[vs.entrySet().key[0]] > rIndex &&
rowMap[ro] > rIndex) || (rowMap[vs.entrySet().key[0]] < rIndex && rowMap[ro]
< rIndex)) {

mmap[k-1][mmap[k-1].indexOf(vs)][vs.entrySet().key[0]] << vi
}
}

mmap.remove(k);
}
}

}else {
mmap.remove(k);
}
}

println data;
println mmap;

--
View this message in context: http://groovy.329449.n5.nabble.com/parsing-dynamic-data-in-an-unorthodox-format-into-a-map-tp5722615p5722633.html
Sent from the groovy - dev mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Jim White

2015-02-15 22:12:44 UTC

Permalink

There is a lot of code there and so I can't say much about what it is
doing. I don't see the test cases which would help a lot in understanding
and are essential for determining whether the code works as intended.

I can make some style remarks though. It is clear you've found Groovy's
regex operator, but you're not using it very efficiently and in some places
where it doesn't really make sense.

Using a StringBuilder is more effecient for building up a string:

def cv = "";

def cv = new StringBuilder()

To append characters you then do cv.append(foo) and get the result with
cv.toString().

Post by Protocol-X
data.toCharArray().each {

cav ->
data.withReader { cav ->

These tests against a single character can be done with comparisons and/or
a switch statement:

if(cav =~ /[\|\/+]/) {
switch(cav) {
case '|' :
case '/' :
case '+' :

To test the class of an object use the instanceof operator:

if(vs.getClass() =~ /String/ && vs =~ /\w+=/) {
if (vs instanceof String && ...

Later you do the same thing for a LinkedHashMap. You should test for the
least specific class which would be Map (so 'if (vi instanceof Map) ...').

Here are some resources that can help with your Groovy:

http://mrhaki.blogspot.com/
http://pleac.sourceforge.net/pleac_groovy/
http://www.manning.com/koenig2/

This is fun site that will help build coding skills in Groovy as well as
many other languages:

http://www.codingame.com/games

TopCoder is also a good site but I don't think they support Groovy
(although they have Java, C++, etc).

I hope that is helpful to you. I apologize in advance for not having time
for further one-on-one assistance.

Jim

Jim White

2015-02-15 22:23:26 UTC

Permalink

Post by Protocol-X

Post by Protocol-X
data.toCharArray().each {

cav ->
data.withReader { cav ->

Whoops. That doesn't include the loop. I usually would use a reader but
the code is a bit more verbose. Since you show using a string you can use
.each on that:

data.each { cav ->

Jim