20 January 2012

Java Construct 1.0.0 Release


I'm pleased to announce release 1.0.0 of Java Construct, a faithful
port to Java of Python Construct.

About Python Construct: http://construct.wikispaces.com/
"Construct is a python library for parsing and building of data
structures (binary or textual). It is based on the concept of defining
data structures in a declarative manner, rather than procedural code:
more complex constructs are composed of a hierarchy of simpler ones.
It's the first library that makes parsing fun, instead of the usual
headache it is today."

About Java Construct: https://github.com/ZiglioNZ/construct/
This Java version employs some syntactic sugar (i.e. static methods)
to make the syntax as close as possible to the original Construct
library in Python.

Example of a Construct:

 import static construct.Core.*;  
   import static construct.Macros.*;  
   import static construct.Adapters.*;  
   import static construct.lib.Containers.*;  
   Construct struct = BitStruct(  
     "foo",  
     BitField("a", 3),  
     Flag("b"),  
     Padding(3),  
     Nibble("c"),  
     Struct("bar",  
       Nibble("d"),  
       Bit("e")  
     )  
   );  
A Java Construct can parse byte arrays and produces Objects like Containers. Viceversa, it can take Objects to produce byte arrays.
   public Object parse(byte[] data);  
   public byte[] build( Object obj);  
Parsing example:
   Container c1 = Container(  
     "a", 7,  
     "b", false,  
     "bar",  
     Container(  
       "d", 15 ,  
       "e", 1  
      ),  
      "c",8  
   );  
   Container c2 = struct.parse( ByteArray( 0xe1, 0x1f ));  
   assertEquals( c1, c2 );  
Currently Java Construct supports enough Macros, Adapters and Repeaters to parse and build these protocols: Full ipstack example: https://github.com/ZiglioNZ/construct/blob/master/src/main/construct/protocols/ipstack.java Notes: 1. I haven't tested for threadsafety but it's a priority 2. Streams are not supported so a message has to be contained in memory. If there are segments, they have to re-assambled prior to parsing 3. Text protocols like http are not supported, it's questionable whether Construct would be the right tool for text parsing.

7 comments:

  1. any advice on whether your code can be easily ported to C++?
    any insight on why the Python version runs even slower under pypy?

    ReplyDelete
  2. Hi Anon,

    1. Interesting question. I haven't done enough C++ to answer it but I've don't enough C to at least think about it. It'd be ironically closing full circle since the inspiration for Python Construct was C structs. It would be quite a bit of work but I'm sure people port stuff back on forth from C++ to Java. Not knowing enough of C++'s std library it's hard for me to estimate how long it would take to do it. Are you familiar with Python Construct and have you experienced performance issues?
    2. You should ask that question to Tomer: http://tomerfiliba.com/blog/Construct-Plans/

    ReplyDelete
    Replies
    1. As always, if the batch job is large enough, execution time differences from static languages become apparent. Was this the motivation for the Java port?
      I was guessing about pypy's static RPython work might be why my CPython run was faster, so i thought i would ask someone with real practical experience with the dynamic (eval, etc) and static versions. Hopefully any JVM dynamic optimizations can beat unoptimized C++ implementations.

      Delete
    2. The motivation for the Java port is that I program in Java (on the server) and there was nothing like Construct. I was sick and tired of designing parsers and encoders for custom binary protocols (have been doing that a dew times over the years).
      I think the point of Construct is providing an easy way to do just that.
      If your problem is generic serialization of objects, in Java there's no shortage, with the best being Kryo. See this comparison: https://github.com/eishay/jvm-serializers/wiki/Staging-Results

      Some libraries, such as Protocol Buffers, support C++ as well as Java and Python.

      Delete
    3. finally got around to testing your Java Construct for a simple ethernet sniffer file....don't know what i am doing wrong with the Python Construct definitions, but the "equivalent" definitions in Java parse about 50 times faster in about 800msec as opposed to 40sec. Would you expect such a performance improvement?

      Delete
    4. Wow, no, I've never done that profiling myself but it sound a lot.
      Were you running CPython? you could try PyPy, it should go faster.
      But 40 seconds sounds like a lot. Or maybe 800ms is too fast, are you feeding a large number of packets?

      Delete
    5. I re-read your first message where you wrote that Construct on PyPy actually runs slower than CPython. I'm really not the right person to answer Python questions. My limited experience running Python was with Pymite, an embedded VM. You may want to enter an issue here: https://github.com/construct/construct/issues and see what the experts say :-)

      Delete