Javassist now provides data-flow analysis API

Posted by    |      

Out of need, I recently contributed a data-flow analysis framework to javassist. The framework allows an application to determine, by inference, the type-state of the local variable table and stack frame at the start of every bytecode instruction. For those unfamiliar with the java bytecode format, a lot of information is lost once a java program is compiled, since it is not really needed when the program is executed, and leaving it out helps keep class files small.

To illustrate this loss, take a look at the following simple Java method:

public static class Base {}
public static class A extends Base{}
public static class B extends Base{}
public static class C extends B{}

private void foo(int x) {
   Base b;
   if (x > 4) {
       b = new A();
   } else {
       b = new C();
   }

   b.toString();
}

While it is quite clear in the Java code that b is of type Base, this information is missing from the output of a compiler:

private void foo(int);
  Code:
   Stack=2, Locals=3, Args_size=2
   0:   iload_1
   1:   iconst_4
   2:   if_icmple       16
   5:   new     #68; //class example/Example$A
   8:   dup
   9:   invokespecial   #70; //Method example/Example$A."<init>":()V
   12:  astore_2
   13:  goto    24
   16:  new     #71; //class example/Example$C
   19:  dup
   20:  invokespecial   #73; //Method example/Example$C."<init>":()V
   23:  astore_2
   24:  aload_2
   25:  invokevirtual   #74; //Method java/lang/Object.toString:()Ljava/lang/String;
   28:  pop
   29:  return

Since toString() is declared by Object, all that line 25 tells us is that the type is an Object, which is obviously not very specific. If the class was compiled with debugging, you would be able to learn that local #2 was of type Base, but even if you did have this information, you would not necessarily know that the object invoked on by invokevirtual is the value stored in local variable 2. The only way to determine that is to know the state of the stack frame immediately before the instruction executes.

The analysis framework provides this by modeling the effect of every instruction, until it can eventually infer the type information. This process does not use any debugging information, since there is no guarantee it is available. Instead, it extrapolates it by tracking all possible type states, as every branch is evaluated, until the type information is reduced to the most specific type state available.

The following code, which uses the framework, is able to tell us that the type invoked on line 25 is in fact Base:

Analyzer a = new Analyzer();
CtClass clazz = ClassPool.getDefault().get("example.Example");
Frame[] frames = a.analyze(clazz.getDeclaredMethod("foo"));
System.out.println(frames[25].peek()); // Prints "example.Example$Base"

There is also a nice little tool I added, called framedump, that dumps the entire state at every instruction in human readable format, and yes I know that's debatable :)

$ framedump example.Example
private void foo(int);
0: iload_1
     stack []
     locals [example.Example, int, empty]

... snipped for brevity ...

24: aload_2
     stack []
     locals [example.Example, int, example.Example$Base]
25: invokevirtual #85 = Method java.lang.Object.toString(()Ljava/lang/String;)
     stack [example.Example$Base]
     locals [example.Example, int, example.Example$Base]
28: pop
     stack [java.lang.String]
     locals [example.Example, int, example.Example$Base]
29: return
     stack []
     locals [example.Example, int, example.Example$Base]

Some of you are probably thinking:

That sounds nice and all, but why in the world would I ever need to use this?

It is definitely not something useful to everyone, however it is very useful for certain applications:

  • Bytecode Enhancers
  • Verifiers
  • Optimizers
  • Debugging/Profiling Tools
  • Decompilers

To expand on the enhancer example, for security reasons, the JVM actually does its own data-flow analysis to verify that a class does not violate type rules before it can be ran. This poses an interesting challenge to any application that manipulates bytecode, since any change that affects the possible type-state can lead to a verify error and the JVM throwing out the class. Frameworks such as this can be used to prevent this problem, since they reveal the same (in the javassist analyzer case, actually more detailed) information available to the JVM's verifier.

If you want to play with this new feature, download the recently released 3.8.0 here.

The javadoc is here.

Note, I should also mention that the ASM project has had a similar framework for quite some time, however, it wasn't usable in my case since I needed the ability to handle reduction of multi-interface and array types. Also, I was already using javassist and switching just wasn't possible, mainly due to other features I rely on.

Enjoy!


Back to top