Out of need, I recently contributed a data-flow analysis framework to javassist. The framework allows an application to determine, by inference, the type-state of the local variable table and stack frame at the start of every bytecode instruction. For those unfamiliar with the java bytecode format, a lot of information is lost once a java program is compiled, since it is not really needed when the program is executed, and leaving it out helps keep class files small.
To illustrate this loss, take a look at the following simple Java method:
public static class Base {} public static class A extends Base{} public static class B extends Base{} public static class C extends B{} private void foo(int x) { Base b; if (x > 4) { b = new A(); } else { b = new C(); } b.toString(); }
While it is quite clear in the Java code that b
is of type Base
, this information is missing from the output of a compiler:
private void foo(int); Code: Stack=2, Locals=3, Args_size=2 0: iload_1 1: iconst_4 2: if_icmple 16 5: new #68; //class example/Example$A 8: dup 9: invokespecial #70; //Method example/Example$A."<init>":()V 12: astore_2 13: goto 24 16: new #71; //class example/Example$C 19: dup 20: invokespecial #73; //Method example/Example$C."<init>":()V 23: astore_2 24: aload_2 25: invokevirtual #74; //Method java/lang/Object.toString:()Ljava/lang/String; 28: pop 29: return
Since toString()
is declared by Object, all that line 25 tells us is that the type is an Object, which is obviously not very specific. If the class was compiled with debugging, you would be able to learn that local #2 was of type Base
, but even if you did have this information, you would not necessarily know that the object invoked on by invokevirtual is the value stored in local variable 2. The only way to determine that is to know the state of the stack frame immediately before the instruction executes.
The analysis framework provides this by modeling the effect of every instruction, until it can eventually infer the type information. This process does not use any debugging information, since there is no guarantee it is available. Instead, it extrapolates it by tracking all possible type states, as every branch is evaluated, until the type information is reduced to the most specific type state available.
The following code, which uses the framework, is able to tell us that the type invoked on line 25 is in fact Base
:
Analyzer a = new Analyzer(); CtClass clazz = ClassPool.getDefault().get("example.Example"); Frame[] frames = a.analyze(clazz.getDeclaredMethod("foo")); System.out.println(frames[25].peek()); // Prints "example.Example$Base"
There is also a nice little tool I added, called framedump, that dumps the entire state at every instruction in human readable format, and yes I know that's debatable :)
$ framedump example.Example private void foo(int); 0: iload_1 stack [] locals [example.Example, int, empty] ... snipped for brevity ... 24: aload_2 stack [] locals [example.Example, int, example.Example$Base] 25: invokevirtual #85 = Method java.lang.Object.toString(()Ljava/lang/String;) stack [example.Example$Base] locals [example.Example, int, example.Example$Base] 28: pop stack [java.lang.String] locals [example.Example, int, example.Example$Base] 29: return stack [] locals [example.Example, int, example.Example$Base]
Some of you are probably thinking:
That sounds nice and all, but why in the world would I ever need to use this?
It is definitely not something useful to everyone, however it is very useful for certain applications:
- Bytecode Enhancers
- Verifiers
- Optimizers
- Debugging/Profiling Tools
- Decompilers
To expand on the enhancer example, for security reasons, the JVM actually does its own data-flow analysis to verify that a class does not violate type rules before it can be ran. This poses an interesting challenge to any application that manipulates bytecode, since any change that affects the possible type-state can lead to a verify error and the JVM throwing out the class. Frameworks such as this can be used to prevent this problem, since they reveal the same (in the javassist analyzer case, actually more detailed) information available to the JVM's verifier.
If you want to play with this new feature, download the recently released 3.8.0 here.
The javadoc is here.
Note, I should also mention that the ASM project has had a similar framework for quite some time, however, it wasn't usable in my case since I needed the ability to handle reduction of multi-interface and array types. Also, I was already using javassist and switching just wasn't possible, mainly due to other features I rely on.
Enjoy!