Out of need, I recently contributed a data-flow analysis framework to javassist. The framework allows an application to determine, by inference, the type-state of the local variable table and stack frame at the start of every bytecode instruction. For those unfamiliar with the java bytecode format, a lot of information is lost once a java program is compiled, since it is not really needed when the program is executed, and leaving it out helps keep class files small.
To illustrate this loss, take a look at the following simple Java method:
public static class Base {}
public static class A extends Base{}
public static class B extends Base{}
public static class C extends B{}
private void foo(int x) {
Base b;
if (x > 4) {
b = new A();
} else {
b = new C();
}
b.toString();
}
While it is quite clear in the Java code that b
is of type Base
, this information is missing from the output of a compiler:
private void foo(int); Code: Stack=2, Locals=3, Args_size=2 0: iload_1 1: iconst_4 2: if_icmple 16 5: new #68; //class example/Example$A 8: dup 9: invokespecial #70; //Method example/Example$A."<init>":()V 12: astore_2 13: goto 24 16: new #71; //class example/Example$C 19: dup 20: invokespecial #73; //Method example/Example$C."<init>":()V 23: astore_2 24: aload_2 25: invokevirtual #74; //Method java/lang/Object.toString:()Ljava/lang/String; 28: pop 29: return
Since toString()
is declared by Object, all that line 25 tells us is that the type is an Object, which is obviously not very specific. If the class was compiled with debugging, you would be able to learn that local #2 was of type Base
, but even if you did have this information, you would not necessarily know that the object invoked on by invokevirtual is the value stored in local variable 2. The only way to determine that is to know the state of the stack frame immediately before the instruction executes.
The analysis framework provides this by modeling the effect of every instruction, until it can eventually infer the type information. This process does not use any debugging information, since there is no guarantee it is available. Instead, it extrapolates it by tracking all possible type states, as every branch is evaluated, until the type information is reduced to the most specific type state available.
The following code, which uses the framework, is able to tell us that the type invoked on line 25 is in fact Base
:
Analyzer a = new Analyzer();
CtClass clazz = ClassPool.getDefault().get("example.Example");
Frame[] frames = a.analyze(clazz.getDeclaredMethod("foo"));
System.out.println(frames[25].peek()); // Prints "example.Example$Base"
There is also a nice little tool I added, called framedump, that dumps the entire state at every instruction in human readable format, and yes I know that's debatable :)
$ framedump example.Example
private void foo(int);
0: iload_1
stack []
locals [example.Example, int, empty]
... snipped for brevity ...
24: aload_2
stack []
locals [example.Example, int, example.Example$Base]
25: invokevirtual #85 = Method java.lang.Object.toString(()Ljava/lang/String;)
stack [example.Example$Base]
locals [example.Example, int, example.Example$Base]
28: pop
stack [java.lang.String]
locals [example.Example, int, example.Example$Base]
29: return
stack []
locals [example.Example, int, example.Example$Base]
Some of you are probably thinking:
That sounds nice and all, but why in the world would I ever need to use this?
It is definitely not something useful to everyone, however it is very useful for certain applications:
- Bytecode Enhancers
- Verifiers
- Optimizers
- Debugging/Profiling Tools
- Decompilers
To expand on the enhancer example, for security reasons, the JVM actually does its own data-flow analysis to verify that a class does not violate type rules before it can be ran. This poses an interesting challenge to any application that manipulates bytecode, since any change that affects the possible type-state can lead to a verify error and the JVM throwing out the class. Frameworks such as this can be used to prevent this problem, since they reveal the same (in the javassist analyzer case, actually more detailed) information available to the JVM's verifier.
If you want to play with this new feature, download the recently released 3.8.0 here.
The javadoc is here.
Note, I should also mention that the ASM project has had a similar framework for quite some time, however, it wasn't usable in my case since I needed the ability to handle reduction of multi-interface and array types. Also, I was already using javassist and switching just wasn't possible, mainly due to other features I rely on.
Enjoy!
Nice article. What is the memory cost of such analysis, given that you track all possible type states?
Great stuff, Jason!
ASM has such API and more for ages. http://asm.objectweb.org/
Yeah, I mentioned that, see the last paragraph.
The memory usage is a directly proportional to the size of the method. The process is a variant to the one described in the vmspec. When two branches , the reduced/merge type set replaces the previous. So you have fairly linear growth that eventually caps and is slightly decreased when the type set hopefully becomes a single type. Also, in an number of cases instances are reused.
Jason:
Assuming I have:
class MyClass implements MyInterface1, MyInterface2{}...and...
class Client{ Object obj = someGenericLookup("MyClass"); MyInterface1 intf = (MyInterface1)obj; Class<?> clazz = intf.getClass(); }Is it possible either with your current implementation or with some enhancements to know that is of type MyInterface1 but not MyInterface2?
S, ALR
Of course, if this information is ditched by the compiler, any load-time additions we make would be irrelevant. :)
Yes, casts result in runtime checking via the checkcast instruction. So you can determine that clazz is at least a type that implements/extends MyInterface1.
Here is the relevant portion of the framedump output of your example:
8: checkcast #90 = Class example.Example$MyInterface1 stack [java.lang.Object] locals [example.Example, java.lang.Object, empty] 11: astore_2 stack [example.Example$MyInterface1] locals [example.Example, java.lang.Object, empty] 12: aload_2 stack [] locals [example.Example, java.lang.Object, example.Example$MyInterface1] 13: invokevirtual #92 = Method java.lang.Object.getClass(()Ljava/lang/Class;) stack [example.Example$MyInterface1] locals [example.Example, java.lang.Object, example.Example$MyInterface1]How can I find the reference to a method and the value passed as the parameter? For example: A method setName() exists in Class Person In Class Evaluate, the method is referred twice setName() and setName()
How do I get these values(Steve and Jan) using javaassist?
The data-flow analyzer in javassist is not intra-procedural, it's only ran against a single method, and only solves for type info, not values.
Not sure what you meant by . ASM provides several interpreters that could handle chosen type system and one could also implement custom interpreter that could handle custom type system or do some other advanced stuff.
By multiple-interface types, I am referring to types that can not be immediately resolved to a single common type when merged (not an uncommon case). A type may have one or more common interfaces in addition to a common super class. The only way to solve for a correct answer is to infer based off of the information available in other instructions, and eventually arrive at the best fitting solution.
A simple example is a branch using a Long and another using a Integer; the merged type could be either a Number or Comparable.
It could be possible to do this with a new ASM interpreter, although other enhancements to the analyzer would have likely been needed.
Not that I should have to justify this, but from my perspective, the cost of enhancing asm to resolve this issue, in addition to building some kind of asm to javassist bridge, and the negative aspects of having yet another dependency and the less than ideal code integration path was more than the cost of just adding a new framework to javassist that met my needs. Not to mention that other javassist users have asked for this in the past.
Version 3.8 has not yet appeared on http://repo1.maven.org/maven2/jboss/javassist/. You probably know that this is the maven repository, where the maven buildsystem reads dependencies from. I would like to use the new version, but I have to add your jar manually. The older versions are there - could you add the new version as well or do you know who would be the right person to ask?
The maven2 repo for javassist is at: http://repository.jboss.org/maven2/javassist/javassist/
Not sure who is maintaining the one on repo1.maven.org.