Tuesday, February 03, 2009

How to really measure the memory usage of your objects

Measuring and analyzing the Memory consumption in Java and other OOP languages is not an easy task, because all the objects form a big highly connected graph. Only counting the flat, also called "shallow" size, of the objects, is not always helpful:


This figure shows a typical memory histogram. In general this table does not help very much to figure out which objects really cause the high memory usage. Typically String and char[] will show up in the list of the classes with the highest shallow size, just because they are frequently used.
There is probably a correlation between the high number of char[] instances and the high number of String instances(indicated by the yellow arrows), but we cannot really know from the information in the table. We also don't know whether the Order object could potentially be the root cause for all those Objects there.

So how can we find out, who is really responsible for the memory usage?

Let's take a step back and ask ourselves how Garbage Collection works in a language like Java (and also in other "modern" languages ;)).

Garbage Collection


The general idea is that the Garbage Collector will reclaim all objects that cannot be reached from a so called "GC root object".

GC root objects are
  • Local variables on the stack
  • Parameters of functions on the stack
  • JNI native references
  • All classes loaded by the bootstrap Loader
Now, let's come back to our original problem, we can ask us the interesting question:
"How much memory would be freed, if we would remove all those "Order" objects and afterwards a full GC would run?"
This would give us a measure for how much memory would be freed if those objects would not be there.
We call this measure the "retained size". The set of objects that would be freed is called the "retained set".
With the "retained set" we can figure out how many Strings would be freed if those "Order" objects would not be there. We can therefore solve the problem to find the objects "responsible" for the memory usage.

Definition "Retained size/set"


Retained set of X is the set of objects which would be collected by the GC
Retained size of X is the sum of shallow sizes of all objects in the retained set of X, i.e. memory kept alive by X.
Generally speaking, shallow size of an object is its "flat" size in the heap whereas the retained size of the same object is the amount of heap memory that will be freed when the object is garbage collected.
The retained set for a leading set of objects, such as all objects of a particular class or all objects of all classes loaded by a particular class loader or simply a bunch of arbitrary objects, is the set of objects that is released if all objects of that leading set become inaccessible. The retained set includes these objects as well as all other objects only accessible through these objects. The retained size is the total heap size of all objects contained in the retained set.

How to find the object with the biggest retained size?

The Eclipse Memory Analyzer was built from the beginning with support for computing the retained set/size of objects.
The retained size was pioneered, as far as I know, by the commercial
Yourkit profiler(great product!). When we started to develop the Eclipse Memory Analyzer (SAP Memory Analyzer at that point in time), Yourkit already had this very nice feature to find the single object with the biggest retained size. We still don't know how Yourkit does it, but for sure at that point in time they used a different approach than we do now. The reason is that it took Yourkit several minutes to get the first 10 (I think) biggest (in terms of retained size) objects in the heap. We knew that this was already still much faster than the naive method, that would have to simulate a Garbage Collector run each of the object in the heap. With millions of objects on the heap this would have taken years!

So I went on to research, what kind of Graph algorithms could help us to speed things up. After a lot of Google searches I finally found the dominator tree.

The dominator tree

The dominator tree is a datastructure that allows you to answer the question about the biggest objects in almost no time. After further research I found that it could be build in (almost) O(n) time, by a complicated algorithm invented by Lengauer and Tarjan. Since the plan by the architect of the project, was always to store precomputed information on disk, we could compute the dominator tree once and then store it on disk. Therefore opening a heap dump for the first time is a little bit slow (but still often faster than with any other tool), but opening it again is blazing fast (just a few seconds).
Still implementing the algorithm was a high risk, because at that point in time relatively new paper "Finding Dominators in Practice" was talking about graphs with at most a few thousand nodes, whereas we wanted to run it on a few millions of nodes on a 32bit machine.

Still the team would get it done (I was always only in "consultant" role), and it turned out that the dominator tree would not only allow us to speed up the process of finding the biggest objects, but would also allow us to really significantly simplify the process of finding the root causes for high memory usage.

More on this in the next post.
Stay tuned!



16 comments:

Domingos Neto said...

Excellent! I'm linking this post if you don't mind :)

Unknown said...

Thanks!
I don't mind of course :)

I also would not mind if people would vote on
dzone

;)

Domingos Neto said...

Hey Mark,

You could use the dzone voting button that I have in my own site. One thing I noticed is that people usually don't go back to dzone after reading the article, so if you add a dzone button it is more likely that people will vote.

Here is the link:
http://www.dzone.com/links/buttons.jsp

Unknown said...

Thanks for voting and thanks for the hint :)
I already thought about this, but didn't have the time to investigate how it would work with blogger.com

Ashish said...

Markus, I have been following your blog for a while and really happy to see these posts. I believe your entries help me understand Java and JVM better. Will be waiting for your next articles

Unknown said...

@Ashish thanks for the positive feedback!
Stay tuned!

Anonymous said...

Great article thanks Markus.
voted at dzone :)

Cemo Koc said...

Another great article...

Voted..

thx

Unknown said...

great article
thnx...

Deepak Srivastava said...

Really a great article, I have started using Eclipse Memory Analyzer and it really looks cool, fast and your post provides really good information and understanding about profiling and what should one consider during application profiling.

Thanks, I am about to read your another posts..

Unknown said...

Thanks for the compliments!

Haven't blogged here for a while, but I plan to continue it soon!
Markus

meitian said...

execellent post!

btw, i have tried the memory analyszer come with ibm support assistant but still found using eclipse MAT was much faster. any reason?

Unknown said...

Thanks.
The reason that MAT is as fast as it is, is that it is heavily optimized.
It uses for example special primitve collections similiar to http://pcj.sourceforge.net/.
Certain operations (retained set for example) are also multithreaded and take advantage of multiple cores.
Indexing lowers memory usage and speeds up certain operations by an order of magnitude.

Regards,
Markus

Patrick D said...

Hi Markus,

Thanks for the great blog posts! I'm a little bit late in discovering these.

One question: when calculating the dominator tree, are weak references treated specially? In my experience, it seems that they are treated as strong references. Is this true?

Unknown said...

Hi Patrick,
Thanks!

No weak references are not treated specially.
One could argue whether this makes sense or not, but this decision was taken when we decided that we want to see softreferences, because they caused us a lot of trouble in our applications.

We probably should offer an option to exclude at least weak references but when parsin a dump.

The issue is that we can't easily support both "modes" without slowing down parsing significantly.

Regards,
Markus

Pranav said...

Hi Markus,
Thanks for the valuable inputs!
I want to analyze the class where leak is happening, but my application is on HP Unix, so when it is running, I can not use some of the Windows tools as I need to take reading. Can you please suggest something which will help me.