Monday, February 16, 2009

Memory leaks are easy to find

Last time I talked about the dominator tree and how it can be used to find the biggest objects in your heap easily.

So what exactly is the dominator tree?

Dominators

An object A dominates on an object B if all the paths to object B pass through object A.

Remember that the Garbage Collector removes all objects that are not referenced anymore. If A dominates B and A could be removed from memory, that means that there's no path anymore that leads to B. B is therefore unreachable and would be reclaimed by the Garbage Collector.
One could also say that A is the single object that is responsible for B still being there!

The Dominator Tree

Using the the "dominates" relationship we can create a dominator tree out of the the graph of objects in memory. At each node of this tree we store the amount of memory that would be freed (= retained size).
At the top of the tree we have a "virtual" root object, which we also use to represent objects that don't have "real" single dominator.
Here's an example of an object tree (on the left) and the corresponding dominator tree (on the right) :




  1. Note that A, B and C are dominated by a "virtual" root object.
  2. Note that the dominator relationship is transitive;C dominates E which dominates G therefore C also dominates G.

Because of the transitivity of "dominated", the retained size of a parent object within the dominator tree is always greater than the sum of it's child objects.

To get the biggest objects we just need to sort the second level of the dominator tree (the first level is the "virtual" root object) by retained size.

Now if you are looking to find a memory leak, and you have no a priori knowledge that could help you, the typical approach is to run a test that reproduces the leak, and then try to somehow figure out what is leaking.

Do we really need Object allocations tracing?
In my experience people often seem to believe that finding leaks requires recording object creations, also called "object allocations tracing "sometimes, because you want to know where in the code objects are always allocates but never released.
Tess Ferrandez, ASP.NET Escalation Engineer (Microsoft) has an example of how this method for finding leaks can be applied to .NET applications. Dear Microsoft I encourage you to look at the Eclipse Memory Analyzer ;)


All you need is the dominator tree

With the dominator tree finding these kind of leaks is much easier and you don't need the high overhead of allocation tracing, which is typically not acceptable in a production environment. Allocation tracing has it's uses, but in general IMHO it is overrated.

It's almost guaranteed that a significant memory leak will show up at the very top of the dominator tree!
The Eclipse Memory Analyzer not only supports the dominator tree but can even find memory leaks automatically, based on some heuristics.

The dominator tree can also be used to find high memory usage in certain areas, which is a much harder problem. More on this in the next post.





22 comments:

William Louth said...

"In my experience people often seem to believe that finding leaks requires recording object creations, also called "object allocations tracing "sometimes, because you want to know where in the code objects are always allocates but never released."

In my experience people always assume the OutOfMemory errors in production are the result of a leakage whereas a significant number are due to memory capacity issues with high workload concurrency and deep/prolonged call chains. For this you need to have already obtained object allocation counts and sizes per activity. Naturally this data should not be obtained in production just like complete memory heap dumps but during testing for the purpose of capacity planning.

William

Ashish said...

Another good article! Last time MAT didn't worked for me. Let me try it one my apps. Waiting for your next post ;-)

Pramatr said...

Interesting post, really enjoyed reading!!

kirk said...

Agreed, allocation traces is generally not that useful in the first steps of locating leaks. I have found a much more effective technique is to use generational counts. From there execution traces are much more useful in narrowing down the problem. This technique is so predictable that I fix price all of my memory leak engagements.

Regards,
Kirk

Markus Kohler said...

Hi Kirk,
Yes I agree. Generations are very helpful for findingleaks and are supported in Netbeans for example.

Generations are also helpful with heap dumps. Unfortunately I can't tell you more details yet ...

Regards,
Markus

Nick said...

Except you can only do that analysis once you have the memory leak. The trick is finding them before they hit you or your customers in the field. That's the tricky part.

Markus Kohler said...

Hi Nick,
Yes,finding bugs, that only show up under realistic load on the system, is always difficult.
Memory usage/leaks often fall into this category, also IMHO much more could be done to avoid "simple" mistakes, e.g. people should be doing more (junit performance testing).

Still what you can do is to have a load test, which runs for a while where you monitor whether there's a systematic increase in memory usage until the load test is finished.

If that is the case you might want to trigger (potentially automatically)a heap dump that can be analyzed by the Eclipse Memory Analyzer, which can produce a report of potential problems.

This is the short answer and a longer answer would probably need to be another blog post :)

William Louth said...

So Kirk what happens when it is a memory capacity problem or have you not encountered any of these in your engagements.

Allocations tracked at higher levels of software activity (recorded prior to deployment) are extremely important for framing any investigation and resulting observations.

What is the range of allocation for particular transactions and what is the degree of concurrency of such transactions in production? You need to look down and up and to be effective at the same time.

I should state I mainly deal with production issues so my experience is probably a lot different.

William Louth said...

kirk "first steps of locating leaks".

Do you naturally assume everything is a leak? Again a common mis-diagnosis.

William Louth said...

Here is a recently reported easy to find memory capacity issue.

http://www.codeinstructions.com/2009/01/memory-problem-with-java-io.html

"The process of fixing it usually involves profiling the memory in search for memory leaks, which can be very time consuming."

Then check the comments and resolution.

William Louth said...

I should state that I was referring to the relatively cheap mechanism is simply measuring object allocation counts and sizes during the execution interval of a method which includes possibly nested non-instrumented method calls.

I was not referring to the backtrac(k)ing of object allocation call sites as this is horrendously expensive even on a local developer workstation and just creates so much noise (strings, maps) within a profile model.

You typically only need to know what is the memory cost for particular entry points in an enterprise application which can of course include a lot of transient allocation that might even be GC prior to the finishing of the execution.

Melinda said...

what a strange kind of structure... where did you get it ? Did you see wikipedia ?

Melinda Robinson

knee anatomy

Josue said...

I really liked this information very similar to nasal congestion, I think it is good to learn about these things, I would like some day to receive any update of this information so interesting

Kimberly said...

a few years ago I heard about this information on a blog called orexis online, which taught us how to handle this sort of thing in different environments

alice said...

A few days ago I received an email like this one called osteoporosis exercise, which appeared very similar to this information, that's why I like so much, thanks for sharing the blog

Guillaume said...

Markus, do you have any advice as to how to find memory leaks in Android "native heap" (the "external" part of the GC statistics)? I'm finding nothing and I have a leak in there..

Markus Kohler said...

That is pretty difficult at the moment, because you won't see some Bitmaps for example.
If your problem has to do with bitmaps then you might try to run your application on Android 3.0 or 3.1. IIRC bitmaps are allocated on the heap since 3.0
Greetings Markus

Mat said...

Pretty informative. I would also suggest this article. It shows memory leak examples: http://stackoverflow.com/q/6470651/465179

Java OutOfMemoryError said...

Tomcat has long history of having memory leaks common scenarios are memory leaks caused by web-app clasloader when it failed to unregister JDBC drivers which eventually result in java.lang.OutOfMemoryError: Permgen Space in Tomcat.

Pierre-Hugues Charbonneau said...

Hi Markus,

Just found your article written back in 2009, good job there.

I have been using MAT since last few years now and I can assure you it has helped either resolved Java Heap memory leaks and/or better understand the production environment memory footprint.

In my experience tuning Java EE production environments, I have seen about 50/50 split regarding the source OOM errors. Half related to true application or Java EE container/API memory leak and the other half capacity or tuning related. A small portion also related to PermGen or other native memory problems.

I have written an article recently on Java Heap Dumps along with recommendations so please feel free to review and provide your comments.


Looking forward for new posts from you in 2012, including some potential new features of MAT.

Regards,
P-H
http://javaeesupportpatterns.blogspot.com

Shrikant said...



It will be more better if you post the details with the example and show the actual memory leaking (with the help of NetBean’s Profiler).

hope so in next post…..

Thanks.

Tom Watson said...

a memory leak is an object that the system unintentionally hangs on to, thereby making it impossible for the garbage collector to remove this object. The way that profilers find memory leaks is to trace references to a leaked object. plumbers claremont ca