Memory Analyzer 1.1. was just released!
Check the New and noteworthy page and download it from here.
Showing posts with label memory. Show all posts
Showing posts with label memory. Show all posts
Wednesday, June 22, 2011
Memory Analyzer 1.1
Labels:
memory,
memory analyzer,
performance
Gepostet von
Unknown
Thursday, November 11, 2010
Extending the Eclipse Memory Analyzer
I usally don't post links here but I always wanted to describe here how to extend the Eclipse Memory Analyzer. Since there's now a Slide Deck available that describes briefly how it can be done at
http://www.slideshare.net/AJohnson1/extending-eclipse-memory-analyzer it's not that urgent anymore to post a description here ;-)
I believe there's a huge potential in extending MAT for specific use cases, such as Android, Jruby, Scala etc.
http://www.slideshare.net/AJohnson1/extending-eclipse-memory-analyzer it's not that urgent anymore to post a description here ;-)
I believe there's a huge potential in extending MAT for specific use cases, such as Android, Jruby, Scala etc.
Labels:
android,
memory,
performance
Gepostet von
Unknown
Wednesday, February 24, 2010
Android memory usage analysis slides available
Yesterday, I gave a talk at the mobile monday android developer day in Düsseldorf.The slides about analysing the memory usage of android applications with the Eclipse Memory Analyzer are available now. Note it might be worth to download the pdf because it contains some notes.
The slides should be interesting for general java developers, not only for android developers.
Also, I want to thank everyone who showed up. It was great talking to all of you!
The slides should be interesting for general java developers, not only for android developers.
Also, I want to thank everyone who showed up. It was great talking to all of you!
Labels:
android,
memory,
performance
Gepostet von
Unknown
Thursday, July 09, 2009
Eclipse Memory Analyzer, 10 useful tips/articles
The Eclipse Memory Analyzer has been shipped with Eclipse 3.5 Galileo and I planned to start a new series here about memory usage antipatterns and how to analyze them with MAT.
Unfortunately I'm pretty busy these days and I will need more time for those posts.
But, due to popular demand I decided it would make sense to post some good links (ok, I admit some of them are from myself) about how to use the Eclipse Memory Analyzer.
Please forgive me if I missed some important ones, but I'm really hungry now and therefore just stopped after 10 tips ;)
Just read my memory related blogs: Oh that was number 11, but who cares? ;)
Unfortunately I'm pretty busy these days and I will need more time for those posts.
But, due to popular demand I decided it would make sense to post some good links (ok, I admit some of them are from myself) about how to use the Eclipse Memory Analyzer.
Please forgive me if I missed some important ones, but I'm really hungry now and therefore just stopped after 10 tips ;)
- How to really measure memory usage: The absolute fundamentals
- Check the online help. It has a pretty good introduction chapter.
- Memory leaks are easy to find: Learn how to analyze memory leaks
- Check also one click leak analysis if this is your problem :]
- Memory usage analysis: Learn the fundamental approach for finding the owner of a set of objects
- A typical issue in Goggles Android UI framework: Learn how to analyze memory usage on Googles android platform. Shows a typical issue that can be solved by using lazy initialization.
- Never forget to take a look at your Strings: Learn why String.intern() can be useful
- A tip for analysis Equinox based applications: Learn about a special feature for Eclipse based applications.
- Analysing perm size/classloader problems: Perm size problems can be nasty. Check why a Jruby core developer likes MAT ;)
- If you like webinar's here's an introduction presented by the developers.
Just read my memory related blogs: Oh that was number 11, but who cares? ;)
Labels:
memory
Gepostet von
Unknown
Monday, April 27, 2009
Analyzing the memory usage of your Android application
The new Android 1.5 Early Look SDK is out since a few weeks. The "Android 1.5 highlights" page does not mention one highlight, which IMHO will become very important for all developers on the Android platform because it allows you to find memory leaks easily and analyze the memory usage of your Android applications.
I'm talking about the hprof-conv tool that allows you to convert an Android/Dalvik heap dump into an .hprof heap dump. The tool does not preserve all the information, because Dalvik has some unique features such as cross-process data sharing, that are not available in a "standard" JVM. Still the .hprof file is a good starting point and can be read with the Eclipse Memory Analyzer, which will interpret it like a normal file from a Sun 32 bit JVM. In the future it should also be not that difficult to read the Dalvik heap dump format directly and provide more information for memory analysis, such as which objects are shared and also maybe how much memory is used by native Objects (haven't checked that).
How to get a heap dump
First we must of course get a heap dump. I do not yet own an Android phone, also I plan to get on this year. If you want to donate one, please let me know. Donating lot's of money would also help ;)
I therefore created the heap dump using the emulator.
I created a new avd for the 1.5 preview:
android create avd -n test -t 2
emulator -avd test
Then you can logon to the emulated phone and make sure that the /data/misc directory is writable :
adb shell
chmod 777 /data/misc
exit
Heap dumps are actually created by sending the process a SIGUSR1 (-10) signal.
adb shell ps
check for the pid of your application
adb shell kill -10 pid
You can check with
adb shell logcat
whether it works. You should see something like:
I/dalvikvm(
now you can copy the heap dump from the (emulated) phone to the Desktop machine:
adb pull /data/misc/heap-dump-tm
Now the file you will get does not conform to the "standard" Sun .hprof format but is written in Dalvik's own format and you need to convert it:
hprof-conv heap-dump-tm
Now you can use the Eclipse Memory analyzer to analyze the memory usage of your application.
Typical optimization opportunities
I described some typical issues already here in this blog for example a problem with Finalizers in Netbeans,
or finding "String duplicates" in Eclipse (my favourite memory analysis trick).
You might also take the time and check all my memory related posts.
Now I would like to present you a new typical memory usage issue that I just found when playing around with the new Android cupcake 1.5 prerelease.
YouShouldHaveDoneLazyInitialization
In the histogram view I filtered the android classes:
The high number of Rect (Rectangle) objects looked suspicious to me. Also they retain not that much memory, I thought it would be interesting why such a high number of Rect instances was alive.
When checking the Rect objects I immediately saw that most of them seemed to be empty e.g. bottom=0 and left=0 and right=0 and top=0.
It seemed to me that this was the time to again use the most advanced feature in MAT. I'm talking about the OQL view, which allows you to execute SQL like queries against the objects in your heap dump.
I very rarely used this feature in the past because the standard queries are good enough almost all times and because writing your own command in Java is not that difficult.
Still here it would help me to find how many of the Rect instances where empty.
I entered
select * from android.graphics.Rect where bottom=0 and left=0 and right=0 and top=0
Which showed me that out of 1320 Rect instances 941 where "empty", which means that over 70% of the instances where "useless".
I used "immediate dominators" on the first 20 of those 941 instances, which showed me that those empty Rect instances were retained by several UI related
classes which are subclasses of android.graphics.drawable.Drawable.
I checked the source and quickly found that Drawable always instantiates a Rect using the default constructor:
public abstract class Drawable { private int[] mStateSet = StateSet.WILD_CARD; private int mLevel = 0; private int mChangingConfigurations = 0; private Rect mBounds = new Rect();Also this issue might not increase the memory usage of gmail that much it's still a fundamental/systematic problem, which IMHO should be fixed.
Even if we would tolerate the memory overhead, we are still left with the problem that the number of objects the garbage collector has to trace during Full GC is higher than needed, which at least potentially can lead to sluggish UI.
It's also a kind of fundamental problem because all subclasses of Drawable inherit the problem.
So instead of always using the default constructor it would most likely better to use lazy initalization and only initialize the field when it's first needed.
A copy on write strategy could also be used by sharing an "empty" Rect instance in the beginning, which would be replaced by a new one as soon as the field is written ( haven't checked the code whether this can still be done).
Disclaimer:
This is not a sign that Android is a poor platform. This is a kind of a problem that I've seen in the past more than once and I'm sure this kind of antipattern can be found in a lot of software out there.
UPDATE:
The issue has been fixed a few hours (or minutes?) after my post!
Check
https://review.source.android.com/Gerrit#patch,sidebyside,9684,2,graphics/java/android/graphics/drawable/Drawable.java
To see how simple the change was!
Just take the Eclipse Memory Analyzer and have a look at your application(s).
May the force be with you! ;)
Update 2:
In Android 2.3 (Gingerbread) a new command is available to trigger the heap dump.
The kill command is not supported anymore.
Monday, March 23, 2009
Leaks are easy to find, but memory usage analysis is bit more difficult
Leaks again
Last time I talked about how easy it is to find memory leaks in Java using the dominator tree feature of the Eclipse Memory Analyzer.
If you haven't read this post, I recommend you to do so, because this post will assume that you know the meaning of the terms "retained size" and "dominator tree".
Why does this work so well? The reason is that memory leaks in Java are not really "classical" leaks in the strict sense. Let's check what Wikipedia says about memory leaks:
So because those leaks in Java are "only" unwanted (unbound) increase of memory usage, the typical reason for them is that people forget to remove an object out of some collection/array or a recursive data structure,such as a tree. This might sound stupid and you (and me) would of course never make such a simple mistake ;)
But look at the following example:
"thing" will not be removed if "doSomething" throws an IOException (or any other exception). OMG Joel Spolsky was right when he said:
The correct way would be:
try
{
doSomething(thing); // does IO
}
catch (IOException e)
{
// should not happen
}
finally
{
collection.remove(thing);
}
So I talked enough of leaks. I promise you if you regulary analyze heap dumps taken at the end of a performance test run, after while of fixing, you will not see a lot of leaks anymore If you still think that you need to know more a about leaks. I recommend you to check this excellent tutorial.
High memory usage
You might still see high memory usage, and your users might hate that as much as leaks, because performance degradation can be similiar.
OK, why is high memory more difficult to analyze?
You might use the dominator tree to find some big objects and you might also use it to figure out some cause of high memory usage. Because it's a tree it's easy to see where the memory is used:

You just have to look at the pathes down the tree to find out where the most memory is used/retained by single objects.
But in general the dominator tree view alone (without using some advanced functions, that I will skip for now) will not help you to find the reason why for example all those Strings are there:

Fortunately there is the "immediate dominators" query in the Eclipse Memory Analyzer that is based on the dominator tree that can help here and that also is the used internally by most of the advanced queries. The "immediate dominators" query is one of the key innovations in the the Eclipse Memory Analyzer. Even the commercial Yourkit profiler does not seem to have it yet, also they now also have a dominator tree functionality.
Immediate Dominators
So what is a "immediate dominator"? Let's have look at a simple example where the "business object" of class "Order" references a LinkedList of Strings:

If we look at String2 first we can find the LinkedList$Entry 2 is the "closest" object hat dominates it. If we could remove LinkedList$Entry 2, the Object String 2 would also be reclaimed by the garbage collector. We say "LinkedList$Entry 2" is the immediate dominator of "Object String 2". Note that there's always one
Let's have a look at the immediate dominators up to the Order object:

Note that LinkedList$Entry 1 is not an immediate dominator for LinkedList$Entry 2, because after removing it there would still be links from LinkedList$Entry 0 to LinkedList$Entry 2. We can do the same for the String 0 and String 1 and we will get the dominator tree:

No if we ask ourself the question "why are all those Strings still there", we see that if we filter all JDK classes out of the dominator tree it's easy:

The immediate dominators query in MAT basically lets you walk up the dominator tree and shows you the dominators aggregated by the class:

This is really a screenshot from an existing heap dump that I took some time ago from Eclipse. You can see for example the famous Dictionary of the spell checker plugin retaining 74393 Strings.
So now how can I find out where memory usage could be reduced?
With Strings it's pretty easy, you use the group_by_value in MAT. For the example above I applied it to the Strings dominated by ResolvedBinaryField in the first line:

Yes,there are really 6969 duplicates of "Ljava.lang.String;" retained only by instances of this class! Disclaimer: And no dear Netbeans "fanboys", Eclipse is not really worse than your beloved IDE in this area ;)
Strings are immutable and I wonder what would happen if people would really use more immutable data structures.
But not only Strings are interesting when you look at minimizing memory usage. Strings are just convenient because they are (usually) human readable. You can still often use Strings to find Objects which are equal but not identical, because if equal but not identical objects are created usually those objects also reference Strings that are equal but not identical.
The main question that you always have to ask yourself when trying to minimize memory usage is :
Do I need these equal but not identical objects?
In a single threaded environment the answer is usually that you don't those copies of objects.
In a highly concurrent environment, reducing the copies might introduce contention, because you have to share objects and you will need to check whether you already have this object. Strings again are relatively safe to optimize in this regard, because they are immutable, so no synchronization is needed to access them.
Having a query in MAT for the "algorithm" I described here for finding duplicated Strings, would be very helpful (there is a similiar but simpler "duplicated Strings" query already built in) .
I have done exactly that quite some time ago, but the query was not yet "production ready". There's some hope that it will appear in the standard MAT soon, stay tuned!
Last time I talked about how easy it is to find memory leaks in Java using the dominator tree feature of the Eclipse Memory Analyzer.
If you haven't read this post, I recommend you to do so, because this post will assume that you know the meaning of the terms "retained size" and "dominator tree".
Why does this work so well? The reason is that memory leaks in Java are not really "classical" leaks in the strict sense. Let's check what Wikipedia says about memory leaks:
"many people refer to any unwanted increase in memory usage as a memory leak, even if this is not strictly accurate"The later cannot happen in languages such as Java that have built-in automatic garbage collection, also Ruby does not seem to be bug free in this area.
and
"Typically, a memory leak occurs because dynamically allocated memory has become unreachable."
So because those leaks in Java are "only" unwanted (unbound) increase of memory usage, the typical reason for them is that people forget to remove an object out of some collection/array or a recursive data structure,such as a tree. This might sound stupid and you (and me) would of course never make such a simple mistake ;)
But look at the following example:
try
{
doSomething(thing); // does IO
collection.remove(thing);
}
catch (IOException e)
{
// should not happen
}
"thing" will not be removed if "doSomething" throws an IOException (or any other exception). OMG Joel Spolsky was right when he said:
"I consider exceptions to be no better than "goto's""
The correct way would be:
try
{
doSomething(thing); // does IO
}
catch (IOException e)
{
// should not happen
}
finally
{
collection.remove(thing);
}
So I talked enough of leaks. I promise you if you regulary analyze heap dumps taken at the end of a performance test run, after while of fixing, you will not see a lot of leaks anymore If you still think that you need to know more a about leaks. I recommend you to check this excellent tutorial.
High memory usage
You might still see high memory usage, and your users might hate that as much as leaks, because performance degradation can be similiar.
OK, why is high memory more difficult to analyze?
You might use the dominator tree to find some big objects and you might also use it to figure out some cause of high memory usage. Because it's a tree it's easy to see where the memory is used:
You just have to look at the pathes down the tree to find out where the most memory is used/retained by single objects.
But in general the dominator tree view alone (without using some advanced functions, that I will skip for now) will not help you to find the reason why for example all those Strings are there:
Fortunately there is the "immediate dominators" query in the Eclipse Memory Analyzer that is based on the dominator tree that can help here and that also is the used internally by most of the advanced queries. The "immediate dominators" query is one of the key innovations in the the Eclipse Memory Analyzer. Even the commercial Yourkit profiler does not seem to have it yet, also they now also have a dominator tree functionality.
Immediate Dominators
So what is a "immediate dominator"? Let's have look at a simple example where the "business object" of class "Order" references a LinkedList of Strings:

If we look at String2 first we can find the LinkedList$Entry 2 is the "closest" object hat dominates it. If we could remove LinkedList$Entry 2, the Object String 2 would also be reclaimed by the garbage collector. We say "LinkedList$Entry 2" is the immediate dominator of "Object String 2". Note that there's always one
Let's have a look at the immediate dominators up to the Order object:

Note that LinkedList$Entry 1 is not an immediate dominator for LinkedList$Entry 2, because after removing it there would still be links from LinkedList$Entry 0 to LinkedList$Entry 2. We can do the same for the String 0 and String 1 and we will get the dominator tree:

No if we ask ourself the question "why are all those Strings still there", we see that if we filter all JDK classes out of the dominator tree it's easy:

The immediate dominators query in MAT basically lets you walk up the dominator tree and shows you the dominators aggregated by the class:
This is really a screenshot from an existing heap dump that I took some time ago from Eclipse. You can see for example the famous Dictionary of the spell checker plugin retaining 74393 Strings.
So now how can I find out where memory usage could be reduced?
With Strings it's pretty easy, you use the group_by_value in MAT. For the example above I applied it to the Strings dominated by ResolvedBinaryField in the first line:
Yes,there are really 6969 duplicates of "Ljava.lang.String;" retained only by instances of this class! Disclaimer: And no dear Netbeans "fanboys", Eclipse is not really worse than your beloved IDE in this area ;)
Strings are immutable and I wonder what would happen if people would really use more immutable data structures.
But not only Strings are interesting when you look at minimizing memory usage. Strings are just convenient because they are (usually) human readable. You can still often use Strings to find Objects which are equal but not identical, because if equal but not identical objects are created usually those objects also reference Strings that are equal but not identical.
The main question that you always have to ask yourself when trying to minimize memory usage is :
Do I need these equal but not identical objects?
In a single threaded environment the answer is usually that you don't those copies of objects.
In a highly concurrent environment, reducing the copies might introduce contention, because you have to share objects and you will need to check whether you already have this object. Strings again are relatively safe to optimize in this regard, because they are immutable, so no synchronization is needed to access them.
Having a query in MAT for the "algorithm" I described here for finding duplicated Strings, would be very helpful (there is a similiar but simpler "duplicated Strings" query already built in) .
I have done exactly that quite some time ago, but the query was not yet "production ready". There's some hope that it will appear in the standard MAT soon, stay tuned!
Labels:
java,
memory,
memory analyzer,
performance
Gepostet von
Unknown
Monday, March 09, 2009
JavaFX's memory overhead, some (high) numbers
Lately I took a look at the TweetBox Twitter client which is build with JavaFX(© 2008-2009 Sun Microsystems). I thought it would be fun to check whether a Java based Twitter client would have less memory usage problems than those popular Adobe Air(© Adobe) based clients, such as Tweetdeck. If you search for memory leak in Twitter you will get a lot of references about Tweetdeck, which might only be caused by it's a popularity. Because JavaFX is JVM based it actually has an advantage in the area of memory usage analysis over Adobe Air.
TweetBox currently does not perform very well, but the author is working on improving it and I also tried to help him out a little bit.
When looking for memory usage using my favorite tool I was quite surprised to see the high overhead of variables in JavaFX.
Let's have a look at some numbers for the shallow size of variables (32 bit CPU):
Shallow Size
Boolean: 24 bytes (Yes, this is for one bit of information)
Integer: 24 bytes
Objects: 24 bytes
Floats: 24 bytes
Sequence: 32 bytes(never smaller than 48 bytes)
JavaFX objects seems to have a relatively large number of fields, so this overhead might be really become significant, also as always, it depends on your application. For comparsion a boolean in Java takes 1 byte and Objects have a general overhead of 8 bytes (SUN 32 bit JVM). [updated] Note that even a Boolean object in Java would in practice only require 4 Bytes for the Reference, because only 2 of them (for true and false) are needed.
TweetBox currently does not perform very well, but the author is working on improving it and I also tried to help him out a little bit.
When looking for memory usage using my favorite tool I was quite surprised to see the high overhead of variables in JavaFX.
Let's have a look at some numbers for the shallow size of variables (32 bit CPU):
Shallow Size
Boolean: 24 bytes (Yes, this is for one bit of information)
Integer: 24 bytes
Objects: 24 bytes
Floats: 24 bytes
Sequence: 32 bytes(never smaller than 48 bytes)
JavaFX objects seems to have a relatively large number of fields, so this overhead might be really become significant, also as always, it depends on your application. For comparsion a boolean in Java takes 1 byte and Objects have a general overhead of 8 bytes (SUN 32 bit JVM). [updated] Note that even a Boolean object in Java would in practice only require 4 Bytes for the Reference, because only 2 of them (for true and false) are needed.
An idea could be to have a native BitSet in JavaFX, which would only need one bit for each boolean value (plus a fixed overhead).
For details about the memory usage of Java objects check my post here.
But when looking at the retained size it even gets worse. Due to JavaFX's fancy binding mechanism those objects often reference other objects which causes the retained size to "explode".
Average "Retained size"
Boolean: 111 bytes
Integer: 53 bytes
Objects: 76 bytes
Floats: 53 bytes
Sequence: 84 bytes
111 bytes average (!) for a Boolean is absurdly high and I'm not sure yet whether that's because Tweetbox using it in the wrong way or because of a bug. It is caused by long chain's of com.sun.javafx.runtime.location.DynamicDependentLocation.
I have no clue what that is and even Google doesn't seem to know.
I hope JavaFX's memory usage will improve over time. Right know you have to be very careful to not run into memory usage problems.
But when looking at the retained size it even gets worse. Due to JavaFX's fancy binding mechanism those objects often reference other objects which causes the retained size to "explode".
Average "Retained size"
Boolean: 111 bytes
Integer: 53 bytes
Objects: 76 bytes
Floats: 53 bytes
Sequence: 84 bytes
111 bytes average (!) for a Boolean is absurdly high and I'm not sure yet whether that's because Tweetbox using it in the wrong way or because of a bug. It is caused by long chain's of com.sun.javafx.runtime.location.DynamicDependentLocation.
I have no clue what that is and even Google doesn't seem to know.
I hope JavaFX's memory usage will improve over time. Right know you have to be very careful to not run into memory usage problems.
Note that other dynamic languages on the Java stack also currently have a higher memory overhead than Java, because most of them can't directly map to the Java object model.
I assume that the footprint of JavaFX Mobile is similiar.
IMHO Googles decision to go for plain Java on it's Android platform was a smart decision
[update]: Michael Heinrichs just blogged about the difference between "def" and "var" in JavaFX Mobile. It seems to me that "def" could help to optimize memory usage.
Monday, February 16, 2009
Memory leaks are easy to find
Last time I talked about the dominator tree and how it can be used to find the biggest objects in your heap easily.
So what exactly is the dominator tree?
Dominators
An object A dominates on an object B if all the paths to object B pass through object A.
Remember that the Garbage Collector removes all objects that are not referenced anymore. If A dominates B and A could be removed from memory, that means that there's no path anymore that leads to B. B is therefore unreachable and would be reclaimed by the Garbage Collector.
One could also say that A is the single object that is responsible for B still being there!
The Dominator Tree
Using the the "dominates" relationship we can create a dominator tree out of the the graph of objects in memory. At each node of this tree we store the amount of memory that would be freed (= retained size).
At the top of the tree we have a "virtual" root object, which we also use to represent objects that don't have "real" single dominator.
Here's an example of an object tree (on the left) and the corresponding dominator tree (on the right) :
Because of the transitivity of "dominated", the retained size of a parent object within the dominator tree is always greater than the sum of it's child objects.
To get the biggest objects we just need to sort the second level of the dominator tree (the first level is the "virtual" root object) by retained size.
Now if you are looking to find a memory leak, and you have no a priori knowledge that could help you, the typical approach is to run a test that reproduces the leak, and then try to somehow figure out what is leaking.
Do we really need Object allocations tracing?
In my experience people often seem to believe that finding leaks requires recording object creations, also called "object allocations tracing "sometimes, because you want to know where in the code objects are always allocates but never released.
Tess Ferrandez, ASP.NET Escalation Engineer (Microsoft) has an example of how this method for finding leaks can be applied to .NET applications. Dear Microsoft I encourage you to look at the Eclipse Memory Analyzer ;)
All you need is the dominator tree
With the dominator tree finding these kind of leaks is much easier and you don't need the high overhead of allocation tracing, which is typically not acceptable in a production environment. Allocation tracing has it's uses, but in general IMHO it is overrated.
It's almost guaranteed that a significant memory leak will show up at the very top of the dominator tree!
The Eclipse Memory Analyzer not only supports the dominator tree but can even find memory leaks automatically, based on some heuristics.
The dominator tree can also be used to find high memory usage in certain areas, which is a much harder problem. More on this in the next post.
So what exactly is the dominator tree?
Dominators
An object A dominates on an object B if all the paths to object B pass through object A.
Remember that the Garbage Collector removes all objects that are not referenced anymore. If A dominates B and A could be removed from memory, that means that there's no path anymore that leads to B. B is therefore unreachable and would be reclaimed by the Garbage Collector.
One could also say that A is the single object that is responsible for B still being there!
The Dominator Tree
Using the the "dominates" relationship we can create a dominator tree out of the the graph of objects in memory. At each node of this tree we store the amount of memory that would be freed (= retained size).
At the top of the tree we have a "virtual" root object, which we also use to represent objects that don't have "real" single dominator.
Here's an example of an object tree (on the left) and the corresponding dominator tree (on the right) :
- Note that A, B and C are dominated by a "virtual" root object.
- Note that the dominator relationship is transitive;C dominates E which dominates G therefore C also dominates G.
Because of the transitivity of "dominated", the retained size of a parent object within the dominator tree is always greater than the sum of it's child objects.
To get the biggest objects we just need to sort the second level of the dominator tree (the first level is the "virtual" root object) by retained size.
Now if you are looking to find a memory leak, and you have no a priori knowledge that could help you, the typical approach is to run a test that reproduces the leak, and then try to somehow figure out what is leaking.
Do we really need Object allocations tracing?
In my experience people often seem to believe that finding leaks requires recording object creations, also called "object allocations tracing "sometimes, because you want to know where in the code objects are always allocates but never released.
Tess Ferrandez, ASP.NET Escalation Engineer (Microsoft) has an example of how this method for finding leaks can be applied to .NET applications. Dear Microsoft I encourage you to look at the Eclipse Memory Analyzer ;)
All you need is the dominator tree
With the dominator tree finding these kind of leaks is much easier and you don't need the high overhead of allocation tracing, which is typically not acceptable in a production environment. Allocation tracing has it's uses, but in general IMHO it is overrated.
It's almost guaranteed that a significant memory leak will show up at the very top of the dominator tree!
The Eclipse Memory Analyzer not only supports the dominator tree but can even find memory leaks automatically, based on some heuristics.
The dominator tree can also be used to find high memory usage in certain areas, which is a much harder problem. More on this in the next post.
Labels:
eclipse,
java,
memory,
performance
Gepostet von
Unknown
Tuesday, February 03, 2009
How to really measure the memory usage of your objects
Measuring and analyzing the Memory consumption in Java and other OOP languages is not an easy task, because all the objects form a big highly connected graph. Only counting the flat, also called "shallow" size, of the objects, is not always helpful:
This figure shows a typical memory histogram. In general this table does not help very much to figure out which objects really cause the high memory usage. Typically String and char[] will show up in the list of the classes with the highest shallow size, just because they are frequently used.
There is probably a correlation between the high number of char[] instances and the high number of String instances(indicated by the yellow arrows), but we cannot really know from the information in the table. We also don't know whether the Order object could potentially be the root cause for all those Objects there.
So how can we find out, who is really responsible for the memory usage?
Let's take a step back and ask ourselves how Garbage Collection works in a language like Java (and also in other "modern" languages ;)).
Garbage Collection
The general idea is that the Garbage Collector will reclaim all objects that cannot be reached from a so called "GC root object".
GC root objects are
"How much memory would be freed, if we would remove all those "Order" objects and afterwards a full GC would run?"
This would give us a measure for how much memory would be freed if those objects would not be there.
We call this measure the "retained size". The set of objects that would be freed is called the "retained set".
With the "retained set" we can figure out how many Strings would be freed if those "Order" objects would not be there. We can therefore solve the problem to find the objects "responsible" for the memory usage.
Definition "Retained size/set"
Retained set of X is the set of objects which would be collected by the GC
Retained size of X is the sum of shallow sizes of all objects in the retained set of X, i.e. memory kept alive by X.
Generally speaking, shallow size of an object is its "flat" size in the heap whereas the retained size of the same object is the amount of heap memory that will be freed when the object is garbage collected.
The retained set for a leading set of objects, such as all objects of a particular class or all objects of all classes loaded by a particular class loader or simply a bunch of arbitrary objects, is the set of objects that is released if all objects of that leading set become inaccessible. The retained set includes these objects as well as all other objects only accessible through these objects. The retained size is the total heap size of all objects contained in the retained set.
How to find the object with the biggest retained size?
The Eclipse Memory Analyzer was built from the beginning with support for computing the retained set/size of objects.
The retained size was pioneered, as far as I know, by the commercial
Yourkit profiler(great product!). When we started to develop the Eclipse Memory Analyzer (SAP Memory Analyzer at that point in time), Yourkit already had this very nice feature to find the single object with the biggest retained size. We still don't know how Yourkit does it, but for sure at that point in time they used a different approach than we do now. The reason is that it took Yourkit several minutes to get the first 10 (I think) biggest (in terms of retained size) objects in the heap. We knew that this was already still much faster than the naive method, that would have to simulate a Garbage Collector run each of the object in the heap. With millions of objects on the heap this would have taken years!
So I went on to research, what kind of Graph algorithms could help us to speed things up. After a lot of Google searches I finally found the dominator tree.
The dominator tree
The dominator tree is a datastructure that allows you to answer the question about the biggest objects in almost no time. After further research I found that it could be build in (almost) O(n) time, by a complicated algorithm invented by Lengauer and Tarjan. Since the plan by the architect of the project, was always to store precomputed information on disk, we could compute the dominator tree once and then store it on disk. Therefore opening a heap dump for the first time is a little bit slow (but still often faster than with any other tool), but opening it again is blazing fast (just a few seconds).
Still implementing the algorithm was a high risk, because at that point in time relatively new paper "Finding Dominators in Practice" was talking about graphs with at most a few thousand nodes, whereas we wanted to run it on a few millions of nodes on a 32bit machine.
Still the team would get it done (I was always only in "consultant" role), and it turned out that the dominator tree would not only allow us to speed up the process of finding the biggest objects, but would also allow us to really significantly simplify the process of finding the root causes for high memory usage.
More on this in the next post.
Stay tuned!
This figure shows a typical memory histogram. In general this table does not help very much to figure out which objects really cause the high memory usage. Typically String and char[] will show up in the list of the classes with the highest shallow size, just because they are frequently used.
There is probably a correlation between the high number of char[] instances and the high number of String instances(indicated by the yellow arrows), but we cannot really know from the information in the table. We also don't know whether the Order object could potentially be the root cause for all those Objects there.
So how can we find out, who is really responsible for the memory usage?
Let's take a step back and ask ourselves how Garbage Collection works in a language like Java (and also in other "modern" languages ;)).
Garbage Collection
The general idea is that the Garbage Collector will reclaim all objects that cannot be reached from a so called "GC root object".
GC root objects are
- Local variables on the stack
- Parameters of functions on the stack
- JNI native references
- All classes loaded by the bootstrap Loader
"How much memory would be freed, if we would remove all those "Order" objects and afterwards a full GC would run?"
This would give us a measure for how much memory would be freed if those objects would not be there.
We call this measure the "retained size". The set of objects that would be freed is called the "retained set".
With the "retained set" we can figure out how many Strings would be freed if those "Order" objects would not be there. We can therefore solve the problem to find the objects "responsible" for the memory usage.
Definition "Retained size/set"
Retained set of X is the set of objects which would be collected by the GC
Retained size of X is the sum of shallow sizes of all objects in the retained set of X, i.e. memory kept alive by X.
Generally speaking, shallow size of an object is its "flat" size in the heap whereas the retained size of the same object is the amount of heap memory that will be freed when the object is garbage collected.
The retained set for a leading set of objects, such as all objects of a particular class or all objects of all classes loaded by a particular class loader or simply a bunch of arbitrary objects, is the set of objects that is released if all objects of that leading set become inaccessible. The retained set includes these objects as well as all other objects only accessible through these objects. The retained size is the total heap size of all objects contained in the retained set.
How to find the object with the biggest retained size?
The Eclipse Memory Analyzer was built from the beginning with support for computing the retained set/size of objects.
The retained size was pioneered, as far as I know, by the commercial
Yourkit profiler(great product!). When we started to develop the Eclipse Memory Analyzer (SAP Memory Analyzer at that point in time), Yourkit already had this very nice feature to find the single object with the biggest retained size. We still don't know how Yourkit does it, but for sure at that point in time they used a different approach than we do now. The reason is that it took Yourkit several minutes to get the first 10 (I think) biggest (in terms of retained size) objects in the heap. We knew that this was already still much faster than the naive method, that would have to simulate a Garbage Collector run each of the object in the heap. With millions of objects on the heap this would have taken years!
So I went on to research, what kind of Graph algorithms could help us to speed things up. After a lot of Google searches I finally found the dominator tree.
The dominator tree
The dominator tree is a datastructure that allows you to answer the question about the biggest objects in almost no time. After further research I found that it could be build in (almost) O(n) time, by a complicated algorithm invented by Lengauer and Tarjan. Since the plan by the architect of the project, was always to store precomputed information on disk, we could compute the dominator tree once and then store it on disk. Therefore opening a heap dump for the first time is a little bit slow (but still often faster than with any other tool), but opening it again is blazing fast (just a few seconds).
Still implementing the algorithm was a high risk, because at that point in time relatively new paper "Finding Dominators in Practice" was talking about graphs with at most a few thousand nodes, whereas we wanted to run it on a few millions of nodes on a 32bit machine.
Still the team would get it done (I was always only in "consultant" role), and it turned out that the dominator tree would not only allow us to speed up the process of finding the biggest objects, but would also allow us to really significantly simplify the process of finding the root causes for high memory usage.
More on this in the next post.
Stay tuned!
Labels:
memory,
memory analyzer
Gepostet von
Unknown
Monday, January 26, 2009
First Android/Dalvik heap dump loaded into the Memory Analyzer

Here you go.
I thought that Linux would be a better choice then Windows :)
Ok, to be honest the converter tool from the Dalvik format to hprof did not work on Windoof.
This is showing an old version of the SAP Memory Analyzer, because I'm lazy and no newer version was available on the Linux machine.
Thanks to Romain Guy for his assistance.
More on this tomorrow.
[update:] Could not resist and looked for duplicated Strings:

There's quite some potential for optimizations ...
Monday, December 15, 2008
How much memory is used by my Java object?
There's currently some buzz around the size of certain Java objects (Update: more on this below). Here is some very good description of what you have to take into account to compute the (shallow) size of an java object.
I repost here the rules from my old blog at SDN, because I think they are a more compact description of the memory usage for java objects.
The general rules for computing the size of an object on the SUN/SAP VM are :
32 bit
64 bit
Note that an object might have unused space due to alignment at every inheritance level (e.g. imagine a class A with just a byte field and class B has A as it's superclass and declares a byte field itself -> 14 bytes 'wasted on 64 bit system).
In practice 64 bit needs 30 to 50% more memory due to references being twice as large.
[UPDATE]
How much memory does a boolean consume?
Coming back to the question of how much a boolean consumes, yes it does consume at least one byte, but due to alignment rules it may consume much more. IMHO it is more interesting to know that a boolean[] will consume one byte per entry and not one bit,plus some overhead due to alignment and for the size field of the array. There are graph algorithms where large fields of bits are useful, and you need to be aware that, if you use a boolean[] you need almost exactly 8 times more memory than really needed (1 byte versus 1 bit).
Alternatives to boolean[]
To store large sets of booleans more efficiently a standard BitSet can be used. This will pack the bits into a compact representation that only uses one bit per entry (plus some fixed overhead)
For the Eclipse Memory Analyzer we had the need to use large fields of booleans (Millions of entries) and we know in advance how many entries we need.
We therefore implemented the class BitField that is faster than BitSet.
Still we only use this class on single core machines, because the disadvantage of it is that it is not thread save. On multicore machines we use boolean[] and I will explain in a later post how that works, because it's pretty tricky.
Note that these rules tell you only the flat (=shallow) size of an object, which is in practice pretty useless.
I will explain in my next post, how you can better measure how much memory your Java objects consume.
I repost here the rules from my old blog at SDN, because I think they are a more compact description of the memory usage for java objects.
The general rules for computing the size of an object on the SUN/SAP VM are :
32 bit
Arrays of boolean, byte, char, short, int: 2 * 4 (Object header) + 4 (length-field) + sizeof(primitiveType) * length -> align result up to a multiple of 8
Arrays of objects: 2 * 4 (Object header) + 4 (length-field) + 4 * length -> align result up to a multiple of 8
Arrays of longs and doubles: 2 * 4 (Object header) + 4 (length-field) + 4 (dead space due to alignment restrictions) + 8 * length
java.lang.Object: 2 * 4 (Object header)
other objects: sizeofSuperClass + 8 * nrOfLongAndDoubleFields + 4 * nrOfIntFloatAndObjectFields + 2 * nrOfShortAndCharFields + 1 * nrOfByteAndBooleanFields -> align result up to a multiple of 8
64 bit
Arrays of boolean, byte, char, short, int: 2 * 8 (Object header) + 4 (length-field) + sizeof(primitiveType) * length -> align result up to a multiple of 8
Arrays of objects: 2 * 8 (Object header) + 4 (length-field) + 4 (dead space due to alignment restrictions) + 8 * length
Arrays of longs and doubles: 2 * 8 (Object header) + 4 (length-field) + 4 (dead space due to alignment restrictions) + 8 * length
java.lang.Object: 2 * 8 (Object header)
other objects: sizeofSuperClass + 8 * nrOfLongDoubleAndObjectFields + 4 + nrOfntAndFloatFields + 2 * nrOfShortAndCharFields + 1 * nrOfByteAndBooleanFields -> align result up to a multiple of 8
Note that an object might have unused space due to alignment at every inheritance level (e.g. imagine a class A with just a byte field and class B has A as it's superclass and declares a byte field itself -> 14 bytes 'wasted on 64 bit system).
In practice 64 bit needs 30 to 50% more memory due to references being twice as large.
[UPDATE]
How much memory does a boolean consume?
Coming back to the question of how much a boolean consumes, yes it does consume at least one byte, but due to alignment rules it may consume much more. IMHO it is more interesting to know that a boolean[] will consume one byte per entry and not one bit,plus some overhead due to alignment and for the size field of the array. There are graph algorithms where large fields of bits are useful, and you need to be aware that, if you use a boolean[] you need almost exactly 8 times more memory than really needed (1 byte versus 1 bit).
Alternatives to boolean[]
To store large sets of booleans more efficiently a standard BitSet can be used. This will pack the bits into a compact representation that only uses one bit per entry (plus some fixed overhead)
For the Eclipse Memory Analyzer we had the need to use large fields of booleans (Millions of entries) and we know in advance how many entries we need.
We therefore implemented the class BitField that is faster than BitSet.
Still we only use this class on single core machines, because the disadvantage of it is that it is not thread save. On multicore machines we use boolean[] and I will explain in a later post how that works, because it's pretty tricky.
Note that these rules tell you only the flat (=shallow) size of an object, which is in practice pretty useless.
I will explain in my next post, how you can better measure how much memory your Java objects consume.
Labels:
memory
Gepostet von
Unknown
Thursday, November 06, 2008
Is Eclipse bloated? Some numbers and a first analysis
On the e4 (Eclipse 4) mailing list there were lately some discussions about whether Eclipse is bloated or not, and what could be done to minimize bloat for e4. Some summary information can be found here.
I promised to get some numbers about how much memory is used by the JIT for the classes.
I finally managed it :)
So I took Eclipse 3.5M2 (without the web tools) configured it to use the SAP JVM (which is based on SUN's Hotspot, therefore the numbers should be comparable) and used the Eclipse Memory Analyzer (plus the non open source, but free SAP extensions) to generate a csv file that I then imported into IBM's manyeyes (great stuff!) to get the data visualized as a treemap.
The numbers do not include the overhead of the compiled code cache which was around 10 Mbyte.
Here comes the memory usage(in bytes) per bundle for Eclipse 3.5M2 after clicking around to activate as many plugins as possible (a more well defined test would be good idea) :
A detailed analysis would be very time consuming, because you would need to check for a lot of classes whether there's anything that could be done to make them smaller. So for now here are just some interesting observations that I made, when quickly looking over the results:
So is Eclipse really bloated?
To me it seems it's overall not very bloated. Remember that a developer PC these days probably has 2 Gbyte of RAM, so 64Mbyte hardly matter.
The help system could be probably optimizied because it uses a complete servlet implementation (Jetty) including JSP's as well as a complete Text search engine (lucene).
The heap consumption could be a bigger issue. I've seen things that you should almost always avoid like holding parsed DOM trees in memory. I may explain these issues in a later post, and I will probably show some examples in my next talk .
I promised to get some numbers about how much memory is used by the JIT for the classes.
I finally managed it :)
So I took Eclipse 3.5M2 (without the web tools) configured it to use the SAP JVM (which is based on SUN's Hotspot, therefore the numbers should be comparable) and used the Eclipse Memory Analyzer (plus the non open source, but free SAP extensions) to generate a csv file that I then imported into IBM's manyeyes (great stuff!) to get the data visualized as a treemap.
The numbers do not include the overhead of the compiled code cache which was around 10 Mbyte.
Here comes the memory usage(in bytes) per bundle for Eclipse 3.5M2 after clicking around to activate as many plugins as possible (a more well defined test would be good idea) :
A detailed analysis would be very time consuming, because you would need to check for a lot of classes whether there's anything that could be done to make them smaller. So for now here are just some interesting observations that I made, when quickly looking over the results:
- 12649 classes were loaded and consumed around 64 Mbyte in Perm space
- during the test I run out of permspace, which was configured by default to 64Mbyte and the UI would freeze :(
- there does not seem to be any large amount of generated code, and therefore optimizing classes might be difficult
- the help system uses JSP's (generats classes) which are relatively big, also only of few of them where in memory
- 247 relatively big classes in org.eclipse.jdt.internal.compiler were loaded twice,once by the jasper plugin and once by the jdk core plugin
So is Eclipse really bloated?
To me it seems it's overall not very bloated. Remember that a developer PC these days probably has 2 Gbyte of RAM, so 64Mbyte hardly matter.
The help system could be probably optimizied because it uses a complete servlet implementation (Jetty) including JSP's as well as a complete Text search engine (lucene).
The heap consumption could be a bigger issue. I've seen things that you should almost always avoid like holding parsed DOM trees in memory. I may explain these issues in a later post, and I will probably show some examples in my next talk .
Labels:
eclipse,
memory,
performance
Gepostet von
Unknown
Tuesday, November 04, 2008
Eclipse Memory Analyzer Talk
On 25 November I will give a talk(in German) about the Eclipse Memory Analyzer at the Mannheim Java User Group.
See here for the details.
I will explain how MAT works and I also will also show some real world examples.
See here for the details.
I will explain how MAT works and I also will also show some real world examples.
Monday, October 27, 2008
The knowledge about how much memory things need in Java is surprisingly low
I just came across the question "Size of a byte in memory - Java" at stackoverflow.
Ok, stackoverflow might not be the place for high quality answers, but almost nobody getting close to a meaningful answer for this day to day question, is a little bit shocking.
I documented the rules for java memory usage of the SUN JVM at my old blog 2 years ago.
I don't ask for people knowing the rules exactly (I don't either), but some feeling about where a JVM would align objects, and what kind of overhead an object has is essential Java programmers knowledge.
It's not really necessary that you know all the rules because the Eclipse Memory Analyzer knows about them.
Still, I think I understand that people might not know the details, because they are platform (32 bit versus 64 bit) as well as JVM implementation depended and have not been documented for a long time.
Not everybody has access to JVM hackers ;)
Lets come back to the question of how much a byte costs. A byte costs really 8 bits=1 byte (that is defined by the Java Spec), but the SUN JVM (on other JVM's do that as well) aligns to 8 byte.
So a it all depends on what other fields you have defined in your object. If you have 8 byte fields and nothing else everything is properly aligned and you would not waste memory.
On the other side you could add a byte field to an object and not consume more memory. You could have for example one field that only consumes 4 bytes and due to alignment you already waste for 4 bytes but adding the byte field actually costs nothing because it fills in the padded bytes.
The same is true for byte arrays, they are aligned as well and there's additional space needed for the length of the array. But for large byte arrays the average consumption is very close to one byte.
Ok, stackoverflow might not be the place for high quality answers, but almost nobody getting close to a meaningful answer for this day to day question, is a little bit shocking.
I documented the rules for java memory usage of the SUN JVM at my old blog 2 years ago.
I don't ask for people knowing the rules exactly (I don't either), but some feeling about where a JVM would align objects, and what kind of overhead an object has is essential Java programmers knowledge.
It's not really necessary that you know all the rules because the Eclipse Memory Analyzer knows about them.
Still, I think I understand that people might not know the details, because they are platform (32 bit versus 64 bit) as well as JVM implementation depended and have not been documented for a long time.
Not everybody has access to JVM hackers ;)
Lets come back to the question of how much a byte costs. A byte costs really 8 bits=1 byte (that is defined by the Java Spec), but the SUN JVM (on other JVM's do that as well) aligns to 8 byte.
So a it all depends on what other fields you have defined in your object. If you have 8 byte fields and nothing else everything is properly aligned and you would not waste memory.
On the other side you could add a byte field to an object and not consume more memory. You could have for example one field that only consumes 4 bytes and due to alignment you already waste for 4 bytes but adding the byte field actually costs nothing because it fills in the padded bytes.
The same is true for byte arrays, they are aligned as well and there's additional space needed for the length of the array. But for large byte arrays the average consumption is very close to one byte.
Tuesday, October 21, 2008
Support for IBM JVM's now available for the Eclipse Memory Analyzer
Just a short note.
It was just brought to my attention, that the Eclipse Memory Analyzer now also supports system dumps from IBM Virtual Machines for Java version 6, version 5.0 and version 1.4.2.
The IBM DTFJ adapter is available as an Eclipse plugin and can be downloaded from developerworks.
This is great news, because a lot of people had asked for IBM support, and because it means that MAT is now modular enough to support other Heap dump formats as well.
Maybe Google would like to support Dalvik (Android)? ;)
It was just brought to my attention, that the Eclipse Memory Analyzer now also supports system dumps from IBM Virtual Machines for Java version 6, version 5.0 and version 1.4.2.
The IBM DTFJ adapter is available as an Eclipse plugin and can be downloaded from developerworks.
This is great news, because a lot of people had asked for IBM support, and because it means that MAT is now modular enough to support other Heap dump formats as well.
Maybe Google would like to support Dalvik (Android)? ;)
Tuesday, June 24, 2008
Eclipse Memory Analyzer considered to be a must have tool
Philip Jacob thinks that the Eclipse Memory Analyzer is a must have tool :
I didn't know that EC2 supports big multi core boxes. That is very interesting because the Eclipse Memory Analyzer does take advantage of multiple cores and the available memory on 64 bit operating systems. It will "fly" on one of these boxes.
I also had a little incident with a 1.5Gb heap dump yesterday. I wanted to analyze it after one of our app servers coughed it up (right before it crashed hard) to find out what the problem was. I tried jhat, which seemed to require more memory than could possibly fit into my laptop (with 4Gb). I tried Yourkit, which also stalled trying to read this large dump file (actually, Yourkit’s profiler looked pretty cool, so I shall probably revisit that). I even tried firing up jhat on an EC2 box with 15Gb of memory… but that also didn’t work. Finally, I ran across the Eclipse Memory Analyzer. Based on my previous two experiences, I didn’t expect this one to work…. but, holy cow, it did. Within just a few minutes, I had my culprit nailed (big memory leak in XStream 1.2.2) and I was much further along than I was previously.Thanks Philip for the positive feedback!
I didn't know that EC2 supports big multi core boxes. That is very interesting because the Eclipse Memory Analyzer does take advantage of multiple cores and the available memory on 64 bit operating systems. It will "fly" on one of these boxes.
Tuesday, June 03, 2008
A classical Finalizer problem in Netbeans 6.1
Recently I tried the Netbeans UML module to sketch some simple use case diagrams.
It worked pretty well, but it didn't feel very responsive all the time. I quickly checked the memory consumption and found that it would be much higher than during my last test.
I therefore took another heap dump. Here comes the overview:
Finalizers?
So this time Netbeans needed 74,2 Mbyte, much more than last time.
Surprisingly 15,5Mbyte alone are consumed by instances of the class java.lang.ref.Finalizer.
Such a high memory usage caused by Finalizer instances is not normal.
Usually you would see Finalizer instance using a few hundred Kbyte.
Next I simply checked the retained set (the object that would be reclaimed, if I could remove the Finalizer instances from memory) of these Finalizer instances:
So int[] arrays are consuming most of the memory. I again used the "immediate dominator" query on those int[] arrays to see who is keeping them in memory:
So lets take a look at those sun.awt.image.IntegerInterleavedRaster instances and see who is referencing them:
Can we blame Tom Sawyer?
We see again that java.awt.image.BufferedImage is involved as well as Java2d.
surfaceData sun.java2d.SunGraphics2D is referenced by com.tomsawyer.editor.graphics.TSEDefaultGraphics (what a nice package name).
Lets look at the code of surfaceData sun.java2d.SunGraphics2D:
public void dispose()
{
surfaceData = NullSurfaceData.theInstance;
invalidatePipe();
}
public void finalize()
{
}
"dispose" should clean surfaceData, but at least to to me it seems that nobody has called it.
So I decompiled TSEDefaultGraphics and found dispose to be empty:
public void dispose()
{
}
So my guess is (without digging deeply into the code) that TSEDefaultGraphics needs to be fixed and call dispose on it's surfaceData instance variable.
At the End
What this shows, is that you not only need to be very careful with implementing finalize(), but yo also need to take check whether you use objects that implement finalize().
Objects that really need to implement finalize should be small and you should not reference large objects.
It worked pretty well, but it didn't feel very responsive all the time. I quickly checked the memory consumption and found that it would be much higher than during my last test.
I therefore took another heap dump. Here comes the overview:
Finalizers?So this time Netbeans needed 74,2 Mbyte, much more than last time.
Surprisingly 15,5Mbyte alone are consumed by instances of the class java.lang.ref.Finalizer.
Such a high memory usage caused by Finalizer instances is not normal.
Usually you would see Finalizer instance using a few hundred Kbyte.
Next I simply checked the retained set (the object that would be reclaimed, if I could remove the Finalizer instances from memory) of these Finalizer instances:
So int[] arrays are consuming most of the memory. I again used the "immediate dominator" query on those int[] arrays to see who is keeping them in memory:
So lets take a look at those sun.awt.image.IntegerInterleavedRaster instances and see who is referencing them:
Can we blame Tom Sawyer?We see again that java.awt.image.BufferedImage is involved as well as Java2d.
surfaceData sun.java2d.SunGraphics2D is referenced by com.tomsawyer.editor.graphics.TSEDefaultGraphics (what a nice package name).
Lets look at the code of surfaceData sun.java2d.SunGraphics2D:
public void dispose()
{
surfaceData = NullSurfaceData.theInstance;
invalidatePipe();
}
public void finalize()
{
}
"dispose" should clean surfaceData, but at least to to me it seems that nobody has called it.
So I decompiled TSEDefaultGraphics and found dispose to be empty:
public void dispose()
{
}
So my guess is (without digging deeply into the code) that TSEDefaultGraphics needs to be fixed and call dispose on it's surfaceData instance variable.
At the End
What this shows, is that you not only need to be very careful with implementing finalize(), but yo also need to take check whether you use objects that implement finalize().
Objects that really need to implement finalize should be small and you should not reference large objects.
Wednesday, May 28, 2008
Memory consumption of Netbeans versus Eclipse an analysis
I recently analyzed the memory consumption of Eclipse and found that it should be easy to optimize it.
This time I will take a look at Netbeans (6.1, the Java SE pack) using the exact same setup.
First the overall memory consumption of Netbeans is only a little bit higher 24 Mbyte versus 22,7 Mbyte for eclipse :

Keep in mind that the Eclipse memory usage includes the spell checker which needs 5,6Mbyte, which can easily turned off. Without the spell checker Eclipse would need only 17,1Mbyte
The overview of the Memory Analyzer shows that the biggest memory consumer, with around 5.4 Mbyte is sun.awt.image.BufImgSurfaceData:

Swing overhead(?)
This seems to be caused by the fact that Swing uses java2d which does it's own image buffering independent of the OS. I could easily figure this out by using the Memory Analyzers "path to GC roots query":
So maybe we pay here for the platform independence of Swing?
I quickly checked using Google whether there are ways around this Image buffering, but I couldn't find any clear guidance that would avoid this. If there are Swing experts reading this, please let me know your advise.
Duplicated Strings again
Again I check for the duplicated Strings using the "group_by_value" feature of the Memory Analyzer.
Again some Strings are there many times :

This time I selected all Strings, which are there more than once, then called the "immediate dominators" query and afterwards I used the "group by package" feature in the resulting view:
This view shows you the sum of the number of duplicates in each package and therefore gives you an good overview of which packages waste the most memory because of objects keeping duplicates of Strings alive.
You can see for example that
org.netbeans.modules.java.source.parsing.FileObjects$CachedZipFileObject keeps alive a lot of duplicated Strings.
Looking at some of these objects you can for example see that one problem is the instance variable ext, which contains very often duplicates of the String "class".

Summary
So at the end we found for this probably simplistic scenario that both Netbeans and Eclipse don't consume that much memory. 24Mbyte is really not that much these days.
Eclipse seems to have a certain advantage, because turning of the spell checker is easy and then it needs almost 30% less memory than Netbeans.
To be clear, it is still much to early to declare a real winner here. My scenario is just to simple.
The only conclusion that we can draw for sure for now is, that this kind of analysis is pretty easy with the Eclipse Memory Analyzer :)
This time I will take a look at Netbeans (6.1, the Java SE pack) using the exact same setup.
First the overall memory consumption of Netbeans is only a little bit higher 24 Mbyte versus 22,7 Mbyte for eclipse :
Keep in mind that the Eclipse memory usage includes the spell checker which needs 5,6Mbyte, which can easily turned off. Without the spell checker Eclipse would need only 17,1Mbyte
The overview of the Memory Analyzer shows that the biggest memory consumer, with around 5.4 Mbyte is sun.awt.image.BufImgSurfaceData:

Swing overhead(?)
This seems to be caused by the fact that Swing uses java2d which does it's own image buffering independent of the OS. I could easily figure this out by using the Memory Analyzers "path to GC roots query":
So maybe we pay here for the platform independence of Swing?I quickly checked using Google whether there are ways around this Image buffering, but I couldn't find any clear guidance that would avoid this. If there are Swing experts reading this, please let me know your advise.
Duplicated Strings again
Again I check for the duplicated Strings using the "group_by_value" feature of the Memory Analyzer.
Again some Strings are there many times :

This time I selected all Strings, which are there more than once, then called the "immediate dominators" query and afterwards I used the "group by package" feature in the resulting view:
This view shows you the sum of the number of duplicates in each package and therefore gives you an good overview of which packages waste the most memory because of objects keeping duplicates of Strings alive.You can see for example that
org.netbeans.modules.java.source.parsing.FileObjects$CachedZipFileObject keeps alive a lot of duplicated Strings.
Looking at some of these objects you can for example see that one problem is the instance variable ext, which contains very often duplicates of the String "class".

Summary
So at the end we found for this probably simplistic scenario that both Netbeans and Eclipse don't consume that much memory. 24Mbyte is really not that much these days.
Eclipse seems to have a certain advantage, because turning of the spell checker is easy and then it needs almost 30% less memory than Netbeans.
To be clear, it is still much to early to declare a real winner here. My scenario is just to simple.
The only conclusion that we can draw for sure for now is, that this kind of analysis is pretty easy with the Eclipse Memory Analyzer :)
Labels:
eclipse,
java,
memory,
netbeans,
performance
Gepostet von
Unknown
Thursday, May 22, 2008
Eclipse memory leaks?
"ecamacho" picked up my post about the memory consumption of Eclipse on the spanish website http://javahispano.org/
Google translation works pretty well for the post and the comments are quite interesting.
Leaks in Eclipse?
One question was, whether I was talking about leaks in Eclipse.
Actually I myself did not, but my colleague and project lead of the Eclipse Memory Analyzer project, "Andreas, Buchen", blogged about analyzing an actual leak in Eclipse. I can highly recommend the article, because it also shows how powerful the Memory Analyzer is.
Does turning of the Spellchecker help?
From what I have seen, yes it should help. Check http://bugs.eclipse.org/bugs/show_bug.cgi?id=233156 for the progress on the spellchecker issue.
Should we wait until the end to analyze the memory consumption of our applications?
IMHO we should not, because as we all know, fixes at the end of an software project are much more expensive than in the beginning.
I often hear :
"premature optimization is the root of all evil."
This has been misquoted just too often.
Hoare said (according to wikipedia)
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."
"small efficiencies"
That quote clearly doesn't say, you should not do any investigations on how much resources your applications consumes in the early phase of your project.
Actually we have some experience with the automatic evaluation of heap dumps and we found that it pays off very well. This could be a topic for another blog post.
Google translation works pretty well for the post and the comments are quite interesting.
Leaks in Eclipse?
One question was, whether I was talking about leaks in Eclipse.
Actually I myself did not, but my colleague and project lead of the Eclipse Memory Analyzer project, "Andreas, Buchen", blogged about analyzing an actual leak in Eclipse. I can highly recommend the article, because it also shows how powerful the Memory Analyzer is.
Does turning of the Spellchecker help?
From what I have seen, yes it should help. Check http://bugs.eclipse.org/bugs/show_bug.cgi?id=233156 for the progress on the spellchecker issue.
Should we wait until the end to analyze the memory consumption of our applications?
IMHO we should not, because as we all know, fixes at the end of an software project are much more expensive than in the beginning.
I often hear :
"premature optimization is the root of all evil."
This has been misquoted just too often.
Hoare said (according to wikipedia)
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."
"small efficiencies"
That quote clearly doesn't say, you should not do any investigations on how much resources your applications consumes in the early phase of your project.
Actually we have some experience with the automatic evaluation of heap dumps and we found that it pays off very well. This could be a topic for another blog post.
Labels:
java,
memory,
performance
Gepostet von
Unknown
Monday, May 19, 2008
Analyzing the Memory Consumption of Eclipse
During my talk on May 7 at the Java User Group Karlsruhe about the Eclipse Memory Analyzer I used the latest build of Eclipse 3.4 to show live, that there's room for improvement regarding the memory consumption of Eclipse.
Today I will show you how easy this kind of analysis is with the Eclipse Memory Analyzer.
I first started Eclipse 3.4 M7 (running on JDK 1.6_10) with one project "winstone" which includes the source of the winstone project(version 0.9.10):

Then I did a heap dump using the JDK 1.6 jmap command :
Since this was a relatively small dump (around 45Mbyte) the Memory Analyzer would parse and load it in a couple of seconds :
In the "Overview" page I already found one suspect. The spellchecker (marked in red on the screen shot) takes 5.6Mbyte (24,6%) out of 22,7 Mbyte overall memory consumption!
That's certainly too much for a "non core" feature.
[update:] In the mean time submitted a bug (https://bugs.eclipse.org/bugs/show_bug.cgi?id=233156)
Looking at the spellchecker in the Dominator tree :

reveals that the implementation of the dictionary used by the Spellchecker is rather simplistic.
No Trie, no Bloom filter just a simple HashMap mapping from a String to a List of spell checked Strings :
There's certainly room for improvement here by using one of the advanced data structures mentioned above.
My favorite memory consumption analysis trick
Now comes my favorite trick, which almost always works to find some memory to optimize in a complex Java application.
I went to the histogram and checked how much String instances are retained:
12Mbyte (out of 22,7), quite a lot! Note that 4 Mbyte are from the spell checker above (not shown here, how I computed that), but that still leaves 8 Mbyte for Strings.
The next step was to call the "magic" "group by value" query on all those strings :
Which showed us how many duplicates of those Strings are there:
Duplicates of Strings everywhere
What does this table tell us? It tells us for example that there are 1988 duplicates of the same String "id" or 504 duplicates of the String "true". Yes I'm serious. Before you laugh and comment how silly this is, I recommend you to take a look at your own Java application :] In my experience (over the past few years) this is one of the most common memory consumption problems in complex java applications.
"id" or "name" for example are clearly constant unique identifiers (UID). There's simply no reason why you would want that many duplicates of UID's. I don't even have to check the source code to claim that.
Let's check which instances of which class are reponsible for these Strings.
I called the immediate dominator function on the top 30 dominated Strings :

org.eclipse.core.internal.registry.ConfigurationElement seems to cause most of the duplicates ,13.242!
If you look at the instances of the ConfigurationElement it's pretty clear. that there's a systematic problem in this class. So this should be easy to fix by using for example String.intern() or a Map to avoid the duplicates.
Bashing Eclipse?
Now you may think, that this guy is bashing Eclipse, but that's really not the case.
If you beg enough, I might also take a closer look at Netbeans :]
Today I will show you how easy this kind of analysis is with the Eclipse Memory Analyzer.
I first started Eclipse 3.4 M7 (running on JDK 1.6_10) with one project "winstone" which includes the source of the winstone project(version 0.9.10):

Then I did a heap dump using the JDK 1.6 jmap command :

Since this was a relatively small dump (around 45Mbyte) the Memory Analyzer would parse and load it in a couple of seconds :

In the "Overview" page I already found one suspect. The spellchecker (marked in red on the screen shot) takes 5.6Mbyte (24,6%) out of 22,7 Mbyte overall memory consumption!
That's certainly too much for a "non core" feature.
[update:] In the mean time submitted a bug (https://bugs.eclipse.org/bugs/show_bug.cgi?id=233156)
Looking at the spellchecker in the Dominator tree :

reveals that the implementation of the dictionary used by the Spellchecker is rather simplistic.
No Trie, no Bloom filter just a simple HashMap mapping from a String to a List of spell checked Strings :
There's certainly room for improvement here by using one of the advanced data structures mentioned above.My favorite memory consumption analysis trick
Now comes my favorite trick, which almost always works to find some memory to optimize in a complex Java application.
I went to the histogram and checked how much String instances are retained:
12Mbyte (out of 22,7), quite a lot! Note that 4 Mbyte are from the spell checker above (not shown here, how I computed that), but that still leaves 8 Mbyte for Strings.The next step was to call the "magic" "group by value" query on all those strings :
Which showed us how many duplicates of those Strings are there:
Duplicates of Strings everywhereWhat does this table tell us? It tells us for example that there are 1988 duplicates of the same String "id" or 504 duplicates of the String "true". Yes I'm serious. Before you laugh and comment how silly this is, I recommend you to take a look at your own Java application :] In my experience (over the past few years) this is one of the most common memory consumption problems in complex java applications.
"id" or "name" for example are clearly constant unique identifiers (UID). There's simply no reason why you would want that many duplicates of UID's. I don't even have to check the source code to claim that.
Let's check which instances of which class are reponsible for these Strings.
I called the immediate dominator function on the top 30 dominated Strings :

org.eclipse.core.internal.registry.ConfigurationElement seems to cause most of the duplicates ,13.242!
If you look at the instances of the ConfigurationElement it's pretty clear. that there's a systematic problem in this class. So this should be easy to fix by using for example String.intern() or a Map to avoid the duplicates.
Bashing Eclipse?
Now you may think, that this guy is bashing Eclipse, but that's really not the case.
If you beg enough, I might also take a closer look at Netbeans :]
Subscribe to:
Posts (Atom)



