Domingos Neto just posted
Busting java.lang.String.intern() Myths.
In general I like the post,because I think this is an important topic, because in my experience Strings typically consume about 20% to 50% of the memory in Java applications. It's therefore important to avoid useless copies of the same String, to reduce memory usage.
But first some comments to the post above:
I agree, it doesn't make sense to intern Strings to be able to use == instead of equals. But the real reason is that String.equals already does == in the first place. If your Strings are identical you automatically get the speed advantage because usually equals will be inlined!
Here I disagree. String.intern() can help you to save a lot of memory because it can be used to avoid holding duplicates of Strings in memory.
Imagine you read a lot of Strings from some File and some (or a lot) of these Strings might actually be identifiers such as the name of a City or type(class). If you don't use String.intern()(or a similiar mechanism using a Set), you will hold copies of those Strings in memory. The number of this unnecessary copies will often increase with the number of Strings you read, and therefore you will really save a significant amount of memory.
In my experience duplicated Strings are one of the most common memory usage problems in Java applications.
Check for example my blog post about the Memory usage of Netbeans versus Eclipse.
That those interned Strings end up in Perm space IMHO is not a big issue. You need to setup perm space these days to pretty high values anyway, check for example my blog post about the perm space requirements of Eclipse.
Still in the SAP JVM we also introduced a new option to store those interned Strings in the old space.
Maybe someone wants to implement this option for the OpenJDK as well ;)
Issues with String.intern()
Now you might think that String.intern() is not problematic at all, but unfortunately there are a few issues.
Busting java.lang.String.intern() Myths.
In general I like the post,because I think this is an important topic, because in my experience Strings typically consume about 20% to 50% of the memory in Java applications. It's therefore important to avoid useless copies of the same String, to reduce memory usage.
But first some comments to the post above:
Myth 1: Comparing strings with == is much faster than with equals()
busted! Yes, == is faster than String.equals(), but in general it isn't near a performance improvement as it is cracked up to be.
I agree, it doesn't make sense to intern Strings to be able to use == instead of equals. But the real reason is that String.equals already does == in the first place. If your Strings are identical you automatically get the speed advantage because usually equals will be inlined!
Myth 2: String.intern() saves a lot of memory
Here I disagree. String.intern() can help you to save a lot of memory because it can be used to avoid holding duplicates of Strings in memory.
Imagine you read a lot of Strings from some File and some (or a lot) of these Strings might actually be identifiers such as the name of a City or type(class). If you don't use String.intern()(or a similiar mechanism using a Set), you will hold copies of those Strings in memory. The number of this unnecessary copies will often increase with the number of Strings you read, and therefore you will really save a significant amount of memory.
In my experience duplicated Strings are one of the most common memory usage problems in Java applications.
Check for example my blog post about the Memory usage of Netbeans versus Eclipse.
That those interned Strings end up in Perm space IMHO is not a big issue. You need to setup perm space these days to pretty high values anyway, check for example my blog post about the perm space requirements of Eclipse.
Still in the SAP JVM we also introduced a new option to store those interned Strings in the old space.
Maybe someone wants to implement this option for the OpenJDK as well ;)
Issues with String.intern()
Now you might think that String.intern() is not problematic at all, but unfortunately there are a few issues.
- Not all JVM's have fast implementations for String.intern(). For example HP's JVM used to have problems until recently.
- Additional contention is introduced and you have no control over it because String.intern() is native

4 Kommentare:
Excellent post! (I posted a tweet for it.) As you mentioned, performance can be very different amongst the various JVMs. To expand on that just a bit, not all JVMs even support the concept of permanent space. In other words, perm. gen. is a JVM-specific implementation detail. IIRC, older versions of JRockit did not even have perm. gen. (I think it put class metadata in the "regular" heap.)
Things get more complicated when you think about the JVM's garbage collector, which of course is also very implementation dependent. As the original post pointed out, Sun's JVM does garbage collect perm. gen., but keep in mind that the algorithm used to determine when to collect it, how much, etc. is *not* the same as the algorithm for the "regular" heap.
Hi Greg(?),
Thanks!
You are right, there are completely different JVM implementations, and I always tend to forget that :]
It's also true that storing the interned Strings in old space does not help much performance wise.
It's "only" less likely that you will run into OOM Errors because old space is usually larger than perm space. OK, maybe you would have less full GC's if old space is greater than perm space.
Yes,in most JVM's today you also have a new space, where short living objects get reclaimed more efficiently. But IMHO this is not an advantage here, because I would only intern Strings which are long lived.
I guess you are javaperformance on twitter?
Regards,
Markus
So, guess intern() would be a nono in a j2me environment...
Or if it is set to null, does it get removed from the perm space ?
If you set it to null it will get removed after a full GC (typically, depends on the VM).
Post a Comment