Tuesday, December 15, 2009

Gory details of java.lang.String interning

While exploring through the JDK source code today I came to some degree of understanding of how interned strings are treated by garbage collector (GC).

In the first versions of java interned strings were not collected at all. They were accumulated in the PermGen so it was quite possible to very quickly end up with OutOfMemory (OOM) exception when abusing intern() call. The current version of JVM uses a smarter way to maintain the string cache.

Opposed to some people saying that strings are kept as weak references the actual approach is different. During the first part of mark-and-sweep phase GC delegates to the static string table (a specialization of Hashtable) to get rid of all non-alive entries. These entries are not deleted but relinked instead from the hashtable bucket (the linked list they reside in) to the linked list of free entries (revise
   BasicHashtable::free_entry(BasicHashtableEntry* entry)← 
   void Hashtable::unlink(BoolObjectClosure* is_alive)←
   StringTable::unlink(BoolObjectClosure* cl)
call chain for details)
One important observation here is that memory taken by a freed entry is not deallocated. That means the more non-identical strings are interned by the application the more PermGen memory is consumed. Correspondingly if the JVM string table is too intensively used, for example, by attempting to cache too many non-identical strings it is easy to cause OOME. While in a case where the cached strings are known to have big percentage of duplicates interning along with fine tuning of PermGen may significantly reduce the overall memory consumption.

No comments:

Post a Comment