Learnitweb

Compact Strings in Java 9

1. Introduction

JEP 254 introduced a more space-efficient internal representation of strings. This was introduced in Java 9.

A String in Java is internally represented as char[] array. Since Java uses UTF-16, each character of char[] takes up space of 2 bytes (sixteen bits). In most of the applications, strings are prominently used and consume a heap space. However, most of the String objects contain only Latin-1 characters. Such Latin-1 characters require only one byte of storage. In such cases, half of the space of such char arrays is unused. For example, if a String contains only English characters, then it results in usused space.
To address this issue, compact String was introduced in Java 9.

2. Compact String

Internally, the Compact String in Java 9 is represented by a character array plus an encoding-flag field. Based on the characters of the string, the char array stores characters either as ISO-8859-1/Latin-1 (one byte per character), or as UTF-16 (two bytes per character). The encoding flag is used to indicate which encoding is used.

If you see the String class, you’ll find following two fields in the class:

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence,
               Constable, ConstantDesc {

    /**
     * The value is used for character storage.
     *
     * @implNote This field is trusted by the VM, and is a subject to
     * constant folding if String instance is constant. Overwriting this
     * field after construction will cause problems.
     *
     * Additionally, it is marked with {@link Stable} to trust the contents
     * of the array. No other facility in JDK provides this functionality (yet).
     * {@link Stable} is safe here, because value is never null.
     */
    @Stable
    private final byte[] value;

    /**
     * The identifier of the encoding used to encode the bytes in
     * {@code value}. The supported values in this implementation are
     *
     * LATIN1
     * UTF16
     *
     * @implNote This field is trusted by the VM, and is a subject to
     * constant folding if String instance is constant. Overwriting this
     * field after construction will cause problems.
     */
    private final byte coder;   

As you can see, the coder field is used to preserve the information of the encoding used.

In the string class, you can find the following code for encoding:

    @Native static final byte LATIN1 = 0;
    @Native static final byte UTF16  = 1;

Most of the String operations now check for the encoding, for example:

    boolean isLatin1() {
        return COMPACT_STRINGS && coder == LATIN1;
    }
	
	public int indexOf(int ch, int fromIndex) {
        return isLatin1() ? StringLatin1.indexOf(value, ch, fromIndex)
                          : StringUTF16.indexOf(value, ch, fromIndex);
    }

It’s important to note that the decision to use compact strings is made at runtime by the JVM and is transparent to the programmer. The programmer does not need to explicitly enable or disable compact strings. The JVM automatically determines whether to use compact strings based on the content of the strings being created.

3. CompactStrings VM option

The Compact String VM option is enabled by default. To disable it, we can use the following:

+XX:-CompactStrings

It’s important to note that disabling compact strings may increase the memory footprint of your strings and could potentially impact the performance of your application in terms of memory usage and string processing operations. Therefore, it’s recommended to carefully evaluate the specific requirements and performance characteristics of your application before deciding to disable compact strings in Java.

Compact strings in Java 9 provide a way to reduce the memory usage of strings in certain scenarios where strings primarily contain ASCII characters, resulting in potential memory savings for applications that deal with large numbers of strings. However, it’s worth noting that the actual memory savings may vary depending on the specific use case and the content of the strings being used in the application.

3. When to disable CompactStrings

Here are few scenarios when you may want to disable compact strings:

  • Memory usage is not a concern: If your application has ample memory available and memory usage is not as much critical, you may disable CompactStrings to give priority to performance.
  • String processing performance is a priority: The encoding and decoding process may lead to additional overhead, which could impact the performance of string processing operations. In such cases, disabling compact strings may improve performance, especially for applications that perform a large number of string operations.
  • Compatibility with older Java versions: You may choose to disable compact strings to ensure consistent behavior across different Java versions. For example, if your application needs to be compatible with older versions of Java that do not support compact strings (prior to Java 9).
  • Debugging or profiling: Disabling compact strings gives more accurate representation of memory usage and performance charactersitics when debugging or profiling Java applications.

2. Conclusion

In this article, we discussed about JEP 254 Compact Strings. We also discussed, how to disable this and the scenarios when we may want to disable this. This is a very important enhancement for applications where memory usage is critical.

Happy Learning!