Summary

No longer require super() and this() to appear first in a constructor.

Goals

Change the Java Language Specification and make corresponding changes to the Java compiler so that:

  • super() and this() no longer must appear as the first statement in a constructor
  • The language preserves existing safety and initialization guarantees afforded to constructors
  • Existing programs continue to compile and function as they did before

Non-Goals

Modifications to the JVM. These changes may prompt reconsideration of the JVM’s current restrictions on constructors, however, in order to avoid unnecessary linkage between JLS and JVM changes, any such modifications should be proposed in a follow-on JEP. This JEP assumes no change to the current JVM behavior.

Changes to current behavior. There is no intention to change the behavior of any program that adheres to the current JLS.

Addressing larger language concerns. Thinking about the interplay between superclass constructors and subclass initialization has evolved since the Java language was first designed. This work should be considered a pragmatic tweak rather than a statement on language design.

Motivation

Currently, the Java language requires that invocations of this() or super() appear as the first statement in a constructor.

However, the Java Virtual Machine actually allows more flexibility:

  • Multiple invocations of this() and/or super() may appear in a constructor, as long as on any code path there is exactly one invocation
  • Arbitrary code may appear before this()/super(), as long as that code doesn’t reference the instance under construction, with an exception carved out for field assignments
  • However, invocations of this()/super() may not appear within a try { } block (i.e., within a bytecode exception range)

Note that these more permissive rules do not cause any reduction in existing safety guarantees regarding proper initialization: (a) the uninitialized instance is still “off limits”, except for field assignments (which do not affect outcomes), until superclass initialization is performed, and (b) superclass initialization always happens exactly once, either directly via super() or indirectly via this().

So a basic motivation is simply that the JLS is being needlessly restrictive. In fact, this inconsistency is a historical artifact: the original JVM specification was more restrictive also, however, this led to issues with initialization of synthetic fields generated by the compiler to support new language features such as inner classes and captured free variables. As as result, the JVM specification was relaxed to accommodate the compiler, but this new flexibility never made its way back up to the language level.

There is also a practical motivation, which is that it’s often convenient to be able to do “housekeeping” before invoking super() or this().

Here’s a somewhat contrived example:

import java.math.*;

public class BigPositiveValue extends BigInteger {

    /**
     * Constructor taking a {@code long} value.
     *
     * @param value value, must be one or greater
     */
    public BigPositiveValue(long value) {
        if (value < 1)
            throw new IllegalArgumentException("non-positive value");
        super(String.valueOf(value));
    }

    /**
     * Constructor taking a base and exponent. Negative exponents are clipped to zero.
     *
     * @param base base
     * @param power exponent
     */
    public BigPositiveValue(int base, float power) {
        if (base < 2)
            throw new IllegalArgumentException("invalid base");
        if (!Float.isFinite(power))
            throw new IllegalArgumentException("invalid power");
        if (power <= 0)      // clip negative exponents to zero
            super("1");
        else
            this(Math.round(Math.pow(base, power)));
    }
}

Another reason is to provide a way to avoid bugs caused by a 'this' escape in a superclass constructor. A 'this' escape is when a superclass constructor does something that could cause a subclass method to be invoked before the superclass constructor returns; in such cases the subclass method would operate on an incompletely initialized instance.

For example, consider this class:

import java.util.*;
import java.util.function.*;

/**
 * A {@link Set} that rejects elements not accepted by the configured {@link Predicate}.
 */
public class FilteredSet extends HashSet {

    private final Predicate filter;

    public FilteredSet(Predicate filter, Collection elems) {
        super(elems);
        this.filter = filter;
    }

    @Override
    public boolean add(E elem) {
        if (!this.filter.test(elem))
            throw new IllegalArgumentException("disallowed element");
        return super.add(elem);
    }

    public static void main(String[] args) {
        new FilteredSet<>(s -> true, Arrays.asList("abc", "def"));   // NullPointerException
    }
}

It appears bug-free, but actually it throws a NullPointerException. The reason is not apparent until you realize that the HashSet(Collection) constructor invokes AbstractCollection.addAll(), which invokes add(), which as overridden in FilteredSet dereferences this.filter before that field is initialized. In other words, the bug results from the trap laid by the 'this' escape in the HashSet(Collection) constructor.

Moreover, there's no simple way for the FilteredSet constructor to work around that trap. But the problem could be easily avoided if the constructor could simply do this:

public FilteredSet(Predicate filter, Collection elems) {
    this.filter = filter;
    super(elems);
}

Even if there is no 'this' escape in a superclass, this is a fact that's not going to be obvious to a developer, because it requires recursive inspection of each superclass constructor's code. Moreover, 'this' escape behavior in constructors is rarely part of their documented behavior (either way), and so is subject to change; it's unwise to rely on some other class' unspecified implementation details for correct code. By initializing fields prior to superclass initialization, developers can confidently dismiss any concerns about superclass 'this' escapes.

Description

Language Changes

The JLS will be modified as follows:

  • Remove the requirement that super() or this() appear as the first statement in a constructor
  • Add the requirement that, in any constructor with explicit super() and/or this() invocations, either super() or this() must be invoked exactly once (assuming the constructor returns normally). This may be specified economically by stating that the compiler treats superclass initialization like a non-static blank final field.
  • Add the requirement that no access to the new instance in a constructor, other than assignments to fields, may occur prior to an invocation of super() or this()
  • Add the requirement that super() and this() may not appear within any try { } block
  • Specify that non-static field initializers and initialization blocks are executed immediately after super() invocation, wherever it occurs

Note: there is no change to the implicit addition of super() at the beginning of any constructor having no explicit super() or this() invocation.

try { } Blocks

The restriction that super() and this() may not appear inside a try { } block comes from the JVM itself, and is due to how StackMaps are represented. The logic is that when a superclass constructor throws an exception, the new instance on the stack is neither fully uninitialized nor fully initialized, so it should be considered unusable, and therefore such a constructor must never return. However, the JVM doesn't allow the bytecode to discard the unusable instance and throw another exception; instead, it doesn't allow it to exist on the stack at all. The net effect is that constructors can't catch exceptions thrown by superclass initialization, even if rethrown.

Initialization Order

The JLS specifies that field initializers and initialization blocks execute after superclass initialization via super(). So this class:

class Test1 {
    final int x;
    {
        x = 123;
    }
    public Test1() {
        super();
        this.x = 456;
    }
}

generates this error:

Test1.java:8: error: variable x might already have been assigned
        this.x = 456;
            ^

However, now that super() can appear anywhere in a constructor, an assignment in an initializer block can now happen after an earlier assignment in a constructor. So this class:

class Test1 {
    final int x;
    {
        x = 123;
    }
    public Test1() {
        this.x = 456;
        super();
    }
}

will now generate this error:

Test1.java:4: error: variable x might already have been assigned
        x = 123;
        ^

As before, initializers and initialization blocks happen immediately after superclass initialization, which happens when super() is invoked. But now this can be anywhere in the constructor.

One might ask why not move initializers and initialization blocks to the start of every constructor, but that doesn't work. First, they could have early references (e.g., by invoking an instance method), and second, the constructor might invoke this() and super() on different code branches, so you'd be executing the initialization twice in the this() case.

Records

Record constructors are subject to more restrictions that normal constructors. In particular:

  • Canonical record constructors may not contain any explicit super() or this() invocation
  • Non-canonical record constructors may invoke this(), but not super()

These restrictions remain in place, but otherwise record constructors benefit from these changes. The net change is that non-canonical record constructors can now invoke this() multiple times, as long as it is invoked exactly once along any code path.

Compiler Changes

All constructors except java.lang.Object() must initialize their superclass. Currently, there are three options for superclass initialization:

  1. Invoke super() as the first statement
  2. Invoke this() as the first statement
  3. Do not invoke super() or this() → the compiler adds a super() for you

In the compiler, constructors are currently divided into two categories:

  1. Initial constructors invoke super() (either explicitly or implicitly)
  2. Non-inital constructors invoke this()

In the current code, non-initial constructors are treated almost the same as normal methods, because once this() is invoked at the start of the constructor, the object is fully initialized. Initial constructors, however, must be more closely watched to insure final fields are initialized correctly. Initial constructors also must be modified during compilation to execute any non-static field initializers and initialization blocks. All constructors are modified to handle non-static nested class references to outer instances, and free variable proxies.

Overall, the following "syntactic sugar" adjustments are applied to constructors during compilation:

  1. If the constructor doesn't invoke this() or super(), an initial super() invocation is inserted
  2. If the class has non-static fields initializers or initialization blocks:
    1. Code is added after super() invocations to initialize fields and run initialization blocks
  3. If the class has an outer instance:
    1. A synthetic this$0 field is added to the class
    2. Constructors have an extra parameter prepended to carry it
    3. Code is added prior to super() invocations to initialize this$0 from the new parameter
  4. If the class has proxies for free variables:
    1. Synthetic val$x fields are added to the class
    2. Constructors have extra parameters appended
    3. Code is added prior to super() invocations to initialize each val$x from its new parameter

By initializing this$0 and val$x fields before invoking super(), the compiler is already taking advantage of the looser JVM requirements for its own purposes. A side effect is that this alternate version of FilteredSet works fine:

import java.util.*;
import java.util.function.*;

public class FilteredSet {

    public static  Set create(Predicate filter, Collection elems) {
        return new HashSet(elems) {
            @Override
            public boolean add(E elem) {
                if (!filter.test(elem))
                    throw new IllegalArgumentException("disallowed element");
                return super.add(elem);
            }
        };
    }

    public static void main(String[] args) {
        FilteredSet.create(s -> true, Arrays.asList("abc", "def"));   // works!
    }
}

Compiler Change Overview

This change impacts a few different areas of the compiler. In all cases, existing classes should compile the same way as they did before; we are strictly expanding the set of accepted source inputs.

At a high level, here's what changes in the compiler:

  1. Relax checks so that this()/super() may appear anywhere in constructors except for try { } blocks
  2. Add DA/DU analysis for superclass initialization
  3. Add checks to disallow early this references, except for field assignments
  4. Refactor/replace any code that currently assumes this()/super() is always first in constructors

Changes to Specific Files

Below are per-file descriptions of the changes being made.

comp/Attr.java

The check that super()/this() is the first statement of a constructor is relaxed to just check that super()/this() occurs within a constructor.

Non-canonical record constructors may now invoke this() more than once on different code branches, but (as before) they must invoke this() exactly once and they must not ever invoke super().

comp/Check.java

The check for recursive constructor invocation is adjusted to handle the fact that a constructor may invoke more than one other constructor, i.e., the invocation call graph is now one-to-many instead of one-to-one.

comp/Flow.java

Flow.FlowAnalyzer checks for uncaught checked exceptions. For initializer blocks, this was previously done by requiring that any checked exceptions thrown be declared as thrown by all initial constructors. This list of checked exceptions is pre-calculated before recursing into the initial constructors. This works because initializer blocks are executed at the beginning of each initial constructor right after super() is called.

In the new version of FlowAnalyzer, initializer blocks are traversed in the flow analysis after each super() invocation, reflecting what actually will happen at runtime (see below), and the pre-calculation is removed. The effect is the same as before, namely, any checked exceptions thrown by initializer blocks must be declared as thrown by all constructors that invoke super().

Flow.AssignAnalyzer is responsible for DA/DU analysis for fields and variables. We piggy-back on the existing machinery for tracking assignments to final instance fields to track superclass initialization, which acts like an assignment to a blank final field, in that it must happen exactly once in each constructor no matter what code branch is taken. To do this we allocate an additional bit in the existing DA/DU bitmaps, and for the most part the existing machinery takes care of the rest.

Previously, the code worked as follows:

  1. For initial constructors:
    1. Assume final fields with initializers or assigned within initialization blocks start out DA.
      1. Note: This is an optimization based on the assumption that super() is always first and then followed by initializers
    2. Assume all blank final fields start out DU.
    3. Upon seeing an assignment to a blank final field:
      1. Before, the blank final field must be DU
      2. After, the blank final field is DA
    4. Require all final fields to be DA on any return.
  2. For non-initial constructors, don't do DA/DU analysis for fields (i.e., treat non-initial constructors like a normal method)
    1. Note: This is another optimization, based on the assumption that this() is always first

Now that super() and this() can appear anywhere in constructors, there is no longer such a thing as an "initial" constructor. The new code works as follows:

  1. For all constructors:
    1. Assume all final fields start out DU.
    2. Upon seeing an assignment to a blank final field:
      1. Before, the blank final field must be DU
      2. After, the blank final field is DA
    3. Upon seeing super():
      1. Superclass initialization must be DU
      2. Mark superclass initialization as DA
      3. Recurse on initializers and initialization blocks normally to process field assignments therein
    4. Upon seeing this():
      1. Superclass initialization must be DU
      2. Mark superclass initialization as DA
      3. "Infer" assignments to all blank final fields, i.e.:
        1. All blank final fields must be DU
        2. Mark all blank final fields as DA
    5. Require all final fields to be DA on any return.
    6. Require superclass initialization to be DA on any return.

The result is that on every path through every constructor, each blank final field must be assigned exactly once, and superclass initialization must also happen exactly once.

AssignAnalyzer is also augmented to enforce these new restrictions:

  1. Disallow any reference to the current instance prior to super() or this(), except for assignments to fields.
  2. Disallow invocations of this() or super() invocations within try { } blocks.

comp/Lower.java

This is where the adjustments are made for initializing outer instances and free variable proxies. This now must be done at every super() invocation instead of just at the presumed first and only one, so the new code goes and finds all super() invocations. Otherwise the adjustments made are the same.

jvm/Code.java

This class requires a change because of the following problem: while the class Code.State is used to model the JVM state on each code branch, the "uninitialized" status of each local variable is not part of Code.State but rather stored in the LocalVar fields themselves (which are not cloned per code branch). Previously this was not a problem because the initial this() or super() invocation was always on the (only) initial branch of the code. Now that different branches of code may or may not initialize the superclass, we have to keep track of the "uninitialized" status of each LocalVar separately in each Code.State instance.

This is done by adding a bitmap indicating which local variables are initialized. As a result, to get the current type of a LocalVar, now you access the State instead of accessing the LocalVar directly.

jvm/Gen.java

Previously, the method Gen.normalizeMethod() added initialization code to initial constructors after the intial super() invocation. This is now done at every super() invocation instead of just after the presumed first and only one.

tree/TreeInfo.java

Removed these utility methods:

  1. public static Name getConstructorInvocationName(List trees, Names names)
  2. public static boolean isInitialConstructor(JCTree tree)

Added these utility methods:

  1. public static boolean hasConstructorCalls(JCTree tree, Name target)
  2. public static boolean hasAnyConstructorCalls(JCTree tree)
  3. public static List findConstructorCalls(JCTree tree, Name target)
  4. public static List findAllConstructorCalls(JCTree tree)
  5. public static void mapSuperCalls(JCBlock block, Function mapper)

resources/compiler.properties

There are some changes to error messages:

Removed these errors

  1. call to {0} must be first statement in constructor

Added these errors:

  1. calls to {0}() may only appear within constructors
  2. calls to {0}() may not appear within try statements
  3. superclass constructor might not have been invoked
  4. superclass constructor might already have been invoked

Changed these errors:

Old: canonical constructor must not contain explicit constructor invocation

New: canonical constructor must not contain explicit constructor invocations

Old: constructor is not canonical, so its first statement must invoke another constructor of class {0}

New: constructor is not canonical, so it must invoke other constructors of class {0}

Testing

Testing of compiler changes will be done using the existing unit tests, which are unchanged except for those tests that verify changed compiler behavior, plus new positive and negative test cases related to this new feature.

All JDK existing classes will be compiled using the previous and new versions of the compiler, and the bytecode compared, to verify there is no change to existing bytecode.

No platform-specific testing should be required.

Risks and Assumptions

An explicit goal of this work is to not change the behavior of existing programs. Therefore, other than any newly created bugs, the risk to existing software should be low.

From a technical point of view, the most complicated aspect of this change is proper DA/DU analysis of superclass initialization. It is believed that risk here is reduced by relying on the existing, well-tested code for blank final field DA/DU analysis.

It's possible that compiling and/or executing newly valid code could trigger bugs in existing code that were not previously accessible.

Dependencies

Java compiler changes - JDK-8194743

Read More