Optimistic Stack Allocation and Dynamic Heapification for Managed Runtimes
The runtimes of managed object-oriented languages such as Java allocate objects on the heap, and rely on automatic garbage collection (GC) techniques for freeing up unused objects. Most such runtimes also consist of just-in-time (JIT) compilers that optimize memory access and GC times by employing escape analysis: an object that does not escape (outlive) its allocating method can be allocated on (and freed up with) the stack frame of the corresponding method. However, in order to minimize the time spent in JIT compilation, the scope of such useful analyses is quite limited, thereby restricting their precision significantly. On the contrary, even though it is feasible to perform precise program analyses statically, it is not possible to use their results in a managed runtime without a closed-world assumption. In this paper, we propose a static+dynamic scheme that allows one to harness the results of a precise static escape analysis for allocating objects on stack, while taking care of both soundness and efficiency concerns in the runtime.
Our scheme comprises of three key ideas. First, using the results of a statically performed escape analysis, it performs optimistic stack allocation during JIT compilation. Second, it handles the challenges associated with features that may invalidate the optimism, using a novel idea of dynamic heapification. Third, it uses another novel notion of stack ordering, again supported by a static analysis, to reduce the overheads associated with the checks that determine the need for heapification. The static and the runtime components of our approach are implemented in the Soot optimization framework and in the tiered infrastructure of the Eclipse OpenJ9 VM, respectively. To evaluate the benefits, we compare our scheme with the existing escape analysis and find that it succeeds in allocating a much larger number of objects on the stack. Furthermore, the enhanced stack allocation leads to a significant reduction in the number of GC cycles and brings decent performance improvements, especially suited for constrained-memory environments.