SAP Logo
Rechenmaschiene von Philipp Matthäus Hahn (1739-1790) Quelle: Württembergisches Landesmuseum Stuttgart

HotSpot Internals:
Explore and Debug the VM at the OS Level

Götz Lindenmaier & Volker Simonis, SAP

Java - Terms and Definitions

A Java Runtime Environment (JRE) consists of:

A Java Development Kit (JDK) consists of:

As of today, there exist numerous Java VM implementations:

This talk is about the internals of the OpenJDK Java VM (aka HotSpot)

Building the HotSpot VM

The HotSpot VM is the VM in Oracle's commercial Java SE product and in the OpenJDK.

Get the party started

public class HelloWorld {
  public static void main(String args[]) {
    System.out.println("Hello World");
  }
}



> javac HelloWorld.java

> javap -c HelloWorld
public HelloWorld();
  Code:
   0:   aload_0
   1:   invokespecial #1; //Method java/lang/Object."<init>":()V
   4:   return
public static void main(java.lang.String[]);
  Code:
   0:   getstatic     #2; //Field java/lang/System.out:Ljava/io/PrintStream;
   3:   ldc           #3; //String Hello World
   5:   invokevirtual #4; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   8:   return

Let's count the bytecodes the VM executes:

> java -XX:+CountBytecodes HelloWorld
Hello World
2095542 bytecodes executed in 0.6s (3.246MHz)
[BytecodeCounter::counter_value = 2095542]
> java -XX:+CountBytecodes HelloWorld
Hello World
2095542 bytecodes executed in 0.6s (3.246MHz)
[BytecodeCounter::counter_value = 2095542]

Let's see which bytecodes the VM executes:

> java -XX:+PrintBytecodeHistogram HelloWorld
Hello World
Histogram of 2092212 executed bytecodes:

  absolute  relative  code    name
------------------------------------------
    158325     7.57%    df    fast_iload
    113765     5.44%    84    iinc
    105903     5.06%    2d    aload_3
     98983     4.73%    a7    goto
...
> java -XX:+PrintBytecodeHistogram HelloWorld
Hello World
Histogram of 2092212 executed bytecodes:

  absolute  relative  code    name
------------------------------------------
    158325     7.57%    df    fast_iload
    113765     5.44%    84    iinc
    105903     5.06%    2d    aload_3
     98983     4.73%    a7    goto
...
> java -XX:+TraceBytecodes HelloWorld

[11931] static void java.lang.Object.<clinit>()
[11931]        1     0  invokestatic 72 <java/lang/Object.registerNatives()V> 
[11931]        2     3  return

[11931] static void java.lang.String.<clinit>()
[11931]        3     0  iconst_0
[11931]        4     1  anewarray java/io/ObjectStreamField 
[11931]        5     4  putstatic 425 <java/lang/String.serialPersistentFields/[Ljava/io/ObjectStreamField;> 
...
> java -XX:+TraceBytecodes HelloWorld

[11931] static void java.lang.Object.<clinit>()
[11931]        1     0  invokestatic 72 <java/lang/Object.registerNatives()V> 
[11931]        2     3  return

[11931] static void java.lang.String.<clinit>()
[11931]        3     0  iconst_0
[11931]        4     1  anewarray java/io/ObjectStreamField 
[11931]        5     4  putstatic 425 <java/lang/String.serialPersistentFields/[Ljava/io/ObjectStreamField;> 
...

Bytecode to Assembler - TemplateInterpreter

The HotSpot comes with two interpreters - the Template and the CPP Interpreter

 
 
 
 
 
 
 
 
 
 

TemplateInterpreter - Demo

Grab the address of the iadd codelet:

gdb java
(gdb) run -XX:+PrintInterpreter HelloWorld | grep iadd
Starting program: /share/OpenJDK/jdk1.7.0_hsx/bin/java -XX:+PrintInterpreter HelloWorld | grep iadd
iadd  96 iadd  [0x00007fffeedeb400, 0x00007fffeedeb440]  64 bytes
During startup program exited normally.

Now stop after the Template Interpreter was generated and set a breakpoint in the iadd codelet

(gdb) b init_globals()
(gdb) run HelloWorld
...
Breakpoint 1, init_globals () at /share/OpenJDK/hsx/hotspot/src/share/vm/runtime/init.cpp:92
(gdb) fin
Run till exit from #0  init_globals () at /share/OpenJDK/hsx/hotspot/src/share/vm/runtime/init.cpp:92
...
(gdb) b *0x00007fffeedeb400
(gdb) c

Verify that this is the same assembler code like in the -XX:+PrintInterpreter example

Breakpoint 2, 0x00007fffeedeb400 in ?? ()
(gdb) x /12i $pc
=> 0x7fffeedeb400:	mov    (%rsp),%eax
   0x7fffeedeb403:	add    $0x8,%rsp
   0x7fffeedeb407:	mov    (%rsp),%edx
   0x7fffeedeb40a:	add    $0x8,%rsp
   0x7fffeedeb40e:	add    %edx,%eax
   0x7fffeedeb410:	movzbl 0x1(%r13),%ebx
   0x7fffeedeb415:	inc    %r13
   0x7fffeedeb418:	movabs $0x7ffff73fb8e0,%r10
   0x7fffeedeb422:	jmpq   *(%r10,%rbx,8)
   0x7fffeedeb426:	nop
   0x7fffeedeb427:	nop
   0x7fffeedeb428:	int3   

But notice that gdb has no clue of where we are

(gdb) where
#0  0x00007fffeedeb400 in ?? ()
#1  0x0000000000000000 in ?? ()

So lets compile a new VM with a new, small helper function in src/share/vm/utilities/debug.cpp ..

extern "C" void mixed_ps(intptr_t *sp, intptr_t *fp, unsigned char *pc) {
  // don't allocate large buffer on stack
  static char buf[O_BUFLEN];

  frame fr = (pc == NULL) ? os::current_frame() : frame(sp, fp, pc);

  // see if it's a valid frame
  if (fr.pc()) {
    tty->print_cr("Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)");

    int count = 0;

    while (count++ < StackPrintLimit) {
      fr.print_on_error(tty, buf, sizeof (buf));
      tty->cr();
      if (os::is_first_C_frame(&fr)) break;
      fr = os::get_sender_for_C_frame(&fr);
    }

    if (count > StackPrintLimit) {
      tty->print_cr("......");
    }

    tty->cr();
  }
}

..which can help us a little..

> patch -p1 -d .. < ~/public_html/hotspot/Jax2012/examples/mixed_ps.patch
patching file src/share/vm/utilities/debug.cpp
> ALT_BOOTDIR=/share/software/Java/jdk1.7.0/ ALT_OUTPUTDIR=/share/OpenJDK/output_amd64_jvmg ARCH_DATA_MODEL=64 make jvmg
...
make[2]: Leaving directory `/share/OpenJDK/output_amd64_jvmg'
make[1]: Leaving directory `/share/OpenJDK/hsx/hotspot/make'
> cp /share/OpenJDK/output_amd64_jvmg/linux_amd64_compiler2/jvmg/libjvm.so /share/OpenJDK/jdk1.7.0_hsx/jre/lib/amd64/server/libjvm.so
> cp /share/OpenJDK/output_amd64_jvmg/linux_amd64_compiler2/jvmg/libjvm.debuginfo /share/OpenJDK/jdk1.7.0_hsx/jre/lib/amd64/server/libjvm.debuginfo
> /share/OpenJDK/jdk1.7.0_hsx/bin/java -version
java version "1.8.0-ea"
Java(TM) SE Runtime Environment (build 1.8.0-ea-b106)
OpenJDK 64-Bit Server VM (build 25.0-b51-internal-debug, mixed mode)

and try again

(gdb) disable 2
(gdb) run HelloWorld
...
Breakpoint 1, init_globals () at /share/OpenJDK/hsx/hotspot/src/share/vm/runtime/init.cpp:92
(gdb) fin
Run till exit from #0  init_globals () at /share/OpenJDK/hsx/hotspot/src/share/vm/runtime/init.cpp:92
...
(gdb) enable 2
(gdb) continue
...
Breakpoint 2, 0x00007fffeedeb400 in ?? ()
(gdb) where
#0  0x00007fffeedeb400 in ?? ()
#1  0x0000000000000000 in ?? ()
(gdb) call mixed_ps($sp, $rbp, $pc)
"Executing mixed_ps"
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
j  java.lang.String.charAt(I)C+27
j  java.security.BasicPermission.init(Ljava/lang/String;)V+37
j  java.security.BasicPermission.(Ljava/lang/String;)V+7
j  java.lang.RuntimePermission.(Ljava/lang/String;)V+2
j  java.lang.Thread.()V+16
v  ~StubRoutines::call_stub
V  [libjvm.so+0x7305d8]  JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x51a
V  [libjvm.so+0x9749ee]  os::os_exception_wrapper(void (*)(JavaValue*, methodHandle*, JavaCallArguments*, Thread*), JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x3a
V  [libjvm.so+0x7300b7]  JavaCalls::call(JavaValue*, methodHandle, JavaCallArguments*, Thread*)+0x7d
V  [libjvm.so+0x6cae31]  instanceKlass::call_class_initializer_impl(instanceKlassHandle, Thread*)+0x199
V  [libjvm.so+0x6cac29]  instanceKlass::call_class_initializer(Thread*)+0x45
V  [libjvm.so+0x6c9ab9]  instanceKlass::initialize_impl(instanceKlassHandle, Thread*)+0x57d
V  [libjvm.so+0x6c8920]  instanceKlass::initialize(Thread*)+0x74
V  [libjvm.so+0xac5adf]  initialize_class(Symbol*, Thread*)+0x5b
V  [libjvm.so+0xacd2ba]  Threads::create_vm(JavaVMInitArgs*, bool*)+0xa08
V  [libjvm.so+0x76fe83]  JNI_CreateJavaVM+0xb0
C  [libjli.so+0x31b7]  JavaMain+0x97

You can now verify that the stack is the same like for the first occurence of iadd we see with -XX:+TraceBytecodes (actually it isn't, it's the stack at the second occurence - see below why)

The Template Interpreter maintains a Top Of Stack Cache (TOSCA) in a register. Some bytecodes leave their result in this register to avoid writing it to the stack and other bytecodes can potentially consume the top of stack value directly from that register without reading it from the stack. Therefore there's actually not just one active active template dispatch table, but a an array of dispatch tables, one for each type of TOSCA output state.

The template codelets have different entry points, based on the TOSCA state of the previous bytecode. It is therefore not enough to put a breakpoint only on the first instruction of a bytecode. If we want to stop at every occurence of a bytecode, we have to put breakpoints at every entry point.

(gdb) p /d 'Bytecodes::_iadd'
$23 = 96
((gdb) p 'TemplateTable::_template_table'['Bytecodes::_iadd']
$24 = {_flags = 0, _tos_in = itos, _tos_out = itos, _gen = 0x7ffff6b90f7c , _arg = 0}
(gdb) p 'TemplateTable::_template_table'['Bytecodes::_getfield']
$25 = {_flags = 5, _tos_in = vtos, _tos_out = vtos, _gen = 0x7ffff6b98404 , _arg = 1}
(gdb) p 'TemplateTable::_template_table'['Bytecodes::_iconst_1']
$26 = {_flags = 0, _tos_in = vtos, _tos_out = itos, _gen = 0x7ffff6b8b8f8 , _arg = 1}

With our iadd example, it happend that the very first occurence of an iadd bytecode happened after a iconst_1 bytecode which has a TOSCA out state of itos (i.e. it places its result right into the TOSCA register). So after the iconst_1 codelet, the interpreter dispatches right to the second entry point of the iadd codelet bacause it only has to load the second paramter from the stack (and we didn't stop at our breakpoint!).

Let's see what GDB thinks about the address we want to jump at the end of the codelet:

(gdb) x /12i $pc
=> 0x7fffeedeb400:	mov    (%rsp),%eax
   0x7fffeedeb403:	add    $0x8,%rsp
   0x7fffeedeb407:	mov    (%rsp),%edx
   0x7fffeedeb40a:	add    $0x8,%rsp
   0x7fffeedeb40e:	add    %edx,%eax
   0x7fffeedeb410:	movzbl 0x1(%r13),%ebx
   0x7fffeedeb415:	inc    %r13
   0x7fffeedeb418:	movabs $0x7ffff73fb8e0,%r10
   0x7fffeedeb422:	jmpq   *(%r10,%rbx,8)
   0x7fffeedeb426:	nop
   0x7fffeedeb427:	nop
   0x7fffeedeb428:	int3   
(gdb) info symbol(0x7ffff73fb8e0)
TemplateInterpreter::_active_table + 6144 in section .bss of /share/OpenJDK/jdk1.7.0_hsx/jre/lib/amd64/server/libjvm.so

It's actually computed from the active Bytecode Table!

And how is it computed (i.e. what is $r13)?

(gdb) call find($r13)
"Executing find"
0x00000000bce065fe is an oop
{constMethod} 
 - klass: {other class}
 - method:       0x00000000bce06608 {method} 'charAt' '(I)C' in 'java/lang/String'
 - exceptions:   0x00000000bce01c10
bci_from(0xbce065fe) = 30; print_codes():
0 iload_1
1 iflt 12
4 iload_1
5 aload_0
6 fast_igetfield 395  
9 if_icmplt 21
12 new 234 
15 dup
16 iload_1
17 invokespecial 476 (I)V> 
20 athrow
21 aload_0
22 fast_agetfield 398  
25 iload_1
26 aload_0
27 fast_igetfield 397  
30 iadd
31 caload
32 ireturn

(gdb) x /2b $r13
0xbce065fe:	0x60	0x34

So $r13 is actually the Bytecode Pointer (BCP). It points to 0x60 which is the bytecode for iadd. The next byte is 0x34 which is the bytecode for caload.

We can now easily compute the address of the next codelet manually and verify that's the right one (the one for caload)

(gdb) p ((void**)0x7ffff73fb8e0)[0x34]
$6 = (void *) 0x7fffeede9cc7
(gdb) call find(0x7fffeede9cc7)
"Executing find"
0x00007fffeede9cc7 is an Interpreter codelet
caload  52 caload  [0x00007fffeede9cc0, 0x00007fffeede9d00]  64 bytes

There are other/better ways to stop at a specific bytecode: -XX:StopInterpreterAt=174

(gdb) b breakpoint
(gdb) run -XX:StopInterpreterAt=174 HelloWorld
(gdb) where
#0  breakpoint () at /share/OpenJDK/hsx/hotspot/src/os/linux/vm/os_linux.cpp:518
#1  0x00007ffff6a4a147 in os::breakpoint () at /share/OpenJDK/hsx/hotspot/src/os/linux/vm/os_linux.cpp:513
#2  0x00007fffeedebf02 in ?? ()
#3  0x000000000000001f in ?? ()
#4  0x00000000eb564790 in ?? ()
#5  0x00007ffff7fe33c8 in ?? ()
...
(gdb) call mixed_ps($sp, $rbp, $pc)
"Executing mixed_ps"
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x96c14d]  breakpoint+0x4
V  [libjvm.so+0x96c147]  os::breakpoint()+0x9
j  java.lang.String.charAt(I)C+27
j  java.security.BasicPermission.init(Ljava/lang/String;)V+37
j  java.security.BasicPermission.(Ljava/lang/String;)V+7
j  java.lang.RuntimePermission.(Ljava/lang/String;)V+2
j  java.lang.Thread.()V+16
v  ~StubRoutines::call_stub
V  [libjvm.so+0x7305d8]  JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x51a
V  [libjvm.so+0x9749ee]  os::os_exception_wrapper(void (*)(JavaValue*, methodHandle*, JavaCallArguments*, Thread*), JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x3a
V  [libjvm.so+0x7300b7]  JavaCalls::call(JavaValue*, methodHandle, JavaCallArguments*, Thread*)+0x7d
V  [libjvm.so+0x6cae31]  instanceKlass::call_class_initializer_impl(instanceKlassHandle, Thread*)+0x199
V  [libjvm.so+0x6cac29]  instanceKlass::call_class_initializer(Thread*)+0x45
V  [libjvm.so+0x6c9ab9]  instanceKlass::initialize_impl(instanceKlassHandle, Thread*)+0x57d
V  [libjvm.so+0x6c8920]  instanceKlass::initialize(Thread*)+0x74
V  [libjvm.so+0xac5adf]  initialize_class(Symbol*, Thread*)+0x5b
V  [libjvm.so+0xacd2ba]  Threads::create_vm(JavaVMInitArgs*, bool*)+0xa08
V  [libjvm.so+0x76fe83]  JNI_CreateJavaVM+0xb0
C  [libjli.so+0x31b7]  JavaMain+0x97

(gdb) up
(gdb) up
(gdb) call find($pc)
"Executing find"
0x00007fffeedebf02 is an Interpreter codelet
iadd  96 iadd  [0x00007fffeedebee0, 0x00007fffeedebf40]  96 bytes

(gdb) x /16i 0x00007fffeedebee0
   0x7fffeedebee0:	mov    (%rsp),%eax
   0x7fffeedebee3:	add    $0x8,%rsp
   0x7fffeedebee7:	incl   0x85a9713(%rip)        # 0x7ffff7395600 <_ZN15BytecodeCounter14_counter_valueE>
   0x7fffeedebeed:	cmpl   $0xae,0x85a9709(%rip)        # 0x7ffff7395600 <_ZN15BytecodeCounter14_counter_valueE>
   0x7fffeedebef7:	jne    0x7fffeedebf02
   0x7fffeedebefd:	callq  0x7ffff6a4a13e <_ZN2os10breakpointEv>
=> 0x7fffeedebf02:	mov    (%rsp),%edx
   0x7fffeedebf05:	add    $0x8,%rsp
   0x7fffeedebf09:	add    %edx,%eax
   0x7fffeedebf0b:	movzbl 0x1(%r13),%ebx
   0x7fffeedebf10:	inc    %r13
   0x7fffeedebf13:	movabs $0x7ffff73fb8e0,%r10
   0x7fffeedebf1d:	jmpq   *(%r10,%rbx,8)
   0x7fffeedebf21:	nop
   0x7fffeedebf22:	nop
   0x7fffeedebf23:	nop
(gdb) info symbol 0x7ffff7395600
BytecodeCounter::_counter_value in section .bss of /share/OpenJDK/jdk1.7.0_hsx/jre/lib/amd64/server/libjvm.so

But as you can see, the iadd interpreter codelet now looks different now, because the interpreter had to generate extra code for the -XX:StopInterpreterAt feature.

Bytecode to Assembler - JIT Compiler

The HotSpot comes with two interpreters - the C1 client and the C2 server compiler


The C2 server compiler is a high-end fully optimising compiler:


Notice that you can usually not debug/profile fully optimized C2 code with Java debuggers/profilers

JIT Compiler Demo - Part 1

public class IntLoop {

  static void loop(int count) {
    for (int i = 0; i < count; i++)
      for (int j = 0; j < 1000000; j++);
  }

  public static void main(String[] args) {

    for (long i = 0; i < 100; i++)
      loop(2);
    System.out.println("Warmup done");
    
    long start  = System.currentTimeMillis();
    loop(Integer.parseInt(args[0]));
    long end = System.currentTimeMillis();

    System.out.println(end - start + "ms");
  }
}
$ java -Xint IntLoop 10
Warmup done
112ms
$ java -Xint IntLoop 100
Warmup done
1111ms
$ java -Xint IntLoop 1000
Warmup done
11202ms

For the interprter we get nice linear scaling - no surprise!

Now lets try with JIT:

$ export MY_OPTS="-XX:-TieredCompilation -XX:-UseOnStackReplacement -XX:CICompilerCount=1 -XX:LoopUnrollLimit=0"
$ java -XX:+PrintCompilation $MY_OPTS IntLoop 10
    583    1             IntLoop::loop (28 bytes)
Warmup done
0ms
$ java -XX:+PrintCompilation $MY_OPTS IntLoop 100000000
    598    1             IntLoop::loop (28 bytes)
Warmup done
0ms

That's really impressing - couldn't have been better!

Now lets change the example slightly..

public class LongLoop {

  static void loop(int count) {
    for (long i = 0; i < count; i++)
      for (long j = 0; j < 1000000; j++);
  }

  public static void main(String[] args) {

    for (long i = 0; i < 100; i++)
      loop(2);
    System.out.println("Warmup done");
    
    long start  = System.currentTimeMillis();
    loop(Integer.parseInt(args[0]));
    long end = System.currentTimeMillis();

    System.out.println(end - start + "ms");
  }
}
$ java -Xint LongLoop 10
Warmup done
155ms
$ java -Xint LongLoop 100
Warmup done
1535ms
$ java -Xint LongLoop 1000
Warmup done
15345ms

Again no surprise for the interpreted version. It needs a little longer than the int version, but it's still linear. So let's try the JITed version:

$ java -XX:+PrintCompilation $MY_OPTS LongLoop 10
    583    1             LongLoop::loop (34 bytes)
Warmup done
3ms
$ java -XX:+PrintCompilation $MY_OPTS LongLoop 100
    635    1             LongLoop::loop (34 bytes)
Warmup done
35ms
$ java -XX:+PrintCompilation $MY_OPTS LongLoop 1000
    620    1             LongLoop::loop (34 bytes)
Warmup done
330ms
$ java -XX:+PrintCompilation $MY_OPTS LongLoop 10000
    590    1             LongLoop::loop (34 bytes)
Warmup done
3235ms

That's a little suprising. The JITed version is considerably faster than the interpreted one, but it bocomes linear with regard to the input parmater (whereas the int version has been constant!)

Let's have a look at the generated code:

$ java -XX:+PrintAssembly -XX:+PrintCompilation $MY_OPTS IntLoop 10
    622    1             IntLoop::loop (28 bytes)
Loaded disassembler from /share/OpenJDK/jdk1.7.0_hsx/jre/lib/amd64/hsdis-amd64.so
Decoding compiled method 0x00007f8c47b4af90:
Code:
[Disassembling for mach='i386:x86-64']
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} 'loop' '(I)V' in 'IntLoop'
  # parm0:    rsi       = int
  #           [sp+0x20]  (sp of caller)
  ;; N1: #	B1 <- B1  Freq: 1

  ;; B1: #	N1 <- BLOCK HEAD IS JUNK   Freq: 1

  0x00007f8c47b4b0c0: sub    $0x18,%rsp
  0x00007f8c47b4b0c7: mov    %rbp,0x10(%rsp)    ;*synchronization entry
                                                ; - IntLoop::loop@-1 (line 4)
  0x00007f8c47b4b0cc: add    $0x10,%rsp
  0x00007f8c47b4b0d0: pop    %rbp
  0x00007f8c47b4b0d1: test   %eax,0x6140f29(%rip)        # 0x00007f8c4dc8c000
                                                ;   {poll_return}
  0x00007f8c47b4b0d7: retq   
  ...

So that's a lot mor code for only the fact that we've changed an integer into a long!

$ java -XX:+PrintAssembly -XX:+PrintCompilation $MY_OPTS LongLoop 10
    638    1             LongLoop::loop (34 bytes)
Loaded disassembler from /share/OpenJDK/jdk1.7.0_hsx/jre/lib/amd64/hsdis-amd64.so
Decoding compiled method 0x00007f9c80264e50:
Code:
[Disassembling for mach='i386:x86-64']
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} 'loop' '(I)V' in 'LongLoop'
  # parm0:    rsi       = int
  #           [sp+0x20]  (sp of caller)
  ;; N1: #	B1 <- B3  Freq: 1

  ;; B1: #	B3 B2 <- BLOCK HEAD IS JUNK   Freq: 1

  0x00007f9c80264f80: sub    $0x18,%rsp
  0x00007f9c80264f87: mov    %rbp,0x10(%rsp)    ;*synchronization entry
                                                ; - LongLoop::loop@-1 (line 4)
  0x00007f9c80264f8c: movslq %esi,%r10          ;*i2l  ; - LongLoop::loop@4 (line 4)
  0x00007f9c80264f8f: test   %r10,%r10
  0x00007f9c80264f92: jle    0x00007f9c80264f99  ;*ifge
                                                ; - LongLoop::loop@6 (line 4)
  ;; B2: #	B7 <- B1  Freq: 0.5

  0x00007f9c80264f94: xor    %r11d,%r11d
  0x00007f9c80264f97: jmp    0x00007f9c80264fd2  ;*return
                                                ; - LongLoop::loop@33 (line 6)
  ;; B3: #	N1 <- B6 B1  Freq: 1

  0x00007f9c80264f99: add    $0x10,%rsp
  0x00007f9c80264f9d: pop    %rbp
  0x00007f9c80264f9e: test   %eax,0x614105c(%rip)        # 0x00007f9c863a6000
                                                ;   {poll_return}
  0x00007f9c80264fa4: retq   
  ...
  0x00007f9c80264faf: nop
  ;; B4: #	B5 <- B5  top-of-loop Freq: 4.93447e+06

  0x00007f9c80264fb0: add    $0x1,%r8           ; OopMap{off=52}
                                                ;*goto
                                                ; - LongLoop::loop@23 (line 5)
  ;; B5: #	B4 B6 <- B7 B4 	Loop: B5-B4 inner  Freq: 4.93447e+06

  0x00007f9c80264fb4: test   %eax,0x6141046(%rip)        # 0x00007f9c863a6000
                                                ;*goto
                                                ; - LongLoop::loop@23 (line 5)
                                                ;   {poll}
  0x00007f9c80264fba: cmp    $0xf4240,%r8
  0x00007f9c80264fc1: jl     0x00007f9c80264fb0  ;*ifge
                                                ; - LongLoop::loop@16 (line 5)
  ;; B6: #	B3 B7 <- B5  Freq: 5

  0x00007f9c80264fc3: add    $0x1,%r11          ; OopMap{off=71}
                                                ;*goto
                                                ; - LongLoop::loop@30 (line 4)
  0x00007f9c80264fc7: test   %eax,0x6141033(%rip)        # 0x00007f9c863a6000
                                                ;*goto
                                                ; - LongLoop::loop@30 (line 4)
                                                ;   {poll}
  0x00007f9c80264fcd: cmp    %r10,%r11
  0x00007f9c80264fd0: jge    0x00007f9c80264f99  ;*lconst_0
                                                ; - LongLoop::loop@9 (line 5)
  ;; B7: #	B5 <- B2 B6 	Loop: B7-B6  Freq: 5

  0x00007f9c80264fd2: mov    $0x1,%r8d
  0x00007f9c80264fd8: jmp    0x00007f9c80264fb4
  0x00007f9c80264fda: hlt    
  ...

JIT Compiler Demo - Part 2

In the second part, we start a concurrent thread with triggers GC every second while we are looping in out main thread. Let's start this time with the long version:

class LongLoopWithGC {

  static long tmp;

  static void loop(int count) {
    for (long i = 1; i < count; i++)
      for (long j = 1; j < 1000000; j++)
        tmp++;
  }

  public static void main(String[] args) {

    for (long i = 0; i < 100; i++)
      loop(2);
    System.out.println("Warmup done");
    
    new Thread() {
      public void run() {
        while(true) {
          try { Thread.sleep(1000); } catch (InterruptedException e) {}
          System.gc();
        }
      }
    }.start();

    long start  = System.currentTimeMillis();
    loop(Integer.parseInt(args[0]));
    long end = System.currentTimeMillis();

    System.out.println(end - start + "ms");
    System.exit(0);
  }
}
$ java -XX:+PrintCompilation -verbose:gc $MY_OPTS LongLoopWithGC 100
    567    1             LongLoopWithGC::loop (42 bytes)
Warmup done
68ms
$ java -XX:+PrintCompilation -verbose:gc $MY_OPTS LongLoopWithGC 1000
    568    1             LongLoopWithGC::loop (42 bytes)
Warmup done
646ms
$ java -XX:+PrintCompilation -verbose:gc $MY_OPTS LongLoopWithGC 10000
    560    1             LongLoopWithGC::loop (42 bytes)
Warmup done
[GC 317K->304K(60800K), 0.0112460 secs]
[Full GC 304K->217K(60800K), 0.0799480 secs]
[GC 852K->281K(60800K), 0.0007890 secs]
[Full GC 281K->217K(60800K), 0.0562010 secs]
[GC 217K->217K(60800K), 0.0008090 secs]
[Full GC 217K->217K(60800K), 0.0570710 secs]
[GC 217K->217K(60800K), 0.0007410 secs]
[Full GC 217K->217K(60800K), 0.0541550 secs]
[GC 217K->217K(60800K), 0.0006150 secs]
[Full GC 217K->217K(60800K), 0.0482390 secs]
[GC 217K->217K(60800K), 0.0006570 secs]
[Full GC 217K->217K(60800K), 0.0527470 secs]
7014ms

No surprise until now - everything looks as expected! Now lets change the example in the same way as before (i.e. replace the long iterators by integer ones). Should we see any differences?

...
  static void loop(int count) {
    for (int i = 1; i < count; i++)
      for (int j = 1; j < 1000000; j++)
        tmp++;
  }
  ...
$ java -XX:+PrintCompilation -verbose:gc $MY_OPTS IntLoopWithGC 1000
    543    1             IntLoopWithGC::loop (36 bytes)
Warmup done
341ms
$ java -XX:+PrintCompilation -verbose:gc $MY_OPTS IntLoopWithGC 10000
    537    1             IntLoopWithGC::loop (36 bytes)
Warmup done
[GC 317K->272K(60800K), 0.0109640 secs]
[Full GC 272K->217K(60800K), 0.0659020 secs]
3809ms
$ java -XX:+PrintCompilation -verbose:gc $MY_OPTS IntLoopWithGC 100000
    560    1             IntLoopWithGC::loop (36 bytes)
Warmup done
[GC 317K->320K(60800K), 0.0112450 secs]
[Full GC 320K->217K(60800K), 0.0708950 secs]
37818ms

Very strange! No difference how long our main thread is running, we always only see a single GC!

But why? Let's try to attach with a Java debugger to see what happens..

$ java -XX:+PrintCompilation -verbose:gc -agentlib:jdwp=transport=dt_socket,address=8000,server=y,suspend=n $MY_OPTS IntLoopWithGC 100000 &
$ jdb -attach 8000
java.io.IOException
	at com.sun.tools.jdi.VirtualMachineManagerImpl.createVirtualMachine(VirtualMachineManagerImpl.java:234)
	at com.sun.tools.jdi.VirtualMachineManagerImpl.createVirtualMachine(VirtualMachineManagerImpl.java:241)
	at com.sun.tools.jdi.GenericAttachingConnector.attach(GenericAttachingConnector.java:117)
	at com.sun.tools.jdi.SocketAttachingConnector.attach(SocketAttachingConnector.java:90)
	at com.sun.tools.example.debug.tty.VMConnection.attachTarget(VMConnection.java:347)
	at com.sun.tools.example.debug.tty.VMConnection.open(VMConnection.java:156)
	at com.sun.tools.example.debug.tty.Env.init(Env.java:54)
	at com.sun.tools.example.debug.tty.TTY.main(TTY.java:1057)

Fatal error:
Unable to attach to target VM.

It doesn't work! We can not attach to 'IntLoopWithGC'. But it works with 'LongLoopWithGC':

$ java -XX:+PrintCompilation -verbose:gc -agentlib:jdwp=transport=dt_socket,address=8000,server=y,suspend=n $MY_OPTS LongLoopWithGC 100000 &
$ jdb -attach 8000
Set uncaught java.lang.Throwable
Set deferred uncaught java.lang.Throwable
Initializing jdb ...
> threads
Group system:
  (java.lang.ref.Reference$ReferenceHandler)0x16e Reference Handler cond. waiting
  (java.lang.ref.Finalizer$FinalizerThread)0x16f  Finalizer         cond. waiting
  (java.lang.Thread)0x170                         Signal Dispatcher running
Group main:
  (java.lang.Thread)0x1                           main              running
  (LongLoopWithGC$1)0x172                         Thread-0          sleeping
> suspend 1
> where 1
  [1] LongLoopWithGC.loop (LongLoopWithGC.java:7)
  [2] LongLoopWithGC.main (LongLoopWithGC.java:27)

Ok, let's see what we can find out with GDB:

$ gdb java
(gdb) run -verbose:gc -XX:+PrintAssembly -XX:+PrintCompilation $MY_OPTS IntLoopWithGC 100000
Disassembling for mach='i386:x86-64']
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} 'loop' '(I)V' in 'IntLoopWithGC'
  # parm0:    rsi       = int
  #           [sp+0x20]  (sp of caller)
  ;; N1: #	B1 <- B6  Freq: 1

  ;; B1: #	B6 B2 <- BLOCK HEAD IS JUNK   Freq: 1

  0x00007ffff1eb6fc0: sub    $0x18,%rsp
  0x00007ffff1eb6fc7: mov    %rbp,0x10(%rsp)    ;*synchronization entry
                                                ; - IntLoopWithGC::loop@-1 (line 6)
  0x00007ffff1eb6fcc: cmp    $0x1,%esi
  0x00007ffff1eb6fcf: jle    0x00007ffff1eb700c  ;*if_icmpge
                                                ; - IntLoopWithGC::loop@4 (line 6)
  ;; B2: #	B3 <- B1  Freq: 0.5

  0x00007ffff1eb6fd1: movabs $0xeb5aa680,%r10   ;   {oop(a 'java/lang/Class' = 'IntLoopWithGC')}
  0x00007ffff1eb6fdb: mov    0x70(%r10),%r11
  0x00007ffff1eb6fdf: mov    $0x1,%r9d
  ;; B3: #	B4 <- B2 B5 	Loop: B3-B5  Freq: 5

  0x00007ffff1eb6fe5: mov    $0x1,%r8d
  ...
  0x00007ffff1eb6fef: nop                       ;*getstatic tmp
                                                ; - IntLoopWithGC::loop@15 (line 8)
  ;; B4: #	B4 B5 <- B3 B4 	Loop: B4-B4 inner  Freq: 4.93447e+06

  0x00007ffff1eb6ff0: inc    %r8d               ;*iinc
                                                ; - IntLoopWithGC::loop@23 (line 7)
  0x00007ffff1eb6ff3: add    $0x1,%r11          ;*ladd
                                                ; - IntLoopWithGC::loop@19 (line 8)
  0x00007ffff1eb6ff7: mov    %r11,0x70(%r10)    ;*putstatic tmp
                                                ; - IntLoopWithGC::loop@20 (line 8)
  0x00007ffff1eb6ffb: cmp    $0xf4240,%r8d
  0x00007ffff1eb7002: jl     0x00007ffff1eb6ff0  ;*if_icmpge
                                                ; - IntLoopWithGC::loop@12 (line 7)
  ;; B5: #	B3 B6 <- B4  Freq: 5

  0x00007ffff1eb7004: inc    %r9d               ;*iinc
                                                ; - IntLoopWithGC::loop@29 (line 6)
  0x00007ffff1eb7007: cmp    %esi,%r9d
  0x00007ffff1eb700a: jl     0x00007ffff1eb6fe5  ;*return
                                                ; - IntLoopWithGC::loop@35 (line 9)
  ;; B6: #	N1 <- B5 B1  Freq: 1

  0x00007ffff1eb700c: add    $0x10,%rsp
  0x00007ffff1eb7010: pop    %rbp
  0x00007ffff1eb7011: test   %eax,0x613ffe9(%rip)        # 0x00007ffff7ff7000
                                                ;   {poll_return}
  0x00007ffff1eb7017: retq                      ;*getstatic tmp
                                                ; - IntLoopWithGC::loop@15 (line 8)
  0x00007ffff1eb7018: hlt    
  ...
^C
Program received signal SIGINT, Interrupt.
0x00007ffff7bc803d in pthread_join () from /lib/libpthread.so.0
(gdb) info threads
  12 Thread 0x7fffecfb0710 (LWP 7255)  0x00007ffff7bcb85c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
  11 Thread 0x7fffed44a710 (LWP 7254)  0x00007ffff7bcbbc9 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
  10 Thread 0x7fffed54b710 (LWP 7253)  0x00007ffff7bcb85c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
  9 Thread 0x7fffed64c710 (LWP 7252)  0x00007ffff7bcb85c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
  8 Thread 0x7fffed74d710 (LWP 7251)  0x00007ffff7bcdb50 in sem_wait () from /lib/libpthread.so.0
  7 Thread 0x7fffed9ab710 (LWP 7250)  0x00007ffff7bcb85c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
  6 Thread 0x7fffedaac710 (LWP 7249)  0x00007ffff7bcb85c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
  5 Thread 0x7fffedbad710 (LWP 7248)  0x00007ffff74d9437 in sched_yield () from /lib/libc.so.6
  4 Thread 0x7ffff1932710 (LWP 7247)  0x00007ffff7bcb85c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
  3 Thread 0x7ffff1a33710 (LWP 7246)  0x00007ffff7bcb85c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
  2 Thread 0x7ffff7fe4710 (LWP 7245)  0x00007ffff1eb6ffb in ?? ()
* 1 Thread 0x7ffff7fe6700 (LWP 7244)  0x00007ffff7bc803d in pthread_join () from /lib/libpthread.so.0
(gdb) thread 2
[Switching to thread 2 (Thread 0x7ffff7fe4710 (LWP 7245))]#0  0x00007ffff1eb6ffb in ?? ()
(gdb) where
#0  0x00007ffff1eb6ffb in ?? ()
#1  0x0000000000000000 in ?? ()
(gdb) call mixed_ps($sp, $rbp, $pc)
"Executing mixed_ps"
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J  IntLoopWithGC.loop(I)V
v  ~StubRoutines::call_stub
V  [libjvm.so+0x7305d8]  JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x51a
V  [libjvm.so+0x9749ee]  os::os_exception_wrapper(void (*)(JavaValue*, methodHandle*, JavaCallArguments*, Thread*), JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x3a
V  [libjvm.so+0x7300b7]  JavaCalls::call(JavaValue*, methodHandle, JavaCallArguments*, Thread*)+0x7d
V  [libjvm.so+0x7431dd]  jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)+0x186
V  [libjvm.so+0x758ea1]  jni_CallStaticVoidMethod+0x37b
C  [libjli.so+0x39c3]  JavaMain+0x8a3
(gdb)  continue
Continuing.

Everything looks normal, but if we continue, we get a segmentation fault. Should we be scared about that?

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff1eb7011 in ?? ()
(gdb)  call mixed_ps($sp, $rbp, $pc)

"Executing mixed_ps"
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J  IntLoopWithGC.loop(I)V
v  ~StubRoutines::call_stub
V  [libjvm.so+0x7305d8]  JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x51a
V  [libjvm.so+0x9749ee]  os::os_exception_wrapper(void (*)(JavaValue*, methodHandle*, JavaCallArguments*, Thread*), JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x3a
V  [libjvm.so+0x7300b7]  JavaCalls::call(JavaValue*, methodHandle, JavaCallArguments*, Thread*)+0x7d
V  [libjvm.so+0x7431dd]  jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)+0x186
V  [libjvm.so+0x758ea1]  jni_CallStaticVoidMethod+0x37b
C  [libjli.so+0x39c3]  JavaMain+0x8a3

(gdb)  x /1i $pc
=> 0x7ffff1eb7011:	test   %eax,0x613ffe9(%rip)        # 0x7ffff7ff7000
(gdb)  continue
Continuing.
[GC 317K->320K(60800K), 0.0438280 secs]
[Full GC 320K->217K(60800K), 0.0690220 secs]
8060ms

Program exited normally.

It doesn't seems to be critical! If we continue, the program terminates normally.

If we try the same with LongLoopWithGC, we will see, that we get even more segmentation faults (actually every time before a GC happens).

(gdb) run -verbose:gc -XX:+PrintOptoAssembly -XX:+PrintCompilation $MY_OPTS LongLoopWithGC 10000
...
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7fe4710 (LWP 7302)]
0x00007ffff1eb6f4c in ?? ()
(gdb) continue
Continuing.
[GC 317K->288K(60800K), 0.0248930 secs]
[Full GC 288K->217K(60800K), 0.0708540 secs]

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff1eb6f4c in ?? ()
(gdb) continue
Continuing.
[GC 852K->281K(60800K), 0.0027870 secs]
[Full GC 281K->217K(60800K), 0.0575250 secs]
...

Notice that this time we used -XX:+PrintOptoAssembly instead of -XX:+PrintAssembly. It provides some interesting information:

044   B5: #	B4 B6 <- B7 B4 	Loop: B5-B4 inner  Freq: 4.93447e+06
044   	addq    R8, #1	# long
048   	movq    [R11 + #112 (8-bit)], R8	# long ! Field LongLoopWithGC.tmp
04c   	testl  rax, [rip + #offset_to_poll_page]	# Safepoint: poll for GC        # LongLoopWithGC::loop @ bci:31  L[0]=RSI L[1]=R9 L[2]=_ L[3]=RCX L[4]=_
        # OopMap{r11=Oop off=76}
052   	cmpq    RCX, #1000000
059   	jl,s   B4  P=0.999999 C=1610592.000000
059
05b   B6: #	B3 B7 <- B5  Freq: 5
05b   	addq    R9, #1	# long
05f   	testl  rax, [rip + #offset_to_poll_page]	# Safepoint: poll for GC        # LongLoopWithGC::loop @ bci:38  L[0]=RSI L[1]=R9 L[2]=_ L[3]=_ L[4]=_
        # OopMap{r11=Oop off=95}
065   	cmpq    R9, R10
068   	jge,s   B3  P=0.100000 C=-1.000000

So the segmentation faults are in fact Safepoints!

(gdb) p 'os::_polling_page'
$1 = (address) 0x7ffff7ff8000 ""

Questions?

Rechenmaschiene von Philipp Matthäus Hahn (1739-1790) Quelle: Württembergisches Landesmuseum Stuttgart

Acknowledgment