Testing in Nodus - IsolatedVM

At a very high level, Nodus is a JVMTI based mod loader for Minecraft written in Rust, and it loads mods written in Java. This means we interact with JVMTI and JNI a lot, and to make things more complicated, a lot of that code does weird things like defining classes at runtime, or modifying existing classes, which is a nightmare for test isolation.

The first attempt

When we started writing JNI tests, it was a pretty simple setup. Java is almost always executed via the command line, but you can also embed it in an existing process. So we create a JVM in the process, or get the one we created earlier, and attach the test thread to a JNI environment.

static VM: OnceLock<JavaVM> = OnceLock::new();

pub fn env() -> AttachGuard {
    let vm = VM.get_or_init(|| {
        JavaVM::new(
            InitArgsBuilder::new()
                .version(JNIVersion::V8)
                .build(),
        )
    });
    vm.attach_current_thread()
}

AttachGuard is a wrapper for a JNI Environment

For the simpler cases this worked quite well, and in the cases where we needed to define classes, we got some level of test isolation by randomising class names. However, once we got into non trivial classes and JVMTI, this became very impractical.

The second attempt

The most obvious pain point with our previous setup was libraries, technically JVMTI gives us a way to load new Jar files, but if you want test isolation, putting things on the system class loader is not a good idea because you can never take them off it. Another less obvious pain point was that JNI is a relatively low level API with a lot of footguns, and it didn't really make for tests that were easy to understand. Both of these were addressed with the next iteration.

static VM: OnceLock<JavaVM> = OnceLock::new();

pub struct IsolatedVM {
    pub vm: AttachGuard,
    pub class_loader: GlobalRef,
}

impl IsolatedVM {
    pub fn new() -> Self {
        let vm = env();
        let class_loader = vm.new_object("java/net/URLClassLoader", "()V");
        let class_loader = vm.new_global_ref(class_loader);

        Self { vm, class_loader }
    }
}

The most notable change here is the introduction of a URLClassLoader, this solves all our problems around defining classes as each test is on its own ClassLoader, and it makes it very easy to load libraries. Additionally, the wrapper struct around AttachGuard allows us to introduce new abstractions. A good example of this is defining classes, we use a custom rust library for reading/modifying/writing classes, so previously the process for defining a class was:

  • Create the Class struct
  • Call write() to get a Vec<u8>
  • Pass this to DefineClass
  • Check for exceptions
  • Return the class pointer

Now that we have a more abstract way of interacting with the JVM, this can all be hidden in one function call. However, one downside of this approach is the thread's context class loader will not be the class loader we just defined, meaning reflection and JNI won't see the classes we define on it. This could be fixed by calling Thread.setContextClassLoader, however we opted not to do this as we don't have this option when injecting into Minecraft, so most of our JNI code is already written under the assumption that it is not running on the context class loader. Instead, we provide abstractions for calling things like loadClass directly on the ClassLoader object, rather than the JNI (almost) equivalent FindClass.

JVMTI permissions

When testing a bytecode modification library, we wrote a simple JVMTI agent to run in the test to simulate some of the production workflows such as transforming an already loaded class. Because all our tests that call JVM APIs from Rust are "integration" tests, we just shoved a JVMTI environment into a global and hid it behind some testing abstractions, it wasn't pretty but it did the job, and it actually serves as quite a nice reference implementation of an agent using this library.

Then we started getting intermittent failures in CI, our CI environment is configured to rerun tests every few minutes to track this sort of issue down, and on average we were getting 1 failure a day, out of ~150 runs. I'll skip the details of tracking this down, but it turned out to be a faulty assumption on my part. In JUnit, tests within the same class run sequentially, I'd assumed this was also how Rust ran tests, but they are actually parallelised. This meant that because we shared a JVMTI environment across all tests, one thread would add a capability, then another thread would remove that capability before the first thread could use it.

There's two solutions here, one is to give each test a unique JVMTI environment, this avoids direct conflicts but its not perfect, you'll still observe events from other tests. The other option is to set RUST_TEST_THREADS=1, we already get test parallelism from Bazel, so Rust's parallelism wasn't doing much for us and it helps avoid further edge cases. We ended up doing both as it means tests don't interfere with each other and they can't leave the environment in a bad state for the next one.

JNI Errors

If you've used JNI for more than a simple application, you've probably seen a few VM segfaults. The JVM has a very nice segfault handler, but it still requires you to debug your code to find the bug, and it doesn't seem to work from within a Rust test (at least when running from IntelliJ). One of the debugging options the JVM has is -Xcheck:jni, which adds some extra checks so you'll get slightly more information about fatal errors, but also errors that might not show up immediately at runtime, such as deleting a local reference twice:

FATAL ERROR in native method: Bad global or local ref passed to JNI
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x9d1800]  ReportJNIFatalError+0x30
V  [libjvm.so+0x9db702]
V  [libjvm.so+0x9de3a6]  checked_jni_DeleteLocalRef+0x96

This is especially helpful for us as this option isn't viable to use in some versions of Minecraft (LWJGL2 prints a frankly impressive amount of errors), so these tests are the only opportunity we have to get this level of insight into if we're using JNI correctly.

Final Thoughts

This way of testing JNI & JVMTI code has helped us a lot. Minecraft mods and even the toolchains supporting them aren't typically the most well tested (from what I can tell, Mixin doesn't even have a test folder), but these kinds of interactions with the JVM are very testable, and it is worth testing. If something is broken in the code that uses these tests, it will either segfault the JVM or cause a verification error, and neither of those are things I want to spend hours debugging, so this test suite has been invaluable in preventing those kinds of errors. Although it didn't stop me trying to use after free a Java object via JNI but that's another story.