Wasm Runtime GC Crash: Investigating A SEGV Error

Alex Johnson
-
Wasm Runtime GC Crash: Investigating A SEGV Error

It looks like you've encountered a Segmentation Fault (SEGV) error within the WebAssembly Micro Runtime (WAMR), specifically during a garbage collection operation. This is a critical error that halts program execution, and it's often related to memory access violations. Let's dive into what's happening and how we can approach this.

Understanding the SEGV Error in WAMR

A SEGV error, or Segmentation Fault, typically means that your program tried to access a memory location that it wasn't allowed to. Think of it like trying to read a page from a book that doesn't exist or is ripped out – the system stops you to prevent further damage. In the context of WAMR and its garbage collector (GC), this could stem from a few different places. The error message points directly to /work/harnesses/sources/wasm-micro-runtime/core/iwasm/common/gc/gc_object.c:560 within a function called wasm_obj_is_i31_externref_or_anyref_obj. This function's name suggests it's involved in checking the type or identity of objects, particularly those related to i31, externref, or anyref types, which are key components of Wasm's reference types and garbage collection features.

When the WAMR's interpreter (specifically the fast interpreter in this case, as indicated by the build flags) is executing WebAssembly bytecode, it relies on internal data structures to manage objects, memory, and types. The garbage collector is responsible for keeping track of which objects are still in use and cleaning up those that are not. If there's a bug in how these objects are tracked, how their types are checked, or how memory is managed, it can lead to the GC trying to access invalid memory. The backtrace provided shows a chain of function calls leading up to the error, starting from main and moving through various WAMR interpreter and runtime functions, eventually landing in wasm_obj_is_i31_externref_or_anyref_obj. This indicates that the issue occurred during the execution of a WebAssembly function that likely involved object manipulation or type checking, possibly within a call to another function or a host environment function.

Key Areas of Investigation:

  • Object Representation: How are Wasm objects, especially those with reference types, represented in memory? Are there potential issues with dangling pointers, uninitialized memory, or incorrect memory layouts?
  • Type Checking Logic: The function wasm_obj_is_i31_externref_or_anyref_obj is directly involved. Errors in this logic could lead to misinterpreting an object's type, causing subsequent operations to fail.
  • Garbage Collector Implementation: The GC itself needs to correctly identify live objects. If the GC's marking or sweeping phases encounter corrupted object metadata or invalid pointers, it can trigger a SEGV.
  • Interaction with Reference Types: The mention of i31, externref, and anyref suggests that the bug might be specific to how WAMR handles these newer WebAssembly features, which are tightly integrated with the GC.

Given that you're using the Debug build type with AddressSanitizer enabled (-fsanitize=address), the SEGV is precisely the kind of error that AddressSanitizer is designed to catch. It instruments your code to detect memory errors at runtime. The fact that it's reporting a SEGV on an "unknown address" is typical when the read or write operation targets memory that is clearly out of bounds or unallocated. The provided GDB backtrace offers more detailed information about the state of the program at the moment of the crash, including the values of registers, which can be invaluable for debugging.

Analyzing the Crash: Trace and Context

The provided GDB backtrace is crucial for pinpointing the exact location and context of the crash. It tells us that the segmentation fault occurred within the wasm_obj_is_i31_externref_or_anyref_obj function, at line 563 of gc_object.c. The arguments passed to this function, particularly the obj parameter, are likely misconfigured or point to invalid memory. Let's break down what the backtrace reveals:

  • The Culprit Function: wasm_obj_is_i31_externref_or_anyref_obj(obj=...) is where the program crashed. This function is designed to check if a given WebAssembly object (obj) conforms to specific reference types (i31, externref, or anyref).
  • Call Stack: The sequence of calls leading to the crash is quite telling. It starts from main, then wasm_application_execute_func, wasm_interp_call_func_bytecode, and finally reaches the problematic function. This indicates that the crash happened during the execution of a WebAssembly function that was called through the WAMR API.
  • Register Values: The register dump at the time of the crash shows the state of the CPU. For instance, rip (instruction pointer) points to the exact instruction that caused the fault. rsp (stack pointer) and rbp (base pointer) indicate the current position on the stack. The values of rax, rbx, rcx, rdx, rsi, and rdi are the arguments and general-purpose registers that were in use. Specifically, rdi is often used for the first argument to a function. In this case, rdi holds 0x710deac0. If obj is indeed passed via rdi, then the function was attempting to dereference this address.

The core of the problem likely lies in the obj parameter passed to wasm_obj_is_i31_externref_or_anyref_obj. It appears that this pointer is either:

  1. NULL or Invalid: It points to memory that the program is not supposed to access.
  2. Corrupted: The pointer itself has been overwritten or corrupted earlier in the execution flow.
  3. Pointing to Uninitialized Memory: The memory it points to has not been properly set up.

Given that the crash occurs within the GC-related code and involves reference types, we can hypothesize a few scenarios. Perhaps the WebAssembly module being executed creates or manipulates objects in a way that confuses the GC. This could involve:

  • Incorrect Object Allocation/Deallocation: If objects are allocated or freed improperly, it might leave behind invalid pointers or corrupt the GC's internal structures.
  • Type Confusion: An object might be treated as one type when it's actually another, leading to incorrect access patterns. The wasm_obj_is_instance_of function, which calls the crashing function, is directly involved in type checking, making this a strong possibility.
  • Race Conditions (Less Likely in this Context): If threading were heavily involved and not handled carefully, it could lead to memory corruption. However, the provided output doesn't explicitly suggest a multithreading issue here.

Your build command specifies a configuration that enables many features, including WAMR_BUILD_GC=1 and WAMR_BUILD_REF_TYPES=1. This is good for testing the full capabilities of WAMR, but it also means more complex interactions are at play. The Debug build type and -fno-omit-frame-pointer are excellent choices for debugging as they provide more detailed stack traces and prevent optimizations that can obscure the source of errors.

The fact that AddressSanitizer is reporting "can not provide additional info" suggests that the crash might be happening very early in the access, possibly even before the pointer is fully dereferenced in a way that ASan can precisely track, or that the corruption is severe enough to mask the original cause.

Reproducing and Debugging the Issue

To effectively debug this SEGV error, the first step is to reliably reproduce it. You've already provided the command: iwasm --interp -f main test.wasm -76. This is excellent! It means we have a consistent way to trigger the crash.

Here’s a systematic approach to debugging this problem:

  1. Examine test.wasm: If possible, try to understand what test.wasm is doing. What kind of WebAssembly functions does it export? What arguments does it take? Does it interact with host functions? Understanding the Wasm module's logic might reveal why it's triggering this specific code path. If test.wasm is a test case you developed or found, review its creation process and its intended behavior.

  2. Simplify the Wasm Module: If test.wasm is complex, try to create a minimal Wasm module that still triggers the SEGV. This stripped-down version will make it much easier to isolate the problematic operation.

  3. Add Logging: You can instrument the WAMR source code with printf or fprintf(stderr, ...) statements to trace the execution flow and the values of key variables. Pay close attention to:

    • The obj parameter in wasm_obj_is_i31_externref_or_anyref_obj and wasm_obj_is_instance_of.
    • The object's type information before and during the call.
    • The state of the GC heap.

    For example, inside wasm_obj_is_i31_externref_or_anyref_obj, you could add:

    fprintf(stderr, "DEBUG: Checking object at %p\n", obj);
    // ... rest of the function
    

    This might help you see if obj is NULL or an unexpected value.

  4. Use GDB Interactively: You've already obtained a backtrace. Now, use GDB to step through the code leading up to the crash:

    • Start GDB: gdb --args iwasm --interp -f main test.wasm -76
    • Run the program: run
    • When the program crashes, you'll be at the wasm_obj_is_i31_externref_or_anyref_obj function. Now you can use commands like bt (backtrace), info frame, info registers, print obj (or relevant variables), and step or next to move through the code line by line and examine variable values. Pay special attention to the obj variable and any related type descriptors.
    • You can set a breakpoint before the crash: break /work/harnesses/sources/wasm-micro-runtime/core/iwasm/common/gc/gc_object.c:560 and then run. When the breakpoint is hit, you can inspect the state.
  5. Focus on the obj Pointer: The most direct lead is the obj pointer. In GDB, when you hit the breakpoint just before the crash, examine the memory address pointed to by obj (e.g., using x/xg $rdi if rdi holds obj). Is it readable? Does it look like a valid object structure? If it's corrupted, you'll need to trace back further up the call stack to see where this obj pointer was last set or modified.

  6. Review Recent Changes: If this issue started appearing recently, check the commit history of the WAMR repository for changes related to garbage collection, reference types, or the interpreter around the time the problem began. Your commit hash 4b42cfdbce1b724137eea3f76868f42b36b4d51c is helpful here.

  7. Consider the -76 argument: The command includes -76. It's worth investigating what this argument does. Does it enable a specific GC mode, configuration, or feature that might be related to the crash?

By systematically applying these debugging techniques, you should be able to narrow down the root cause of the SEGV error and contribute to fixing this issue in WAMR.

For further insights into WebAssembly and its memory management, you can refer to the official WebAssembly documentation:

You may also like