Wasm Runtime GC Crash: Investigating A SEGV Error
It looks like you've encountered a Segmentation Fault (SEGV) error within the WebAssembly Micro Runtime (WAMR), specifically during a garbage collection operation. This is a critical error that halts program execution, and it's often related to memory access violations. Let's dive into what's happening and how we can approach this.
Understanding the SEGV Error in WAMR
A SEGV error, or Segmentation Fault, typically means that your program tried to access a memory location that it wasn't allowed to. Think of it like trying to read a page from a book that doesn't exist or is ripped out – the system stops you to prevent further damage. In the context of WAMR and its garbage collector (GC), this could stem from a few different places. The error message points directly to /work/harnesses/sources/wasm-micro-runtime/core/iwasm/common/gc/gc_object.c:560 within a function called wasm_obj_is_i31_externref_or_anyref_obj. This function's name suggests it's involved in checking the type or identity of objects, particularly those related to i31, externref, or anyref types, which are key components of Wasm's reference types and garbage collection features.
When the WAMR's interpreter (specifically the fast interpreter in this case, as indicated by the build flags) is executing WebAssembly bytecode, it relies on internal data structures to manage objects, memory, and types. The garbage collector is responsible for keeping track of which objects are still in use and cleaning up those that are not. If there's a bug in how these objects are tracked, how their types are checked, or how memory is managed, it can lead to the GC trying to access invalid memory. The backtrace provided shows a chain of function calls leading up to the error, starting from main and moving through various WAMR interpreter and runtime functions, eventually landing in wasm_obj_is_i31_externref_or_anyref_obj. This indicates that the issue occurred during the execution of a WebAssembly function that likely involved object manipulation or type checking, possibly within a call to another function or a host environment function.
Key Areas of Investigation:
- Object Representation: How are Wasm objects, especially those with reference types, represented in memory? Are there potential issues with dangling pointers, uninitialized memory, or incorrect memory layouts?
- Type Checking Logic: The function
wasm_obj_is_i31_externref_or_anyref_objis directly involved. Errors in this logic could lead to misinterpreting an object's type, causing subsequent operations to fail. - Garbage Collector Implementation: The GC itself needs to correctly identify live objects. If the GC's marking or sweeping phases encounter corrupted object metadata or invalid pointers, it can trigger a SEGV.
- Interaction with Reference Types: The mention of
i31,externref, andanyrefsuggests that the bug might be specific to how WAMR handles these newer WebAssembly features, which are tightly integrated with the GC.
Given that you're using the Debug build type with AddressSanitizer enabled (-fsanitize=address), the SEGV is precisely the kind of error that AddressSanitizer is designed to catch. It instruments your code to detect memory errors at runtime. The fact that it's reporting a SEGV on an "unknown address" is typical when the read or write operation targets memory that is clearly out of bounds or unallocated. The provided GDB backtrace offers more detailed information about the state of the program at the moment of the crash, including the values of registers, which can be invaluable for debugging.
Analyzing the Crash: Trace and Context
The provided GDB backtrace is crucial for pinpointing the exact location and context of the crash. It tells us that the segmentation fault occurred within the wasm_obj_is_i31_externref_or_anyref_obj function, at line 563 of gc_object.c. The arguments passed to this function, particularly the obj parameter, are likely misconfigured or point to invalid memory. Let's break down what the backtrace reveals:
- The Culprit Function:
wasm_obj_is_i31_externref_or_anyref_obj(obj=...)is where the program crashed. This function is designed to check if a given WebAssembly object (obj) conforms to specific reference types (i31,externref, oranyref). - Call Stack: The sequence of calls leading to the crash is quite telling. It starts from
main, thenwasm_application_execute_func,wasm_interp_call_func_bytecode, and finally reaches the problematic function. This indicates that the crash happened during the execution of a WebAssembly function that was called through the WAMR API. - Register Values: The register dump at the time of the crash shows the state of the CPU. For instance,
rip(instruction pointer) points to the exact instruction that caused the fault.rsp(stack pointer) andrbp(base pointer) indicate the current position on the stack. The values ofrax,rbx,rcx,rdx,rsi, andrdiare the arguments and general-purpose registers that were in use. Specifically,rdiis often used for the first argument to a function. In this case,rdiholds0x710deac0. Ifobjis indeed passed viardi, then the function was attempting to dereference this address.
The core of the problem likely lies in the obj parameter passed to wasm_obj_is_i31_externref_or_anyref_obj. It appears that this pointer is either:
- NULL or Invalid: It points to memory that the program is not supposed to access.
- Corrupted: The pointer itself has been overwritten or corrupted earlier in the execution flow.
- Pointing to Uninitialized Memory: The memory it points to has not been properly set up.
Given that the crash occurs within the GC-related code and involves reference types, we can hypothesize a few scenarios. Perhaps the WebAssembly module being executed creates or manipulates objects in a way that confuses the GC. This could involve:
- Incorrect Object Allocation/Deallocation: If objects are allocated or freed improperly, it might leave behind invalid pointers or corrupt the GC's internal structures.
- Type Confusion: An object might be treated as one type when it's actually another, leading to incorrect access patterns. The
wasm_obj_is_instance_offunction, which calls the crashing function, is directly involved in type checking, making this a strong possibility. - Race Conditions (Less Likely in this Context): If threading were heavily involved and not handled carefully, it could lead to memory corruption. However, the provided output doesn't explicitly suggest a multithreading issue here.
Your build command specifies a configuration that enables many features, including WAMR_BUILD_GC=1 and WAMR_BUILD_REF_TYPES=1. This is good for testing the full capabilities of WAMR, but it also means more complex interactions are at play. The Debug build type and -fno-omit-frame-pointer are excellent choices for debugging as they provide more detailed stack traces and prevent optimizations that can obscure the source of errors.
The fact that AddressSanitizer is reporting "can not provide additional info" suggests that the crash might be happening very early in the access, possibly even before the pointer is fully dereferenced in a way that ASan can precisely track, or that the corruption is severe enough to mask the original cause.
Reproducing and Debugging the Issue
To effectively debug this SEGV error, the first step is to reliably reproduce it. You've already provided the command: iwasm --interp -f main test.wasm -76. This is excellent! It means we have a consistent way to trigger the crash.
Here’s a systematic approach to debugging this problem:
-
Examine
test.wasm: If possible, try to understand whattest.wasmis doing. What kind of WebAssembly functions does it export? What arguments does it take? Does it interact with host functions? Understanding the Wasm module's logic might reveal why it's triggering this specific code path. Iftest.wasmis a test case you developed or found, review its creation process and its intended behavior. -
Simplify the Wasm Module: If
test.wasmis complex, try to create a minimal Wasm module that still triggers the SEGV. This stripped-down version will make it much easier to isolate the problematic operation. -
Add Logging: You can instrument the WAMR source code with
printforfprintf(stderr, ...)statements to trace the execution flow and the values of key variables. Pay close attention to:- The
objparameter inwasm_obj_is_i31_externref_or_anyref_objandwasm_obj_is_instance_of. - The object's type information before and during the call.
- The state of the GC heap.
For example, inside
wasm_obj_is_i31_externref_or_anyref_obj, you could add:fprintf(stderr, "DEBUG: Checking object at %p\n", obj); // ... rest of the functionThis might help you see if
objisNULLor an unexpected value. - The
-
Use GDB Interactively: You've already obtained a backtrace. Now, use GDB to step through the code leading up to the crash:
- Start GDB:
gdb --args iwasm --interp -f main test.wasm -76 - Run the program:
run - When the program crashes, you'll be at the
wasm_obj_is_i31_externref_or_anyref_objfunction. Now you can use commands likebt(backtrace),info frame,info registers,print obj(or relevant variables), andstepornextto move through the code line by line and examine variable values. Pay special attention to theobjvariable and any related type descriptors. - You can set a breakpoint before the crash:
break /work/harnesses/sources/wasm-micro-runtime/core/iwasm/common/gc/gc_object.c:560and then run. When the breakpoint is hit, you can inspect the state.
- Start GDB:
-
Focus on the
objPointer: The most direct lead is theobjpointer. In GDB, when you hit the breakpoint just before the crash, examine the memory address pointed to byobj(e.g., usingx/xg $rdiifrdiholdsobj). Is it readable? Does it look like a valid object structure? If it's corrupted, you'll need to trace back further up the call stack to see where thisobjpointer was last set or modified. -
Review Recent Changes: If this issue started appearing recently, check the commit history of the WAMR repository for changes related to garbage collection, reference types, or the interpreter around the time the problem began. Your commit hash
4b42cfdbce1b724137eea3f76868f42b36b4d51cis helpful here. -
Consider the
-76argument: The command includes-76. It's worth investigating what this argument does. Does it enable a specific GC mode, configuration, or feature that might be related to the crash?
By systematically applying these debugging techniques, you should be able to narrow down the root cause of the SEGV error and contribute to fixing this issue in WAMR.
For further insights into WebAssembly and its memory management, you can refer to the official WebAssembly documentation:
- WebAssembly Specification: WebAssembly.org
- Garbage Collection in WebAssembly: WebAssembly GC Proposal