Fuzzing Crash: File I/O Length Mismatch With Projection & Filter

Alex Johnson
-
Fuzzing Crash: File I/O Length Mismatch With Projection & Filter

Uncovering a critical bug through fuzzing, this article dives into a length mismatch issue encountered during file I/O operations within the vortex-data system. Specifically, the crash occurs when projection and filter operations are applied. Understanding the root cause and steps to reproduce this issue is crucial for maintaining data integrity and system stability.

Fuzzing Crash Report: A Deep Dive

This report details a crash identified by the fuzzer, pinpointing a length mismatch during file I/O operations involving both projection and filtering. The inconsistency arises when the length of the data isn't preserved after a round trip of writing and reading data.

Analysis of the Crash

  • Crash Location: fuzz/fuzz_targets/file_io.rs:85 - This indicates the exact line of code where the assertion failed, making it easier to debug.
  • Error Message: The error message assertion 'left == right' failed: Length was not preserved. left: 18 right: 22 clearly shows that the expected length (18) differs from the actual length (22) after the file I/O operation.
  • Stack Trace: The stack trace provides a call sequence leading to the crash, assisting in tracing the execution path. The trace indicates the crash originates from the core panicking mechanism, triggered by the length assertion failure within the fuzz test.

Root Cause: Unraveling the Mystery

The root cause of the crash lies in a length mismatch during a file I/O round-trip test that involves projection and filter operations. Here's a breakdown:

  1. A ChunkedArray is created, which contains a struct field holding nullable Utf8 data. This array's structure is key to understanding the bug.
  2. A filter expression is applied to this array, resulting in an expected length of 18. The filtering process is intended to reduce the size of the array.
  3. The array is then written to a buffer, essentially serializing the data for storage or transmission.
  4. Upon reading the array back from the buffer, the same projection and filter are applied. This step aims to reconstruct the original filtered array.
  5. The final step compares the lengths of the filtered array before and after the I/O operation. The assertion fails because the length after the round trip is 22 instead of the expected 18.

The input array is a ChunkedArray with a length of 22, composed of 3 chunks. The struct field has an unusual name, potentially hinting at FSST encoding. This array contains a mix of empty strings, valid strings, and null values, adding complexity to the data structure.

This crash suggests a bug in how the file I/O system handles projection and filtering, especially when the following conditions are met:

  • The array is chunked, meaning it's divided into smaller, more manageable parts.
  • Struct fields are involved, adding a layer of nesting and complexity.
  • FSST encoding might be in use, potentially affecting how strings are stored and retrieved.
  • Complex filter expressions are applied, increasing the chance of logical errors.

The discrepancy between the expected length (18) and the actual length (22) indicates that either:

  1. The filter is not being applied correctly during the read-back process. This could be due to an issue in how the filter expression is interpreted or executed after deserialization.
  2. The combination of projection and filtering produces different results when applied to serialized data versus in-memory data. This suggests a potential inconsistency between how the operations are handled in different states.
Debug Output
FuzzFileAction {
 array: ChunkedArray {
 dtype: Struct(
 StructFields {
 names: FieldNames(
 [
 FieldName(
 "&\u{1}`\rvortex.fsst",
 ),
 ],
 ),
 dtypes: [
 FieldDType {
 inner: Owned(
 Utf8(
 Nullable,
 ),
 ),
 },
 ],
 },
 NonNullable,
 ),
 len: 22,
 chunk_offsets: Buffer<u64> {
 length: 4,
 alignment:
 Alignment(
 8,
 ),
 as_slice: [0, 6, 16, 22],
 },
 chunks: [
 StructArray {
 len: 6,
 dtype: Struct(...),
 fields: [
 ChunkedArray {
 dtype: Utf8(Nullable),
 len: 6,
 chunks: [
 VarBinViewArray { len: 1, validity: AllValid, ... },
 VarBinViewArray { len: 5, validity: Array(BoolArray), ... }
 ],
 },
 ],
 },
 StructArray {
 len: 10,
 fields: [
 VarBinViewArray { len: 10, validity: AllValid, ... }
 ],
 },
 StructArray {
 len: 6,
 fields: [
 ChunkedArray {
 dtype: Utf8(Nullable),
 len: 6,
 chunks: [
 VarBinViewArray { len: 1, validity: AllValid, ... },
 VarBinViewArray { len: 5, validity: Array(BoolArray), ... }
 ],
 },
 ],
 },
 ],
 },
 projection_expr: Some(
 Expression {
 vtable: vortex.pack,
 children: [
 Expression { vtable: vortex.get_item, ... },
 Expression { vtable: vortex.get_item, ... },
 Expression { vtable: vortex.get_item, ... },
 Expression { vtable: vortex.get_item, ... },
 Expression { vtable: vortex.get_item, ... },
 Expression { vtable: vortex.get_item, ... },
 ],
 },
 ),
 filter_expr: Some(
 Expression {
 vtable: vortex.binary,
 children: [
 Expression { vtable: vortex.binary, ... },
 Expression { vtable: vortex.binary, ... },
 ],
 },
 ),
 compressor_strategy: Default,
}

Summary of the Crash

  • Target: file_io
  • Crash File: crash-745d353afc79d11b489b2440bc4fdec37471003b
  • Branch: develop
  • Commit: ed51819
  • Crash Artifact: Direct download not available

Reproduction Steps: Getting to the Bottom of It

To reproduce this crash locally and investigate further, follow these steps:

  1. Download the Crash Artifact:

    • Locate the artifact within the workflow run. This will typically be a zip file containing the crash-inducing input.
    • Extract the zip file to access the crash file.
  2. Reproduce Locally:

    # The artifact contains file_io/crash-745d353afc79d11b489b2440bc4fdec37471003b
    cargo +nightly fuzz run --sanitizer=none file_io file_io/crash-745d353afc79d11b489b2440bc4fdec37471003b
    

    This command uses the cargo fuzz tool to run the file_io fuzz target with the provided crash file. The --sanitizer=none option disables sanitizers, which can sometimes interfere with crash reproduction.

  3. Obtain a Full Backtrace:

    RUST_BACKTRACE=full cargo +nightly fuzz run --sanitizer=none file_io file_io/crash-745d353afc79d11b489b2440bc4fdec37471003b
    

    Setting the RUST_BACKTRACE=full environment variable provides a detailed stack trace, helping to pinpoint the exact location of the crash and the sequence of function calls that led to it.

By following these reproduction steps, developers can effectively investigate the root cause of the length mismatch issue and implement a robust solution to ensure data integrity during file I/O operations. The use of fuzzing has proven invaluable in identifying such subtle bugs that might otherwise go unnoticed. Understanding the interplay between chunked arrays, struct fields, FSST encoding, and complex filter expressions is key to resolving this issue.


Auto-created by fuzzing workflow with Claude analysis

For further reading on Rust fuzzing and related topics, visit the Rust Fuzz Book

You may also like