False Positive: Array Bound Error In Clang Analyzer

Alex Johnson
-
False Positive: Array Bound Error In Clang Analyzer

This article discusses a false positive reported by the Clang Static Analyzer related to array bound checking. Specifically, it involves a scenario where the analyzer incorrectly flags a potential out-of-bounds access, even when the code contains assertions that guarantee the access is safe. This issue arises in the security.ArrayBound check within the analyzer.

Code Snippet

The following C code snippet demonstrates the problem:

#include <stdlib.h>
#include <assert.h>
#include <unistd.h>
#include <stdint.h>
#include <sys/stat.h>

void foo(size_t max_path_len)
{
 assert(max_path_len < SIZE_MAX);
 char *buf = malloc(max_path_len + 1);
 assert(buf);

 int path_len_signed = readlink("hello", buf, max_path_len);
 assert(path_len_signed >= 0);
 size_t path_len = (size_t)path_len_signed;
 assert(path_len <= max_path_len);

 buf[path_len] = 0;
}

In this code, the foo function reads a symbolic link using readlink into a buffer allocated with malloc. Several assertions are used to ensure that the size of the buffer and the length of the path read from the symbolic link are within expected bounds. Specifically, the code ensures:

  1. max_path_len is less than SIZE_MAX.
  2. The malloc call succeeds and buf is not NULL.
  3. readlink returns a non-negative value.
  4. path_len (the length returned by readlink) is less than or equal to max_path_len.

Despite these checks, the Clang Static Analyzer flags line 19 (buf[path_len] = 0;) as a potential out-of-bounds access.

The Problem

The core issue lies in the analyzer's inability to fully reason about the relationships between the variables. The analyzer seems to lose track of the constraints imposed by the assertions, particularly the relationship between max_path_len, the size of the allocated buffer, and path_len, the actual length of the path read by readlink. The analyzer should recognize that because path_len is guaranteed to be less than or equal to max_path_len, and the buffer buf is allocated with max_path_len + 1 bytes, accessing buf[path_len] is always safe. The allocation malloc(max_path_len + 1) guarantees space for max_path_len characters plus a null terminator. The analyzer incorrectly flags the access to buf[path_len] as a potential out-of-bounds write, even though path_len is asserted to be less than or equal to max_path_len.

Analyzer Output

The output from clang-tidy shows the analyzer's reasoning:

$ clang-tidy repro.c
Error while trying to load a compilation database:
Could not auto-detect compilation database for file "repro.c"
No compilation database found in /tmp or any parent directory
fixed-compilation-database: Error while opening fixed database: No such file or directory
json-compilation-database: Error while opening JSON database: No such file or directory
Running without flags.
3 warnings generated.
/nix/store/cz0l7aa2dzk2sgz9b55i1wzn0v80ygwq-glibc-2.40-66-dev/include/stdint.h:95:11: warning: '__INT64_C' macro redefined [clang-diagnostic-macro-redefined]
 95 | # define __INT64_C(c) c ## L
 | ^
note: previous definition is here
/nix/store/cz0l7aa2dzk2sgz9b55i1wzn0v80ygwq-glibc-2.40-66-dev/include/stdint.h:96:11: warning: '__UINT64_C' macro redefined [clang-diagnostic-macro-redefined]
 96 | # define __UINT64_C(c) c ## UL
 | ^
note: previous definition is here
/tmp/repro.c:19:2: warning: Potential out of bound access to the heap area with tainted index [clang-analyzer-security.ArrayBound]
 19 | buf[path_len] = 0;
 | ^~~~~~~~~~~~~
/tmp/repro.c:10:9: note: Assuming 'max_path_len' is < -1
 10 | assert(max_path_len < SIZE_MAX);
 | ^
/nix/store/cz0l7aa2dzk2sgz9b55i1wzn0v80ygwq-glibc-2.40-66-dev/include/assert.h:117:11: note: expanded from macro 'assert'
 117 | if (expr) \
 | ^~~~
/tmp/repro.c:10:2: note: Taking true branch
 10 | assert(max_path_len < SIZE_MAX);
 | ^
/nix/store/cz0l7aa2dzk2sgz9b55i1wzn0v80ygwq-glibc-2.40-66-dev/include/assert.h:117:7: note: expanded from macro 'assert'
 117 | if (expr) \
 | ^
/tmp/repro.c:12:9: note: Assuming 'buf' is non-null
 12 | assert(buf);
 | ^
/tmp/repro.c:12:2: note: Taking true branch
 12 | assert(buf);
 | ^
/nix/store/cz0l7aa2dzk2sgz9b55i1wzn0v80ygwq-glibc-2.40-66-dev/include/assert.h:117:7: note: expanded from macro 'assert'
 117 | if (expr) \
 | ^
/tmp/repro.c:14:24: note: Taint originated here
 14 | int path_len_signed = readlink("hello", buf, max_path_len);
 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/tmp/repro.c:14:24: note: Assuming that the 3rd argument to 'readlink' is <= 18446744073709551615
 14 | int path_len_signed = readlink("hello", buf, max_path_len);
 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/tmp/repro.c:14:24: note: Taint propagated to the return value
 14 | int path_len_signed = readlink("hello", buf, max_path_len);
 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/tmp/repro.c:14:24: note: Assuming that 'readlink' is successful
 14 | int path_len_signed = readlink("hello", buf, max_path_len);
 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/tmp/repro.c:15:9: note: 'path_len_signed' is >= 0
 15 | assert(path_len_signed >= 0);
 | ^
/tmp/repro.c:15:2: note: Taking true branch
 15 | assert(path_len_signed >= 0);
 | ^
/nix/store/cz0l7aa2dzk2sgz9b55i1wzn0v80ygwq-glibc-2.40-66-dev/include/assert.h:117:7: note: expanded from macro 'assert'
 117 | if (expr) \
 | ^
/tmp/repro.c:17:9: note: 'path_len' is <= 'max_path_len'
 17 | assert(path_len <= max_path_len);
 | ^
/tmp/repro.c:17:2: note: Taking true branch
 17 | assert(path_len <= max_path_len);
 | ^
/nix/store/cz0l7aa2dzk2sgz9b55i1wzn0v80ygwq-glibc-2.40-66-dev/include/assert.h:117:7: note: expanded from macro 'assert'
 117 | if (expr) \
 | ^
/tmp/repro.c:19:2: note: Access of the heap area with a tainted index that may be too large
 19 | buf[path_len] = 0;
 | ^~~~~~~~~~~~~

The analyzer's trace indicates that it assumes max_path_len is less than -1. This incorrect assumption likely stems from how the analyzer handles SIZE_MAX in the assertion assert(max_path_len < SIZE_MAX). It appears the analyzer isn't able to concretely evaluate this condition, leading to a pessimistic assumption about the possible values of max_path_len.

Impact

False positives from static analysis tools can be problematic. While static analyzers are designed to catch potential bugs, a high rate of false positives can lead developers to distrust the tool and ignore its warnings. It's crucial for static analyzers to be accurate and precise in their analysis to maintain developer confidence.

Possible Solutions and Workarounds

  1. Simplify Assertions: Try simplifying the assertions or using more explicit range checks to help the analyzer understand the constraints. For example, instead of relying on SIZE_MAX, use a smaller, concrete value for testing.
  2. Code Restructuring: Sometimes, restructuring the code can help the analyzer follow the logic more easily. This might involve breaking down complex expressions into smaller, more manageable steps.
  3. Analyzer Configuration: Explore the analyzer's configuration options to see if there are any settings that can be adjusted to improve its accuracy.
  4. Suppress Warning: As a last resort, you can suppress the warning using analyzer-specific directives if you are confident that the code is correct. However, use this option with caution and make sure to document why the warning is suppressed.

Conclusion

This scenario highlights a limitation in the Clang Static Analyzer's ability to reason about array bounds in the presence of assertions and SIZE_MAX. While the analyzer is a valuable tool for detecting potential bugs, it's important to be aware of its limitations and to carefully review its warnings to avoid false positives. In this particular case, the access to buf[path_len] is safe due to the preceding assertions, and the analyzer's warning is a false positive.

For more information on Clang Static Analyzer and its capabilities, visit the official Clang Static Analyzer Documentation.

You may also like