Aug 23, 2020

What Does Memory Safety Really Mean in D?

Recently, in the D community, there's been a good deal of discussion about what exactly is meant by "memory safety," and under what circumstances (if any) it's appropriate to consider an extern(C) function to be "memory safe." This post is my attempt to clarify my own thoughts on the subject by putting them in writing.

None of these ideas are original to me, and many are not even original to D. Any mistakes you might notice, however, are almost certainly my own. :)

What is Memory Safety?

The D language's specification of memory safety defines "memory safety" as follows:

Memory Safety for a program is defined as it being impossible for the program to corrupt memory.

This definition is frustratingly vague. What does it mean, exactly, for a program to "corrupt memory"? Does an out-of-bounds array access "corrupt memory"? Does it matter if the access is a read or a write? And what about "impossible"—does it count if a D program causes memory corruption when running on faulty hardware? If not, where do we draw the line? The D language specification gives no answers to these questions.

A concept closely related to "memory safety" is "undefined behavior." The C99 standard defines "undefined behavior" as

behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

Many systems languages, including D, adopt this definition.

It's clear that whatever it may mean to "corrupt memory," a program whose behavior is undefined is free to do so; after all, it's free to do anything. So, at minimum, a program that is memory safe must have no undefined behavior.

Whether there is any defined behavior that is also considered "memory corruption" is a question I'll return to later.

Is @safe Code Memory-Safe?

In D, functions marked with @safe are checked automatically for memory safety. Assuming these checks are implemented correctly, a program consisting only of @safe D code is guaranteed to be memory-safe.

Unfortunately, it's impossible to write a program that does anything useful using only @safe code, because @safe code can't do I/O or make system calls. In order to bridge this gap between the world of @safe and the world outside, D has @trusted functions, which are allowed to perform operations that violate @safe's checks (including calling unchecked @system functions), but can still be called from @safe functions.

When checking @safe code, the D compiler assumes that all @trusted functions are memory-safe. Therefore, in any useful program, your @safe code is truly memory-safe only if your @trusted code is also memory-safe.

Establishing that your @trusted code is memory-safe is left as an excercise to the programmer.

If @safe doesn't actually guarantee memory safety, why is it useful at all? Because it reduces the incredibly difficult problem of proving that your entire program is memory-safe to the relatively tractable one of proving that your @trusted code is memory-safe.

Can External Functions be @trusted?

What about external library functions, whose source code may not be available? Is there any way to prove that such a function is memory-safe?

The strict, hard-line answer is...no. It's impossible. When you write an extern function declaration, the only guarantee you have is that the function you call matches the specified signature. The actual implementation of that function isn't known until your program is compiled and linked--or in the case of dynamic linking, until it's executed!

Does that mean @trusted is useless? Not quite. To see why, let's look at an an example.

Suppose we're trying to write a memory-safe @trusted wrapper around the C library function puts. From reading the C99 standard, we know that puts has defined behavior when called with a pointer to a null-terminated string, and undefined behavior otherwise, so we write our function as follows:

@trusted void safePuts(const(char)[] s) {
    if (s.length == 0)
        throw new Exception("s must not be empty");
    if (s[$ - 1] != '\0')
        throw new Exception("s must be null-terminated");
    puts(s.ptr);
}

Now we ask: what would have to go wrong for this function to corrupt memory?

The array access, s[$ - 1], can't corrupt memory, because array accesses in D are bounds-checked. Similarly, we know that checking the length of an array, comparing two characters, and throwing an exception are all safe operations, because their behavior as defined by the language spec does not include any possibility of memory corruption. The only remaining thing that could possibly go wrong is the call to puts.

Let's assume, for now, that the only way a call to puts can corrupt memory is by causing undefined behavior. There are two possible ways this could happen:

  1. We call puts with something other than a pointer to a null-terminated string.
  2. puts has a bug in its implementation and can cause undefined behavior even when it's called with a null-terminated string.

We can rule out #1 because our function checks for null-termination before calling puts, so the only possibility left is #2. Since the C99 standard defines the behavior of puts when it's called with a null-terminated string, we can conclude that our code is memory-safe as long as puts conforms to the C99 standard.

This is not a 100% ironclad guarantee, but it's a pretty good one. As long as our C library implementation is well-tested, our @trusted code should be fine.

We can apply the same approach to other external functions: assume that they behave as described in the relevant standard, specification, or documentation, and then prove that our @trusted code is memory-safe as long as that assumption holds.

Is All Well-Defined Behavior Memory-Safe?

The D spec doesn't answer this question directly, but the section on function safety gives us some clues.

Recall that a D program consisting of only @safe code is guaranteed to be memory-safe, according to D's definition of memory safety. It follows from this that any operation a purely-@safe program is allowed to perform must be a memory-safe operation—again, according to the definition of "memory-safe" used in the D spec.

Here's the twist: @safe D code is allowed to dereference a pointer.

At first glance, this seems impossible. In most systems languages, including D, dereferencing a pointer when you're not supposed to is considered undefined behavior. How does D get away with allowing @safe code to do this?

If we scroll down a bit to the section on safe values, we find our answer. The D spec divides pointer values into two categories: "valid" and "invalid". A valid pointer is one that can be dereferenced without causing undefined behavior.

In purely-@safe D code, every operation that could possibly create an invalid pointer is a compile-time error. Pointer arithmetic, casting, and type punning are banned. Escape analysis ensures stack pointers are always valid; GC ensures heap pointers are always valid.

Interestingly, @safe D code is allowed to create null pointers, which implies that in D, the behavior of dereferencing null is actually well-defined.

So dereferencing a pointer in @safe code is considered memory-safe because there's no way to access an invalid pointer from @safe code. It follows that any code which allows an invalid pointer to be accessed from @safe code is guilty of violating memory safety. Even if the offending code contains no undefined behavior.

The easiest way for this to happen is for a @trusted function to either (a) create an invalid pointer and return it, or (b) invalidate an existing pointer that's accessible from @safe code.

Currently, the only perfectly-reliable way to work with invalid pointers while preventing @safe code from accessing them is to store them in local variables of @trusted or @system functions (or in closures that capture those variables). For global variables and aggregate members, the only access-protection mechanism is private, which still allows access from @safe code in the same module.

What About Arrays?

Array access is considered memory-safe (and therefore @safe) because it's bounds-checked at runtime. But this bounds-checking relies on the fact that the .length property of an array matches the actual length of the array in memory. How is that guaranteed?

Just like with pointers, D forbids, in @safe code, any operation that could possibly create an invalid array. In particular, any change to the array's length that would create an invalid array triggers a reallocation, using the GC.

Because arrays are built into the language, their internals do not suffer from the same access-protection hole as private variables w.r.t. @safe code in the same module. In this way, built-in arrays are privileged over user-defined types, which have no equivalent access-protection mechanism.

Invariants, Generally

Generalizing from the above two examples, we can derive the general principle that any data which must uphold some run-time invariant in order to be "valid" (i.e., memory-safe to use) must not be directly accessible from @safe code, but instead encapsulated behind a @safe/@trusted interface that maintains that invariant.

Some examples of such potentially-invalid data include:

  • The tag and union of a tagged union.
  • The reference count and data pointer of a reference-counted pointer.
  • The allocator instance of an allocator-aware container.

To protect potentially-invalid data from access from @safe code, there are several possible solutions:

  1. Hide the data inside closures. While this solution requires no language changes, it is unsatisfactory because it requires the use of the GC and substantially complicates the implementation of aggregates that wish to use it.

  2. Forbid @safe code from accessing private data, even in the same module. While this would work, it is a breaking change, and would cause many "false positives," since many private member variables are not subject to memory-safety related invariants.

  3. Allow the data to be explicitly marked as inaccessible from @safe code. This is the solution proposed by DIP 1035.

  4. Just rely on private. This is not a total loss, since private does provide some protection, but it means that all @safe functions in a module that contains potentially-invalid variables or aggregate members must be considered effectively @trusted, and subject to manual rather than automatic verification of memory safety.

Summary and Conclusions

Memory safety in D means both the absence of undefined behavior, and the absence of certain kinds of data (which I've termed "invalid") that could lead to undefined behavior. To corrupt memory means to either (a) invoke undefined behavior, or (b) expose invalid data to @safe code (which could then use it to invoke undefined behavior).

To prove that a @safe program is memory safe, all of the following steps must be completed:

  1. Prove that the assumptions made by @trusted code about external functions hold.
  2. Prove that, if those assumptions hold, the @trusted code is memory safe.
  3. Prove that, if the @trusted code is memory safe, the @safe code is memory safe.

Step 1 is performed by the party compiling and linking the program. Step 2 is performed by the author(s) of the @trusted code. And step 3 is performed by the compiler. If any step is skipped, or performed incorrectly, the proof does not hold.

When performing step 3, the D compiler relies on the assumption that invalid data cannot be accessed from @safe code. It upholds this assumption automatically in purely-@safe code, and relies on the programmer to uphold it in @trusted code.

Unfortunately, in the current D language, it is impossible for the programmer to uphold this assumption in @trusted code in all cases without making unacceptable feature, performance, and implementation-quality tradeoffs. DIP 1035 would solve this problem. The current approach, of relying on private, requires manual verification of @safe code in order for the compiler's proof of memory safety to remain valid.