What Does Memory Safety Really Mean in D?
Recently, in the D community, there's been a good deal of discussion about what
exactly is meant by "memory safety," and under what circumstances (if any) it's
appropriate to consider an extern(C)
function to be "memory safe." This post
is my attempt to clarify my own thoughts on the subject by putting them in
writing.
None of these ideas are original to me, and many are not even original to D. Any mistakes you might notice, however, are almost certainly my own. :)
What is Memory Safety?
The D language's specification of memory safety defines "memory safety" as follows:
Memory Safety for a program is defined as it being impossible for the program to corrupt memory.
This definition is frustratingly vague. What does it mean, exactly, for a program to "corrupt memory"? Does an out-of-bounds array access "corrupt memory"? Does it matter if the access is a read or a write? And what about "impossible"—does it count if a D program causes memory corruption when running on faulty hardware? If not, where do we draw the line? The D language specification gives no answers to these questions.
A concept closely related to "memory safety" is "undefined behavior." The C99 standard defines "undefined behavior" as
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
Many systems languages, including D, adopt this definition.
It's clear that whatever it may mean to "corrupt memory," a program whose behavior is undefined is free to do so; after all, it's free to do anything. So, at minimum, a program that is memory safe must have no undefined behavior.
Whether there is any defined behavior that is also considered "memory corruption" is a question I'll return to later.
Is @safe Code Memory-Safe?
In D, functions marked with @safe
are checked automatically for memory
safety. Assuming these checks are implemented correctly, a program consisting
only of @safe
D code is guaranteed to be memory-safe.
Unfortunately, it's impossible to write a program that does anything useful
using only @safe
code, because @safe
code can't do I/O or make system
calls. In order to bridge this gap between the world of @safe
and the world
outside, D has @trusted
functions, which are allowed to perform operations
that violate @safe
's checks (including calling unchecked @system
functions), but can still be called from @safe
functions.
When checking @safe
code, the D compiler assumes that all @trusted
functions are memory-safe. Therefore, in any useful program, your @safe
code
is truly memory-safe only if your @trusted
code is also memory-safe.
Establishing that your @trusted
code is memory-safe is left as an excercise
to the programmer.
If @safe
doesn't actually guarantee memory safety, why is it useful at all?
Because it reduces the incredibly difficult problem of proving that your entire
program is memory-safe to the relatively tractable one of proving that your
@trusted
code is memory-safe.
Can External Functions be @trusted?
What about external library functions, whose source code may not be available? Is there any way to prove that such a function is memory-safe?
The strict, hard-line answer is...no. It's impossible. When you write an
extern
function declaration, the only guarantee you have is that the function
you call matches the specified signature. The actual implementation of that
function isn't known until your program is compiled and linked--or in the case
of dynamic linking, until it's executed!
Does that mean @trusted
is useless? Not quite. To see why, let's look at an
an example.
Suppose we're trying to write a memory-safe @trusted
wrapper around the C
library function puts
. From reading the C99 standard, we know that puts
has defined behavior when called with a pointer to a null-terminated string,
and undefined behavior otherwise, so we write our function as follows:
@trusted void safePuts(const(char)[] s) {
if (s.length == 0)
throw new Exception("s must not be empty");
if (s[$ - 1] != '\0')
throw new Exception("s must be null-terminated");
puts(s.ptr);
}
Now we ask: what would have to go wrong for this function to corrupt memory?
The array access, s[$ - 1]
, can't corrupt memory, because array accesses in D
are bounds-checked. Similarly, we know that checking the length of an array,
comparing two characters, and throwing an exception are all safe operations,
because their behavior as defined by the language spec does not include any
possibility of memory corruption. The only remaining thing that could possibly
go wrong is the call to puts
.
Let's assume, for now, that the only way a call to puts
can corrupt memory is
by causing undefined behavior. There are two possible ways this could happen:
- We call
puts
with something other than a pointer to a null-terminated string. puts
has a bug in its implementation and can cause undefined behavior even when it's called with a null-terminated string.
We can rule out #1 because our function checks for null-termination before
calling puts
, so the only possibility left is #2. Since the C99 standard
defines the behavior of puts
when it's called with a null-terminated string,
we can conclude that our code is memory-safe as long as puts
conforms to the
C99 standard.
This is not a 100% ironclad guarantee, but it's a pretty good one. As long as
our C library implementation is well-tested, our @trusted
code should be
fine.
We can apply the same approach to other external functions: assume that they
behave as described in the relevant standard, specification, or documentation,
and then prove that our @trusted
code is memory-safe as long as that
assumption holds.
Is All Well-Defined Behavior Memory-Safe?
The D spec doesn't answer this question directly, but the section on function safety gives us some clues.
Recall that a D program consisting of only @safe
code is guaranteed to be
memory-safe, according to D's definition of memory safety. It follows from this
that any operation a purely-@safe
program is allowed to perform must be a
memory-safe operation—again, according to the definition of "memory-safe" used
in the D spec.
Here's the twist: @safe
D code is allowed to dereference a pointer.
At first glance, this seems impossible. In most systems languages, including D,
dereferencing a pointer when you're not supposed to is considered undefined
behavior. How does D get away with allowing @safe
code to do this?
If we scroll down a bit to the section on safe values, we find our answer. The D spec divides pointer values into two categories: "valid" and "invalid". A valid pointer is one that can be dereferenced without causing undefined behavior.
In purely-@safe
D code, every operation that could possibly create an invalid
pointer is a compile-time error. Pointer arithmetic, casting, and type punning
are banned. Escape analysis ensures stack pointers are always valid; GC ensures
heap pointers are always valid.
Interestingly, @safe
D code is allowed to create null pointers, which
implies that in D, the behavior of dereferencing null
is actually
well-defined.
So dereferencing a pointer in @safe
code is considered memory-safe because
there's no way to access an invalid pointer from @safe
code. It follows that
any code which allows an invalid pointer to be accessed from @safe
code is
guilty of violating memory safety. Even if the offending code contains no
undefined behavior.
The easiest way for this to happen is for a @trusted
function to either (a)
create an invalid pointer and return it, or (b) invalidate an existing pointer
that's accessible from @safe
code.
Currently, the only perfectly-reliable way to work with invalid pointers while
preventing @safe
code from accessing them is to store them in local variables
of @trusted
or @system
functions (or in closures that capture those
variables). For global variables and aggregate members, the only
access-protection mechanism is private
, which still allows access from
@safe
code in the same module.
What About Arrays?
Array access is considered memory-safe (and therefore @safe
) because it's
bounds-checked at runtime. But this bounds-checking relies on the fact that the
.length
property of an array matches the actual length of the array in
memory. How is that guaranteed?
Just like with pointers, D forbids, in @safe
code, any operation that could
possibly create an invalid array. In particular, any change to the array's
length that would create an invalid array triggers a reallocation, using the
GC.
Because arrays are built into the language, their internals do not suffer from
the same access-protection hole as private
variables w.r.t. @safe
code in
the same module. In this way, built-in arrays are privileged over user-defined
types, which have no equivalent access-protection mechanism.
Invariants, Generally
Generalizing from the above two examples, we can derive the general principle
that any data which must uphold some run-time invariant in order to be "valid"
(i.e., memory-safe to use) must not be directly accessible from @safe
code,
but instead encapsulated behind a @safe
/@trusted
interface that maintains
that invariant.
Some examples of such potentially-invalid data include:
- The tag and union of a tagged union.
- The reference count and data pointer of a reference-counted pointer.
- The allocator instance of an allocator-aware container.
To protect potentially-invalid data from access from @safe
code, there are
several possible solutions:
-
Hide the data inside closures. While this solution requires no language changes, it is unsatisfactory because it requires the use of the GC and substantially complicates the implementation of aggregates that wish to use it.
-
Forbid
@safe
code from accessingprivate
data, even in the same module. While this would work, it is a breaking change, and would cause many "false positives," since manyprivate
member variables are not subject to memory-safety related invariants. -
Allow the data to be explicitly marked as inaccessible from
@safe
code. This is the solution proposed by DIP 1035. -
Just rely on
private
. This is not a total loss, sinceprivate
does provide some protection, but it means that all@safe
functions in a module that contains potentially-invalid variables or aggregate members must be considered effectively@trusted
, and subject to manual rather than automatic verification of memory safety.
Summary and Conclusions
Memory safety in D means both the absence of undefined behavior, and the
absence of certain kinds of data (which I've termed "invalid") that could lead
to undefined behavior. To corrupt memory means to either (a) invoke undefined
behavior, or (b) expose invalid data to @safe
code (which could then use it
to invoke undefined behavior).
To prove that a @safe
program is memory safe, all of the following steps
must be completed:
- Prove that the assumptions made by
@trusted
code about external functions hold. - Prove that, if those assumptions hold, the
@trusted
code is memory safe. - Prove that, if the
@trusted
code is memory safe, the@safe
code is memory safe.
Step 1 is performed by the party compiling and linking the program. Step 2 is
performed by the author(s) of the @trusted
code. And step 3 is performed by
the compiler. If any step is skipped, or performed incorrectly, the proof does
not hold.
When performing step 3, the D compiler relies on the assumption that invalid
data cannot be accessed from @safe
code. It upholds this assumption
automatically in purely-@safe
code, and relies on the programmer to uphold it
in @trusted
code.
Unfortunately, in the current D language, it is impossible for the programmer
to uphold this assumption in @trusted
code in all cases without making
unacceptable feature, performance, and implementation-quality tradeoffs. DIP
1035 would solve this problem. The current approach, of relying on private
,
requires manual verification of @safe
code in order for the compiler's proof
of memory safety to remain valid.