Cryptographic Software – writing the grotty bits

It occurs to me that the craft of designing and writing cryptographic software so that it doesn’t leak sensitive information is, at best, a black art, and most of the necessary techniques aren’t widely known, documented, or shared.

I bet there are a thousand tips and tricks that most of us have never written down because they are grotty details that aren’t very interesting mathematically. I would love to hear the coding techniques that others have developed to control side channels and write software that is both reliable and doesn’t leak information.

Here are a few of mine.

First, I write unit tests for everything. Often even before writing the things. It’s astonishing how often crypto code can look like it’s doing the right thing while it’s actually doing something very subtly different that you’d probably never notice was different. Like having the wrong intervals for your implementation of some LFSG-based thing like the Mersenne Twister – get it wrong by one bit, and the output still looks good, but it’ll be insecure and have a period way shorter than you expected, giving your software an exploitable bug. Or checking for certificate revocation, then accepting a certificate that’s been revoked anyway.

Or getting encrypt-decrypt that reproduces the plaintext, but doesn’t produce the same output on test vectors as the standard implementation you’re trying to be compatible with (and if it’s not producing the same output, very likely that’s the standard implementation that’s more secure than yours….) Or finding the key server unavailable and then falling back to accepting a certificate without displaying the error it was supposed to display about not knowing whether the certificate is good. Anyway, write lots of unit tests. Not only will they make sure you know what your routines are doing, they’ll also tell you when you break something.

The debug build calls the unit tests, and they write a file of test results. The test results file, rather than the executable, is the usual makefile target, and the makefile instructions for building test results end with ‘cat results.txt’ which appends the test result output (ie, a blank line followed by listings of any unit test errors) directly to the compile output, just like any other errors that need fixing.

If I get any choice about it, I try to do all the crypto in a single-threaded, dedicated executable that does very little else. It’s much easier to analyze and debug a single-threaded application. Know your compiler options. Use the diagnostic and standard-enforcing ones heavily. Use those that enable security extensions peculiar to that particular compiler if it helps (such as stack canaries or an option to zero all memory allocated to your program on exit) but do your best to write code that does not rely for its security solely on those extensions, because sooner or later someone will need to compile it on a different compiler. Eschew any standard-breaking extensions that won’t work (especially if they will also cause errors) in different environments.

I write in C using the standard libraries and the GMP bignum library. GMP has some not-very-widely known primitives that are specifically for cryptographic use. They run in constant time and leave nothing on the stack or in buffers, eliminating common side channels. They’re a little slower than the “normal” set of calls, but that’s okay. The C standard libraries can mostly be trusted to do exactly what they say and nothing more. This is kind of a shame because other languages have nice facilities, less “undefined” behavior, and considerably more protection from programmer mistakes, but there is no way to get around it because AFAIK no other language allows me to absolutely control when and whether copies are made, when and whether writes to variables actually happen, etc, as well. Which isn’t high praise for C, because it’s still hard and grotty and error prone. But at least it’s possible.

The runtimes, templates, and libraries that come with most languages (and here I include C++) can be trusted to do what they say, but without going over a million lines of difficult, heavily #ifdef’d template code with a fine toothed comb I can’t trust that they do nothing more. Therefore I don’t know how to write secure software in those languages and be certain that it won’t leak information. They accomplish their semantic goals well, but do so while leaving copies, and fragments of copies, of everything they touch in their objects, in unused areas of their allocated buffers until those buffers are actually used for something else, and on the stack.

A lot of crypto software makes extensive use of global variables for sensitive values. They are fast, never get deallocated or (accidentally) copied during runtime, new values always overwrite previous values instead of landing at new places in memory possibly leaving copies of the old values somewhere, and they avoid indirections that might leave pointers to them lying around. The only thing even a little bit subtle is making sure that they get erased as soon as the program doesn’t need them anymore, and again before the program exits, and that’s not hard. A pattern emerges with a dedicated ‘eraser’ routine that sets them all to bytes read from /dev/random before program exit. The read from /dev/random can’t be elided by the compiler because it is a system-level side effect. But if you read from /dev/random into a buffer and then copy from the buffer to the sensitive variables, the compiler can elide that because those are writes to dead variables. What’s necessary is to read bytes from /dev/random *directly* into the global variables, one at a time if necessary. It helps if you define a singleton record type that holds them all; that way you can just overwrite the record instead of doing one at a time.

That pattern is fine for globals that you clear once or twice per program run and again on program exit, but I/O, even from /dev/random, is far too slow for using on return from every subroutine that handles sensitive variables. I kind of don’t like using global variables. I don’t like the idea that every part of the program has access to them. But I have certainly used them.
C also gives you variables with ‘file’ scope, which is another way to have something that you can’t deallocate or lose track of and allows you to limit access to the sensitive variables to JUST the routines defined in one file.  That’s probably a better pattern than globals.

I use the ‘volatile’ keyword a lot, to designate local variables so that writes to those variables will never be skipped, even if writing to them is the last thing that happens before the procedure they’re allocated in returns. I know of no other language besides C that allows that. ‘volatile’ allows me to easily use local variables and avoid leaving anything sensitive on the stack, but do not use it for anything that’s not sensitive. For example if you use a volatile variable to index a loop, the loop will run slow because the code has to write that variable to cache every iteration rather than just mapping it to a register. It’s better to just have a regular auto variable that you use for that.

A very good solution to the problem is to allocate a large ‘security’ buffer as a local variable in main(), and have the program manage its own stack for sensitive variables. The way that works is that when main() calls anything, it gives it a pointer into the security buffer, and the size of the buffer. A called routine checks that the buffer is large enough and uses as much of the buffer as it needs for its own sensitive locals by casting the pointer at the buffer into a pointer at a record type that contains its sensitive locals. If it calls anything
else that has sensitive local variables, it does so with a pointer just past the end of its record, and the size it got for the security buffer minus the size of its own sensitive-locals record. As with globals, before program exit you call an ‘eraser’ that overwrites the entire buffer with bytes from /dev/random. If this sounds familiar, it’s because when you were studying programming languages in college they made you implement your own call stack for a compiler or interpreter you were writing at least once. You’re managing your own stack memory because you know exactly how you do it and the system your program is running on might do it in some way that creates a security flaw otherwise.

The benefit of this is that main() retains control of the buffer. It doesn’t make the system slower the way ‘volatile’ does. And if multiple routines both read and write in the buffer, the compiler can never elide writes into it – so the routines can easily and reliably clear any sensitive vars that they *aren’t* passing back to their caller before they return. It’s probably safer than using ‘volatile’ local variables, because it’s possible to forget to clear a sensitive ‘volatile’ before returning but it is NEVER possible to forget to clear the security buffer before exiting – that would definitely break your unit tests. The downside of this is that handling the buffer tends to be a pain in the @$$, which is why I tend to use ‘volatile’ instead. I do use a designated buffer for VERY sensitive variables, such as passwords. Absolutely no routine, anywhere, gets to make its own copy of a password that lives outside the buffer, and main() writes over that as soon as the key is set up.

I’ve seen crypto software where sensitive locals were allocated using the ‘static’ keyword to ensure that local variables with sensitive information are not deallocated when the function returns. This prevents leaks during runtime, and with static variables, the compiler can’t USUALLY elide writes, so the called subroutine can usually clear the values of its sensitive locals before returning. But it’s not a technique I trust, because compilers are ALLOWED to elide final writes to static variables if it can prove that the initial values of the static variables don’t matter to the routine whose scope they’re in, and static locals are strictly worse than globals for leak prevention because main() has no way to overwrite all those before it exits. Every one of them gets released when the program exits, and you just don’t know what’s going to be the next program to allocate the block of memory where they were contained. Granted, on most systems it doesn’t matter much because the kernel zeros all memory before the next process gets it. But that’s an option that can be turned off in a kernel that’s compiled for speed.

I always use unsigned integers. In fact this is something I learned rather recently, due to an issue raised on a cryptography list. The basic mathematical operators (addition, subtraction, multiplication) can overflow, and overflow on signed integers (with the usual wraparound semantics that can give a negative result of adding two positive numbers) is undefined behavior. If you must use signed integers and check afterward for overflow/wraparound, you must add them as though they were unsigned integers, like this:

(unsigned int)z = (unsigned int)x + (unsigned int)y;

because overflow on unsigned integers *is* defined behavior.

You can’t even check to see if undefined behavior has happened, because the compiler will go, “oh, that couldn’t happen except for undefined behavior, and I can do whatever I want with undefined behavior. I want to ignore it.”

It will then cut the checks and everything that depends on them out of your program as ‘dead code’. So you can have an integer that is in fact negative because of a wraparound that occurred while adding two positive numbers, and the program will jump straight past a check for a negative value without triggering it.

Crypto coding has taught me to use a few other unusual bits of coding style. In crypto, we tend to allocate buffers, permutations, etc that are ’round’ numbers like 0x100 bytes or 0x10000 16-bit values. Because we’re using unsigned integers anyway, indexing into these buffers using uint8_t or uint16_t variables gives us an automatic range safety check. But it is hard to write ‘for’ loops that exit if we’re iterating over the whole buffer, so instead of ‘for’ loops I tend to use do/while loops. If I want to do something like initializing a permutation with every value of a uint16_t for example, my usual idiom is to write something like

count = 0; do {
permutation[count] = count;
}while (++count != 0);

This is also an example of a habit that suits me when in full-on paranoia mode to never leave any local variable (such as count) with a nonzero value if I can help it, whether *I* think that variable is sensitive or not. If I leave something with a nonzero value on routine exit, I try to leave the same constant value in it on every exit. So that’s a few bits about writing non-leaking crypto code. I’ve been more concerned with preventing memory-based data leaks, obviously, than controlling other side channels like timing or power use. Would anybody else here like to share some of the techniques they use?

Leave a Reply