Monday, 30 November 2020

Bang and Blame

The Joy of C++

The joy of C++, an catalog of some ways C++ programs can and do fail in practice. 

Error Handling via Exceptions

C++ supports an exception mechanism - errors can be reported by throwing an exception, and handled by code catching the exception higher up the call stack. Given that C++ is, well, C++, you can throw values that are not exceptions but of any desired type.

Consequently, it is hard to know what exceptions should be handled - there is no common base type. You can try "catch (...)" but then its an interesting challenge to portably determine what the unknown exception is.

The cost of missing to catch an exception is program termination. So, it would be nice if the compiler could assist in ensuring that exceptions that can be thrown are in fact caught.

The compiler will not assist. Functions can have exception specifications, stating either that "nothing is thrown" or "A or B is thrown". These are not statically enforced, but dynamically checked: the action on failing the check is, essentially, to terminate the program. 

Such checks may interfere with the ability of compilers to optimise, and certainly complicate the process.
 

Not all exceptions are Exceptions

The exception handling mechanism of C++ is inadequate for another reason - not all errors generate exceptions. There is no Java-style "nil pointer" deference exception in C++. In broad strokes, your program will crash on dereferencing a null pointer. Such an error is a "system thing", as is integer division by zero. The behaviour is "undefined" - which means in practice you cannot reasonably handle it, portably. 
 

Disabling Error Handling via Exceptions

The C++ language is often subset - in the name of runtime efficiency - by disabling exception handling (typically together with run time type information (RTTI). The consequence is that code must then either return errors via return values, or locally decide that an error is fatal and invoke "abort()" or "exit()". 

Error codes may all to easily be ignored. The other case, where code may locally call "abort()" on error requires that all callers be aware of how all the code they call, directly or indirectly, will handle errors.

It is not best practice for one dynamically linked library in a large application to terminate the whole, with no chance to intervene in the error handling. Yet, without careful API design in all the intervening layers, it may not be possible to report the error all the way up to the top layer.

Stack Overflow

The call stack can overflow in any language with recursion where stack allocation is used for activation records and tail call optimisation is not mandatory. So, this not a C++ specific problem. Yet, C++ does not provide any in language mechanism to know if you *will* overflow the stack, or to handle if you *do* overflow the stack.
 
Two approaches seem used, based on how the program is linked. You can place the stack at the top of memory, and grow the stack towards the top: exceeding the available space will then trigger a pagefault (or memory protection error) and terminate the program. Or you can have a guard page at the top of the stack area, to trigger either a memory protection fault or to read from and detect overwrites after the fact.
 
The stack may grow towards the heap area - leading to potential memory corruption on overflow. This may lead to hard to diagnose bugs that do not occur in "normal" usage.
 
The desire for efficiency in C++ prevents a runtime system being widely used that would bound the call stack depth - as interpreters for e.g. Python do. Similarly, a calling convention with call stack checking is also rejected. In such a scheme, calls to functions would check the remaining size of the stack area, and either pass the size remaining, or compute it on the fly.
 
C++ is "portable". It does not mean that your program will work the same when ported - even on non resource constrained systems, the stack size is variable, as is the allocation size of each stack frame. Your program may fail with stack overflow on some, but not all, platforms: and at some, but not all, optimisation levels.  
 
If the stack is large enough, stack overflow may be a seldom seen failure only occurring from unbounded recursion, typically from a logic error. It can also be provoked, if naive recursive code must process user supplied inputs for e.g. parsing languages with nesting constructs. 

Memory Allocation Failure

C++ heap allocation functions can raise exceptions. Not a problem specific to C++, especially, but hard to recover from. What to do when you have no memory left? What to free? What code is safe to run, and will not attempt itself to allocate more? In general, when the failure to allocate occurs, it is easiest to terminate the program.

There is not a single approach to heap allocation. You may find allocations in C++ style, C style, and in the normal heap, or in arenas of buffers with placement new, or in system heaps. The diversity makes uniform treatment of exhaustion difficult.

The wrinkle with C++ is that allocations are often manual, with sizes in integer types: integer types subject to, for example, overflow: or where a negative signed number is interpreted as a very large unsigned number. Thus, it can be seen that some memory allocation failures are due to attempts to allocate extremely large amounts of memory due to an incorrect size resulting from confusion of signed and unsigned types. 
 
If taking sizes from external inputs or data, it may be worth checking they are plausible before allocating.

Memory Corruptions

C++ is not memory safe. You can write past the end of an array, or the start; that can corrupt the stack or heap, leading to unpredictable behaviour depending on what data or code got corrupted.

The fun aspect is the distance between corruption and detection, and the potential for silently corrupting data.

In Silence

Assertions are good! They should be used. Just, not with the anti pattern of asserting that something must hold, and then, in a release build, merely proceeding as if all was well down a code path that would have been aborted due to the assertion failure in a debug build.

ABI (Application Binary Interface) mismatch

C++ is compiled by multiple compilers. To interoperate, the compilers must agree on, amongst other issues, the size of data types, and the layout of data values in memory and when passing arguments to, and values from, functions, the placement of those in registers or stack locations. 

When a mismatch is not detected at e.g. link time, it may lead to hard to diagnose runtime errors. The outcome of ABI mismatch may be memory corruption, or the corruption of the arguments passed to a function.

Thread Lifespan vs Scoped Lifespans

Managing lifespans of scoped data in C++ is eased by RAII - destructors being run when execution leaves a scope. The issues with threads are not so simple: what to do with a thread that has lived "too long"? You cannot portably, simply, or safely kill it. You cannot let it run forever. You can ask it nicely to exit, if there is a mechanism to do so - but then - how long shall you wait?

If you destroy an instance of the C++ thread class without having called "join()", standard terminate is invoked - your program abruptly exits. 
 
Various thread usage patterns do this: the problem being, that threading and concurrency are hard to test, and timing can affect the order of operations and therefore problems that occur only under load.

Live Lock, Dead Lock, all the usual suspects

The standard mutex type in C++ is not a recursive mutex: a thread can deadlock itself by taking the same mutex twice. As usual, the advice is to not make that kind of mistake. 

DLLs, Libraries, Plugins, and Lifespans

C++ semantics for construction and destruction of globals interact in complex ways with dynamic loading and unloading of shared libraries, and with reloading. It is unsafe, in the extreme, to pass references and pointers to data in one DLL to another, without care for the lifespans of the DLLs contra data. 

And in all the normal ways programs can fail

C++ programs can fail for reasons not specific to C++ - logic errors, typically. There is a sufficient number of ways to fail, that we may wish for fewer added by C++.