Monday 30 November 2020

Bang and Blame

The Joy of C++

The joy of C++, an catalog of some ways C++ programs can and do fail in practice. 

Error Handling via Exceptions

C++ supports an exception mechanism - errors can be reported by throwing an exception, and handled by code catching the exception higher up the call stack. Given that C++ is, well, C++, you can throw values that are not exceptions but of any desired type.

Consequently, it is hard to know what exceptions should be handled - there is no common base type. You can try "catch (...)" but then its an interesting challenge to portably determine what the unknown exception is.

The cost of missing to catch an exception is program termination. So, it would be nice if the compiler could assist in ensuring that exceptions that can be thrown are in fact caught.

The compiler will not assist. Functions can have exception specifications, stating either that "nothing is thrown" or "A or B is thrown". These are not statically enforced, but dynamically checked: the action on failing the check is, essentially, to terminate the program. 

Such checks may interfere with the ability of compilers to optimise, and certainly complicate the process.
 

Not all exceptions are Exceptions

The exception handling mechanism of C++ is inadequate for another reason - not all errors generate exceptions. There is no Java-style "nil pointer" deference exception in C++. In broad strokes, your program will crash on dereferencing a null pointer. Such an error is a "system thing", as is integer division by zero. The behaviour is "undefined" - which means in practice you cannot reasonably handle it, portably. 
 

Disabling Error Handling via Exceptions

The C++ language is often subset - in the name of runtime efficiency - by disabling exception handling (typically together with run time type information (RTTI). The consequence is that code must then either return errors via return values, or locally decide that an error is fatal and invoke "abort()" or "exit()". 

Error codes may all to easily be ignored. The other case, where code may locally call "abort()" on error requires that all callers be aware of how all the code they call, directly or indirectly, will handle errors.

It is not best practice for one dynamically linked library in a large application to terminate the whole, with no chance to intervene in the error handling. Yet, without careful API design in all the intervening layers, it may not be possible to report the error all the way up to the top layer.

Stack Overflow

The call stack can overflow in any language with recursion where stack allocation is used for activation records and tail call optimisation is not mandatory. So, this not a C++ specific problem. Yet, C++ does not provide any in language mechanism to know if you *will* overflow the stack, or to handle if you *do* overflow the stack.
 
Two approaches seem used, based on how the program is linked. You can place the stack at the top of memory, and grow the stack towards the top: exceeding the available space will then trigger a pagefault (or memory protection error) and terminate the program. Or you can have a guard page at the top of the stack area, to trigger either a memory protection fault or to read from and detect overwrites after the fact.
 
The stack may grow towards the heap area - leading to potential memory corruption on overflow. This may lead to hard to diagnose bugs that do not occur in "normal" usage.
 
The desire for efficiency in C++ prevents a runtime system being widely used that would bound the call stack depth - as interpreters for e.g. Python do. Similarly, a calling convention with call stack checking is also rejected. In such a scheme, calls to functions would check the remaining size of the stack area, and either pass the size remaining, or compute it on the fly.
 
C++ is "portable". It does not mean that your program will work the same when ported - even on non resource constrained systems, the stack size is variable, as is the allocation size of each stack frame. Your program may fail with stack overflow on some, but not all, platforms: and at some, but not all, optimisation levels.  
 
If the stack is large enough, stack overflow may be a seldom seen failure only occurring from unbounded recursion, typically from a logic error. It can also be provoked, if naive recursive code must process user supplied inputs for e.g. parsing languages with nesting constructs. 

Memory Allocation Failure

C++ heap allocation functions can raise exceptions. Not a problem specific to C++, especially, but hard to recover from. What to do when you have no memory left? What to free? What code is safe to run, and will not attempt itself to allocate more? In general, when the failure to allocate occurs, it is easiest to terminate the program.

There is not a single approach to heap allocation. You may find allocations in C++ style, C style, and in the normal heap, or in arenas of buffers with placement new, or in system heaps. The diversity makes uniform treatment of exhaustion difficult.

The wrinkle with C++ is that allocations are often manual, with sizes in integer types: integer types subject to, for example, overflow: or where a negative signed number is interpreted as a very large unsigned number. Thus, it can be seen that some memory allocation failures are due to attempts to allocate extremely large amounts of memory due to an incorrect size resulting from confusion of signed and unsigned types. 
 
If taking sizes from external inputs or data, it may be worth checking they are plausible before allocating.

Memory Corruptions

C++ is not memory safe. You can write past the end of an array, or the start; that can corrupt the stack or heap, leading to unpredictable behaviour depending on what data or code got corrupted.

The fun aspect is the distance between corruption and detection, and the potential for silently corrupting data.

In Silence

Assertions are good! They should be used. Just, not with the anti pattern of asserting that something must hold, and then, in a release build, merely proceeding as if all was well down a code path that would have been aborted due to the assertion failure in a debug build.

ABI (Application Binary Interface) mismatch

C++ is compiled by multiple compilers. To interoperate, the compilers must agree on, amongst other issues, the size of data types, and the layout of data values in memory and when passing arguments to, and values from, functions, the placement of those in registers or stack locations. 

When a mismatch is not detected at e.g. link time, it may lead to hard to diagnose runtime errors. The outcome of ABI mismatch may be memory corruption, or the corruption of the arguments passed to a function.

Thread Lifespan vs Scoped Lifespans

Managing lifespans of scoped data in C++ is eased by RAII - destructors being run when execution leaves a scope. The issues with threads are not so simple: what to do with a thread that has lived "too long"? You cannot portably, simply, or safely kill it. You cannot let it run forever. You can ask it nicely to exit, if there is a mechanism to do so - but then - how long shall you wait?

If you destroy an instance of the C++ thread class without having called "join()", standard terminate is invoked - your program abruptly exits. 
 
Various thread usage patterns do this: the problem being, that threading and concurrency are hard to test, and timing can affect the order of operations and therefore problems that occur only under load.

Live Lock, Dead Lock, all the usual suspects

The standard mutex type in C++ is not a recursive mutex: a thread can deadlock itself by taking the same mutex twice. As usual, the advice is to not make that kind of mistake. 

DLLs, Libraries, Plugins, and Lifespans

C++ semantics for construction and destruction of globals interact in complex ways with dynamic loading and unloading of shared libraries, and with reloading. It is unsafe, in the extreme, to pass references and pointers to data in one DLL to another, without care for the lifespans of the DLLs contra data. 

And in all the normal ways programs can fail

C++ programs can fail for reasons not specific to C++ - logic errors, typically. There is a sufficient number of ways to fail, that we may wish for fewer added by C++. 
 
 

Monday 28 September 2020

Last Man Standing

Thanks to OpenXCOM (https://openxcom.org/) I have been replaying the XCOM 1 (UFO - Enemy Unknown) and XCOM 2 (Terror from the Deep) games from 1994 + 1995. Thus, with the aid of 25 years further gameplay experience, an improved game engine, bug fixes, mods and only modest amounts of save scumming I have completed XCOM 1.

It is done on Windows - the first time in 20+years the home PC has not been a Mac or a PC running Linux. The cost effectiveness of buying a Mac is simply no longer there, and who wants to administer a Linux system any more or try to diagnose why power management is not all it could be? So far, Windows has not been majorly annoying. It is certainly convenient that a Steam install of the XCOM games is easily found by OpenXCOM.

The reviews of TFTD have rarely been positive in comparison to its predecessor. It has been suggested that the preference for TFTD depends on playing it first, before XCOM. I may agree; I played TFTD before I ever got a copy of XCOM, and found XCOM somewhat lacking in comparison to TFTD for various reasons.

Firstly, TFTD is a reskin of XCOM except with a decent budget for art - both graphics and sound. It has a larger range of animations, more varied level terrain, richer artwork in research reports, and crucially, an oppressive, brooding sound track that provides eery punctuation to the murky depths of underwater combat - combining cautious exploration and reckless attack. The graphics are basic by todays standards, but not spartan or cartoony, as could be said for some of the XCOM artwork.

Both games had bugs as shipped, which are now forgotten and fixed in the remade game engines (and patch releases made in the 90s, too). Yet, the interface of TFTD added one important aspect; opening a door without immediately entering the room or area behind it. This is quite important when the next room can be empty or full or heavily armed aliens...

The difficulty of TFTD is a response to the infamous bug of XCOM where all the save games reverted to "easy"; thus, the difficulty levels actually work, and are harder, too. The increased difficulty of TFTD leads to consequences, and tension, in the game that are not so evident in XCOM. There is not an agreed on set of tactics applicable in nearly all scenarios for TFTD, as there is for XCOM: and those tactics that do exist are not as certain as for XCOM. It is not unlikely to suffer heavy casualties on even small missions. The game does not become a walk in the park in the same way that XCOM does.

TFTD seems to be able to break the routine of XCOM missions, which could be described as first disembarking, then scouting and locating the UFO, before finally storming the UFO in an assault to end the mission. This seems to be done by various means. One is by having the landing site close to the downed alien craft, leading to a choice: to start an assault, and risk being flanked, or to attempt to contain the aliens within the sub while clearing the area? 

Clearing the area is made considerably more difficult by more complex and 3d terrain - its no longer an option to have every soldier in the squad, even without line of sight, shoot at each spotted alien in a massive firestorm to mow them down one by one. Instead, line of fire is often blocked, and the encounters are often down to one or two soldiers versus one or two aliens.

Larger levels lead to an increased risk of being outflanked: tougher aliens also lead to more re-awakenings of stunned aliens that were not killed outright. 

An emphasis on hand to hand combat requires the fragile soldiers to get close to the monstrous creatures to battle them; no more merely hanging back and relying on volleys of automatic plasma fire to prevent proximity. The scariest alien of XCOM (the Chrysallid) is back, except this time it flies (swims!) and no flying suit will save you from it now. The most powerful weapons are not available for all missions (above vs below water) so cannot be leant upon as crutch.

In any case, the games are both now entirely moddable, so you can turn off aspects of the game that irritate (PSI! money worries!) and add fun new behaviours (spherical explosions, better base layouts) and enjoy the bits of the game you want.

Now, I have to salvage the situation of 70% casualties on the first encounter with Lobster men! Back to saving the world