Thursday 2 February 2012

The Game Has Changed

LLVM is described as "a collection of modular and reusable compiler and toolchain technologies". While true, this description is insufficient:


It is modular, insofar as one can pick and choose from the collection of technologies present within LLVM at a given moment in time. Either from a stable release, or a snapshot of the SVN trunk. Neither approach is satisfactory.


The LLVM project encompasses a set of libraries to manipulate program representations (e.g. LLVM IR), tools to operate on programs in IR form, compiler backends to translate IR to target specific machine code, and with the Clang subproject, a compiler front end to translate programs written in C, C++, Objective-C, and OpenCL into IR. Within this system, some new behaviours may be added through addition of submodules to add, for example, a new pass over IR.


The pace of change in the LLVM SVN trunk is rapid. This includes changes in aspects from the naming conventions for APIs to fundamental alterations in the IR type system shared throughout LLVM, to deprecation of entire compiler back-ends, APIs, and the replacement of those. The laudable goal is to rapidly advance the quality of LLVM, and not to be hampered by approaches found insufficient in some aspect, nor by code lacking an active maintainer. Like any active development project, regressions will and do occur, and the trunk stability fluctuates in any metric from correctness, performance, quality of generated code, or buildability: regressions are intended to be found by testing on build bots, and minimised by a review process on commits either before or after commit.


Periodically, at about 6 month intervals, the code is stabilised and a new "stable" release is made after a period of testing and regression fixes. This code, once tagged as "RELEASE N", is then not updated. Not for trivial bug fixes, not for serious bug fixes, nor the addition of entirely new functionality.


Thus, the choice for a user of LLVM (a user being either a developer using the LLVM compiler suite simply as a compiler, or a developer building upon LLVM to create compilers or other language tools) is to live with a stable release, that rapidly falls behind the current version, or a current version that constantly changes and makes no quality guarantees.


Finding a bug in a stable LLVM release is hopefully rare. It is not impossible, however. The problem is then, when a bug is found, a fix for it is not applied to the stable release. Instead, it is applied to trunk (if present there as well) and becomes available in the next release at some point in the future. If you want that fix on the current (or a previous stable release) you need to cherry pick the fix, and backport it. The difficulty is probably proportional both to the size of the fix, and the duration of the interval between the release date of the release(s) affected and the current date.


So, if a bug is found, a fix will not become available in a simple update to a stable release. There are no patches. There is no LLVM 3.0.1 to follow 3.0. Nor will fixes be backported to previous or older releases, which themselves may not be very old e.g. in a codebase less than 1 year diverged from the tip of the SVN trunk.


Further, while LLVM and related projects are modular, and build on one another, they are tightly coupled. The releases of Clang are tied to the equivalent releases of LLVM. (The trunk build of Clang and LLVM will typically fail to compile if permitted to get of sync). This will assuredly also apply to other LLVM subprojects such as LLDB. You cannot mix and match between releases. You cannot have a new Clang release, with new language features e.g. C++11, and an older LLVM backend (e.g. Alpha just got removed, the C backend is increasingly unmaintained). This would require at least a commitment to a stable API - which LLVM will not provide. (A stable ABI between C++ libraries is an inherent challenge in any case).


To add a new module e.g. a backend, you choose a version of LLVM at a moment in time and implement code generation in the context of that LLVM. If you choose trunk, you will have a constant stream of mostly minor changes to deal with, with the exceptions of subsystem APIs being iteratively improved or changed, minor build system issues such as makefiles being broken, fixed, and changed, and entire modules being removed. If you choose a stable release, you forgo ongoing bug fixes and improvement: and face a massively disruptive change in porting to the next stable release, unless you stick with your old version. The LLVM release notes indicate both major changes and removed features, and the internal API changes: even incomplete listings of these are lengthy, and adaptation of existing code to a new release can be anywhere from trivial to exasperating.


The LLVM infrastructure permits many ways of adding new behaviour to LLVM - plugging in a new pass, adding a new backend: but it is impractical to add all desirable changes in a clean manner by plugging in a new module, as the number of hooks via which change can effected in this manner is limited. At times, changes need to be made within LLVM components (either as dirty hacks, or as the sole means to achieve some legit technical goal). Preserving these changed components is hard: the code in which changes are made will evolve over successive releases, and the modified component cannot be dropped in to a different LLVM release. 


To sum up: it would be desirable if LLVM provided a) backports of bugfixes to stable branches, instead of periodic release snapshots of code and b) stable APIs between components, to less tightly couple the components of LLVM to one specific LLVM release.


The modularity of LLVM + related projects is impressive when compared to the monolith of GCC: LLVM posesses a decent architecture, implementation language (tasteful C++ subset), and a t least one portable and relatively comprehensible build system (via CMake). It would be nice if point releases were made to patch known bugs, and API design was undertaken to loosen inter module dependencies and ease ports of subsystems between LLVM release versions. 


None of the above are insurmountable obstacles to the use of LLVM - but I blench at the prospect of porting changes, features, and fixes forward repeatedly, again - or of living with constant API and design churn.