Daily Archives: 13 November, 2014

Building an Algorithmic Code

I have a little Lisp implementation. I’ve written about it here before, if only tangentially. It’s got a history over twenty years long now in various forms, but it’s not something I’ve ever polished up into a releasable state yet. Mostly it’s been a learning tool for me personally, something to experiment with when I have a mad-science inspiration for programming languages so I can see the consequences of my bizarre ideas. Some of these inspirations have been truly mad, such as using fexprs and developing some strategies for using them without invoking semantic chaos.

I guess I should start at the beginning and explain where all this came from.

When I was learning my first Lispy language (which happened to be Scheme) many years ago, there were a few things that I considered, in a vague student way that didn’t really know enough yet about how to express the problem, to be … inconsistent. And when the course was over, and I had a little bit of time to do independent experimenting instead of just course work, I implemented my own Lispy language to see what I could do about the inconsistencies. And that’s where this whole thing started.

It turned out that the inconsistencies were there for reasons, and some of them were good reasons. It took years and a lot of learning (and a fair amount of experimenting) to fully understand the design space and the tradeoffs.

Inconsistency was probably a strange thing for someone new to Lisps to notice. The languages I had already learned were far more inconsistent than Scheme. In most Imperative languages and Object Oriented languages, literally every linguistic concept has its own set of rules. Every kind of statement has a particular set of contexts in which it may appear and within each of which it may mean something slightly or totally different. But Scheme was a functional language, and compared to everything I knew at the time the rules and syntax were incredibly simple and consistent. Me noticing, and being bothered by, these little inconsistencies would be kind of like a mountaineer rappelling down off a rocky cliff, looking out over a parking lot, and being a little bit uneasy about the fact that the parking lot is not quite precisely smooth and level.

I probably wouldn’t have noticed it at all except that it was closer to being consistent than anything I’d ever seen. I felt that there was this strong aesthetic being expressed and a very powerful design principle engaged, and that it could be expressed more fully or engaged more powerfully, if just a few things were this way instead of that way. It was as though the guy climbing down off a cliff had looked out at the parking lot and said, well, obviously they were trying to make it smooth and level — so why didn’t they finish the job?

Above all, learning Scheme was what made me first truly understand something which, to this day, I have some trouble expressing in a way that would mean anything to someone who doesn’t already understand it. Computer programs are an executable representation of algorithms, and algorithms are methods for solving equations. And equations are clear and eternal logic that stands independent of every machine, every representation, every human perception, and our whole universe including time itself. Whether an algorithm correctly solves an equation or not is a capital-T Truth so absolute that it doesn’t even need anyone to know that it does, doesn’t even need our universe, and damn sure doesn’t need any particular machine to prove it. Programming, in short, is Math. And even though math is hard, that means that the job of programmers can be much much easier than it normally is. It also caused me to notice that the tools available to people doing (other kinds of) math were a hell of a lot more general than the tools available to people doing programming.

Scheme’s design, more than the design of any programming language I’d ever seen, was informed by the relationship of programming to math rather than the relationship of programs to the hardware they’re running on. Most languages are built up by taking hardware primitives and putting them together in ways that seem like they might be useful; Scheme was built down by taking math primitives and trying to find ways to represent them using computer hardware and software. That is why it had so much more consistency than anything I’d ever seen.

But, that’s also why the remaining failures of consistency kept bothering me. In the corners, where you don’t directly see them most of the time but will run into them eventually, there were exceptional cases where the attempt fell short of the ideal and you just had to memorize and work around things that weren’t any part of the clear bright eternal truths of mathematics, or make do in spite of things from that clear bright eternal truth that were conspicuously missing.

Some inconsistencies which bothered me at the time were purely syntactic and inconsequential; a language would be easier to learn without them but they were not really related to failures of the underlying mathematical model. Still, speaking as someone who must actually work to learn programming languages, they’re significant. For example there is no way to distinguish in Lispy source code when a set of parentheses represents a function call and when it represents something else.

Some inconsistencies, like numeric limits, truly are unavoidable in computation. Math the way Programmers do it with computer numbers is drastically more complex (because of representation and precision issues) than math the way Mathematicians do it with "real" numbers. Nobody makes any computer with truly infinite memory, so you can’t have infinite precision. There are some things you just can’t do and you have to make a decision about whether you want failure as a numerically inexact answer that’s known to be wrong but "reasonably close", and devote your effort to minimizing the error, or as an out-of-memory crash.

Other inconsistencies are not truly unavoidable, but most of the tractable strategies we knew for avoiding them had been leading to deeper and more difficult problems. These included phased-deployment woes where different parts of your universe are visible and different operations are happening at readtime, loadtime, macroexpansion time, and runtime, or where your algorithm’s correctness depends on the order in which modules are loaded. Nothing to do with time ought to matter if you’re doing math that is, after all, eternal.

Some inconsistencies were related to whether a given form is a special form, macro, or an ordinary function. These include whether or not there is a prescribed order for argument evaluation, rules for evaluating (or not evaluating) subexpressions, whether all arguments will be evaluated or whether that depends on the values of other arguments, what environments should be used to do the argument evaluations, and whether the form represents something like a function which exists at runtime and can be stored in variables, returned from functions, applied, etc.

Then there were arbitrary mathematical choices that the languages didn’t provide, such as the distinction between lexical and dynamic scope. Scheme uses lexical scope for almost everything because it’s easier for programmers to keep track of, but has a number of predefined variables such as current-input-port and current-output-port which are dynamically scoped because passing them implicitly from the calling environment is more succinct and usually what’s desired. Other lisps, like elisp, or at that time Common Lisp, used dynamic scope. Common Lisp now uses static scope mostly but has a bunch more special variables that are dynamically scoped.

If you’re doing pure math, the issue of lexical versus dynamic scope usually doesn’t come up because you’re usually not dealing with mutable state. In math, IOW, you’re usually referring to values rather than variables, and values don’t vary at all depending on scope. This is also true when you’re doing pure functional programming, and is one of the biggest reasons why functional programming avoids a lot of headaches. But when in mathematics you are dealing with state, you just pick a scope rule depending on your needs and treat it consistently. If a programming language doesn’t give you the same freedom, then your math isn’t expressible in that programming language until you do some transformations and translations to restructure it for the programming language.

Once I got that far, I wound up looking at yet other things that people who are doing math like mathematicians have the option to choose at need but where people who are doing math using programming languages usually have to take the single choice that the environment provides and restructure the math as necessary to express it in terms of that choice. These include distinctions between call-by-name and call-by-value semantics, distinctions between eager and lazy evaluation of function arguments, and several other things.

So anyway, I spent a couple decades off and on – more off than on of course, because sometimes it would be literally years between the times when worthwhile ideas hit and sometimes I’d spend a year or more not thinking about it at all – wrestling with these nagging consistency issues, and over time my focus shifted from just trying to achieve consistency to something a bit harder to explain.

I was trying to develop a notation that could directly, consistently convey any kind of algorithmic semantics. The idea is that whatever kind of math you are talking about – and here understanding that "math" includes all of programming – you should be able to directly express it without the need to transform its structure to be acceptable to the programming language. And that’s my hobby lisp.

Meanwhile, programming language research has gone off in a completely different direction. While I’ve been focused on trying to find a universal language for algorithms allowing any kind of semantics to be expressed – a sort of universal translation target that can express semantically highly diverse programs without lowering their level of abstraction – others have been focused on finding acceptable minimal sets of restrictions on semantics which provide the strongest possible sets of guarantees of "correctness" for the set of all programs that can be written abiding by that set of restrictions. In terms of mathematics, this is mostly means finding sets of rules not too restrictive of your programs that, if followed, provably permit avoiding some set of range errors. Range errors, where you call a function with one or more arguments over which that function is not defined, are an indication that the programmer has made a mistake.

This is good and worthwhile work because, well, range errors are genuinely errors and we need all the help avoiding errors that we can get. But I’m still trying to extend what semantics can be expressed and the mainstream PL work these days is still trying to constrain the semantics that can be expressed, so I’m definitely haring off in a direction that isn’t where the focus is.

Type systems are also used to provide and express program semantics, not just to restrict them. Therefore a universal translation target must provide ways to represent and express type systems in order to do its job. Fortunately, that is relatively easy; the positive semantics of types in static type systems are representable simply as closures and objects in functional systems. But that strategy won’t perform the proofs before runtime which the static type systems permit, and in principle can only prove that a single particular run of a program didn’t violate the type system, rather than proving that no run of that program can violate the type system.

Some of the value of programs expressed under a type system comes from the proof that it abides by the type system’s restrictions and is therefore invulnerable to that type system’s class of excluded range errors, so if a universal translation target is to provide translations having the same value as the original programs, it has to be able to statically perform the proofs on individual programs that the respective type systems permit on all programs constructed under those type systems. I’ve written about that before, but I don’t have a good way to do it. It’s hard. No, I misspelled that. It’s Hard. With a capital ‘H’.

Anyway, leaving that aside for the moment (maybe that’s an idea that will hit a couple years from now, and maybe not), right now I’m adding new instructions to the bytecode that is run by the bytecode interpreter at the middle of the implementation.

It isn’t still a literal "bytecode" as such; there are already well over 256 different opcodes, and there’ll be many more before I’m done.

Why so many? Well, it’s because at this point I’m trying to “generalize” the bytecode in the same way that I elevated the abstraction level of the lisp that compiles to it. I’m trying to provide operations sufficient to express the semantics on language-level and information-level abstractions rather than on machine-level representations, so that I can roundtrip back and forth between the bytecode and the lisp dialect without loss of meaning.

The problem I’m trying to solve is that when a compiler lowers the level of abstraction of a language in order to express it in terms of machine-level representation, it necessarily makes arbitrary choices which hide the abstractions it’s translating. Its output then depends on arbitrary choices and you can’t meaningfully compare the semantics of code that’s expressed in the bytecode unless the exact same set of arbitrary choices were made in mapping the abstractions to the implementation. I want to be able to compare code, to automatically determine if possible when it has the same (or close) semantics, so I’m providing a bytecode which has built-in abstractions that map in a clean and obvious way to the abstractions of my ‘universal translation target’ Lisp dialect.

As I careen toward a universal translation-target, I’m looking for a substrate that preserves information about higher-level semantics so that constructions from different languages or from libraries with similar, parallel, or overlapping semantics can be expressed in terms of a few compatible bytecode-level abstractions whose correspondence to those constructions is general and obvious and which can be roundtripped back and forth to the translation-target lisp dialect. This, in turn, is in the hopes that expressing the various language-level constructions in terms of those abstractions will promote sufficient uniformity that algorithms which use them can link, can meaningfully be compared, and in as many cases as possible, shown to be equal or unequal, regardless of what language they were originally written in.

I don’t know if this is actually possible; but then, twenty-five years ago I didn’t know whether achieving mathematics-like consistency in a Lisp was actually possible either. And not knowing represented crucial progress, because when I started, I had assumed it would be easy.

So, that’s another day in the life of a mad scientist headed off in a direction completely the opposite of where the mainstream in his field is going. Just thought I’d share.