I have a little Lisp implementation. I’ve written about it here before, if only tangentially. It’s got a history over twenty years long now in various forms, but it’s not something I’ve ever polished up into a releasable state yet. Mostly it’s been a learning tool for me personally, something to experiment with when I have a mad-science inspiration for programming languages so I can see the consequences of my bizarre ideas. Some of these inspirations have been truly mad, such as using fexprs and developing some strategies for using them without invoking semantic chaos.
I guess I should start at the beginning and explain where all this came from.
When I was learning my first Lispy language (which happened to be Scheme) many years ago, there were a few things that I considered, in a vague student way that didn’t really know enough yet about how to express the problem, to be … inconsistent. And when the course was over, and I had a little bit of time to do independent experimenting instead of just course work, I implemented my own Lispy language to see what I could do about the inconsistencies. And that’s where this whole thing started.
It turned out that the inconsistencies were there for reasons, and some of them were good reasons. It took years and a lot of learning (and a fair amount of experimenting) to fully understand the design space and the tradeoffs.
Inconsistency was probably a strange thing for someone new to Lisps to notice. The languages I had already learned were far more inconsistent than Scheme. In most Imperative languages and Object Oriented languages, literally every linguistic concept has its own set of rules. Every kind of statement has a particular set of contexts in which it may appear and within each of which it may mean something slightly or totally different. But Scheme was a functional language, and compared to everything I knew at the time the rules and syntax were incredibly simple and consistent. Me noticing, and being bothered by, these little inconsistencies would be kind of like a mountaineer rappelling down off a rocky cliff, looking out over a parking lot, and being a little bit uneasy about the fact that the parking lot is not quite precisely smooth and level.
I probably wouldn’t have noticed it at all except that it was closer to being consistent than anything I’d ever seen. I felt that there was this strong aesthetic being expressed and a very powerful design principle engaged, and that it could be expressed more fully or engaged more powerfully, if just a few things were this way instead of that way. It was as though the guy climbing down off a cliff had looked out at the parking lot and said, well, obviously they were trying to make it smooth and level — so why didn’t they finish the job?
Above all, learning Scheme was what made me first truly understand something which, to this day, I have some trouble expressing in a way that would mean anything to someone who doesn’t already understand it. Computer programs are an executable representation of algorithms, and algorithms are methods for solving equations. And equations are clear and eternal logic that stands independent of every machine, every representation, every human perception, and our whole universe including time itself. Whether an algorithm correctly solves an equation or not is a capital-T Truth so absolute that it doesn’t even need anyone to know that it does, doesn’t even need our universe, and damn sure doesn’t need any particular machine to prove it. Programming, in short, is Math. And even though math is hard, that means that the job of programmers can be much much easier than it normally is. It also caused me to notice that the tools available to people doing (other kinds of) math were a hell of a lot more general than the tools available to people doing programming.
Scheme’s design, more than the design of any programming language I’d ever seen, was informed by the relationship of programming to math rather than the relationship of programs to the hardware they’re running on. Most languages are built up by taking hardware primitives and putting them together in ways that seem like they might be useful; Scheme was built down by taking math primitives and trying to find ways to represent them using computer hardware and software. That is why it had so much more consistency than anything I’d ever seen.
But, that’s also why the remaining failures of consistency kept bothering me. In the corners, where you don’t directly see them most of the time but will run into them eventually, there were exceptional cases where the attempt fell short of the ideal and you just had to memorize and work around things that weren’t any part of the clear bright eternal truths of mathematics, or make do in spite of things from that clear bright eternal truth that were conspicuously missing.
Some inconsistencies which bothered me at the time were purely syntactic and inconsequential; a language would be easier to learn without them but they were not really related to failures of the underlying mathematical model. Still, speaking as someone who must actually work to learn programming languages, they’re significant. For example there is no way to distinguish in Lispy source code when a set of parentheses represents a function call and when it represents something else.
Some inconsistencies, like numeric limits, truly are unavoidable in computation. Math the way Programmers do it with computer numbers is drastically more complex (because of representation and precision issues) than math the way Mathematicians do it with "real" numbers. Nobody makes any computer with truly infinite memory, so you can’t have infinite precision. There are some things you just can’t do and you have to make a decision about whether you want failure as a numerically inexact answer that’s known to be wrong but "reasonably close", and devote your effort to minimizing the error, or as an out-of-memory crash.
Other inconsistencies are not truly unavoidable, but most of the tractable strategies we knew for avoiding them had been leading to deeper and more difficult problems. These included phased-deployment woes where different parts of your universe are visible and different operations are happening at readtime, loadtime, macroexpansion time, and runtime, or where your algorithm’s correctness depends on the order in which modules are loaded. Nothing to do with time ought to matter if you’re doing math that is, after all, eternal.
Some inconsistencies were related to whether a given form is a special form, macro, or an ordinary function. These include whether or not there is a prescribed order for argument evaluation, rules for evaluating (or not evaluating) subexpressions, whether all arguments will be evaluated or whether that depends on the values of other arguments, what environments should be used to do the argument evaluations, and whether the form represents something like a function which exists at runtime and can be stored in variables, returned from functions, applied, etc.
Then there were arbitrary mathematical choices that the languages didn’t provide, such as the distinction between lexical and dynamic scope. Scheme uses lexical scope for almost everything because it’s easier for programmers to keep track of, but has a number of predefined variables such as current-input-port and current-output-port which are dynamically scoped because passing them implicitly from the calling environment is more succinct and usually what’s desired. Other lisps, like elisp, or at that time Common Lisp, used dynamic scope. Common Lisp now uses static scope mostly but has a bunch more special variables that are dynamically scoped.
If you’re doing pure math, the issue of lexical versus dynamic scope usually doesn’t come up because you’re usually not dealing with mutable state. In math, IOW, you’re usually referring to values rather than variables, and values don’t vary at all depending on scope. This is also true when you’re doing pure functional programming, and is one of the biggest reasons why functional programming avoids a lot of headaches. But when in mathematics you are dealing with state, you just pick a scope rule depending on your needs and treat it consistently. If a programming language doesn’t give you the same freedom, then your math isn’t expressible in that programming language until you do some transformations and translations to restructure it for the programming language.
Once I got that far, I wound up looking at yet other things that people who are doing math like mathematicians have the option to choose at need but where people who are doing math using programming languages usually have to take the single choice that the environment provides and restructure the math as necessary to express it in terms of that choice. These include distinctions between call-by-name and call-by-value semantics, distinctions between eager and lazy evaluation of function arguments, and several other things.
So anyway, I spent a couple decades off and on – more off than on of course, because sometimes it would be literally years between the times when worthwhile ideas hit and sometimes I’d spend a year or more not thinking about it at all – wrestling with these nagging consistency issues, and over time my focus shifted from just trying to achieve consistency to something a bit harder to explain.
I was trying to develop a notation that could directly, consistently convey any kind of algorithmic semantics. The idea is that whatever kind of math you are talking about – and here understanding that "math" includes all of programming – you should be able to directly express it without the need to transform its structure to be acceptable to the programming language. And that’s my hobby lisp.
Meanwhile, programming language research has gone off in a completely different direction. While I’ve been focused on trying to find a universal language for algorithms allowing any kind of semantics to be expressed – a sort of universal translation target that can express semantically highly diverse programs without lowering their level of abstraction – others have been focused on finding acceptable minimal sets of restrictions on semantics which provide the strongest possible sets of guarantees of "correctness" for the set of all programs that can be written abiding by that set of restrictions. In terms of mathematics, this is mostly means finding sets of rules not too restrictive of your programs that, if followed, provably permit avoiding some set of range errors. Range errors, where you call a function with one or more arguments over which that function is not defined, are an indication that the programmer has made a mistake.
This is good and worthwhile work because, well, range errors are genuinely errors and we need all the help avoiding errors that we can get. But I’m still trying to extend what semantics can be expressed and the mainstream PL work these days is still trying to constrain the semantics that can be expressed, so I’m definitely haring off in a direction that isn’t where the focus is.
Type systems are also used to provide and express program semantics, not just to restrict them. Therefore a universal translation target must provide ways to represent and express type systems in order to do its job. Fortunately, that is relatively easy; the positive semantics of types in static type systems are representable simply as closures and objects in functional systems. But that strategy won’t perform the proofs before runtime which the static type systems permit, and in principle can only prove that a single particular run of a program didn’t violate the type system, rather than proving that no run of that program can violate the type system.
Some of the value of programs expressed under a type system comes from the proof that it abides by the type system’s restrictions and is therefore invulnerable to that type system’s class of excluded range errors, so if a universal translation target is to provide translations having the same value as the original programs, it has to be able to statically perform the proofs on individual programs that the respective type systems permit on all programs constructed under those type systems. I’ve written about that before, but I don’t have a good way to do it. It’s hard. No, I misspelled that. It’s Hard. With a capital ‘H’.
Anyway, leaving that aside for the moment (maybe that’s an idea that will hit a couple years from now, and maybe not), right now I’m adding new instructions to the bytecode that is run by the bytecode interpreter at the middle of the implementation.
It isn’t still a literal "bytecode" as such; there are already well over 256 different opcodes, and there’ll be many more before I’m done.
Why so many? Well, it’s because at this point I’m trying to “generalize” the bytecode in the same way that I elevated the abstraction level of the lisp that compiles to it. I’m trying to provide operations sufficient to express the semantics on language-level and information-level abstractions rather than on machine-level representations, so that I can roundtrip back and forth between the bytecode and the lisp dialect without loss of meaning.
The problem I’m trying to solve is that when a compiler lowers the level of abstraction of a language in order to express it in terms of machine-level representation, it necessarily makes arbitrary choices which hide the abstractions it’s translating. Its output then depends on arbitrary choices and you can’t meaningfully compare the semantics of code that’s expressed in the bytecode unless the exact same set of arbitrary choices were made in mapping the abstractions to the implementation. I want to be able to compare code, to automatically determine if possible when it has the same (or close) semantics, so I’m providing a bytecode which has built-in abstractions that map in a clean and obvious way to the abstractions of my ‘universal translation target’ Lisp dialect.
As I careen toward a universal translation-target, I’m looking for a substrate that preserves information about higher-level semantics so that constructions from different languages or from libraries with similar, parallel, or overlapping semantics can be expressed in terms of a few compatible bytecode-level abstractions whose correspondence to those constructions is general and obvious and which can be roundtripped back and forth to the translation-target lisp dialect. This, in turn, is in the hopes that expressing the various language-level constructions in terms of those abstractions will promote sufficient uniformity that algorithms which use them can link, can meaningfully be compared, and in as many cases as possible, shown to be equal or unequal, regardless of what language they were originally written in.
I don’t know if this is actually possible; but then, twenty-five years ago I didn’t know whether achieving mathematics-like consistency in a Lisp was actually possible either. And not knowing represented crucial progress, because when I started, I had assumed it would be easy.
So, that’s another day in the life of a mad scientist headed off in a direction completely the opposite of where the mainstream in his field is going. Just thought I’d share.
I don’t think you’re going opposite to the mainstream. It seems like to be able to embed any type system *idiomatically*, you need to have a type system that allows every kind of restriction the others do. This is a motivation behind the pursuit of logical frameworks with dependent types and all that. Still, Gödel’s incompleteness theorems make it unlikely there’ll be a single foundation everyone wants to use.
I think there’s more leeway if the embeddings don’t have to be “idiomatic” by present-day, mainstream cultural standards. For instance, any Turing-complete language can be embedded in any other by writing an interpreter, and just embedding the program as a string literal (or equivalent). This is a local transformation! If this kind of program is idiomatic for a given family of languages, then the “universal translation target” problem should be especially easy for that family in particular.
The relationship of an interpreted language with its host is a lot like a coroutine. Come to think of it, I was about to remove a halfhearted coroutine system from a language I’ve been working on, but I think it finally has a purpose!
The same language is also going to have rich process control mechanisms where a host and guest process can communicate without it counting as real I/O. (Realistically, the host can make arbitrary changes to the guest’s semantics anyway, so this just makes the relationship more explicit.) I’m interested in how you’re approaching semantic quirks like numeric precision differences between languages. This seems like a place where many programs are naturally saying, “Host, my programmer didn’t exactly specify for this arithmetic to be imprecise in the style of IEEE-752, but… you know programmers, right…?”
I’m pretty stuck on the idea of “idiomatic” — embedding program text as a string literal inside an interpreter doesn’t express it in a way that makes it subject to the same kind of analysis as, or interoperable with, other bits of program text from different languages also embedded as string literals inside interpreters. Providing libraries which cause translation-target code having an abstract syntax tree isomorphic to its own to be a program in the translation-target dialect having the same semantics kills both those birds: it gets interoperability AND makes the semantic analysis required the same analysis we’ll be doing on all of the translation-target code, rather than having to be written for the semantics of each different translation source.
As to the semantic quirks of various math representations, I’m using a “standard” numeric format that is usually overkill in terms of precision compared with what any non-bignum system provides, and is capable of exactly representing all the numbers that normally appear in source code or which any “usual” IEEE format can exactly represent. The idea being that unless something unexpected happens, the built-in translation-target mathematics will be exact in any instance where the built-in translation-source mathematics are exact, and at least as high-precision as the original where not exact. Each “number” contains a numerator, denominator, binary exponent, and decimal exponent. As well as an exactness bit to show whether known error has affected the computation that produced this number, and a couple of other moeities. This makes math slow but this project isn’t about blazing fast math.
I basically break numeric algorithms into three classes. In the first class are algorithms that require exact math and which must be done exactly even if that requires bignums in order to be correct. ECC and RSA encryption fall into this category. For these algorithms I define bignums, and it’s usually obvious when they’re needed because (except for a few languages where bignums are lexically indistinguishable from standard) they’ll be replacing calls to GMP or some other bignum library.
In the second class are algorithms that can tolerate some inexactness but which do not depend on particular kinds of inexactness – if the answer is “close” to the numerically correct answer they work fine. Most numerical algorithms fall in this category, and this is where I have aimed my “standard” numerics.
In the third class are algorithms whose statement is incorrect if one assumes either exact or near-exact numerics; they rely on particular roundoff behaviors, error conditions, or modular operations at particular sizes. Stating them in terms of the theoretically “standard” numeric operations would be incorrect. I want to avoid requiring these much more semantically complex hardware-operations unless absolutely necessary to the correctness of the algorithm, so I keep bindings to those definitions in their own library.
I lack the subtlety of art to distinguish between the second and third class of algorithm by means of static analysis, but so far I get good results most of the time by assuming second-class (inexact but non-peculiar) numerics to start with but then dynamically switching to third-class (“hardware-oriented” ops) if the program signals and halts because the result of an operation exceeds the ranges indicated by the translation source’s type declarations.
I always wind up imagining, when forced to link that numeric-format cruft, that two centuries hence when the “standard” computer architecture is defined on 81-trit words of ternary or some equally-improbable-to-us thing, somebody looking up from a code listing will be saying, “why the hell is this routine defined using modular addition base…. uh, 65536?! What the hell was that, the number of angels those old barbarians thought could dance on the head of a pin or something?”