Hints on Programming Language Design

Overview

This page contains notes on the paper "Hints on Programming Language Design" by C.A.R. Hoare. Here is a summary of the major points that jumped out at me.

Hoare is for:

Languages that aid the programmer in design, documentation, and debugging
Small, simple designs
Static analysis and typing
Safety (bounds checking, etc.)
Fast compilation and efficient code
Readability
Dumpable images
A superb comment convention!
Mutable state (for efficiency)
Type extensions + operator overloading
Consolidating best of existing features when designing a new language

Hoare is against:

Separate debug & production builds
Implicit type conversions/coercion
Reference types and pointers
"Independent" compilation
Syntactic extensions of any kind
Including untried/unproven ideas in a new language

The following sections contain notes and quotes from the corresponding sections of the paper.

§ 2 Principles

Design, Documentation, and Debugging

If a programming language is regarded as a tool to aid the programmer, it should give him the greatest assistance in the most difficult aspects of his art, namely program design, documentation, and debugging.

Program Debugging

Certain programming errors cannot always be detected in this way¹, and must be cheaply detectable at run time; in no case can they be allowed to give rise to machine or implementation dependent effects, which are inexplicable in terms of the language itself. This is a criterion to which I give the name "security".

KISS

A necessary condition for the achievement of any of these objectives is the utmost simplicity in the design of the language. Without simplicity, even the language designer himself cannot evaluate the consequences of his design decisions. Without simplicity, the compiler writer cannot achieve even reliability, and certainly cannot construct compact, fast and efficient compilers. But the main beneficiary of simplicity is the user of the language.

I agree with the principle here. But do my language preferences actually back this up? I haven't used scheme much, but I naturally gravitated towards common lisp, warts and all, because it provides lots of pragmatic niceties² that no doubt drastically increase the burden on the implementer, but provide useful features for the user. Likewise, the complexity of C++ is staggering to the point that I'm disgusted, but I'm on the fence as to whether I'd prefer to trade the usefulness of the STL and other C++ features for the simplicity of ANSI C. There's no doubt I prefer the idea of C, but would I actually choose it for a new project over C++ (assuming I was limited to those two options)… Two years ago I would have definitely chosen C. Having now used C++ (just enough to be dangerous), I'm not sure which I'd choose.

§ 3 Discussion

Hoare's five "catch-phrases" that summarize good language design:

Simplicity
Security
Fast Translation
Efficient object code
Readability

§ 3.1 Simplicity

Simplicity != Modularity != Orthogonality

Some language designers have replaced the objective of simplicity by that of modularity, by which they mean that a programmer who cannot understand the whole of his language can get by with a limited understanding of only part of it…

Another replacement of simplicity as an objective has been orthogonality of design… In the early days of hardware design, some very ingenious but arbitrary features turned up in order codes as a result of orthogonal combinations of the function bits of an instruction, on the grounds that some clever programmer would find a use for them — and some clever programmer always did. Hardware designers have now learned more sense; but language designers are clever programmers and have not.

§ 3.1 Security

Hoare argues in favor of static checks and of having no distinction between debug and production builds.

What would we think of a sailing enthusiast who wears his lifejacket when training on dry land, but takes it off as soon as he goes to sea?

§ 3.3 Fast Translation

Argues in favor of fast compilers and against independent compilation.

instead of constructing a fast translator, language designers turned to independent compilation, which permits a programmer to avoid recompiling parts of his program which he has not changed since the last time. But this is a poor substitute for fast compilation, and has many practical disadvantages…

Suggests the following strategies for fast compilers.

Prescan

Pre scan and lex the input and store it in a compact format.

Precompile

This is a directive which can be given to the compiler after submitting any initial segment of a large program. It causes the compiler to make a complete dump of its workspace including dictionary and object code, in a specified user file.

It's unclear how the above is substantially different/better from independent compilation.

Dump

an instruction which can be called by the user program during execution, and causes a complete binary dump of its code and workspace into a named user file.

Similar to SBCL's save-lisp-and-die or to SmallTalk images.

§ 3.4 Efficient Object Code

There is another argument which is all too prevalent among enthusiastic language designers, that efficiency of object code is no longer important; that the speed and-capacity of computers is increasing and their price is coming down, and the programming language designer might as well take advantage of this. This is an argument that would be quite acceptable if used to justify an efficiency loss of ten or twenty percent, or even thirty and forty percent. But all too frequently it is used to justify an efficiency loss of a factor of two, or ten, or even more; and worse, the overhead is not only in time taken but in space occupied by the running program. In no other engineering discipline would such avoidable overhead be tolerated, and it should not be in programming language design…

The above was written in 1978. It's tempting to dismiss Hoare's arguments in light of the last 30 years of hardware advances, but I believe his central point is still valid. Many people complain about the slowness of interpreted languages like Python and Ruby, and look for faster, yet equally expressive alternatives (Pyston, PyPy, Julia, etc.). Even in domains where speed is not important in fact, it's always important in principle. Given two equally expressive alternatives, people will prefer the faster implementation.

Hoare goes on to make some arguments against optimizing compilers. Having never written a compiler, I'm not really qualified to say whether his fears are justified, but it does seem like no one in the last 30 years took too much notice of his objections. I'm struggling to think of any "production quality" compiler toolchain that doesn't do serious optimization.

§ 4 Comment Conventions

If the purpose of a programming language is to assist in the documentation of programs, the design of a superb comment convention is obviously our most important concern.³

Wow! Bold statement! He presents some good arguments in favor of newline-terminated comments (as opposed to open/close pairs like in ANSI C).

§ 6 Arithmetic Expressions

What's so great about arithmetic notation like E + F? Well, a lot! Hoare lists six "fundamental principles of structuring":

Transparency of meaning and purpose
Independence of parts
Recursive application
Narrow interfaces
Manifestness of structure

Hoare likes operators that can accept and return composite data structures.

Often the programmer wishes to deal with much larger data structures, for example, vectors or matrices or lists; and languages such as APL and LISP have permitted the use of expressions with these structures as operands and results. This seems to be an excellent direction of advance in programming language design, particularly for special purpose languages.

Indeed! Goes on to suggest that a compromise for efficiency is to provide operators that specifically do in-place updates. I.e. A.+B instead of A := A + B for matrices, or L1.append(L2) for lists. Hmmm, did Guido read this paper?

Hoare also likes extensible languages, but not syntactic extensions. Sorry, macros! He only wants the ability to define new types, including low-level details like their layout in memory, and then the ability to define overloaded operators on those types.

Also, Hoare is not amused by implicit type conversions.

A solution to this problem is to design a general purpose language which provides the programmer with the tools to design and implement his own representation for data and code the operations upon it. This is the main justification for the design of "extensible" languages, which so many designers have aimed at, with rather great lack of success. In order to succeed, it will be necessary to recognize the following:

The need for an exceptionally efficient base language in order to define the extensions.

The avoidance of any form of syntactic extension to the language. All that is needed is to extend the meaning of the existing operators of the language, an idea which was called "overloading" by McCarthy.

The complete avoidance of any form of automatic type transfer, coercion, or default convention, other than those implemented as an extension by the programmer himself.

§ 7 Program Structure

Hoare gives an example of ugly syntax, ALGOL 60's switch statement:

switch SS = L1, L2, L3;
...
go to SS[i];

L1: Q1; go to L;
L2: Q2; go to L;
L3: Q3;
 L:

which is clearly terrible. I enjoy finding relics like this in old papers. It gives you an appreciation for how much work has gone into improving languages over the years, even little details like decent syntax for a switch statement, which we now take for granted. Speaking of a newfound appreciation for language features which we now take for granted, earlier in the section Hoare praises ALGOL 60 for innovations such as "its compound, conditional, for, and procedure statements." Useful, those! He continues:

The advantages of the use of these program structures is becoming apparent even to programmers using languages which do not provide the notations to express them.

Poor bastards.

Another interesting historical tid-bit: Hoare apparently was the inventor of the case statement, which he conceived as an improvement over ALGOL 60's switch.

case i of
  {Q1,
   Q2,
   Q3};

§ 8 Variables

Hoare is not a fan of reference types or pointers:

References are like jumps, leading wildly from one part of a data structure to another. Their introduction into high level languages has been a step backward from which we may never recover.

§ 10 Procedures & Parameters

Procedure calls should:

produce compact code (no long prelude/postlude).
indicate which arguments, if any, are modified.

§ 11 Types

Argues in favor of static typing and against implicit type coercion.

§ 12 Language Feature Design

Language design != feature design.

When designing a feature:

focus on one feature at a time
ensure the new feature is an improvement and doesn't break anything
know how to implement the feature efficiently
write docs explaining and motivating the feature, with examples
write example programs using the feature

When designing a language:

should be familiar with many alternate features and have excellent judgment to choose the best among competing alternatives
must be able to reconcile minor inconsistencies among features to present a coherent whole.
must have a clear idea of the scope & purpose of the new language
should have resources to write the implementation and good docs
should have political will & resources to sell the language
should not include untried ideas of his own! "His task is consolidation, not innovation."

§ 13 Conclusion

A final hint: listen carefully to what language users say they want, until you have an understanding of what they really want. Then find some way of achieving the latter at a small fraction of the cost of the former. This is the test of success in language design, and of progress in programming methodology.

Footnotes:

i.e., at compile time

loop, complex lambda lists with keyword and optional args, etc.

Emphasis mine.