The Cramps’ Poison Ivy in Tokyo, 1991
NO JOKE.
I have been bestowed with one of the most insane honors of my life by The City of Austin…
February 9, 2012 will be proclaimed “Shakey Graves Day” in Austin, Texas by the Austin Mayor.
I am scheduled to perform a song in the council chambers at City Hall at approximately 5:30…
No shit. Major congrats.
When I saw my friend over the holidays, I couldn’t believe my eyes. He had lost so much weight and looked so different. He was always a heavy guy. But somehow he had managed to shed the pounds.
When I asked him how he did it, he told me it was because of his consistency. Consistently working out…
The Beatles - Oh! Darling
So my brother was playing this at my grandma’s house today on the old piano, and he was singing, and it was awesome. Two of my cousins (freshman in college and 11th grade) that were there however, clearly were not enjoying it, and definitely did not recognize the song. I slightly feel responsible for not raising them better than that.
Today’s quote defines the most basic elements of a programming language. Very important!
A powerful programming language is more than just a means for instructing a computer to perform tasks. The language also serves as a framework within which we organize our ideas about processes. Thus, when we describe a language, we should pay particular attention to the means that the language provides for combining simple ideas to form more complex ideas. Every powerful language has three mechanisms for accomplishing this:
- primitive expressions, which represent the simplest entities the language is concerned with,
- means of combination, by which compound elements are built from simpler ones, and
- means of abstraction, by which compound elements can be named and manipulated as units.
In programming, we deal with two kinds of elements: procedures and data. (Later we will discover that they are really not so distinct.) Informally, data is “stuff” that we want to manipulate, and procedures are descriptions of the rules for manipulating the data. Thus, any powerful programming language should be able to describe primitive data and primitive procedures and should have methods for combining and abstracting procedures and data.
Today’s post comes after the heading “What Is Meant By Data?” The context is a rational number arithmetic package introduced at the beginning of the chapter.
We began the rational-number implementation in section 2.1.1 by implementing the rational-number operations
add-rat,sub-rat, and so on in terms of three unspecified procedures:make-rat,numer, anddenom. At that point, we could think of the operations as being defined in terms of data objects — numerators, denominators, and rational numbers — whose behavior was specified by the latter three procedures.But exactly what is meant by data? It is not enough to say “whatever is implemented by the given selectors and constructors.” Clearly, not every arbitrary set of three procedures can serve as an appropriate basis for the rational-number implementation. We need to guarantee that, if we construct a rational number
xfrom a pair of integersnandd, then extracting thenumerand thedenomof x and dividing them should yield the same result as dividingnbyd. In other words,make-rat,numer, anddenommust satisfy the condition that, for any integernand any non-zero integerd, ifxis(make-rat n d), then(numer x) n --------- = - (denom x) dIn fact, this is the only condition
make-rat,numer, anddenommust fulfill in order to form a suitable basis for a rational-number representation. In general, we can think of data as defined by some collection of selectors and constructors, together with specified conditions that these procedures must fulfill in order to be a valid representation.
I like to read a bit of Structure and Interpretation of Computer Programs every day. So, I’m going to start posting a quote from the book as often as possible. Of course, for the first one, I have to go with the opening analogy from Chapter 1.
We are about to study the idea of a computational process. Computational processes are abstract beings that inhabit computers. As they evolve, processes manipulate other abstract things called data. The evolution of a process is directed by a pattern of rules called a program. People create programs to direct processes. In effect, we conjure the spirits of the computer with our spells.
A computational process is indeed much like a sorcerer’s idea of a spirit. It cannot be seen or touched. It is not composed of matter at all. However, it is very real. It can perform intellectual work. It can answer questions. It can affect the world by disbursing money at a bank or by controlling a robot arm in a factory. The programs we use to conjure processes are like a sorcerer’s spells. They are carefully composed from symbolic expressions in arcane and esoteric programming languages that prescribe the tasks we want our processes to perform.
This is the first in (hopefully) many more articles detailing my path in designing and implementing a new language. The focus of this entry will be to focus on what features I like from various languages, and how I might be able to combine them to create a language I enjoy using. But first, I should answer the question:
My number one reason for designing and implementing a language is because I think it will be fun and educational. If I get something that I or other people enjoy using, that would be a nice bonus. I also have never done this before, so I expect to make a lot of mistakes along the way. Perhaps I can use those to guide the design of my next language.
With that question answered, I’ll try to cover the highlights: things I would like to see in a new language.
I’m a big proponent of functional languages. My favorite thing about Haskell is that everything is a function, because that means I can understand any Haskell code I read in terms of functions. I don’t think I want to allow other kinds of syntax.
Haskell’s record syntax is a known wart in the language. Algebraic datatypes are nice, but often you want a larger container to hold data. In my opinion, objects are not the answer: instead, create data structures and functions to operate on them (it achieves the same purpose).
I don’t want to elaborate on this much, but I really don’t want to allow any automatic type coercion. It only causes problems.
There are plenty of dynamic languages to choose from. While they are fun, I don’t think they leverage the power of computers enough to last far into the future. I want the compiler to do as much work as possible up front; therefore, strong static typing is a must.
This would be borrowed almost entirely from Haskell. It would make dealing with structs / functions instead of objects bearable.
While lazy evaluation allows you to do extremely cool and flashy things, I don’t think it buys you much in terms of developer efficiency and readability. It’s just as easy to wrap something in a lambda or a future to emulate laziness when necessary, so I won’t go to the trouble of implementing it directly in the language. This also means that you’ll be able to do IO in any context; I might consider adding some sort of “pure” keyword to distinguish functions which don’t perform any IO.
I’m a fan of Lisp and Haskell. Both languages have a relatively small core, and the rest of the languages is built-up from those small building blocks. Besides making it easier for me to implement, this also means developers need to remember less when they learn the language.
This seems like the best way to go. I like Lisp, but I don’t think we need another dialect of it (Clojure and Racket should be more than enough for anyone). One Haskell feature I can’t live without anymore is auto-currying, so that’ll definitely be included.
I view some features as absolutely essential for me to use any language. Here are some of them.
There’s no reason not to have a rich REPL along with compilation for speed these days. I would hate to use a language without them.
Some languages, like Scala and Clojure, specifically tie themselves to the JVM instead of any other platform. This has its advantages (speed and ubiquity being the most important), but since I’m not really looking to bust into the mainstream with this I’d like to implement it in something I enjoy (not Java). LLVM could be a prime target for optimization. Despite my dislike of the JVM, I’d like to keep it open as an option, so I want to try to keep that part of the code modular.
I’ve heard great things about the ML family of languages module systems. A good module system keeps you from going crazy, and allows you to create modular code for yourself (this is even more imporant in libraries).
I think these features cause trouble in other languages, so I won’t allow them.
I’m not a fan of DSLs that don’t look like the language they’re written in. I understand their value, but I just hate looking at a new bit of code and having to re-learn the semantics of a language which isn’t the one I’m writing in. It takes too much time and is usually unnecessary.
Haskell and Scala both allow you to use any symbol as an operator. This has its advantages. It also leads to horrible operator soup, which tends to have the same effect as these non-language DSLs. This should be discouraged, but I’m not entirely sure I want to rule it out completely just yet - maybe I can find a middle ground.
My main design goal is to end up with readable programs. Developer productivity is a close second, but I never want to sacrifice readability by creating shortcuts. This is just my first day thinking about this, so I may change my mind soon (or once I start implementing it). I’ll continue to write as the ideas develop.
Edit: I might just be describing OCaml. But, whatever.
I was reading Matt Might’s What every computer scientist should know when I saw this fun-sounding Unix exercise:
Report duplicate MP3s (by file contents, not file name) on a computer.
I’m not too experienced with shell programming, but I thought it might be fun to see if I could come up with a one-liner to identify all dupes and print them out for me. Not super exciting for shell pros, but I enjoyed writing it. My strategy was to compute the SHA-1 of every file and compare them. I ended up with this slightly obfuscated line:
find . -type f -exec openssl dgst -sha1 {} \; |
awk '{ print $NF, $0 }' | sort -k 1 | uniq -D -w 40
Breaking it down a bit:
find . -type f -exec openssl dgst -sha1 {} \;
runs ‘openssl dgst -sha1’ on every regular file in the current directory.
awk '{ print $NF, $0 }'
This is a bootleg bit of code so that I could have the SHA-1 as the first field, for passing to sort. And finally,
uniq -D -w 40
identifies lines which are identical for the first 40 chars.
Took about 40 minutes to run on my >100GB library and correctly reported the duplicates. Not bad.
Disclaimer: I’m an amateur mathematician. I wrote this mainly as a way of clarifying my own understanding. I encourage you to research this on your own if you find it interesting!
There are these things called groups in the branch of math called group theory (which consists of studying symmetries of objects). Groups are a sort of structure. A good example of a group is the group that describes the symmetries of a square: it tells you how you can move the square around and end up with the same looking shape (you “preserve its structure”, if you want to be a bit more technical). The interesting thing is that we formalize the movements (rotations and reflections) so that we can play with them the same way we do other algebra: as an example, you might use “r*s” to describe the action of rotating the square by 90 degrees, then flipping it.
I was reading about this subset of groups called finite simple groups, which are like prime numbers for other finite groups. You can’t break them down any further than they already are. For example, if you want to talk about the symmetry of a 15-sided shape, you can instead describe all of its movements by combining different movements of triangles and pentagons, just like how 15 is the product of primes 3 and 5. Any finite group (a group that doesn’t have infinitely many elements) can be constructed using combinations of finite simple groups. Don’t worry about how, but just know that we are interested in these because if we can identify all of them, we can figure how to build any other finite group by combining them somehow. Or something.
But anyway. For a long time people were trying to classify all the finite simple groups, and they succeeded in 2004. The proof isn’t something localized in one place; rather, it was the combined efforts of many mathematicians over many years. One of the finite simple groups is called the Monster group, and the fact that it exists really hurts my brain.
All of the other groups are pretty crazy as it is. Normal shapes can be displayed in 2 or 3 dimensions. Most of these groups describe abstract objects that can only “exist” in much higher dimensions (I put quotes around “exist” because the existence is something we can only ever think about, not create physically). But the Monster, this group takes it to a new level. This one has such a ridiculous number of elements that I can’t even come close to comprehending anything about it.
It has 808017424794512875886459904961710757005754368000000000 elements. This means you can move this weird, abstract object in that many ways and still preserve its structure. To put this in perspective, the square symmetry group I was talking about earlier has 8 elements. And the number of elements in the Monster is an exact number. And it’s unique, and a prime building block of other groups, despite its ridiculous size. You can’t break it down any further. It must exist; it’s a fact of nature. We made it up, but it follows all the rules and therefore, it exists.
The part that really blows my mind is that the smallest dimension you can construct the Monster in is dimension 196883. What the fuck.
(Additional note if you want more mind blowing: the dimensions that you can construct the Monster group in just happen to be values that are extremely close (close enough for it not to be a coincidence) to the first terms of some totally unrelated functions in a different branch of math. It’s just patterns and they’re all interrelated in absurd ways.)
Most of this information comes from the book Symmetry: A Journey into the Patterns of Nature by Marcus du Sautoy, and various bits of poking around Wikipedia. Thanks to Leah for suggestions on ordering and examples.