Main menu:

Site search

Categories

Archive

Programming Language Universal Grammar

I wanted to expound more on points made in the previous post.

Representing source code as an object structure is manna from heaven. Thinking of source code as text leads to all manner of weird and unnecessary consequences.

But beyond just being “object structures”, an important aspect of what I’m seeing in the new wave of software engineering tools is that the objects (”expressions”) are defined as derivations of real Language objects, which define not only the syntax, but all the details of the operational semantics as well.

Intentional has this idea, but it’s a lot more apparent with JetBrains’ Language API or with IBM SAFARI project’s language descriptors (though I get the impression that IBM’s language description system is evolving rapidly)

I should perhaps qualify my interest in “Domain Workbenches” and say that I’m interested in what it would take to layer a domain-specific language on top of an existing, complex framework. Like J2EE + Oracle, for instance. Where possible, I’d like to see all the maps between languages be bi-directional. In that case, we need full object models of all the general purpose languages involved.

The thought of working with an object called “java” that had all the semantics of that language embedded inside it is pretty powerful. How would/should that look?

I’m just beginning my exploration into this topic, but I go into it with some bias. The attempt to make first-class objects out of these languages sounds very much like the study of natural language. Linguists have cataloged their various idiosyncrasies for years. They’ve been classified in grouped, just like any other object of scientific inquiry.

There is a fairly small set of programming languages in common use. I’ll bet that there can’t be much more than 100 languages. (Compare that to human languages.) How would we classify or group them?

Linguistics have identified a variety of parameters that can be used to classify languages. Things like “subject verb object ordering” or “head directionality” can be considered parameters of natural languages.

We have our own set of parameters. Things like static vs. dynamic, the particular choice of type system, method dispatch behavior, or exception handling semantics. How many parameters would it take to classify the 100 most popular programming langauges?

Once we have that list, we have our own “Programming Language Universal Grammar” (PLUG?).

I think a better understanding of the PLUG would significantly help programming language designers. Especially if it leads to a set of “off the shelf” components that can be used to throw together an entire development environment. Maybe with a little window-dressing and syntactic sugar after the core semantics are defined.

This is part of the appeal of what I’ve seen IBM call “language inheritance”. Underneath that concept is the assumption that it is possible to usefully classify and compare different programming languages.

As with natural linguistics, understanding the universal grammar may also help create a better understanding of the actual cognitive facilities that interact with expressions in these languages. And isn’t that what all this “bad smell” and “readability” stuff is really about?

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google

Write a comment