Paul Hammant's Blog: Categorizing Languages
So I’ve asked a bunch of colleagues, and there’s no definitive Comp-Sci categorization for what I’m about to discuss. I wish there were but there is not, so I’m going to provisionally title them, and will correct it when a Donald Knuth type intellect weighs in later: “Human-centric source formats” and “Machine-centric source formats”.
Human-centric source formats
Examples are C, Java, Ruby, Python, JSON, YAML, S-Expressions.
Of the seven I mentioned, the first four a turing-complete procedural programming languages, and the last three are ‘good’ formats for declarative data payloads. Functional languages are also in this category, though a larger brain is often required.
Being source formats, they suit version control. These are also eminently diffable, and consequentially merge well (though I think the bar could be lifted with diff semantics that are specialized to each).
There is also a new alternate to YAML and JSON called “Tuple Markup”. See the project page. It is too early to say whether this is technology has staying power or not.
Also note that Python and YAML are stricter with indenting and have rules around white-space concerning that nesting/indenting of scope. I’ve always thought that this gives them an advantage in parsing speed.
Machine-centric source formats
Examples are the SGML derivatives XML and HTML.
They typically NOT parsed by YACC, Bison/flex, Antlr (or similar). Instead ‘SAX’ and ‘DOM’ are often referred to for programmatically processing them. Actually that’s more XML, as HTML has some historical processing strategies that allow for less regularity and even incompleteness. In terms of parsing again, there is a chance that they are faster to parse generally than the human-centric ones above.
Google’s Protobufs is an example of something that is not ‘source’ in it’s ultimate encoding. See their encoding page.
I’m going to follow this entry up with something that talks more about suitability in UI markup.