Paul Hammant's Blog: The Importance of the DOM
Choosing the DOM
This article looks a little deeper into the importance of the DOM as a intermediary, and points out some of the strengths and weaknesses of it. The DOM, of course, is the Document Object Model within a browser, that models the content and layout of a page.
UIs via rendering/layout engines and the DOM
Rendering/Layout engines (wikipedia links: Gecko for Firefox, WebKit for Safari and Chrome, Trident for IE) traverse the DOM, and and processes elements they understand. The aim is to position, represent and render elements according to what it knows inline, or from related CSS. I will conveniently ignore HTML5 Canvas for now. There is a handy comparison on wikipedia of capabilities of the layout engines.
The rendering/layout engine can only process what it knows. I previously talked of glass-ceilings and used the lack of a ‘image drop-down’ as an example of that. In that case “image drop-down” may be easy for humans to understand but the rendering engine needs it represented in the DOM as a series of regular <div> <ul> <li> <span> items, that are unlike the <select> that is canonical for ‘drop-down’. It is like that for every advanced effect, I suggest. Thus to push ahead in terms of capability, the rendering/layout engine needs to be enhanced. There are four main browsers now, meaning that a committee needs to decide what new affordances and capabilities the DOM can encode, and that needs to be rendered and interacted with.
By design, the DOM and browsers pertain to presentation and interaction only. There is a server part of the application, that is most likely handling business logic, and persistence. The server and browser speak HTTP to each other of course, and therefore the overloaded term thin client applies to this class of application. Another “thin” aspect of this type of application is the lazy way that additional “pages” of the application are retrieved at the time of first use. The whole application is delivered incrementally to the browser, while in use, leading to a hugely important faster start.
The DOM is an intermediary between code and the renderer
The DOM, as an intermediate layer, is two-way. You can program it at any stage, including sending it more HTML. It can also be queried to retrieve HTML or specific attributes. There a lot of pros to this, and some cons (as I’ll outline later).
All HTML-centric web technologies mutate the DOM
Cappuccino is about as advanced as you can get for user-experience that works without plugins. Here’s their floor-plan example showing the DOM in the style of a HTML app:
Note that it also uses Canvas tactically. The bed in the circle in that screenshot is via Canvas.
Scalable Vector Graphics (SVG)
SVG, in Firebug, does represent itself as a set of elements that are expandable. As such, it uses a scene-graph rather than a bitmap/buffer to represent its details.
UIs rendered by visiting code (and recursing into increasingly small rectangles)
Generally speaking most non-DOM graphics APIs deal with the leasing of rectangles of pixels from a parent container (browser). You can see the relevant node in the DOM, but can’t see specifics. This class of UI technology, is certainly thicker than the thin-client class above.
These technologies typically have a ‘paint’ or ‘drawRect’ (or similar) method/function for a rectangle, that when invoked will repaint itself to a graphics plane. Often there are threading restrictions for repainting or changing variable state that would cause a repaint. A repaint of everything would be a call into the outer-most instances first (that paint method), and each of those could call into many other paint() methods that are part of the composite makeup of that rectangle. For example a textfield could have a border and the area you type into. This is decomposed into at least two sub-components, meaning two further delegations to ‘paint’. Object-orientated inheritance can, sadly, play a part too. The paint() method/function could be overridden, and may not deliberately call the parent’s paint() as part of its invocation. It’s easy to make a performance mess this way. Do any I/O during one of those paint() invocations (or just code badly) and your finished product is going to suck. This can also mean there are deep invocation stacks during rendering, and implicitly a lot of code execution. Since computers became hand-held, the trade-off between economic graphics systems, and fastest CPUs has been constantly revisited. This is true for browser-plugin based versions of the technology where applicable, and for the desktop incarnations. The only difference is where the rectangle is leased from (browser or OS), and what constraints are placed upon the app (security managers, sandboxes, reduced APIs).
There is a key element of this category of renderer/layout engine: There is no intermediary between code and the renderer, they are intermingled.
Flash leases a rectangle in a viewport of the browser via an <object> element, and renders it’s own stuff in there. It is mostly known for it’s in-browser capability, but can also run standalone on a desktop. Within the leased rectangle, smaller and smaller rectangles are leased by code pertains to increasingly specialized components. Flex/ActionScript: the has a paint() style function at the base of graphics widgets. There is some XML to make the experience more designer friendly too. Apple led an informal campaign to sunset Flash usage when the iPhone was launched without the ability to run it’s apps.
Sun/Oracle’s Swing (1997 and onwards) is another fatter UI technology. The bad: unnecessarily deep stacks. The good: advanced layers, huge control, customizability and alpha-channel stuff. When it was first released performance was a real problem because of slower CPUs had then. Certainly no phone-level chip could run it until 2001. Later, Sun tried to push ahead with JavaFX but to my mind only confused the message. Since we’re hyping the importance of the DOM, and debugging tools like Firebug presently, there is an excellent tool called SwingExplorer that acts like Firebug for a running Swing app. That allows navigation of the object model for the Swing containers/components, which approximates to the DOM, but without the ‘document’ aspect as it’s a live object model (directed graph). The downside is that it has to be programmatically inserted, because the Java Applet plugin for browsers, doesn’t have a further plugin for the likes of Firebug (but could easily).
Oracle’s Swing also works as a desktop UI technology. Take a look at Intellij IDEA to see how sophisticated Swing can get on the desktop.
Google’s Android, though Java, doesn’t use Swing. Licensing more than performance must have been the reasons in 2007. There is a way to embed WebKit in these Java applications (and gain the DOMs benefits), and it’s increasingly the case that developers do that. There is no Android browser-plugin to compete with the Sun/Oracle Java plugin for regular browsers though. That seems like a missed opportunity. Kevin Hickey (a colleague) goes into a broad elaboration of that Android environment.
Microsoft’s .Net bits and pieces
Silverlight was, the cross platform rectangle-leasing technology that Microsoft touted as a Flash and Swing-applet replacement, for RIA rectangles in web pages. It is not any more, as it appears that Microsoft are shifting towards more DOM-like designs going forward. The Windows 8 Metro interface is a clue to that. XAML is the technology foundation, and XAML Browser Applications (XBAP) is allegedly the way to go for cross-browser rich internet applications using this technology. There’s not a lot of recent examples though. Colleague David Nelson has promised a follow-up on that.
- no DOM intermediate
- yes, uses leased rectangles
- actual code participates in paint/repaint
There was a technology called Display PostScript that flourished in the NeXT days. The technology could have had a DOM, in the style of the HTML/browser one, but did not. One of two ways to program the view, was via snippets of PostScript, but it was a one-way trip. You could feel the graphics subsystem snippets of PostScript, but you could not navigate a DOM to retrieve textual PostScript nodes/properties for it. PostScript was fiendishly hard to program anyway.
In your attempts to make your RIA, it’s not hard to encounter a feature/design that exceeds what the HTML and the DOM is ordinarily capable of. I’ve wrote about that glass ceiling before. I did again a few days ago, to show the DOM tricks necessary to break through it. That you avoided 20 days of work, for a 1 day problem (were the technology more capable), is a relief to all concerned. The risk remains though that during the build-out of a sophisticated RIA using the DOM, that you’ll encounter something really hard, that should have been simple.
This was the “con” I was alluding to earlier.
Cut & paste
HTML is very cut and paste-able. If I take the super image drop-down thing I’ve used in that “glass ceiling revisited” article, and select a fair bit more than that in the browser, I can paste that into a number of places that recognize HTML in the clipboard. Gmail’s new-email page is one such place. Here it is after I have pasted that into Chrome:
It looks the same for Gmail in a Safari browser. When I hit send, and check how that appears when it arrives somewhere, it’s messed up a little (and shows the behind the scenes trick that Marghoob Suleman’s JQuery extension did), but it is still HTML in the email. That could easily improve as browsers improve.
Microsoft Word tries to accept the HTML, but does not do too well. Worse again are OpenOffice and Apple Pages.
The point is that I can only do this when there is something like the DOM involved. All the widgets are in together in one document. In this case all the text was placed on the page using the same canonical text rendering aspect of WebKit. When I placed the caret with the mouse and slid to perform selection, the clipboard was filled with something meaningful. For the other type of rendering strategy (visiting ‘paint’ methods recursively), the best you can hope for is a bitmap to represent what was copied. Today, I don’t know of a fat-client UI technology that allows you to select a field label like “Created from HTML SELECT Element” AND the contents of the input-field in one go.
You can print pages, and there is a fair change that they have been coded to have a layout that’s pertinent to printing as well as display. The same layout engine kicks in, but makes different choices as to who to represent, and how to represent it. That’s much easier than it is for classic installable applications.
With the DOM, assistive technologies have a much better chance of representing an interface to a sight-impaired user. That said, I’m guessing the more apparently spurious <div> elements there are, the more confusing that gets. Representation could be via an audio facility, or via via Refreshable Braille Displays. Assistive technologies are much harder for non-DOM technologies.
The DOM, a DOM ?
The DOM, as it has evolved over the years, is a very powerful technology, but it doesn’t remove the need for more native graphics programming APIs. Especially while there are glass-ceilings, and no easy ability to extend it. More native graphics technologies could adopt some of the same “Document Object Model” designs (Display PostScript was close 15 years ago). WebOS was close a few years ago, but only insofar as it expanded the DOM we know into a whole platform API.
Thanks to Jeff Wishnie, Kevin Hickey and David Nelson for contributions :)
Feb 6, 2013: This article was syndicated by DZone
blog comments powered by Disqus