Email applications should allow HTML emails with sandboxed JavaScript in them. It’s 2014 for Pete’s sake!

While there is a long list of historical reasons why browsers have a locked-down DOM, and why the same origin policy constrains interaction to/from back-end systems for portions of a page, why in 2014 do email applications totally forbid JavaScript? I’m going to do that here, and outline what could happen if email application makers made that leap.

Relevant history.

Email is a much older Internet service than the World Wide Web. It was in existence for decades before Tim Berners-Lee defined HTTP and HTML in 1992. Plain-text (essentially characters with carriage returns limiting line length) was the only way for the decades up to about 1995. When HTML could have taken over, plain-text still prevailed as a ‘lowest common denominator’ (LCD). Why? Many people liked their mail-reading app, and didn’t want to change to one that could potentially read richer content just because a sender wanted to step beyond plain-text. You basically had two choices for richer email content: HTML-email, and RTF. Different vendors used each. Those wanting LCD still forced email to be sent rich and plain modes. It meant that you could use still your preferred email reading package, and continue to follow conversations, but forced the sending app to do a transform from rich to plain, and scotch-tape that into the outgoing email.

While the duality of plain-text and HTML-email is still largely the case today, and most definitely is for mail-lists that stretch outside an organizational monoculture, you can actually be more choosy now (as an individual). For example, if you make a flight booking online, you can tell the airline how to correspond with you: plain-text or HTML-email. The Airline hopefully remembers that choice, and does not keep asking you as you make flight bookings on the web.

The HTML-email payload isn’t the same as it is for real web-apps though. It’s highly restrained. If you’re hand-crafting a very pretty HTML-email for some audience, you are going to most-likely inline a fair bit. It’s inline a small subset of CSS I mean, and only because CSS in HTML is a moving target today. Click through to the PDF on http://www.campaignmonitor.com/css/ - here for posterity the one at time of publication: Sept 2013 V2. What is an is not possible in each client is clearly detailed. Note to that, Gmail looks to be surprisingly weak at CSS support (I’ll come back to that).

Back to “highly restrained”. You also can’t use JavaScript. I mean you can specify it in the body of the message as you would for a real web app, but the mail-app is going to neutralize it at the moment it renders the mail. It’s not so hard to prove that.

Note: HTML-email will win outright over RTF in the years to come because that whole HTML/CSS/JavaScript potential is so big, as is investment in rendering engines for it.

How interactive is HTML-Email today?

Strictly speaking, you can have button effects in HTML-email, but they are forms or links only. If you click them, they are going to cause the visiting of an associated URL in a new browser window (or separate browser tab for webmail apps) that is outside of the control of the email application. That action may be relatively atomic, but it is a handoff. Either you were logged in to the site in question already (in your browser), or a temporary authentication token allowed you to complete an action without being fully logged in. Sometimes you’re left with regret though as the site asks you to authenticate, and you can’t remember your id/password. On a mobile email application, like that for the iphone, you’re no longer in the mail-app, you’re in Safari staring at a login page you’re not going to work through. Either way, nothing is happening as a partnership between an email-application and a web browser.

Why would we want JavaScript turned on?

Emails could be more interactive. At least much more interactive the launch-a-browser-at-this-URL action that’s possible now in HTML-email as described above.

Any clues that it is needed?

Handling of calendar invitations (meetings and wot not)

Gmail’s web-mail application intercepts meeting invitations and replaces their display with a ‘gadget’ that they wrote (and trust). It allows simple interactive actions for the calendar from within the mail view. You can flip back and forth from “yes” to “no” (or “maybe”) in respect of “attending?” to your hearts content. Gmail has an integrated calendar, and it interoperates with the original sender re your attendance choice by the best possible means.

Here is my buddy Simon Stewart (ex-ThoughtWorks, ex-Google, now Facebook) inviting me (a Gmail user) to a meeting from FaceBook’s own email system (which is NOT gmail of course):

Here is what the same invitation looks like in yahoo mail:

The Webmail applications that google and Yahoo are running have interceptors for the meeting invitation, and because of that run their own JavaScript unfettered in the browser’s otherwise “display email” pane. To work well, it requires all mail-app makers to similarly intercept. Here’s what that looks like on an app that doesn’t intercept (same email): Apple’s the iPhone email application:

Ugly huh? In this case, even the iCal invitation spy didn’t cause a new ‘Invitation’ event on my iPhone.

Gmail, in addition to the invitation gadget, has ones for UPS/Fedex tracking emails, Flights, Hotel bookings. Here’s one for flights:

Back to the proposal

Invitation handling shouldn’t change from the interception design that Gmail and Yahoo do, especially as the sender can’t always know your choice of calendar handling. Thus I’m not going to mention that again.

Instead let’s invent an abstract app that could use sender-specified JavaScript: personal timesheet filing (from within the inbox). At ThoughtWorks we have a salesforce.com based timesheet system, and all our staff have to use it. When on a client project, I often file the same hours for the current week as I did for the previous week. Here is a mockup of a HTML-email that has JavaScript driven click interactions with a SalesForce backend:

If I click the first button, it perhaps interacts with a restful service on salesforce.com and doesn’t open a separate browser, as the workflow is fairly atomic. The second and third buttons would of course, or maybe they open a closable dialog layered over the mail-app view.

Technically, there’s no reason why JavaScript libraries could not be external resources, but some pieces are likely to be inlined, like they are for the compromise world of CSS in HTML-emails now.

New services.

Maybe for JavaScript in an email-app there are new built-in services to interact with. “Delete this email” would be a logical function for a “same as last week” button press (after a delay, or “thanks” overlay of some sort). Other people might like the same email to hang around by remembering its ‘filed’ state, luckily IMAP allows rewriting of content, and while you wouldn’t want the email to rewrite its own HTML in a save action, you might be happy to store another data section that’s analogous to cookies or a HTML5 data-store with the email. That way you might see the following if you returned to the email. Here’s the same timesheet email revisited, after being process (rather than ignored) previously.

Enabling JavaScript in email packages.

Clearly to be compatible, email applications are going to have to embed more than a HTML-renderer. Luckily this is a snap in the modern age with the likes of WebKit - 95% of the work has already been done. The complexity is with the sand boxing. The rendered Email must be sandboxed of course. That’s probably easier for fat-client applications embedding WebKit for display of a HTML-email, that it is for the online mail readers that would want to have a iframe for the payload email.

If you check the PDF that the campaignmonitor.com people maintain, you’ll see that the Gmail web-app is amongst the weakest for CSS support, and Apple Mac and iOS email applications are amongst the strongest. I was initially very surprised by that. The Gmail folks are clearly effectively sanitizing a lot of the non-content aspects of a HTML-email, to make it safely fit into the rectangle of the web-mail interfaces. Yet their Android app is as compliant as the Apple products. It MUST all be about safe sandboxing of content in an embedded WebKit, versus the unsafe injection of rectangles into a larger DOM.

Google’s web-mail app in Chrome for a HTML element encompassing an email, has only has parent elements of ‘div’, ‘table’, ‘tr’ and ‘td’ … right the way back to ‘body’. There’s no containment, iframe-style or otherwise. They simply have to sanitize html-email content.

We would really need to make the email and the chrome of the email-app’s parts of the DOM totally separate. It is that same origin policy, updated some. There is, with HTML5, a concept of sandboxing iframes. The official title is “content security policy”. Read about that sandboxing iframes, the content security policy on the html5rocks.com site, and this blog entry from an article by Boy Baukema on ibuildings.com. The HTML5 feature is very powerful, but should be extended to divs too (if not arbitrary elements). Of course that could take years. It’ll still take a year for all browser to support it for iframe only, and that might be enough. Web-mail might be crippled for some time to come then. Not so fat applications that are embedding Webkit (or Trident, or Blink).

Tokens

Emails coming in via SMTP are marginally authenticated against the alleged sender. SMTP (as Simon suggested) is message bus for the masses of course. Something purporting to come from salesforce.com may not have done so, as all email users have experienced in different ways in the past. Should content and JavaScript be allowed to interact with alleged same-origin endpoints without further authentication? No, obviously. A few years ago, I proposed a change to fix SMTP, but it didn’t grab mindshare, so what we have now is going to stay, including the potential for inauthentic emails. Instead, for the enhanced email future I’m outlining, we’re going to need a token that comes with the email that authenticates the recipient to some meaningful limit. The token is made by the backend application as it write and email to send, and not by humans. In the case of the time-sheet app, the the authentication could be single use, or time limited, even if the action were to open a browser to the a pertinent URL. If the recipient were, in their default browser, already logged in to the same app, the redirect action (as it does today) would offer an unrestrained experience for the site. But for the token’s use against a set of end-points for AJAXy operation in an embedded Webkit, then the server should only trust people/token combinations it knows about, and for a short period of time, which includes “use once” in some configurations.

It would be best if the token were sent up as a HTTP header, but there is no reason why the Javascript app should not have access to it itself.

Same Origin (extended)

Whereas same origin today for browser applications if simply a domain to some level of granularity, there’s perhaps a need for a more fine-grained model. How about an time-sheet email arrives with HTML+JavaScript as the payload https://salesforce.com/a/thoughtworks.com/timesheets/ as the origin. End points that are consequentially hit will be exclusively against that URL prefix. This could be one of the meta-tags for the HTML in the payload that is the content of the email.

General App authentication

Maybe there’s another meta-tag that allows the mail-app to seek pre-approval of the app via a URL for some signing chain. The mail-app could ask a nominated server about whether an app was approved or not, and recurse back to some root level signing for the site as a whole. Public keys could be retrieved and trusted, such that a digest of the content of an email could be verified as authentic on receipt. Maybe the end-user, just once, goes into an admin console for the mail-app and actions an “I trust this certificate”. Maybe the end-user’s employer suggests that this is the thing to do during onboarding or a new-hire introduction. For non-corporate situations, maybe banks and alike also inform end users about certificates to trust. It could be fully automatic of course - I’m not a PKI/security expert. At least not for new uses. Hopefully there’s a place for free self-signed certificates given the functionality I’m suggesting is weaved into the URL design of an ordinary web server applications.



Published

January 30th, 2014
Reads:

Categories