Paul Hammant's Blog: Google App Engine for Java with Rich Ruby clients
I've been testing the Google App Engine (GAE) for the last few months or so. Not the tried and tested Python version, but a new Java version. The short story is that you upload your WAR file standard-ish web app to the Google AppEngine stack, and it just runs. Well its not quite as simple as that, there are some things to remember.
As an added bonus, this blog article is also about Rich Internet Application served from the Google App Engine. Specially ones written in Ruby Domain Specific Languages..
The app I am using for a test of GAE.
It is a small 'email reading' application. It is a copy of an IBM DeveloperWorks article, and morphed a little to use more modern frameworks/libraries.
The idea is that you login (not using Google accounts in this case), you view inbox or sent message in list form, and can read individual messages, or compose news ones. All the emails are contrived, and you can't really send them anywhere in the demo app. The rough design is pretty simple.
The Web app (as copied from the DeveloperWorks article) receives JSON in response to GET/POST requests to the server. I am using the fantastic XStream library to do that, revisiting some code I donated to it a few years ago that turns object trees into JSON instead of XML. Here are screen shots mapping to the mockups above.
Ruby Client using Swiby
But flushed from the success of using XStream for JSON serialization, and working with Jean Lazarou (who leads the Swiby project from Belgium) we (he) made a version of the applicable 'driver' in XStream that writes similar trees in a Ruby syntax. Is that called RSON? With that we could take Swiby (Swing + Ruby + DSL ) and make a Rich Internet App version of the same application. We also generate the Ruby class-defs to reduce code further, it means that Ruby's require functionality is overridden to GET classes over HTTP. Well actually, the hard work was Jean again, with me being customer & QA. The app works well. Watch the video of it in use, or read Jean's blog entry on the same app - this stuff is fantastic - I really hope that it is a wave of the future. If people can look past the distraction that is JavaFX then they will realize the JRuby is the right language to interpret an elegant DSL over Swing.
Ruby Client using Shoes
The skinny for Java web apps
Its mostly just a classic Java WAR file app. Your familiar web frameworks will work. ThoughtWorks colleagues who also tested this, have tried our JRuby on Rails (Ola Bini), Clojure/Compojure (John Hume), and GWT (Sriram Narayan). Persistence is another thing altogether - forget Hibernate (for now as I expect Gavin King to do something). Deep down, Google BigTable is used, and you get a choice of layers on top of if that make it easier for Java developers. The easiest of all is JDO (or JPA) and an Open Source tool called DataNucleus is going need to be in the WAR to make it happen.
You are going to use your normal development techniques ( IDEA / Eclipse / Maven builds ), though some things need to be backfilled - two crucial Google jars are not in the Maven repositories. For caching, Google have taken the JCache API and backfilled it industrial caching that's no doubt leveraging their secret stuff. The Jar for that though (in the maven repositories) is marked as "1.0-dev-2". Someone needs to release an official and final Jar to maven repositories.
A quick diagram of my app and GAE is below. Of many heresies in my app, one is that the BigTable stored object transparently becomes a Java Object (via JDO) then transparently becomes JSON or RSON (via XStream). No facades, no reduction in the amount of fields hitting the clients - sorry! The good news is that we are just binding URLs to objects/methods and disambiguating parameters based on the method's parameter name. PicoWebRemoting is worth a read.
Much like the Python appengine, you deploy by running a shell script. It takes your unzipped WAR file (with the extra manifest) and pushed it up to the GAE service. If its your first deployment, or with the same "version" string as the previously live deployment, then it goes live immediately. Of it has a different version string, then you have to go to a console to change which one is mapped to default. Given that all versions are available, you're most likely going to want to not use numerical version strings otherwise someone willing to lengthen a domain name, may stumble upon one of the non-default ones. This is an interesting feature. You could deploy an entirely different admin app to one of those and purposely never make it default - it has access to the same data. See the "18.latest" part of the URL below, and imagine the possibilities.
My build complexity
I'm using Maven. Its one of those things that you either love or hate. I wish the build grammar were JRuby or Groovy, but I'll accept its XML imperfections. The App we're talking about has public source and is buildable, but there are some caveats. Follow this elaborate set of steps:
- First off do "mvn install" and watch the build go. Watch the unit tests then Integration tests run (JBehave + Selenium). That's standalone completely outside of the AppEngine service. Its also completely outside of the AppEngine SDK. I'm not used anything proprietary (like Google Accounts) so I don't need to worry there - JDO and JCache are standard Jars and exist with Maven.
- Next up, edit src/main/webapp/WEB-INF/appengine-web.xml and change the application-ID to one your authorized to deploy to and do a "mvn jetty:run-war" from the command line.
- You can now navigate to http://localhost:8080/remoting-ajaxemail-webapp/ and see it working. Well first load the data by visiting (once) json/ReloadData/doIt then back to the root page.
- Now add the two Google jars that are not Maven repositories : appengine-api.jar and datanucleus-appengine-1.0.0.final.jar .. they go into target/remoting-ajaxemail-webapp/WEB-INF/lib/
- Lastly, do appcfg.sh update appcfg.sh update target/remoting-ajaxemail-webapp to push it up to GAE (and sign into your console to see it).
This complexity will get better when Jason van Zyl and gang wrap the GAE SDK with Maven plugins. Knowing Jason, unless there is something very pressing, it'll be done in a few days. I have also had to jump through some hoops in code. The first *huge* problem is that I have mocked out JDO in a minimalistic way. This allows me to build/test in a normal Maven build. It also allows me to deploy outside of Google's dev_appengine.sh script (Jason will make a plugin here). Maybe DataNucleus has a full-feature JDO implementation that is essentially in-memory. The second build issue is that the JCache dependency wants to be resolved by the AppEngine SDK. It throw a NullPointerException if out of the SDK hosting environment, so in my setup code I catch it and use a HashMap implementation instead.
With respect to XStream, I'm running a patched version that does not fail Google's security checks. Thus XStream 1.3.1 will not work with GAE, but 1.3.2-SNAPSHOT will (at least my version of that). Hopefully I can get a fix soon for 1.3.2 from the XStream gang.
Lastly, I was hoping to release another version of PicoContainer and PicoContainer Web Remoting for today, but it could well be tomorrow instead. The build will work just fine then, for those couragous enough to checkout the source and want to build it to deploy to GAE
Anyway, my application is up - http://ph-jdo.appspot.com try it out.
Obviously the appengine-web.xml file is an extra file beyond Sun's specifications. But there are others incompatibilities waiting for you. Not all of them with a specification as such...
Multiple concurrent requests from/for the same session could have concurrency issues.
Be aware also that multiple concurrent requests from the same client are not going necessarily hit the same servlet container at the back end. It is all apparently from the same domain name (there is no resource forwarding going on), but the there most likely will be different servlet container instances responding to the requests. This is not going to be a problem for a correctly written stateless app, but one that leverages the session for storing attributes might experience concurrency issues from two writes of to the same logical resource within that apparent map. The work around is that you are going to have to ensure that your application sends such requests serially. Its important to note that most requests that come in a rush are not going to hit your servlets typically. Most are for gifs, css, HTML etc.
At this stage, we'll assume that two successive requests to the same backend session, are going to be fine. Google (we assume) have a perfectly distributed map, that replicates state ahead of satisfying requests.
I have an app up that demonstrates this I hope. See http://ph-tryout.appspot.com/
I sure hope Google change this in future versions.
Session serialization clashes between versions.
If you deploy a new version of your app, and go to the console to flip the default versions, you may experience exceptions because of changes/missing classes in the new app. This is going to happen outside of your control. That is, if you've marked the app as requiring a session. Before GAE hands control to your servlet/filter/listener it will already know if a session cannot be deserialized, and will have halted operation. Resources include going to data-viewer to delete sessions, smarter session storage designs that leverage only instances of JDK classes, pledges to never delete/change serializable fields/classes or textual designs (like XStream - ignoring the costs of that). I hope that Google have a deployment switch for latter versions called "in flight-sessions" with options of "kill", "bleed-over" and whatever else makes sense. Perhaps too we could see a migration facility - your get to supply a jar that honors an API that will pre-process unconverted sessions just prior to usage. That last will require some clever classloader tricks to allow old and new versions of the same class in the mix, or some class-name rewriting tricks.
Maybe its just safer to not use the servlet session as is, and just push everything to the DB that could otherwise be stored in the session, while also relying on browser state. That's going to be better scaling perhaps.
Services Lock in.
Google adds loads of services to the GAE mix. Its the same set as the Python app engine, but via Java facades. These facades are looked-up statically. We'll come back to that. The issue here is that these are non-standard services. Some are proprietary to Google. Others fit another standard, but are limited. Pure-Google ones are like the facade to Google-Accounts. They are so good (with so many consequential features and problems eliminated) that you'll use them without hesitation, and shave a month of your person-days estimates for the app you were making. Others like BigTable cause pause for thought. Specifically, you have a low-level Google API into it (Ola tried this), or you can do it via JDO. I tried that route and it worked for the most part. DataNucleus is the abstraction that Google have implemented to adapt JDO to BigTable. It works, but lots is not supported - primary-key strategies are limited, and the full RDBMS SQL gammet is not there. As a consequence, the hardest part of your GAE initiative is going to be getting the JDO to work. I expect a slew of open source layers on top of the low level BigTable API to leap into being in the months that come. As it happens, DataNucleus and/or Google's impl of it, are the noisiest things to write to the logs, and the things that delay the initial request by the largest amount of time.
The summarize the services issue: its an issue of being handcuffed the Google stack. Fur-lined handcuffs perhaps.
Sandboxing and jars that don't work out of the box.
Google have implement a very thorough sandbox. Its no doubt to protect them from malicious code. This was likely to have been easier to do for Java than it was for the Python version of GAE as security policies come built-in to Java. Its a huge and under used feature of Java.
XStream is one that is particularly beloved of a section of the Java community. The latest version is 1.3.1 and on initialization anywhere within the realm of GAE is throws exceptions. These are security exceptions that cause the entire app to not function. Exceptions have a way of curtailing the fun of running code. In the case of XStream it is possible to fix it to be compatible with GAE (and all security constrained environments). It is possible that to do it in such a way that it does not change its functionality. I have two different solutions to the XStream problem. They'll be posted on the mail-list shortly, and the dev team in question can choose between them, or implement a third (superior) solution.
There are going to be others. Anything that tries to do advanced classloader stuff. Anything that tries to access restricted resources, properties or tinker with the visibility of certain classes. Other things like extending ObjectOutputStream are not allowed. In the case of XStream there were a ton of things you'll never need on a server-side servlet environment, that we can live without. Playing with XStream has caused me to lament its initialization design. which is static in part.
Response Times, and pricing thereof
You're going to obsess over the timings of requests. The first non-static request to the app after (re)deployment is a shocker. Eight to fifteen seconds is not unusual. There are timeouts I think, but I did not reach them. Second are subsequent requests are going to quicker for scoped-component app. By scoped component, I mean something that is managed by a container that recognizes instances of things that are mappable to 'application', 'session', and 'request'. Request level components (AKA actions or beans) are garbage collected at the end of every requests. Thus, you are not going to create bindings to a datastore layer afresh for every request. Yo would see response times of seconds, and scream about general performance.
Given that GAE wants to scale your out quite quickly by not directing requests to the same app to the same servlet container (regardless of session), the app is going to be initialized multiple times. Specifically that the ServletContextListener.contextInitialized(..) , and any one-time setup you may have been doing in session during HttpSessionListener.sessionCreated(..). If you have 15 seconds per app start, that will mount up. I hope they are not garbage collected too quickly. There's no sharing at the application level, and thus no optimization you can make with functioning instances of things that would map to 'application scope'. It could be that you could use the cache for that, but I have not tried.
CPU usage down to the millisecond
This is one the ways that Google prices this stuff. It is an idealized multiplier of the current CPU's rating against a 1.0 that would be for some 1.2 Ghz intel chip from yester-year. Its also recognizing of the fact that multiple CPUs might be cooperating on a single request. This is not going to happen in the servlet container, nor your servlet (or filters or listeners) - that's all single threaded in a traditional way, but as you drop out of Java into things like BigTable quantum mechanics magic is involved, (or hamsters in running wheels in a parallel universe) and that has to be priced differently. It would be nice to see a future version of GAE split the milliseconds between your-code and our code. As a rough guide though your timings will be 2x for CPU to elapsed. Unless, that is, the resources were marked as static (in which case they are close to 0ms CPU time).
Caching, is going to be a big way of reducing your CPU usage, and shortening times. JCache is the supplied mechanism for leveraging google's scaled-up clustered goodness. You would make your front-controller servlet (or filter) intercept requests for resources that don't change too often and prevent a access to BigTable which is more costly for the elapsed time and has a separate charge/quote just for access. How soon your servlet does that is key to cost. Specifically, if fully static resources are near enough 0ms, your servlet may with caching only bring times down to 50ms versus 150ms without caching.
In summary, static resources are the way to go. If you cannot do that, then caching is a must for the right GET requests.
No Dependency Injection?
Given I've spent the last ten years advocating Inversion of Control (IoC) and standing against Singletons, and slightly more recently pioneering Dependency Injection and Constructor Injection in particular, I'm not appreciating the static nature of the location of services. It is eleven years since Avalon pushed IoC onto the Java scene, and its now-defunct Serviceable (nee Composable) interface would have been a lot better than the assortment of technologies that Sun have left as instant legacy in the J2EE space. Take for example how to get a reference to a JDO PersistenceManager:
I'd rather have such thing injected into Servlet and/or Filters and/or listeners (ignore the static in the class definition) :
But that is only going to get messy for an ever growing set of Google services. They would be comma separated (in no particular order). Given this is a hand off between a system (kernel) and a pluggable service that's allowing as a guest, its probably better to have a very simple API for lower level Google services:
This way Google could expand the range of services quite easily (and secretly). The Dependency Injection folks and black-belt unit-testers can feel appeased. I've blogged before on the stupid lack of DI in the Servlet space.
I love the Java version of the AppEngine. I see myself deploying lots of apps to it. Lots for me and my clients. Its dragging Java to that casual-deployment that PHP/Perl pioneered more than fifteen years ago. It is something that Sun should have done a decade ago too. Enterprises are going to knee-jerk deploy something to GAE/J just so that they can have the experience, then plan which of their forthcoming and current apps should go there long term. This blog is hosted on Rackspace's Slicehost service, which has given me lots more scope for deployment (and I very much love Slicehost too), and may organizations are comfy with Rackspace. You cannot help but feel that new clouds on the horizon are a good thing for competition though.
A word from our sponsor: ThoughtWorks is ready and waiting for your call for any cloud engineering (and Agile) need!
The Visicalc of the modern age
In a way, Google have made the Visicalc of modern age. An easy place to deploy apps too. Companies may have the full "Google Apps" piece and be leveraging AppEngine in addition to gmail, calendaring, docs, spreadsheets, voice ... and whatever else is in the mix. They may well control which staff can deploy apps with a 'theircompany.com' suffix, but they are not going to stop the motivated from deploying their own apps, in their own AppEngine account and sharing them. That's why the Spreadsheet analogy is interesting. For a the longest time, people have been proliferating spreadsheets. Its not that the average Excel 'expert' will be able to develop and deploy to AppEngine/AppSpot tomorrow, but we will see another land rush of software types, making applications that will wizard you an AppEngine solution. Indeed, some will have desktop bound 'admin apps' that will do all the inteop for deployment and management over the wire. Both to the console itself, but also directly to built-in APIs. As well as the Wizard + drag & drop based development and deployment, we'll see for the first time a real component marketplace for Java. Again Sun should have done this a decade ago, to mirror the one that MicroSoft has seen built. Now that we're ignoring EJB totally, we will find it a lot easier to have off-the-shelf components that work with each other. If I complete the Visicalc analogy, Google should be sure to adapt/improve/cheapen so though that a Lotus123 or Excel is not waiting in the wings to sideline it.
But as you think some more about GWT, and it has been some months since I last used it, you remember that the testing and prototyping side of it has a truly thick UI too. One that is based on SWT (the UI technology that underpins Eclipse and also leverages WxWindows). If that mode were to be available for a production deployment then you have one technology that can target thick and thin. It could be both web based and truly rich. I reminded of my proposal to the nascent ThoughtWorks Studios many years ago. From a field of some thirty product idea submissions, they selected "Mingle" and "Twist" as the tools to be developed. These are both tools for developers. Mine was a application for end users, and born from a bitter experience that all ThoughtWorks staff had - exposure to Lotus Notes. The thinking was that MicroSoft sold a ton of Exchange/Outlook licenses, with IBM being a a close second with their (many people hate it) Lotus Notes. ThoughtWorks could make a modern equivalent of Lotus Notes. A full blown offline and online email+collaboration platform, that had easy wizards for constructing apps to be deployed on it. You know the types of apps - timesheets, expenses, skills databases, hiring workflow, room booking, sandwich ordering etc. So if Google made changes to GWT to make it a viable thick/desktop/offline technology then AppEngine might shift up a gear with online/offline apps. I'd much rather program those types of apps in Swiby though, which was part of my groupware product proposal (as was Cozmos for those that are interested).
Shoes or Swiby (if they are to take a percentage of the RIA market, are going to need to have the feel of web browsers. That means, URL fields, back/forward buttons stop/refresh and bookmarks. Swiby has had that for over a year in the main version of "Sweb" (not the custom version we've crafted for interop with PWR and GAE). Shoes needs it from scratch. For example you load the Shoes app, then File->Open to load the initial (local script). Shoes needs to work with remote .rb scripts and HTTP as a file location mechanism just like Firefox (etc). _why is not a fellow to sit still so wait and see.