Help

In part 1 of this article, we learnt how to create a new Google App Engine project in Eclipse, integrate the Weld and JSF libraries, run the project locally and finally how to deploy it to the GAE production environment. This second part will look at some of the issues faced when developing a GAE application, particularly when coming from a Java EE development background.

First a disclaimer - while part 1 of this article was more a step by step guide, this part is more of a random collection of thoughts on various aspects of the GAE development process. I still consider myself quite a noob in this area, so if you think you have better information for any of the following topics please let us know in the comments area.

Let's get started! We'll begin by looking at one of the most important things, the persistence layer.

The Google App Engine datastore

If you're used to working with relational databases, then the GAE datastore might be a bit confusing at first. Although Google supports both JDO (don't ask me why) and JPA APIs for datastore access, you need to approach data access in GAE with a different mindset. The most important thing to remember is that the App Engine datastore is designed for scalability, not performance. As far as application architecture goes, my recommendation is to try and design your app to be based on simple, single-table queries filtered to return just the results you need. For data that seldom changes, I recommend that you avoid the database wherever possible and use a cached result instead (I'll cover this a bit later). To be a little bit more particular about what the datastore doesn't support, here's a brief list:

  • Owned many-to-many relationships, and unowned relationships
  • Join queries - you cannot filter by a child entity's field when querying its parent.
  • Aggregation queries such as group by, sum, having, etc

Primary Keys

There are four different options available for primary key values. The easiest way is to just use a Long:

   @Id @GeneratedValue(strategy = GenerationType.IDENTITY)
   public Long getId()

However if you want to create a reference to your entity from other entities you should use an encoded String, which requires an additional annotation on the primary key field:

   @Id @GeneratedValue(strategy = GenerationType.IDENTITY)
   @Extension(vendorName="datanucleus", key="gae.encoded-pk", value="true")
   public String getProjectId()

By using an encoded String primary key value, you can then create references using the standard JPA annotations:

   @ManyToOne
   @JoinColumn(name = "PROJECTID")
   public Project getProject() {
      return project;
   } 

Entity Relationships

By default, relationships to other entities defined via the @OneToOne or @ManyToOne annotations are lazy-loaded. This means that when you perform a query on the parent entity, the child entity isn't loaded until you explicitly read the child property from the parent. Obviously this is going to cause an extra round trip to the database, which may have performance implications. My suggestion here may be a little counter-intuitive, but in cases like this where you want to reduce calls to the database it may pay to de-normalize your database structure if possible.

Alternatively, you may wish to keep lookup data (for example records in COUNTRY or STATE tables in the case of address entities) cached, and simply store the primary key values of the lookup records in the parent table, rather than an object reference. It means a little more work, however is probably worth it for the performance benefits.

Creating the EntityManagerFactory

This is a very expensive operation, and should only be done once. The recommended way is to store the EntityManagerFactory instance in a static field, like this:

public final class EMF {
   private static final EntityManagerFactory emfInstance = Persistence.createEntityManagerFactory("transactions-optional");
   
   public static EntityManagerFactory get() {
      return emfInstance;  
   }
}

To get EntityManager instances you can use a producer method. Since we don't support conversations in GAE yet (see the Unsupported features section below) the following code shows a request scoped producer:

public class EntityManagerProducer  {
         
   @Produces @RequestScoped EntityManager createEntityManager() {
      return EMF.get().createEntityManager();
   }
   
   public void close(@Disposes EntityManager entityManager) {
      entityManager.close();
   }
}

Detaching entities

This one can be quite a gotcha. In GAE, managed entities may have hidden references back to certain database objects, such as the Query that loaded them. This causes problems with session serialization because those database objects aren't serializable. With JDO this isn't such a problem because it has a method called detachCopy() which detaches the object from the persistence context. Unfortunately the JPA spec has only recently introduced (in version 2.0) the detach() method which at present isn't available in GAE.

What this means, is that if you wish to cache objects you load from the database you either need to clone the entities in question, or alternatively create DTOs that contain only the properties that you wish to cache. Hopefully GAE will support JPA 2.0 in the future.

Indexes

Table indexes are configured in a file called datastore-indexes.xml in the WEB-INF directory. There is a property in this file called autoGenerate which if set to true is supposed to detect the queries that you execute when running the application locally and automatically create the necessary indexes. I found this a little flaky in practice and ended up having to configure some indexes manually, which is as simple as adding an entry to datastore-indexes.xml containing the fields of the index and their sort direction. Here's an example:

<?xml version="1.0" encoding="utf-8"?>
<datastore-indexes autoGenerate="true">
    <datastore-index kind="Comment" ancestor="false">
        <property name="blogId" direction="asc" />
        <property name="commentDate" direction="asc" />
    </datastore-index>
</datastore-indexes>

Once your application is deployed to production, you can view the status of your indexes from the GAE dashboard, simply click the 'Datastore Indexes' link in the left hand column. Here's a heads up - index creation in GAE is SLOW! Even if your table contains no data whatsoever, it can literally take hours to create a simple index. So don't get worried if it seems like Google has forgotten to create your index, just be patient.

Security

A great feature of GAE is its integration with the Google Accounts API. In my opinion, why would you want to bother with having to create and maintain user and role tables, creating user registration views, 'I forgot my password' views, CAPTCHA, an e-mail facility for registration confirmation, etc when you can let Google do all that hard work for you. By using the Google Accounts API you allow anyone with a Google account to use your application, meaning you get all the advantages that come with being able to uniquely identify a visitor without any of the maintenance overhead.

Using the Google Accounts API to authenticate is a piece of cake. My recommendation is to create a simple Identity bean which takes care of the security-related stuff for you. Start by creating a method called getLoginUrl() which generates a URL that the user can click to authenticate:

@Named @SessionScoped
public class Identity {

   public String getLoginUrl() {
      ExternalContext ctx = FacesContext.getCurrentInstance().getExternalContext();
      HttpServletRequest request = (HttpServletRequest) ctx.getRequest();
      HttpServletResponse response = (HttpServletResponse) ctx.getResponse();
      
      UserService userService = UserServiceFactory.getUserService();
      return userService.createLoginURL(response.encodeUrl(request.getRequestURI()));
   }
}

Once you've done that, you can then add the following code snippet to your page header or wherever to allow the user to sign into your application:

  <ui:fragment rendered="#{not identity.loggedIn}">
    <a href="#{identity.loginUrl}">Sign In</a>
  </ui:fragment>

When the user clicks this link, they will be redirected to a Google Accounts sign in page. After entering a valid username and password, they will then be redirected back to your application as an authenticated user.

To sign out, you can add another method that generates a logout URL:

   public String getLogoutUrl() {
      UserService userService = UserServiceFactory.getUserService();
      return userService.createLogoutURL("/");
   }

For which the logout link would look like this:

  <ui:fragment rendered="#{identity.loggedIn}">
    <a href="#{identity.logoutUrl}">Sign Out</a>
  </ui:fragment>  

To get a reference to the current user, use the following method:

   public User getCurrentUser() {
      UserService userService = UserServiceFactory.getUserService();
      return userService.getCurrentUser();
   }  

This gives you a reference to a com.google.appengine.api.users.User object, which has methods such as getEmail(), getNickname(), etc for retrieving certain information about the currently-logged in user. If you want to store a reference to a user in the database, use the String value returned by the getUserId() method. This method returns a long identifier value which is unique to that particular user. This is better than using their nickname or e-mail address, both of which can potentially change, whereas the user ID will never change.

Caching

GAE provides a feature called Memcache, which is a high performance distributed cache with generous daily limits. For a more detailed overview of Memcache, see here.

Memcache implements the javax.cache API, which is a good thing because it provides us with an API that you would use much the same way as you would use a Map. You place stuff into the cache using a familiar put(key, value) call, and get it out with a get(key) call. Simple huh?

The easiest way to get a reference to the cache is to just use a producer method. Here's one I prepared earlier:

import java.util.Collections;

import javax.cache.Cache;
import javax.cache.CacheException;
import javax.cache.CacheManager;
import javax.enterprise.context.ApplicationScoped;
import javax.enterprise.inject.Produces;

public class CacheProducer {
   @Produces @ApplicationScoped Cache getCache() throws CacheException {
      return CacheManager.getInstance().getCacheFactory().createCache(Collections.emptyMap());
   }
}

Once we're written this producer method we can simply inject the cache directly into a bean using @Inject Cache cache. We're going to look at caching a little more, further down in the performance section.

JSF

I don't want to get too deep into the JSF side of things because it's really outside the scope of this article. Basically JSF works as intended in GAE, with very little in the way of gotchas. Here's a couple of tips though if you're new to JSF 2.

Request parameters

Request parameters are now defined in the page itself, using the f:metadata tag. Simply use a f:viewParam to bind each request parameter to a property of your model:

  <f:metadata>
    <f:viewParam name="name" value="#{blogSearch.name}"/>
    <f:viewParam name="start" value="#{blogSearch.start}"/>
  </f:metadata>

If the parameter value isn't specified in the request then the value will be null, so make sure the property receiving the parameter value isn't a primitive (i.e. it must be nullable). Using request parameters are a great way of achieving bookmarkable URLs, and of developing a stateless application.

Page actions

This one may be a little strange to you if you're used to using Seam 2's pages.xml to define page actions. Like request parameters, in JSF 2 you also define page actions in the page itself (which kind of makes sense really). Simply use the f:event tag to define a preRenderView event, and specify the method you wish to invoke as the listener:

<f:event type="preRenderView" listener="#{blogAction.setup}"/>

Logging

My recommendation for logging is to use SLF4J. It's already included as part of the weld-servlet distribution so there's no extra libraries to add, and it's a piece of cake to use:

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class MyClass {
  Logger log = LoggerFactory.getLogger(MyClass.class);
  
}

Log messages created by SLF4J will show up in the GAE dashboard, to view them just click the 'Logs' link in the left hand column.

Performance

Let's get the important stuff out of the way first. When it comes to performance in GAE, there's a huge elephant in the room which is the JVM cold start issue that every low traffic application suffers from. To summarise the issue, if your application only receives minimal requests then it will spend most of its time in 'cold storage' (that's my term, but I give you permission to use it also). Once a loading request (this is what Google calls a request that initializes the application container) comes in for your application, GAE will start up a new container for your application to serve the request, and then after a set amount of time (can be just a minute or two) your application will be put back into cold storage again.

To contrast this with the way you're probably used to developing in Tomcat/JBoss/Glassfish (or whatever), a normal container will perform startup before any requests are served, meaning that the container is already hot when the first request comes in. In GAE however the container initialization itself is done during the request.

Why is this bad? Well, when you factor in that the basic container startup itself takes around 10 seconds, creating an EntityManagerFactory is around 3-4 seconds, JSF startup probably takes another few seconds and Weld itself can take 5-6 seconds (we're working at reducing this) we're looking at a minimum of 20 seconds or so just to serve that request. Often the request can even exceed the 30 second hard limit that GAE has, which means that the user is simply given an error message. If you're trying to attract new users to your site you can most likely see the problem here - most users aren't willing to wait even 3 seconds let alone 20+, and to give them an error message almost guarantees that they'll never be visiting your site again (unless it's your mother).

There is some hope though - Google are aware of this severe limitation and there is an open issue to address it:

http://code.google.com/p/googleappengine/issues/detail?id=2456

I strongly recommend you vote for this issue (by starring it) if you are thinking of using GAE for a low traffic site.

Now that I've talked about the big issue, let's look at some of the smaller stuff you can do to improve performance in your app. My number one tip is this - CACHE EVERYTHING! GAE's MemCache feature provides you a place to put your data which you can access much faster than performing a database lookup. If you've got certain queries that are executed on a regular basis then put the results in the cache and use that instead. The less database access you do in your app, the better.

Here's an example of caching query results:

private static final String CACHE_KEY = "RECENT_POSTS";

@Inject Cache cache;   

public List<RecentPost> getRecentPosts() {
  if (cache.get(CACHE_KEY) == null) {
     List<RecentPost> recentPosts = new ArrayList<RecentPost>();         
     
     EntityManager em = entityManagerInstance.get();
     List<Blog> results = em.createQuery("select b from Blog b order by b.entryDate desc")
           .setMaxResults(5)
           .getResultList();
           
     for (Blog blog : results) {
        recentPosts.add(new RecentPost(blog.getBlogId(), 
              userCache.getNameForUserId(blog.getUserId()),
              blog.getTitle() != null && !"".equals(blog.getTitle()) ? 
                    blog.getTitle() : "Untitled Post", blog.getTag()));
     }
     
     cache.put(CACHE_KEY, recentPosts);
  }
  
  return (List<RecentPost>) cache.get(CACHE_KEY);
}

By avoiding the database hit here, you'll make dramatic improvements to request times which in the end is one of the most important things for your users.

Another tip is to use paging whenever possible to constrain the size of your query results (when you're forced to query the database). Make use of the setMaxResults() and setFirstResult() methods provided by the Query API to limit how much data you present to the user in any single request.

Unsupported features

The most glaring omission in features is Weld's support for conversations. Due to the way that conversation cleanup is implemented in Weld (Future-based, which is a no-no in GAE) you currently get an exception when attempting to begin a new conversation. We will hopefully address this issue in a future release of Weld. My recommendation for now is to make your beans @RequestScoped wherever possible and model your application to be as stateless as possible. Use request parameters wherever it makes sense (see the JSF section above) and make use of the @Model stereotype (which when placed on a bean makes it @Named and @RequestScoped).

Conclusion

While we've covered a fair bit of ground in this article, there's probably a lot of other useful tips that I've missed. It may possibly be useful to convert this article (including Part 1) into a wiki page on seamframework.org which can serve as a central reference point for developing Weld apps in GAE, which anyone could contribute to. If you think this might be helpful and would like to see it happen, or have any other ideas please let us know in the comments.

8 comments:
 
08. Mar 2010, 16:57 CET | Link

Actually, in Weld trunk conversations are now implemented synchronously (with an option for switching to asynchronous timeouts) so there is chance that the mentioned exception doesn't occur. Note: I didn't test it, just an observation.

We are currently requesting feedback from the CanDI and OpenWebBeans guys regarding the ConversationManager API we are using. Hopefully we can submit it to the JSR-299 EG and have it included in a future revision of the spec for the benefit of portable conversations.

 
08. Mar 2010, 20:49 CET | Link

Any common way or caveat to transaction handling?

 
09. Mar 2010, 10:37 CET | Link

You can simplify your data access code quite a bit if you ditch GAE's JDO/JPA interface and use one of the available alternative APIs. My favorite (ok, I'm the lead developer :-) is Objectify-Appengine (http://code.google.com/p/objectify-appengine/).

Here's a bit of sample code using Objectify with Weld:

public class ObjectifyFactoryProducer  {
   @Produces @ApplicationScoped ObjectifyFactory createObjectifyFactory() {
      return new ObjectifyFactory();
   }
}
@Cached
public class Blog {
   @Id Long id;
   String content;
   Date entryDate;
}
@Inject ObjectifyFactory fact;

public List<RecentPost> getRecentPosts() {
  Objectify ofy = fact.begin();
  for (Blog blog: ofy.query(Blog.class).order("-entryDate").limit(5)) {
    ...
  }
}

Note that this will automatically put the Blog entities in the memcache (just like a 2nd level cache) and any future get()s (or batch get()s) will come out of the cache.

There are many more examples and (I like to think) some pretty decent documentation at the google code site. The examples don't use Weld, but the transition is easy - you just want to @Inject an @ApplicationScoped ObjectifyFactory in any classes that want data access.

It should be way, way, way easier than working with JDO or even JPA (which, btw, doesn't let you detach objects on GAE and will screw you when you eventually want to serialize an entity).

 
09. Mar 2010, 17:09 CET | Link

Great tip, thanks Jeff. With GAE's limited support for JPA it's nice to see an alternative that even takes advantage of memcache out of the box. I'll have to give it a try sometime ;)

 
09. Mar 2010, 17:18 CET | Link
Ales Justin wrote on Mar 08, 2010 14:49:
Any common way or caveat to transaction handling?

No caveats that I'm aware of, I think they just work as you'd expect. There's a page with some details in the GAE documentation here.

 
09. Mar 2010, 20:53 CET | Link
No caveats that I'm aware of, I think they just work as you'd expect. There's a page with some details in the GAE documentation here.

When the expected is remember that you can only operate on objects that belong to the same entity group within a transaction. :-)

Just thinking about the design on how to get past this limitation -- which is a big mind shift from the standard Hibernate/JPA we're used to.

 
15. Mar 2010, 20:20 CET | Link
Just released this week is a new alternative datastore interface Twig which overcomes many of the problems you mention here

Threading - Twig is the only interface to support parallel non-blocking queries. This can really make a HUGE difference in response time!

Twig supports Owned many-to-many relationships, and unowned relationships. In fact it is the only Java interface that does.


Twig supports OR queries by merging results from multiple queries and streaming them - -not keeping them in memory.

It support embedded instances and collections of instances which often lets you filter by a child entity's field when querying its parent.

The announcement on the app engine mailing list is here:

http://groups.google.com/group/google-appengine-java/browse_thread/thread/aafbeb679a6e6790


More details and downloads here:

http://code.google.com/p/twig-persist/

Cheers!

John
16. Apr 2010, 21:23 CET | Link
Matija Mazi | matija.mazi(AT)gmail.com
I just tried using conversations in GAE with Weld built from trunk and it DOES work. (It doesn't work with Weld 1.0.1.)