Google App Engine Java and GWT Application Development

Google App Engine Java and GWT Application Development
This book is designed to give developers the tools they need to build their own Google
App Engine (GAE) with Google Web Toolkit (GWT) applications, with a particular
focus on some of the technologies useful for building social-media-oriented applications.
The book is centered on a GAE + GWT Java application called Connectr, which is
developed throughout the chapters and demonstrates, by example, the use of the
technologies described in the book. The application includes social-media information
gathering and aggregation activities and incorporates the use of many App Engine
services and APIs, as well as GWT design patterns and widget examples.

Several stages of the Connectr application are used throughout the book as features are
added to the app. Code is included with the book for all application stages, and each
chapter indicates the stage used.

GWT Articles & Books

What This Book Covers

Chapter 1, Introduction, introduces the approaches and technology covered in the book,
and discusses what lies ahead.

Chapter 2, Using Eclipse and the Google Plugin, describes the basics of setting up a
project using the Eclipse IDE and Google’s GWT/GAE plugin. Topics include defining,
compiling and running an Eclipse GWT/GAE project, and using the GWT developer
browser plugin with the interactive debugger. The chapter also covers how to set up an
App Engine account and create applications, and how to deploy an app to App Engine
and access its Admin Console.

Chapter 3, Building The Connectr User Interface with GWT, focuses on GWT, and
building the first iteration of the Connectr application’s frontend. The chapter looks at
how to specify widgets, with a focus on declarative specification using GWT‘s UIBinder
and using the GWT RPC APifor server-side communication.

Chapter 4, Persisting Data: The App Engine Datastore, covers Datastore basics. In the
process, the first iteration of Connectr’s server-side functionality is built. The chapter
looks at how the Datastore works, and the implications of its design for your data models
and code development. It covers how to use Java Data Objects (JDO) as an interface to
the Datastore and how to persist and retrieve Datastore entities.

Chapter 5, JDO Object Relationships and Queries, builds on the topics of Chapter 4. It
describes how to build and manage JDO objects that have relationships to each other,
such as one-to-many and one-to-one parent-child relationships. It also covers how to
query the Datastore, and the important role that Datastore indexes play in this process.

Chapter 6, Implementing MVP, an Event Bus and Other GWT Patterns, builds on the
client-side code of Chapter 3, and shows how to make the frontend code modular and
extensible. It accomplishes this via use of the MVP (Model-View-Presenter) and Event
Bus design patterns, history/bookmark management, and an RPC abstraction, which
supports call retries and progress indicators.

Chapter 7, Background Processing and Feed Management, centers on defining and
running decoupled backend asynchronous tasks. In the process, the chapter introduces
several App Engine services, including URLFetch and Task Queues, shows the use of
Query Cursors to distribute Datastore-related processing across multiple Tasks, and
introduces the use of Java Servlets and the incorporation of third-party libraries in a
deployed application.

Chapter 8, Authentication using Twitter and Facebook OAuth and Google Accounts, adds
authentication, login, and account functionality to Connectr, allowing it to support
multiple users. The chapter demonstrates the use of both the Google Accounts APiand
the OAuth protocol for creating user accounts.

Chapter 9, Robustness and Scalability: Transactions, Memcache, and Datastore Design,
delves into more advanced Datastore-related topics. The chapter investigates Datastorerelated
means of increasing the robustness, speed, and scalability of an App Engine app,
including several ways to design data classes for scalability and to support efficient joinlike
queries. The chapter also introduces App Engine transactions and Transactional
Tasks and the use of Memcache, App Engine’s volatile-memory key-value store.

Chapter 10, Pushing fresh content to clients with the Channel API, covers the
implementation of a message push system using the App Engine Channel API, used by
Connectr to keep application data streams current. The chapter describes how to open
back-end channels connected to client-side socket listeners, and presents a strategy for
preventing the server from pushing messages to unattended web clients.

Chapter 11, Managing and Backing Up Your App Engine Application, focuses on useful
App Engine deployment strategies, and admin and tuning tools. It includes ways to
quickly upload configuration files without redeploying your entire application and
describes how to do bulk uploads and downloads of application data. The chapter also
discusses tools to analyze and tune your application’s behavior, and the App Engine
billing model.

Chapter 12, Asynchronous Processing with Cron, Task Queue, and XMPP, finishes
building the server-side part of the Connectr app. The chapter introduces the use of App
Engine Cron jobs, configuration of customized Task Queues, and App Engine’s XMPP
service and API, which supports push notifications. The chapter shows the benefits of
proactive and asynchronous updating—the behind-the scenes work that keeps Connectr’s
data stream fresh—and looks at how App Engine apps can both send and receive XMPP
messages.

Chapter 13, Conclusion, summarizes some of the approaches and technology covered in
the book, and discusses what might lie ahead.

Robustness and Scalability:Transactions, Memcache, and Datastore Design
Chapter 4 and Chapter 5 explored the basics of using the App Engine Datastore. In this
chapter, we’ll delve deeper to investigate Datastore-related ways to help increase the
robustness, speed, and scalability of an App Engine app, and apply these techniques
to our Connectr app.

First, in the Data modeling and scalability section we look at ways to structure and
access your data objects to make your application faster and more scalable.

Then, the Using transactions section describes the Datastore transactions, what they
do, and when and how to use them. Finally, Using Memcache will introduce App
Engine’s Memcache service, which provides a volatile-memory key-value store, and
discuss the use of Memcache to speed up your app.

In this chapter, we will use for our examples the full version of the
Connectr app, ConnectrFinal.

Data modeling and scalability

In deciding how to design your application’s data models, there are a number of
ways in which your approach can increase the app’s scalability and responsiveness.
In this section, we discuss several such approaches and how they are applied in
the Connectr app. In particular, we describe how the Datastore access latency can
sometimes be reduced; ways to split data models across entities to increase the
efficiency of data object access and use; and how property lists can be used to
support “join-like” behavior with Datastore entities.

Reducing latency—read consistency and Datastore access deadlines

By default, when an entity is updated in the Datastore, all subsequent reads of that
entity will see the update at the same time; this is called strong consistency. To
achieve it, each entity has a primary storage location, and with a strongly consistent
read, the read waits for a machine at that location to become available. Strong
consistency is the default in App Engine.

However, App Engine allows you to change this default and use eventual
consistency
for a given Datastore read. With eventual consistency, the query
may access a copy of the data from a secondary location if the primary location is
temporarily unavailable. Changes to data will propagate to the secondary locations
fairly quickly, but it is possible that an “eventually consistent” read may access a
secondary location before the changes have been incorporated. However, eventually
consistent reads are faster on average, so they trade consistency for availability. In
many contexts, for example, with web apps such as Connectr that display “activity
stream” information, this is an acceptable tradeoff—completely up-to-date freshness
of information is not required.


 This touches on a complex and interesting field beyond the scope of
 this book. See http://googleappengine.blogspot.com/2010/03/
 read-consistency-deadlines-more-control.html,

http://googleappengine.blogspot.com/2009/09/migrationto-

 better-datastore.html, and http://code.google.com/
 events/io/2009/sessions/TransactionsAcrossDatacenters.
 html for more background on this and related topics.
 

In Connectr, we will add the use of eventual consistency to some of our feed object
reads; specifically, those for feed content updates. We are willing to take the small
chance that a feed object is slightly out-of-date in order to have the advantage of
quicker reads on these objects.

The following code shows how to set eventual read consistency for a query, using
server.servlets.FeedUpdateFriendServlet as an example.


 Query q = pm.newQuery("select from " + FeedInfo.class.getName() +
 "where urlstring == :keys");
 //Use eventual read consistency for this query
 q.addExtension("datanucleus.appengine.datastoreReadConsistency",
 "EVENTUAL");
 

App Engine also allows you to change the default Datastore access deadline. By
default, the Datastore will retry access automatically for up to about 30 seconds.
You can set this deadline to a smaller amount of time. It can often be appropriate to
set a shorter deadline if you are concerned with response latency, and are willing to
use a cached version of the data for which you got the timeout, or are willing to do
without it.

The following code shows how to set an access timeout interval (in milliseconds) for
a given JDO query.


 Query q = pm.newQuery("…");
 // Set a Datastore access timeout
 q.setTimeoutMillis(10000);
 

Splitting big data models into multiple entities to make access more efficient

Often, the fields in a data model can be divided into two groups: main and/or
summary information that you need often/first, and details—the data that you might
not need or tend not to need immediately. If this is the case, then it can be productive
to split the data model into multiple entities and set the details entity to be a child of
the summary entity, for instance, by using JDO owned relationships. The child field
will be fetched lazily, and so the child entity won’t be pulled in from the Datastore
unless needed.

In our app, the Friend model can be viewed like this: initially, only a certain amount
of summary information about each Friend is sent over RPC to the app’s frontend
(the Friend’s name). Only if there is a request to view details of or edit a particular
Friend, is more information needed.

So, we can make retrieval more efficient by defining a parent summary entity, and
a child details entity. We do this by keeping the “summary” information in Friend,
and placing “details” in a FriendDetails object , which is set as a child of Friend
via a JDO bidirectional, one-to-one owned relationship, as shown in Figure 1. We
store the Friend’s e-mail address and its list of associated URLs in FriendDetails.
We’ll keep the name information in Friend. That way, when we construct the initial
‘FriendSummaries’ list displayed on application load, and send it over RPC, we only
need to access the summary object.

A details field of Friend points to the FriendDetails child, which we create when
we create a Friend. In this way, the details will always be transparently available
when we need them, but they will be lazily fetched—the details child object won’t
be initially retrieved from the database when we query Friend, and won’t be fetched
unless we need that information.

As you may have noticed, the Friend model is already set up in this manner—this is
the rationale for that design.

Discussion

When splitting a data model like this, consider the queries your app will perform
and how the design of the data objects will support those queries. For example,
if your app often needs to query for property1 == x and property2 == y, and
especially if both individual filters can produce large result sets, you are probably
better off keeping both those properties on the same entity (for example, retaining
both fields on the “main” entity, rather than moving one to a “details” entity).

For persistent classes (that is, “data classes”) that you often access and update, it is
also worth considering whether any of its fields do not require indexes. This would
be the case if you never perform a query which includes that field. The fewer the
indexed fields of a persistent class, the quicker are the writes of objects of that cl ass.

Splitting a model by creating an “index” and a “data” entity

You can also consider splitting a model if you identify fields that you access only
when performing queries, but don’t require once you’ve actually retrieved the object.
Often, this is the case with multi-valued properties. For example, in the Connectr app,
this is the case with the friendKeys list of the server.domain.FeedIndex class (first
encountered in Chapter 7). This multi-valued property is used to find relevant feed
objects but is not used when displaying feed content information.

With App Engine, there is no way for a query to retrieve only the fields that you
need (with the exception of keys-only queries, as introduced in Chapter 5), so the
full object must always be pulled in. If the multi-valued property lists are long,
this is inefficient.

To avoid this inefficiency, we can split up such a model into two parts, and put each
one in a different entity—an index entity and a data entity. The index entity holds
only the multi-valued properties (or other data) used only for querying, and the data
entity
holds the information that we actually want to use once we’ve identified the
relevant objects. The trick to this new design is that the data entity key is defined to
be the parent of the index entity key.

More specifically, when an entity is created, its key can be defined as a “child” of
another entity’s key, which becomes its parent. The child is then in the same entity
group
as the parent (we discuss entity groups further in the Using transactions
section). Because such a child key is based on the path of its parent key, it is possible
to derive the parent key given only the child key, using the getParent() method of
Key, without requiring the child to be instantiated.

So with this design, we can first do a keys-only query on the index kind (which is
faster than full object retrieval) to get a list of the keys of the relevant index entities.
With that list, even though we’ve not actually retrieved the index objects themselves,
we can derive the parent data entity keys from the index entity keys. We can then do
a batch fetch with the list of relevant parent keys to grab all the data entities at once.
This lets us retrieve the information we’re interested in, without having to retrieve
the properties that we do not need.


 See Brett Slatkin's presenta tion, Building scalable, complex
 apps on App Engine (http://code.google.com/events/
 io/2009/sessions/BuildingScalableComplexApps.
 html) for more on this index/data des ign.
 

Our feed model (which was introduced in Chapter 7) maps well to this design—we
filter on the FeedIndex.friendKeys multi-valued property (which contains the list
of keys of Friends that point to this feed) when we query for the feeds associated
with a given Friend.

But, once we have retrieved those feeds, we don’t need the friendKeys list further.
So, we would like to avoid retrieving them along with the feed content. With our
app’s sample data, these property lists will not comprise a lot of data, but they would
be likely to do so if the app was scaled up. For example, many users might have the
same friends, or many different contacts might include the same company blog in
their associated feeds.

So, we split up the feed model into an index part and a parent data part, as shown in
Figure 2. The index class is server.domain.FeedIndex; it contains the friendKeys
list for a feed. The data part, containing the actual feed content, is server.domain.
FeedInfo. When a new FeedIndex object is created, its key will be constructed so
that its corresponding FeedInfo object ‘s key is its parent key. This construction must
of course take place at object creation, as Datastore entity keys cannot be changed.


 For a small-scale app, the payoff from this split model would
 perhaps not be worth it. But for the sake of example, let's
 assume that we expect our app to grow significantly.
 

The FeedInfo persistent class —the parent class—simply uses an app-assigned
String primary key, urlstring (the feed URL string). The server.domain.
FeedIndex constructor, shown in the code below, uses the key of its FeedInfo
parent—the URL string—to construct its key. (The Using transactions section will
describe the key-construction code of the figure in more detail). This places the two
entities into the same entity group and allows the parent FeedInfo key to be derived
from the FeedIndex entity’s key.


 @PersistenceCapable(identityType = IdentityType.APPLICATION,
 detachable="true")
 public class FeedIndex implements Serializable {
 @PrimaryKey
 @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
 private Key key;
 …

public FeedIndex(String fkey, String url) {
this.friendKeys = new HashSet<String>();
this.friendKeys.add(fkey);
KeyFactory.Builder keyBuilder =
new KeyFactory.Builder(FeedInfo.class.getSimpleName(), url);
keyBuilder.addChild(FeedIndex.class.getSimpleName(), url);
Key ckey = keyBuilder.getKey();
this.key= ckey;
}

The following code, from server.servlets.FeedUpdateFriendServlet, shows
how this model is used to efficiently retrieve the FeedInfo objects associated
with a given Friend. Given a Friend key, a query is performed for the keys of
the FeedIndex entities that contain this Friend key in their friendKeys list.
Because this is a keys-only query, it is much more efficient than returning the actual
objects. Then, each FeedIndex key is used to derive the parent (FeedInfo) key.
Using that list of parent keys, a batch fetch is performed to fetch the FeedInfo objects
associated with the given Friend. We did this without needing to actually fetch the
FeedIndex object s.


 … imports…
 @SuppressWarnings("serial")
 public class FeedUpdateFriendServlet extends HttpServlet{

private static Logger logger =
Logger.getLogger(FeedUpdateFriendServlet.class.getName());
public void doPost(HttpServletRequest req, HttpServletResponse resp)
throws IOException {
PersistenceManager pm = PMF.get().getPersistenceManager();
Query q = null;
try {
String fkey = req.getParameter(“fkey”);
if (fkey != null) {
logger.info(“in FeedUpdateFriendServlet, updating feeds for:”
+fkey);
// query for matching FeedIndex keys
q = pm.newQuery(“select key from “+FeedIndex.class.getName()+”
where friendKeys == :id”);
List ids=(List)q.execute(fkey);
if (ids.size()==0) {
return;
}
// else, get the parent keys of the ids
Key k = null;
List<Key>parent list = new ArrayList<Key>();
for (Object id : ids) {
// cast to key
k = (Key)id;
parentlist.add(k.getParent());
}
// fetch the parents using the keys
Query q2 = pm.newQuery(“select from +FeedInfo.class.getName()+
“where urlstring == :keys”);
// allow eventual consistency on read
q2.addExtension(
“datanucleus.appengine.datastoreReadConsistency”,
“EVENTUAL”);
List<FeedInfo>results =
(List<FeedInfo>)q2.execute(parentlist);
if(results.iterator().hasNext()){
for(FeedInfo fi: results){
fi.updateRequestedFeed(pm);
}
}
}
}
catch (Exception e) {
logger.warning(e.getMessage());
}
finally {
if q!=null) {
q.closeAll();
}
pm.close();
}
}
}//end class

Use of property lists to support “join” behavior

Google App Engine does not support joins with the same generality as a relational
database. However, property lists along with accompanying denormalization can
often be used in GAE to support join-like functionality in a very efficient manner.


 At the time of writing, there is GAE work in progress to support simple
 joins. However, this functionality is not yet officially part of the SDK.
 

Consider the many-to-many relationship between Friend and feed information
in our application. With a relational database, we might support this relationship
by using three tables: one for Friend data, one for Feed data, and a “join table”
(sometimes called a “cross-reference table”), named, say, FeedFriend, with two
columns—one for the friend ID and one for the feed ID. The rows in the join table
would indicate which feeds were associated with which friends.

In our hypothetical relational database, a query to find the feeds associated with a
given Friend fid would look something like this:


 select feed.feedname from Feed feed, FeedFriend ff
 where ff.friendid = 'fid' and ff.feedid = feed.id
 

If we wanted to find those feeds that both Friend 1 (fid1) and Friend 2 (fid2) had
listed, the query would look something like this:


 select feed.feedname from Feed feed, FeedFriend f1, FeedFriend f2
 where f1.friendid = 'fid1' and f1.feedid = feed.id
 and f2.friendid = 'fid2' and f2.feedid = feed.id
 

With Google App Engine, to support this type of query, we can denormalize
the “join table” information and use Datastore multi-valued properties to hold
the denormalized information. (Denormalization should not be considered a
second-class citizen in GAE).

In Connectr, feed objects hold a list of the keys of the Friends that list that feed
(friendKeys), and each Friend holds a list of the feed URLs associated with it.
Chapter 7 included a figure illustrating this many-to-many relationship.

So, with the first query above, the analogous JDQL query is:


 select from FeedIndex where friendKeys == 'fid'
 

If we want to find those feeds that are listed by both Friend 1 and Friend 2, the JDQL
query is:


 select from FeedIndex where friendKeys == 'fid1' and
 friendKeys == 'fid2'
 

Our data model, and its use of multi-valued properties, has allowed these queries to
be very straightforward and efficient in GAE.

Supporting the semantics of more complex joins

The semantics of more complex join queries can sometimes be supported in GAE
with multiple synchronously-ordered multi-valued properties.

For example, suppose we decided to categorize the associated feeds of Friends by
whether they were “Technical”, “PR”, “Personal”, “Photography-related”, and so on
(and that we had some way of determining this categorization on a feed-by-feed
basis). Then, suppose we wanted to find all the Friends whose feeds include “PR”
feed(s), and to list those feed URLs for each Friend.

In a relational database, we might support this by adding a “Category” table to hold
category names and IDs, and adding a category ID column to the Feed table. Then,
the query might look like this:


 select f.lastName, feed.feedname from Friend f, Category c,
 Feed feed, FeedFriend ff
 where c.id = 'PR' and feed.cat_id = c.id and ff.feedid = feed.id
 and ff.friend.id = f.id
 

We might attempt to support this type of query in GAE by adding a
feedCategories multi-valued property list to Friend, which contained all the
categories in which their feeds fell. Every time a feed was added to the Friend, this
list would be updated with the new category as necessary. We could then perform a
JDQL query to find all such Friends:


 select from Friend where feedCategories == 'PR'
 

However, for each returned Friend we would then need to check each of their
feeds in turn to determine which feed(s) were the PR ones—requiring further
Datastore access.

To address this, we could build a Friend feedCategories multi-valued property list
whose ordering was synchronized with the urls list ordering, with the nth position
in the categories list indicating the category of the nth feed. For exam ple, suppose
that url1 and url3 are of category ‘PR’, and url2 is of category ‘Technical’. The two
lists would then be sorted as follows:


 urls = [ url1, url2, url3, … ]

feedCategories = [PR, TECHNICAL, PR, …]

(For efficiency, we would probably map the categories to integers). Then, for each
Friend returned from the previous query, we could determine which feed URLs
were the ‘PR’ ones by their position in the feed list, without requiring further
Datastore queries. In the previous example, it would be the URLs at positions 0
and 2— url1 and url3.

This technique requires more expense at write time, in exchange for more efficient
queries at read time. The approach is not always applicable—for example, it requires
a one-to-one mapping between the items in the synchronized property lists, but can
be very effective when it does apply.

GWT Articles & Books

Comments

comments

Pages: 1 2 3 4

About Krishna Srinivasan

He is Founder and Chief Editor of JavaBeat. He has more than 8+ years of experience on developing Web applications. He writes about Spring, DOJO, JSF, Hibernate and many other emerging technologies in this blog.

Speak Your Mind

*