JBoss AS 5 Performance Tuning

January 29, 2011

JBoss

«»

Tuning JBoss Cache

JBoss Cache provides the foundation for many clustered services, which need to synchronize application state information across the set of nodes.

The cache is organized as a tree, with a single root. Each node in the tree essentially contains a map, which acts as a store for key/value pairs. The only requirement placed on objects that are cached is that they implement java.io.Serializable.

 Actually EJB 3 Stateful Session Beans, HttpSessions, and Entity/Hibernate rely on JBoss Cache to replicate information across the cluster. We have discussed thoroughly data persistence in Chapter 6, Tuning the Persistence Layer, so we will focus in the next sections on SFSB and HttpSession cluster tuning.

The core configuration of JBoss Cache is contained in the JBoss Cache Service. In JBoss AS 5, the scattered cache deployments have been replaced with a new CacheManager service, deployed via the <server>/deploy/cluster/jbosscache-manager.sar/META-INF/jboss-cache-manager-jboss-beans.xml.

The CacheManager acts as a factory for creating caches and as a registry for JBoss Cache instances. It is configured with a set of named JBoss Cache configurations. Here’s a fragment of the standard SFSB cache configuration:

	<entry><key>sfsb-cache</key>
		<value>
			<bean name="StandardSFSBCacheConfig"
				class="org.jboss.cache.config.Configuration">
				<property
					name="clusterName">${jboss.partition.name:DefaultPartition}-
				SFSBCache</property>
				<property
					name="multiplexerStack">${jboss.default.jgroups.stack:udp}</property>
				<property name="fetchInMemoryState">true</property>
				<property name="nodeLockingScheme">PESSIMISTIC</property>
				<property name="isolationLevel">REPEATABLE_READ</property>
				<property name="useLockStriping">false</property>
				<property name="cacheMode">REPL_SYNC</property>
				. . . . .
			</bean>
		</value>
	</entry>

Services that need a cache ask the CacheManager for the cache by name, which is specified by the key element; the cache manager creates the cache (if not already created) and returns it.

The simplest way to reference a custom cache is by means of the org.jboss.ejb3. annotation.CacheConfig annotation. For example, supposing you were to use a newly created Stateful Session Bean cache named custom_sfsb_cache:

	@Stateful
	@Clustered
	@CacheConfig(name="custom_sfsb_cache")
	public Class SFSBExample {
	}

The CacheManager keeps a reference to each cache it has created, so all services that request the same cache configuration name will share the same cache. When a service is done with the cache, it releases it to the CacheManager. The CacheManager keeps track of how many services are using each cache, and will stop and destroy the cache when all services have released it.

Understanding JBoss Cache configuration

In order to tune your JBoss Cache, it’s essential to learn some key properties. In particular we need to understand

    • How data can be transmitted between its members. This is controlled by the cacheMode property.
  • How the cache handles concurrency on data between cluster nodes. This is handled by nodeLockingScheme and isolationLevel configuration attributes.

Configuring cacheMode

The cacheMode property determines how JBoss Cache keeps in sync data across all nodes. Actually it can be split in two important aspects: how to notify changes across the cluster and how other nodes accommodate these changes on the local data.

As far data notification is concerned, there are the following choices:

    • Synchronous means the cache instance sends a notification message to other nodes and before returning waits for them to acknowledge that they have applied the same changes. Waiting for acknowledgement from all nodes adds
      delay
      . However, if a synchronous replication returns successfully, the caller knows for sure that all modifications have been applied to all cache instances.

    • Asynchronous means the cache instance sends a notification message andthen immediately returns, without any acknowledgement that changes have been applied. The Asynchronous mode is most useful for cases like session replication (for example, Stateful Session Beans), where the cache sending data expects to be the only one that accesses the data. Asynchronous messaging adds a small potential risk that a fail over to another node may generate stale data, however, for many session-type applications this risk is acceptable given the major performance benefits gained.

    • Local means the cache instance doesn’t send a message at all. You should use this mode when you are running JBoss Cache as a single instance, so that it won’t attempt to replicate anything. For example, JPA/Hibernate Query Cache uses a local cache to invalidate stale query result sets from the second level cache, so that JBoss Cache doesn’t need to send messages around the cluster for a query result set cache.

As far as the second aspect is concerned (what should the other caches in the cluster do to refl ect the change) you can distinguish between:,/P>

Replication: means that the cache replicates cached data across all cluster nodes. This means the sending node needs to include the changed state, increasing the cost of the message. Replication is necessary if the other nodes have no other way to obtain the state.

Invalidation means that you do not wish to replicate cached data but simply inform other caches in a cluster that data under specific addresses are now stale and should be evicted from memory. Invalidation reduces the cost of the cluster update messages, since only the cache key of the changed state needs to be transmitted, not the state itself.

By combining these two aspects we have a combination of five valid values for the cacheMode configuration attribute:


	Should I use invalidation for session data?
	No, you shouldn't. As a matter of fact, data invalidation it is an
	excellent option for a clustered JPA/Hibernate Entity cache,
	since the cached state can be re-read from the database in case
	of failure. If you use the invalidation option, with SFSBs or
	HttpSession, then you lose failover capabilities. If this matches
	with your project requirements, you could achieve better
	performance by simply turning off the cache.

Configuring cache concurrency

JBoss Cache is a thread-safe caching API, and uses its own efficient mechanisms of controlling concurrent access. Concurrency is configured via the nodeLockingScheme and isolationLevel configuration attributes.

There are three choices for nodeLockingScheme:

    • Pessimistic locking involves threads/transactions acquiring locks on nodes before reading or writing. Which is acquired depends on the isolationLevel but in most cases a non-exclusive lock is acquired for a read and an exclusive lock is acquired for a write. Pessimistic locking requires a considerable overhead and allows lesser concurrency, since reader
      threads must block until a write has completed
      and released its exclusive lock (potentially a long time if the write is part of a transaction). The drawbacks include the potential for deadlocks, which are ultimately solved by a TimeoutException.
    • Optimistic locking seeks to improve upon the concurrency available with Pessimistic by creating a workspace for each request/transaction that accesses the cache. All data is versioned; on completion of non-transactional requests or commits of transactions the version of data in the workspace is compared to the main cache, and an exception is raised if there are inconsistencies. This eliminates the cost of reader locks but, because of the cost associated with the parallel workspace, it carries a high memory overhead and low scalability.
    • MVCC is the new locking schema that has been introduced in JBoss Cache 3.x (and packed with JBoss AS 5.x). In a nutshell, MVCC reduces the cost of slow, and synchronization-heavy schemas with a multi-versioned concurrency
      control, which is a locking scheme commonly used by modern database implementations to control concurrent access to shared data.

The most important features of MVCC are:

    1. Readers don’t acquire any locks.
    1. Only one additional version is maintained for shared state, for a single writer.
    1. All writes happen sequentially, to provide fail-fast semantics.

How does MVCC can achieve this?

For each reader thread, the MVCC’s interceptors wraps state in a lightweight container object, which is placed in the thread’s InvocationContext (or TransactionContext if running in a transaction). All subsequent operations on the state are carried out on the container object using Java references, which allow repeatable read semantics even if the actual state changes simultaneously.

Writer threads, on the other hand, need to acquire a lock before any writing can start. Currently, lock striping is used to improve the memory performance of the cache, and the size of the shared lock pool can be tuned using the concurrencyLevel attribute of the locking element.

After acquiring an exclusive lock on a cache Full Qualified Name, the writer thread then wraps the state to be modified in a container as well, just like with reader threads, and then copies this state for writing. When copying, a reference to the original version is still maintained in the container (for rollbacks). Changes are then made to the copy and the copy is finally written to the data structure
when the write completes.


	Should I use MVCC with session data too?
	While MVCC is the default and recommended choice for JPA/Hibernate
	Entity caching, as far as Session caching is concerned, Pessimistic is still the
	default concurrency control. Why? As a matter of fact, accessing the same
	cached data by concurrent threads it's not the case with a user's session. This is
	strictly enforced in the case of SFSB, whose instances are not accessible
	concurrently. So don't bother trying to change this property for session data.

Configuring the isolationLevel

The isolationLevel attribute has two possible values, READ_COMMITTED and REPEATABLE_READ which correspond in semantics to database-style isolation levels. Previous versions of JBoss Cache supported all database isolation levels, and if an unsupported isolation level is configured, it is either upgraded or downgraded to the closest supported level.

REPEATABLE_READ is the default isolation level, to maintain compatibility with previous versions of JBoss Cache. READ_COMMITTED, while providing a slightly weaker isolation, has a significant performance benefit over REPEATABLE_READ.

Tuning session replication

As we have learnt, the user session needs replication in order to achieve a consistent state of your applications across the cluster. Replication can be a costly affair, especially if the amount of data held in session is significant. There are however some available strategies, which can mitigate a lot the cost of data replication and thus improve the performance of your cluster:

    • Override isModified method: By including an isModified method in your SFSBs, you can achieve fine-grained control over data replication. Applicable to SFSBs only.
    • Use buddy replication. By using buddy replication you are not replicating the session data to all nodes but to a limited set of nodes. Can be applicable both to SFSBs and HttpSession.
    • Configure replication granularity and replication trigger. You can apply custom session policies to your HttpSession to define when data needs to be replicated and which elements need to be replicated as well. Applicable to HttpSession.

Override SFSB’s isModified method

One of the simplest ways to reduce the cost of SFSBs data replication is implementing in your EJB a method with the following signature: public boolean isModified ();

Before replicating your bean, the container will detect if your bean implements this method. If your bean does, the container calls the isModified method and it only replicates the bean when the method returns true. If the bean has not been modified (or not enough to require replication, depending on your own preferences), you can return false and the replication will not occur.

If your session does not hold critical data (such as financial information), using the isModified method is a good option to achieve a substantial benefit in terms of performance. A good example could be a reporting application, which needs session management to generate aggregate reports through a set of wizards. Here’s a graphical view of this process:

The following benchmark is built on exactly the use case of an OLAP application, which uses SFSBs to drive some session data across a four step wizard. The benchmark compares the performance of the wizard without including isModified and by returning true to isModified at the end of the wizard.

Ultimately, by using the isModified method to propagate the session data at wider intervals you can improve the performance of your application with an acceptable risk to re-generate your reports in case of node failures.

Use buddy replication

By using buddy replication, sessions are replicated to a configurable number of backup servers in the cluster (also called buddies), rather than to all servers in the cluster. If a user fails over from the server that is hosting his or her session, the session data is transferred to the new server from one of the backup buddies. Buddy replication provides the following benefits:

    • Reduced memory usage
    • Reduced CPU utilization
    • Reduced network transmission

The reason behind this large set of advantages is that each server only needs to store in its memory the sessions it is hosting as well as those of the servers for which it is acting as a backup. Thus, less memory required to store data, less CPU to elaborate bits to Java translations, and less data to transmit.

For example, in an 8-node cluster with each server configured to have one buddy, a server would just need to store 2 sessions instead of 8. That’s just one fourth of the memory required with total replication.

In the following picture, you can see an example of a cluster configured for buddy replication:

Here, each node contains a cache of its session data and a backup of another node. For example, node A contains its session data and a backup of node E. Its data is in turn replicated to node B and so on.

In case of failure of node A, its data moves to node B which becomes the owner of both A and B data, plus the backup of node E. Node B in turn replicates (A + B) data to node C.

In order to configure your SFSB sessions or HttpSessions to use buddy replication you have just to set to the property enabled of the bean BuddyReplicationConfig inside the <server>/deploy/cluster/jboss-cache-manager.sar/META-INF/jboss-cache-manager-jboss-beans.xml configuration file, as shown in the next code fragment:


	<property name="buddyReplicationConfig">
		<bean
			class="org.jboss.cache.config.BuddyReplicationConfig">
			<property name="enabled">true</property>
			. . .
		</bean>
	</property>

In the following test, we are comparing the throughput of a 5-node clustered web application which uses buddy replication against one which replicates data across all members of the cluster.

In this benchmark, switching on buddy replication improved the application throughput of about 30%. No doubt that by using buddy replication there’s a high potential for scaling because memory/CPU/network usage per node does not increase linearly as new nodes are added.

Advanced buddy replication

With the minimal configuration we have just described, each server will look for one buddy across the network where data needs to be replicated. If you need to backup your session to a larger set of buddies you can modify the numBuddies property of the BuddyReplicationConfig bean. Consider, however, that replicating the session to a large set of nodes would conversely reduce the benefits of buddy replication.

Still using the default configuration, each node will try to select its buddy on a different physical host: this helps to reduce chances of introducing a single point of failure in your cluster. Just in case the cluster node is not able to find buddies on different physical hosts, it will not honour the property ignoreColocatedBuddies and fall back to co-located nodes.

The default policy is often what you might need in your applications, however if you need a fine-grained control over the composition of your buddies you can use a feature named buddy pool. A buddy pool is an optional construct where each
instance in a cluster may be configured to be part of a group- just like an “exclusive club membership”.

This allows system administrators a degree of fl exibility and control over how buddies are selected. For example, you might put two instances on separate physical servers that may be on two separate physical racks in the same buddy pool. So rather than picking an instance on a different host on the same rack, the BuddyLocators would rather pick the instance in the same buddy pool, on a separate rack which may add a degree of redundancy.

Here’s a complete configuration which includes buddy pools:

	<property name="buddyReplicationConfig">
		<bean class="org.jboss.cache.config.BuddyReplicationConfig">
			<property name="enabled">true</property>
			<property name="buddyPoolName">rack1</property>
			<property name="buddyCommunicationTimeout">17500</property>
			<property name="autoDataGravitation">false</property>
			<property name="dataGravitationRemoveOnFind">true</property>
			<property name="dataGravitationSearchBackupTrees">true</property>
			<property name="buddyLocatorConfig">
				<bean
					class="org.jboss.cache.buddyreplication.NextMemberBuddyLocatorConfig">
					<property name="numBuddies">1</property>
					<property name="ignoreColocatedBuddies">true</property>
				</bean>
			</property>
		</bean>
	</property>

In this configuration fragment, the buddyPoolName element, if specified, creates a logical subgroup and only picks buddies who share the same buddy pool name. If not specified, this defaults to an internal constant name, which then treats the entire cluster as a single buddy pool.

If the cache on another node needs data that it doesn’t have locally, it can ask the other nodes in the cluster to provide it; nodes that have a copy will provide it as part of a process called data gravitation. The new node will become the owner of the data, placing a backup copy of the data on its buddies.

The ability to gravitate data means there is no need for all requests for data to occur on a node that has a copy of it; that is, any node can handle a request for any data. However, data gravitation is expensive and should not be a frequent occurrence; ideally it should only occur if the node that is using some data fails or is shut down, forcing interested clients to fail over to a different node.

The following optional properties pertain to data gravitation:

    • autoDataGravitation: Whether data gravitation occurs for every cache miss. By default this is set to false to prevent unnecessary network calls.
    • DataGravitationRemoveOnFind: Forces all remote caches that own the data or hold backups for the data to remove that data, thereby making the requesting cache the new data owner. If set to false, an evict is broadcast instead of a remove, so any state persisted in cache loaders will remain. This is useful if you have a shared cache loader configured. (See next section about Cache loader). Defaults to true.
    • dataGravitationSearchBackupTrees: Asks remote instances to search through their backups as well as main data trees. Defaults to true. The resulting effect is that if this is true then backup nodes can respond to data gravitation requests in addition to data owners.

Buddy replication and session affinity

One of the pre-requisites to buddy replication working well and being a real benefit is the use of session affinity, also known as sticky sessions in HttpSession replication speak. What this means is that if certain data is frequently accessed, it is desirable that this is always accessed on one instance rather than in a “round-robin” fashion as this helps the cache cluster optimise how it chooses buddies, where it stores data, and minimises replication traffic.

If you are replicating SFSBs session, there is no need to configure anything since SFSBs, once created, are pinned to the server that created them.

When using HttpSession, you need to make sure your software or hardware load balancer maintain the session on the same host where it was created.

By using Apache’s mod_jk, you have to configure the workers file (workers. properties) specifying where the different node and how calls should be load-balanced across them. For example, on a 5-node cluster:

	worker.loadbalancer.balance_workers=node1,node2,node3,node4,node5
	worker.loadbalancer.sticky_session=1

Basically, the above snippet configures mod_jk to perform round-robin load balancing with sticky sessions (sticky_session=1) across 5 nodes of a cluster.

Configure replication granularity and replication trigger

Applications that want to store data in the HttpSession need to use the methods setAttribute to store the attributes and getAttribute to retrieve them. You can define two kind of properties related to HttpSessions:

  • The replication-trigger configures when data needs to be replicated.
    • The replication-granularity defines which part of the session needs
      to be replicated.

Let’s dissect both aspects in the following sections:

How to configure the replication-trigger

The replication-trigger element determines what triggers a session replication and can be configured by means of the jboss-web.xml element (packed in the WEB-INF folder of your web application). Here’s an example:

	<jboss-web>
		<replication-config>
			<replication-trigger>SET</replication-trigger>
		</replication-config>
	</jboss-web>

The following is a list of possible alternative options:

    • SET_AND_GET is conservative but not performance-wise; it will always replicate session data even if its content has not been modified but simply accessed. This setting made (a little) sense in AS 4 since using it was a way to ensure that every request triggered replication of the session’s timestamp. Setting max_unreplicated_interval to 0 accomplishes the same thing at much lower cost.
    • SET_AND_NON_PRIMITIVE_GET is conservative but will only replicate if an object of a non-primitive type has been accessed (that is, the object is not of a well-known immutable JDK type such as Integer, Long, String, and so on.)This is the default value.
    • SET assumes that the developer will explicitly call setAttribute on the session if the data needs to be replicated. This setting prevents unnecessary replication and can have a major beneficial impact on performance.
	In all cases, calling setAttribute marks the session as dirty and thus
	triggers replication.

For the purpose of evaluating the available alternatives in performance terms, we have compared a benchmark of a web application using different replication-triggers:

In the first benchmark, we are using the default rule (SET_AND_NON_PRIMITIVE_GET). In the second we have switched to SET policy, issuing a setAttribute on 50% of the requests. In the last benchmark, we have formerly populated the session with the required attributes and then issued only queries on the session via the getAttribute method.

As you can see the benefit of using the SET replication trigger is obvious, especially if you follow a read-mostly approach on non-primitive types. On the other hand, this requires very good coding practices to ensure setAttribute is always called whenever a mutable object stored in the session is modified.

How to configure the replication-granularity

As far as what data needs to be replicated is concerned, you can opt for the following choices:

    • SESSION indicates that the entire session attribute map should be replicated when any attribute is considered modified. Replication occurs at request end. This option replicates the most data and thus incurs the highest replication cost, but since all attributes values are always replicated together it ensures that any references between attribute values will not be broken when the session is deserialized. For this reason it is the default setting.
    • ATTRIBUTE indicates that only attributes that the session considers to be potentially modified are replicated. Replication occurs at request end. For sessions carrying large amounts of data, parts of which are infrequently updated, this option can significantly increase replication performance.
    • FIELD level replication only replicates modified data fields inside objects stored in the session. Its use could potentially drastically reduce the data traffic between clustered nodes, and hence improve the performance of the whole cluster. To use FIELD-level replication, you have to first prepare (that is bytecode enhance) your Java class to allow the session cache to detect when fields in cached objects have been changed and need to be replicated. 

In order to change the default replication granularity, you have to configure the desired attribute in your jboss-web.xml configuration file:

	<jboss-web>
		<replication-config>
			<replication-granularity>FIELD</replication-granularity>
			<replication-field-batch-mode>true</replication-field-batchmode>
		</replication-config>
	</jboss-web>

In the above example, the replication-field-batch-mode element indicates whether you want all replication messages associated with a request to be batched into one message.

Additionally, if you want to use FIELD level replication you need to perform a bit of extra work. At first you need to add the @org.jboss.cache.pojo.annotation. Replicable annotation at class level:

	@Replicable
	public class Person { ... }

	If you annotate a class with @Replicable, then all of its subclasses will
	be automatically annotated as well.

Once you have annotated your classes, you will need to perform a post-compiler processing step to bytecode enhance your classes for use by your cache. Please check the JBoss AOP documentation (http://www.jboss.org/jbossaop) for the usage of the aoc post-compiler. The JBoss AOP project also provides easy to use ANT tasks to help integrate those steps into your application build process.

As proof of concept, let’s build a use case to compare the performance of ATTRIBUTE and FIELD granularity policies. Supposing you are storing in your HttpSession an object of Person type. The object contains references to an Address, ContactInfo, and PersonalInfo objects. It contains also an ArrayList of WorkExperience.

	A prerequisite to this benchmark is that there are no references between
	the field values stored in the Person class (for example between the
	contactInfo and personalInfo fields), otherwise the references will
	be broken by ATTRIBUTE or FIELD policies.

By using the SESSION or ATTRIBUTE replication-granularity policy, even if just one of these fields is modified, the whole Person object need to be retransmitted. Let’s compare the throughput of two applications using respectively the ATTRIBUTE and FIELD replication-granularity.

In this example, based on the assumption that we have a single dirty field of Person’ class per request, by using FIELD Replication generate a substantial 10% gain.

Tuning cache storage

Cache loading allows JBoss Cache to store cached data in a persistent store and is used mainly for HttpSession and SFSB sessions. Hibernate and JPA on the other hand, have already their persistence storage in the database so it doesn’t make sense to add another storage.

This data can either be an overflow, where the data in the persistent store has been evicted from memory. Or it can be a replication of what is in memory, where everything in memory is also refl ected in the persistent store, along with items that have been evicted from memory.

The cache storage used for web session and EJB3 SFSB caching comes into play in two circumstances:

    • Whenever a cache element is accessed, and that element is not in the cache (for example, due to eviction or due to server restart), then the cache loader transparently loads the element into the cache if found in the backend store.
    • Whenever an element is modified, added or removed, then that modification is persisted in the backend store via the cache loader (except if the ignoreModifications property has been set to true for a specific cache loader). If transactions are used, all modifications created within a transaction are persisted as well.

Cache loaders are configured by means of the property cacheLoaderConfig of session caches. For example, in the case of SFSB cache:


	<entry><key>sfsb-cache</key>
	<value>
		<bean name="StandardSFSBCacheConfig"
			class="org.jboss.cache.config.Configuration">
			. . . . .
			<property name="cacheLoaderConfig">
				<bean class="org.jboss.cache.config.CacheLoaderConfig">
					<property name="passivation">true</property>
					<property name="shared">false</property>
					<property name="individualCacheLoaderConfigs">
					<list>
						<bean
							class="org.jboss.cache.loader.FileCacheLoaderConfig">
							<property
								name="location">${jboss.server.data.dir}${/}sfsb</property>
							<property name="async">false</property>
							<property name="fetchPersistentState">true</property>
							<property name="purgeOnStartup">true</property>
							<property name="ignoreModifications">false</property>
							<property
								name="checkCharacterPortability">false</property>
						</bean>
					</list>
					</property>
				</bean>
				. . . ..
	</entry>

The passivation property , when set to true, means the persistent store acts as an overflow area written to when data is evicted from the in-memory cache.

The shared attribute indicates that the cache loader is shared among different cache instances, for example where all instances in a cluster use the same JDBC settings to talk to the same remote, shared database. Setting this to true prevents repeated and  unnecessary writes of the same data to the cache loader by different cache instances. The default value is false.

Where does cache data get stored?

By default, the Cache loader uses a filesystem implementation based on the class org.jboss.cache.loader.FileCacheLoaderConfig, which requires the location property to define the root directory to be used.

If set to true, the async attribute read operations are done synchronously, while write (CRUDCreate, Remove, Update, and Delete) operations are done asynchronously. If set to false (default), both read and writes are performed synchronously.


	Should I use an async channel for my Cache Loader?
	When using an async channel, an instance of org.jboss.cache.
	loader.AsyncCacheLoader is constructed which will act as an
	asynchronous channel to the actual cache loader to be used. Be aware
	that, using the AsyncCacheLoader, there is always the possibility of
	dirty reads since all writes are performed asynchronously, and it is thus
	impossible to guarantee when (and even if) a write succeeds. On the
	other hand the AsyncCacheLoader allows massive writes to be written
	asynchronously, possibly in batches, with large performance benefits.
	Checkout the JBoss Cache docs for further information http://docs.
	jboss.org/jbosscache/3.2.1.GA/apidocs/index.html.

fetchPersistentState determines whether or not to fetch the persistent state of a cache when a node joins a cluster and conversely the purgeOnStartup property evicts data from the storage on startup, if set to true.

Finally, checkCharacterPortability should be false for a minor performance improvement.

The FileCacheLoader is a good choice in terms of performance, however it has some limitations, which you should be aware of before rolling your application in a production environment. In particular:

  1. Due to the way the FileCacheLoader represents a tree structure on disk (directories and files) traversal is “inefficient” for deep trees.
    1. Usage on shared filesystems such as NFS, Windows shares, and others should be avoided as these do not implement proper file locking and can cause data corruption.
    1. Filesystems are inherently not “transactional”, so when attempting to use your cache in a transactional context, failures when writing to the file (which happens during the commit phase) cannot be recovered.
	As a rule of thumb, it is recommended that the FileCacheLoader not
	be used in a highly concurrent, transactional. or stressful environment,
	and, in this kind of scenario consider using it just in the testing
	environment.

As an alternative, consider that JBoss Cache is distributed with a set of different Cache loaders which can be used as alternative. For example:

  • The JDBC-based cache loader implementation that stores/loads nodes’ state into a relational database. The implementing class is org.jboss.cache. loader.JDBCCacheLoader.
    • The BdbjeCacheLoader, which is a cache loader implementation based on the Oracle/Sleepycat’s BerkeleyDB Java Edition (note that the BerkeleyDB implementation is much more efficient than the filesystem-based implementation, and provides transactional guarantees, but requires a commercial license if distributed with an application (see http://www.oracle.com/database/berkeley-db/index.html for details).
    • The JdbmCacheLoader, which is a cache loader implementation based on the JDBM engine, a fast and free alternative to BerkeleyDB.
    • Finally, S3CacheLoader, which uses the Amazon S3 solution (Simple Storage Solution http://aws.amazon.com/) for storing cache data. Since Amazon S3 is remote network storage and has fairly high latency, it is really best for
      caches that store large pieces of data, such as media or files.

When it comes to measuring the performance of different Cache Loaders, here’s a benchmark executed to compare the File CacheLoader, the JDBC CacheLoader (based on Oracle Database) and Jdbm CacheLoader.

In the above benchmark we are testing cache insertion and cache gets of batches of 1000 Fqn each one bearing 10 attributes. The File CacheLoader accomplished the overall best performance, while the JBDM CacheLoader is almost as fast for Cache gets.

The JDBC CacheLoader is the most robust solution but it adds more overhead to the Cache storage of your session data.

Summary

Clustering is a key element in building scalable Enterprise applications. The infrastructure used by JBoss AS for clustered applications is based on JGroups framework for the nodes inter-communication and JBoss Cache for keeping the cluster data synchronized across nodes.

  • JGroups can use both UDP and TCP as communication protocol. Unless you have network restriction, you should stay with the default UDP that uses multicast to send and receive messages.
      • You can tune the transmission protocol by setting an appropriate buffer size with the properties mcast_recv_buf_size, mcast_send_buf_size, ucast_recv_buf_size, and ucast_send_buf_size. You should as well increase your O/S buffer size, which need to be adequate to accept JGroups’ settings.
    • JBoss Cache provides the foundation for robust clustered services.
    • By configuring the cacheMode you can choose if your cluster messages will be synchronous (that is will wait for message acknowledgement) or asynchronous. Unless you need to handle cache message exceptions, stay with the asynchronous pattern, which provides the best performance.
    • Cache messages can trigger as well cluster replication or cluster invalidation. A cluster replication is needed for transferring the session state across the cluster while invalidation is the default for Entity/Hibernate, where state
      can be recovered from the database.
    • The cache concurrency can be configured by means of the nodeLockingScheme property. The most efficient locking schema is the MVCC, which reduces the cost of slow, or synchronization-heavy schemas of Pessimistic and Optimistic schemas.
    • Cache replication of sessions can be optimised mostly in three ways:
      • By overriding the isModified method of your SFSBs you can achieve a fine-grained control over data replication. It’s an optimal quick-tuning option for OLAP applications using SFSBs.
      • Buddy replication is the most important performance addition to your session replication. It helps to increase the performance by reducing memory and CPU usage as well as network traffic. Use buddy replication pools to achieve a higher level of redundancy for mission critical applications.
      • Clustered web applications can configure replication-granularity and replication-trigger:
        • As far as the replication trigger is concerned, if you mostly read immutable data from your session, the SET attribute provides a substantial benefit over the default SET_AND_PRIMITIVE_GET.
        • As far as replication granularity is concerned, if your sessions are generally small, you can stay with the default policy (SESSION). If your session is larger and some parts are infrequently accessed, ATTRIBUTE replication will be more effective. If your application has very big data objects in session attributes and only fields in those objects are frequently modified, the FIELD policy would be the best.
email

«»

Comments

comments