The Spring Batch Infrastructure

This article is based on SpringBatch in Action, to be published July-2011. It is being reproduced here by permission from Manning Publications. Manning publishes MEAP (Manning Early Access Program,) ebooks and pbooks. MEAPs are sold exclusively through Manning.com. All print book purchases include an ebook free of charge. When mobile formats become available all customers will be contacted and upgraded. Visit Manning.com for more information. If you are interested in learning more tutorials on spring, please read spring tutorials.

also read:

The Spring Batch Infrastructure

The Spring Batch infrastructure includes components that launch your batch jobs and store job execution metadata. As a batch application developer, you don’t have to deal directly with these components because they provide supporting roles to your applications. However, you need to configure this infrastructure at least once in your Spring Batch application.

This article gives an overview of the job launcher, job repository, and their interactions, before showing how to configure persistence of the job repository.

Launching Jobs And Storing Job Metadata

The Spring Batch infrastructure is quite complex, but you mainly need to deal with two components: the job launcher and the job repository. These concepts match two straightforward Java interfaces: JobLauncher and JobRepository. Let’s start by studying the job launcher.

Job Launcher in Spring Batch

As figure 1 shows, the job launcher is the entry point to launch Spring Batch jobs.
Job Launcher in Spring BatchThis is where the external world meets Spring Batch. The JobLauncher interface is simple:


package org.springframework.batch.core.launch;
(...)
public interface JobLauncher {
public JobExecution run(Job job, JobParameters jobParameters)
throws JobExecutionAlreadyRunningException,
JobRestartException, JobInstanceAlreadyCompleteException,
JobParametersInvalidException;
}

The run method accepts two parameters: a Job, which is typically a Spring bean configured in Spring Batch XML, and a JobParameters, which is usually created on the fly by the launching mechanism.
Who calls the job launcher? Your own Java program can use the job launcher to launch a job but so can command-line programs or schedulers (like cron or the Java-based Quartz).

The job launcher encapsulates launching strategies, like executing a job synchronously or asynchronously. Spring Batch provides one implementation of the JobLauncher interface: SimpleJobLauncher. The SimpleJobLauncher class only launches a job but doesn’t create it because it delegates this work to the job repository.

Job Repository in Spring Batch

The job repository maintains all metadata related to job executions. Here is the definition of the JobRepository interface:

package org.springframework.batch.core.repository;

(...)

public interface JobRepository {
boolean isJobInstanceExists(String jobName, JobParameters jobParameters);
JobExecution createJobExecution(
String jobName, JobParameters jobParameters)
throws JobExecutionAlreadyRunningException, JobRestartException,
JobInstanceAlreadyCompleteException;
void update(JobExecution jobExecution);

void add(StepExecution stepExecution);

void update(StepExecution stepExecution);

void updateExecutionContext(StepExecution stepExecution);

void updateExecutionContext(JobExecution jobExecution);

StepExecution getLastStepExecution(JobInstance jobInstance,
String stepName);

int getStepExecutionCount(JobInstance jobInstance, String stepName);

JobExecution getLastJobExecution(String jobName,
JobParameters jobParameters);
}

The JobRepository interface provides all the services to manage the batch job lifecycle including the creation and the updates.

Figure 2 shows how a Spring Batch application interacts with the outside world.

Job Repository in Spring BatchTo explain the interactions in figure 2, the job launcher delegates job creation to the job repository and a job calls the job repository during execution to store its current state. This is useful for monitoring how your job executions proceed and restarting a job exactly where it failed. Note that the Spring Batch runtime handles all calls to the job repository, meaning that persistence of the job execution metadata is transparent to the application developer.

What constitutes runtime metadata? It includes the list of executed steps, how many items Spring Batch read, wrote, or skipped, the duration of each step, and so forth.

Spring Batch provides two implementations of the JobRepository interface. One stores metadata in memory, which is useful for testing or when you don’t want monitoring or restart capabilities, and the other stores metadata in a relational database. Next, we see how to configure the Spring Batch infrastructure in a database.

Configuring the Spring Batch infrastructure in a database

Spring Batch provides a job repository implementation to store your job metadata in a database. This allows you to monitor the execution of your batch processes and their results (success or failure.) Persistent metadata also makes it possible to restart a job exactly where it failed.

Spring Batch delivers the following to support persistent job repositories:

  • SQL scripts to create the necessary database tables for the most popular database engines
  • A database implementation of JobRepository (SimpleJobRepository) that executes all necessary SQL statements to insert, update, and query the job repository tables

Let’s now see how to configure the database job repository.

Creating the database tables for a job repository

The SQL scripts to create the database tables are located in the core Spring Batch JAR file, in the org.springframework.batch.core package. The SQL scripts use the following naming convention: schema-[database].sql for creating tables and schema-drop-[database].sql for dropping tables, where [database] is the name of a database engine. To initialize H2 for Spring Batch, we use the file schema-h2.sql.

Spring Batch Database Support

Spring Batch supports the following database engines: DB2, Derby, H2, HSQLDB, MySQL, Oracle, PostgreSql, SQLServer, and Sybase.

Create a database for Spring Batch and then execute the corresponding SQL script for your database engine.

Configuring the job repository with Spring

Listing 1 shows how to configure a job repository in a database:

Listing 1 Configuration of a persistent job repository

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:batch="http://www.springframework.org/schema/batch"
xsi:schemaLocation="http://www.springframework.org/schema/beans

http://www.springframework.org/schema/beans/spring-beans-3.0.xsd


http://www.springframework.org/schema/batch


http://www.springframework.org/schema/batch/spring-batch-2.1.xsd">

<batch:job-repository id="jobRepository" #1
data-source="dataSource" #1
transaction-manager="transactionManager" /> #1
<bean id="jobLauncher"
class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
<property name="jobRepository" ref="jobRepository" />
</bean>
<bean id="dataSource" #2
class="org.springframework.jdbc.datasource. #2
[CA]SingleConnectionDataSource"> #2
<property name="driverClassName" #2
value="org.h2.Driver" /> #2
<property name="url" value=" #2
[CA] jdbc:h2:mem:sbia_ch03;DB_CLOSE_DELAY=-1" /> #2
<property name="username" value="sa" /> #2
<property name="password" value="" /> #2
<property name="suppressClose" value="true" /> #2
</bean> #2
<bean id="transactionManager" class="org.springframework.jdbc.datasource.
[CA] DataSourceTransactionManager">
<property name="dataSource" ref="dataSource" />
</bean>
</beans>
#1 Declares persistent job repository
#2 Declares data source
<pre>

The job-repository XML element in the batch namespace creates a persistent job repository (#1). To work properly, the persistent job repository needs a data source and a transaction manager. Note that, at #2, we use a data source implementation that holds a single JDBC connection and reuses it for each query. We did so because it’s convenient and good enough for a single-threaded application (like a batch process.) If you plan to use the data source in a concurrent application, then use a connection pool like Apache Commons DBCP or c3p0.

Now that we have the persistent job repository ready, let’s take a closer look at it.

Accessing Job Metadata

If you look at the job repository database, you see that the SQL script created nine tables. Figure 3 shows how you can use the Spring Batch Admin web application to view job executions. Spring Batch Admin accesses the job repository tables to provide this functionality.

What is Spring Batch Admin? Spring Batch Admin is an open source project from SpringSource that provides a web-based user interface for Spring Batch applications.

also read:

Summary

Using a well-defined vocabulary, you can paint a clear picture of your batch applications. We’ve seen how the Spring Batch framework models these concepts, an important requirement to understand how to implement batch solutions. We focused on the framework’s infrastructure components: the job launcher and the job repository.

Comments

comments

About Krishna Srinivasan

He is Founder and Chief Editor of JavaBeat. He has more than 8+ years of experience on developing Web applications. He writes about Spring, DOJO, JSF, Hibernate and many other emerging technologies in this blog.

Trackbacks

  1. […] ExampleExtending ViewResolver and Chaining ViewResolvers in Spring MVCSpring with Apache VelocityThe Spring Batch InfrastructureTransaction Management in Spring Batch ComponentsLaunching a Spring Batch JobUsing Channels in […]

Speak Your Mind

*