Spring Batch Tutorial with Example Application

Introduction

In this article, we will have an overview of Spring Batch which provides batch and bulk processing capabilities. The architecture is extremely robust and it provides parallel as well as scheduled batch processing. The API provides template and helper classes for repeatable and retryable operations which will be discussed in this article with suitable examples. The classes/interfaces in Spring Batch are not tied to a specified domain and thus it is possible to integrate an application in any business domain seamlessly. This article tries to explain the various concepts in a step-by-step fashion and examples are provided when necessary. This article assumes that the readers have a fair bit of understanding on Core Spring framework.

also read:

Download Example Code for Spring Batch

Example Application

In this section, we will see Spring Batch in action by looking into an example. Since the architecture of Spring Batch processing is quite complex, we will be looking into the various interfaces and classes as and when required. As we will be concentrating more on the usage of the Spring Batch API, we will keep the example simple. This example will perform bulk creation of files and writing them with contents.

File Creator Task

In Spring Batch, a Tasklet represents the unit of work to be done and in our example case, this would be creation of a file in the given path and then populating it with file contents. Have a look at the following piece of code,
FileCreatorTasklet.java

package net.javabeat.articles.spring.batch.examples.filewriter;

import java.io.BufferedWriter;
import java.io.FileWriter;

import org.springframework.batch.core.StepContribution;
import org.springframework.batch.core.scope.context.ChunkContext;
import org.springframework.batch.core.step.tasklet.Tasklet;
import org.springframework.batch.repeat.RepeatStatus;

 public class FileCreatorTasklet implements Tasklet{

	private String filePath;
	private String content;
	public void setFilePath(String filePath) {
	this.filePath = filePath;
	}

	public void setContent(String content) {
		this.content = content;
	}

	public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) throws Exception {

		FileWriter fileWriter = null;
		BufferedWriter bWriter = null;

		try{
			fileWriter = new FileWriter(filePath);
			bWriter = new BufferedWriter(fileWriter);
			bWriter.write(content);
		}catch (Exception e){
			e.printStackTrace();
			throw e;
		}finally{
			if (bWriter != null){
				bWriter.close();
			}
			if (fileWriter != null){
				fileWriter.close();
			}
		}
		return RepeatStatus.FINISHED;
	}
}

The above class extends ‘com.springframework..tasklet.Tasklet’ interface which provides the method execute() for customization. The parameters for this custom class will be the file path and the file contents to be populated, and each of them are represented by the properties ‘filePath’ and ‘contents’. In the later section, we will see how these parameters are populated. Also the execute() method is overridden with the logic of populating the given contents to the file.

Application Context

In this section, we will look into the strategy of configuring the custom task that we wrote in the last section to fit into the Spring Batch framework. Have a look at the following application context configuration xml file.
Application Context

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans

http://www.springframework.org/schema/beans/spring-beans-2.5.xsd">

<bean id="jobLauncher">
<property name="jobRepository" ref="jobRepository"/>
</bean>

<bean id="jobRepository">
<constructor-arg>
<bean/>
</constructor-arg>
<constructor-arg>
<bean />
</constructor-arg>
<constructor-arg>
<bean/>
</constructor-arg>
<constructor-arg>
<bean/>
</constructor-arg>
</bean>

<bean id="transactionManager"/>

</beans>

The example that we will be writing won’t be dealing with transactional data and we have explicitly mentioned that by specifying the ResourcelessTransactionManager class which is mainly used for testing purpose. A Job Repository represents a persistent storage for managing job related entities such as jobs, job parameters, job execution etc. A Job Instance represents a runnable unit of work that is encapsulated with the actual job and the various job parameters. In Spring Batch, this is represented through JobInstance interface. Usually, it is the framework that executes a job written by the application developer and the execution of the job is represented by the interface JobExecution. In the case of the batch processing, the same job has to be run multiple times with varying parameters, and each step execution of the job is represented by StepExecution interface. So while defining a job repository, the parameters are the Dao objects for JobInstance, JobExecution and StepExecution. These are necessary and as with all Spring frameworks, Spring Batch comes with a default in-memory implementation of these Dao classes represented by MapJobInstanceDao, MapJobExecution and MapStepExecution and we have made out job repository to point to these instances.

Configuration

Now in this section, we will see configuration related stuffs specific to application. We have defined a custom tasklet and in this section we will make sure that the custom tasklet fits into the Spring Batch framework. Run through the following code snippet,

Job Configuration

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.5.xsd">

<import resource="applicationContext.xml"/>

<bean id="wordsFWTasklet">
<property name="filePath" value="C:\\temp\\words.txt"/>
<property name="content" value="abcdefghijklmnopqrstuwxyz"/>
</bean>

<bean id="numbersFWTasklet">
<property name="filePath" value="C:\\temp\\numbers.txt"/>
<property name="content" value="0123456789"/>
</bean>

<bean id="taskletStep" abstract="true"
class="org.springframework.batch.core.step.tasklet.TaskletStep">
<property name="jobRepository" ref="jobRepository"/>
</bean>

<bean id="fileWritingJob">
<property name="name" value="fileWritingJob" />
<property name="steps">
<list>
<bean parent="taskletStep">
<property name="tasklet" ref="wordsFWTasklet"/>
<property name="transactionManager" ref="transactionManager"/>
</bean>
<bean parent="taskletStep">
<property name="tasklet" ref="numbersFWTasklet"/>
<property name="transactionManager" ref="transactionManager"/>
</bean>
</list>
</property>
<property name="jobRepository" ref="jobRepository"/>
</bean>

</beans>

We have created two instances for the FileCreator tasklet – one that creates a file in the path ‘C:\temp\words.txt’ for writing the English alphabets and the other one in the path ‘C:\temp\numbers.txt’ that writes numbers. On the whole, since these two file operation tasks denote the step, we have modeled these two tasklets within the TaskletStep. Note that the taslet steps requires an instance of the transaction manager and we have given them also. And finally we have created the batch processing job represented by ‘com.springframework..SimplJob’ by giving it a name and the steps to be executed.

Main

In this section, we will see how to run the example, Spring Batch comes with a number of utility classes for running a job and a simple flavour being the CommandLineJobRunner. The arguments that we will be configuring are the configuration file name and the name of the job.

Main.java

package net.javabeat.articles.spring.batch.examples.filewriter;
<pre>
import org.springframework.batch.core.launch.support.CommandLineJobRunner;

public class Main {

	public static void main(String[] args) {
		CommandLineJobRunner.main(new String[]{"fileWritingJob.xml", "fileWritingJob"});
	}
}

And that’s all. The application is now ready to be run. By running the above program, one can see the files getting created in the appropriate directory with the given contents.

Readers and Processors

Since it is very common in Batch processing that an application is expected to read large set of data and process it according to the business needs, Spring Batch provides a separate set of APIs for doing it in the form of readers and processors. Simply put Readers is a data source which is expected to read large set of data from external resources and processors will process the data read by the readers. Processors can also modify the data returned by the readers if necessary. In this section, we will look into the usage of Readers and Processors.

Item Reader

In this example, we will read a simple set of data from a list containing strings and process them using Spring Batch. Data Readers in Spring Batch are represented through ItemReader and there are bunch of concrete implementations available. Few of them could be AbstractCursorItemReader which reads a record from the database, FlatFileItemReader which reads contents from a file, JmsItemReader which reads the message from a JMS destination, StaxEventItemReader which reads StaxEvent during XML processing and the list goes on.
StringListItemReader.java

package net.javabeat.articles.spring.batch.examples.a;

import java.util.List;

import org.springframework.batch.item.support.ListItemReader;

public class StringListItemReader extends ListItemReader{

	public StringListItemReader(List list) {
		super(list);
	}

	public String read(){
		String readData = super.read();
		System.out.println("Reading data " + readData);
		return readData;
	}
}

In our example, we will use a flavour of ItemReader that reads contents from a List. The class StringListItemReader extends ListItemReader with a properly constructed list object given to its constructor. The method read() is supposed to return the item to be returned, if no data is present this will return null.

Item Processor

As the name suggests, Item Processors are expected to process the data returned by the Item Providers. All item processors extend from ‘org.springframework..ItemProcessor’ and the method process() is overridden for processing the data. This method can create and return a new Object based on the incoming data or it can return null. A null value indicates that the item shouldn’t be processed by the caller.
StringListItemProcessor.java

package net.javabeat.articles.spring.batch.examples.a;

import org.springframework.batch.item.ItemProcessor;

public class StringListItemProcessor implements ItemProcessor{

	public String process(String string) throws Exception {
		System.out.println("Processing data " + string);
		return "\"" + string + "\"";
	}

}

Have a look at the above code. We have written the class StringListItemProcessor and the method process() is overridden for customization. The method simply decorates the incoming data by surrounding it with double-quotes.

Callback

In this section, we will see how to interlink Item Provider and Item Processor with the help of Repeat Callbacks. Simply put, Repeat Callbacks represent the callback methods for repeatable batch processing. We will look into more details on creating and configuring Repeat callback objects in the forthcoming section.
StringRepeatCallback.java

package net.javabeat.articles.spring.batch.examples.a;

import org.springframework.batch.repeat.RepeatCallback;
import org.springframework.batch.repeat.RepeatContext;
import org.springframework.batch.repeat.RepeatStatus;

public class StringRepeatCallback implements RepeatCallback{

	private StringListItemReader reader;
	private StringListItemProcessor processor;

	public void setItemReader(StringListItemReader itemReader){
		this.reader = itemReader;
	}

	public void setItemProcessor(StringListItemProcessor processor){
		this.processor = processor;
	}

	public RepeatStatus doInIteration(RepeatContext repeatContext) throws Exception {

		String data = reader.read();
		if (data == null){
			return RepeatStatus.FINISHED;
		}

		String processedData = processor.process(data);
		System.out.println("For input data " + data + ", the processed data is " + processedData);

		return RepeatStatus.CONTINUABLE;
	}
}

We have created a callback class StringRepeatCallback and the method doInIteration() is overridden. Here we have asked the Item Provider object to return the data and if the data is not null, we have asked the Item Processor object to process the data. Now there has to be so kind of indication as to when the method can break the iteration. If the data returned by the Item Provider is null, then it means that there is no data to be processed and we have returned RepeatStatus.FINISHED which will break the iteration. For all other cases, we have returned ReturnStatus.CONTINUABLE that will continue with the iteration.

Main

Main.java

package net.javabeat.articles.spring.batch.examples.a;

import java.util.ArrayList;
import java.util.List;

import org.springframework.batch.repeat.support.RepeatTemplate;

public class Main {

	public static void main(String[] args) {

		RepeatTemplate template = new RepeatTemplate();

		List colors = new ArrayList();
		colors.add("RED");colors.add("BLUE");colors.add("GREEN");

		StringRepeatCallback callback = new StringRepeatCallback();

		StringListItemReader stringReader = new StringListItemReader(colors);
		callback.setItemReader(stringReader);

		StringListItemProcessor stringProcessor = new StringListItemProcessor();
		callback.setItemProcessor(stringProcessor);

		template.iterate(callback);
	}
}

We will see how to run the above example in this section. Like the Spring template classes JdbcTempate and JmsTemplate, we have the RepeatTemplate class for simplifying the job of executing repeatable operations. The Repeat Template class is set with the Callback object that we created in the last section before calling the iterate() method which will internally call the doInOperation() method. Note that we can set the item provider and the item processor objects to the callback object before passing it to the Template object.

Repeatable operations

In this final section of the article, we will look into the various flavours of Repeatable and Retryable classes. In this example, we will demonstrate the other functionalities of Repeatable classes. Imagine that we want to continuously update the database with some data until some condition is reached. We will see how to incorporate this repeatable operation using the Repeatable class.

DbUpdater

Have a look at the following DbUpdater class which provides the method update() which will just emit a print statement to the console. This method will be getting continuously called from some part of the framework and when to break the operation can be known by calling the method isFinished() method.
DbUpdater.java

package net.javabeat.articles.spring.batch.examples.repeat;

public class DbUpdater{

	private int counter = 0;

	public void update(){
		System.out.println("Update database...");
	}

	public boolean isFinished(){

		boolean finishCondition;

		if (counter == 5){
			finishCondition = true;
		}else{
			finishCondition = false;
		}

		counter++;
		if (finishCondition){
			return true;
		}
		return finishCondition;
	}
}

Listener

In the above class that we have written, the method update() will be called five times, and it is always possible to attach listeners before and after the operation. The listener for the repeatable operation is encapsulated through RepeatListener and there is a support class RepeatListenerSupport that provides default implementation for all the methods present in RepeatListener.
MyRepeatListener.java

package net.javabeat.articles.spring.batch.examples.repeat;

import org.springframework.batch.repeat.RepeatContext;
import org.springframework.batch.repeat.RepeatStatus;
import org.springframework.batch.repeat.listener.RepeatListenerSupport;

public class MyRepeatListener extends RepeatListenerSupport{

	public void after(RepeatContext context, RepeatStatus status) {
		System.out.println("After operation called with status " + status.toString());
	}

	public void before(RepeatContext context) {
		System.out.println("Before operation called");
	}
}

The above class just does the job of printing the status to the console as this class is written for the illustration for attaching listeners to the repeatable operation.

Callback

Now we will write a callback class that will call the DbUpdater class for updating the database. The code snippet class is given below.
MyRepeatCallback.java

package net.javabeat.articles.spring.batch.examples.repeat;

import org.springframework.batch.repeat.RepeatCallback;
import org.springframework.batch.repeat.RepeatContext;
import org.springframework.batch.repeat.RepeatStatus;

public class MyRepeatCallback implements RepeatCallback{

	private DbUpdater updater;

	public void setDbUpdater(DbUpdater updater){
		this.updater = updater;
	}

	public RepeatStatus doInIteration(RepeatContext context) throws Exception {

		boolean finished = updater.isFinished();
		if (finished){
			return RepeatStatus.FINISHED;
		}

		updater.update();
		return RepeatStatus.CONTINUABLE;
	}
}

In the above class, we have called the DbUpdater.update() method when the method isFinished() returns false and the method returns RepeatStatus.CONTINUABLE indicating that the method can prepare for the next iteration. If the method isFinished() returns false, then this method will return RepeatStatus.FINISHED indicating to the framework that batch processing operation ends.

Main

Have a look at the Main class which creates an instance of RepeatTemplate for invoking the callback operation.
RepeatTest.java

package net.javabeat.articles.spring.batch.examples.repeat;

import org.springframework.batch.repeat.RepeatListener;
import org.springframework.batch.repeat.policy.SimpleCompletionPolicy;
import org.springframework.batch.repeat.support.RepeatTemplate;

public class RepeatTest {

	public static void main(String[] args) {

		RepeatTemplate template = new RepeatTemplate();

		SimpleCompletionPolicy policy = new SimpleCompletionPolicy();
		policy.setChunkSize(4);

		template.setCompletionPolicy(policy);

		MyRepeatCallback callback = new MyRepeatCallback();
		DbUpdater updater = new DbUpdater();

		callback.setDbUpdater(updater);

		RepeatListener[] listeners = {new MyRepeatListener()};
		template.setListeners(listeners);

		template.iterate(callback);
	}
}

The first thing to be noted is the completion policy that we have attached to the Repeat Template. Completion Policy denotes the strategy for completing the batch processing job. For example, a batch processor may complete a job on timeout and a different batch processor may complete or terminate the job based on time factor. Here we have introduced the SimpleCompletionPolicy which completes the batch processing based on the operation limit that we have specified through the chunk size parameter. We have give the chunk size as 4, however, this setting will be overridden based on the return value by RepeatCallback.doInOperation().
It is also possible to attach listeners during batch processing operations and this can be achieved by calling the method setListeners() by calling on the RepeatTemplate object.

Retryable Operations

In this section, we will see the retry support provided by Spring Batch. Not every operation will be successful during the first operation, especially remote calls, and it often happens that an application has to do a retry. Support for such retry operations will be discussed here with an example.

Callback

We will write the Callback class which denotes the retry operation done for getting a connection. To make the class interesting, we will ensure that upon the tenth retry operation, the caller will acquire a connection. On the fifth and the seventh retry, the exceptions ResourceNotAvailableException and ConnectionFailureException will be thrown. As the name ConnectionFailureException suggests, this exception will be thrown when the operation is a failure and the caller can make a retry. But for the exception, ResourceNotAvailableException, the client shouldn’t make a retry because we will imagine that this exception will be thrown when the connection resource is permanently not available.
ConnectionRetryCallback.java

package net.javabeat.articles.spring.batch.examples.retry;

import org.springframework.batch.retry.RetryCallback;
import org.springframework.batch.retry.RetryContext;

public class ConnectionRetryCallback implements RetryCallback{

	private int retryCount = 0;

	public MyConnection doWithRetry(RetryContext context) throws Exception {

		retryCount ++;

		System.out.println("In Retry " + retryCount);
		if (retryCount == 10){
			return new MyConnection("SUCCESS");
		}else if (retryCount == 7){
			throw new ResourceNotAvailableException();
		}else if (retryCount == 5){
			throw new ConnectionFailureException();
		}
		throw new Exception();
	}
}

class MyConnection{
	public MyConnection(String status){}
}

Connection Failure Exception

The following code defined the exception ConnectionFailureException and remember that the caller can make a retry when this exception is thrown from the service.
ConnectionFailureException.java

package net.javabeat.articles.spring.batch.examples.retry;

public class ConnectionFailureException extends Exception{

	/**
	* Default serial version UID.
	*/
	private static final long serialVersionUID = 1L;
}

Resource Not Available Exception

Have a look at the following class definition and when this exception is thrown from the service, the caller shouldn’t make a retry as this exception denotes that the resource is permanently unavailable.
ResourceNotAvailableException.java

package net.javabeat.articles.spring.batch.examples.retry;
class ResourceNotAvailableException extends Exception{

	/**
	* Default serial version UID.
	*/
	private static final long serialVersionUID = 1L;
}

Main

We have started the main program by creating a Template for the retryable operation. A retry policy indicates how the framework should deal with the retryable operation. In our case, we have specified by the Simple Retry Policy which will attempt the retry operation for the maximum number of attempts that we have specified in setMaxAttempts(). The retry policy also makes distinct classifications between retryable exceptions and fatal exceptions. When a retryable exception occurs, the framework will continue with the attempt of the next retry if the maximum number of retry attempts hasn’t reached. Whereas for a fatal exception, the framework won’t perform a retry. In our example case, we have called the method setRetryableExceptionClasses() with a list of exceptions that can be ignored which happens to be the classes; Exception and ConnectionFailureException. Since we don’t want to do a retry for ResourceNotAvailableException, we have called the method setFatalExceptionClasses() by passing the ResourceNotAvailableException class object.
RetryTest.java

package net.javabeat.articles.spring.batch.examples.retry;

import java.util.ArrayList;
import java.util.Collection;

import org.springframework.batch.retry.policy.SimpleRetryPolicy;
import org.springframework.batch.retry.support.RetryTemplate;

public class RetryTest {

	public static void main(String[] args) throws Exception {

		RetryTemplate template = new RetryTemplate();

		SimpleRetryPolicy policy = new SimpleRetryPolicy();
		policy.setMaxAttempts(10);

		Collection<Class> retryableExceptions = new ArrayList<Class>();
		retryableExceptions.add(ConnectionFailureException.class);
		retryableExceptions.add(Exception.class);
		policy.setRetryableExceptionClasses(retryableExceptions);

		Collection<Class> fatalExceptionClasses = new ArrayList<Class>();
		fatalExceptionClasses.add(ResourceNotAvailableException.class);

		policy.setFatalExceptionClasses(fatalExceptionClasses);

		template.setRetryPolicy(policy);

		ConnectionRetryCallback callback = new ConnectionRetryCallback();
		template.execute(callback);
	}
}

Conclusion

also read:

This article has provided the introductory details of Spring Batch. Initially this article focused on writing a simple application using which the various APIs like Tasklet, Job, JobExecution, StepExecution, RepeatStatus, JobRepository etc were explained. Next, the article went on to explaining the details of Item Providers and Item Processors with the help of an example. Finally the article concluded with the usage of Retry and Repeat template classes. Hope the readers would have gained the introductory knowledge on writing applications using Spring Batch.
If you have any questions on the spring batch, please post it in the comments section. Also search in our website to find lot of other interesting articles related to the spring framework. There are some interesting articles about spring framework, interview questions, spring and hibernate integration,etc. If you are looking for the detailed knowledge, buy any of the following books for the spring framework. Also refer the spring recommendations for spring books.
If you would like to receive the future java articles from our website, please subscribe here.

Comments

comments

About Krishna Srinivasan

He is Founder and Chief Editor of JavaBeat. He has more than 8+ years of experience on developing Web applications. He writes about Spring, DOJO, JSF, Hibernate and many other emerging technologies in this blog.

Comments

  1. to understand this, i am thing to create project in my machine but i don’t know how to start. i mean new project should be java project or maven project or new spring template etc.. please help

  2. In the first Job Configuration xml file, where is the class that goes with the following bean:

    <bean id="fileWritingJob" >

  3. Which version of Spring Batch and Spring (core) were used for the compilation of the java code.

  4. Which version of Spring Batch and Spring (core) were used for the compilation of the java code.
    Assuming this is tested on JDK 1.5 or 1.6

  5. Nice article .. thanks for writing ..

  6. Very nice article. Thanks for writing

Trackbacks

  1. […] Introduction to Spring Batch […]

  2. […] Introduction to Spring Batch […]

Speak Your Mind

*