A Guide Using Java / XML-DSL and Spring Boot
File operations are an essential part of Enterprise Integration. It is actually one of the EIP patterns, so it is very common and we encounter many forms and use cases.
One of the more challenging use cases is when the files are big, up to 10 megabytes and need to be processed. For one of our clients, we created a Proof of Concept that showed the most efficient way to process 10-megabyte CSV files.
We have divided the document in two parts, the first part will focus on the XML-DSL approach and the second one on JAVA-DSL. All the source code can be found in this Yenlo Bitbucket URL.
Challenges
The challenges often lie in the memory usage (heap size) and processing speed. Reading the message in memory often leads to issues like high resource usage, data corruption and processing slowness.
We selected for this POC a product from the Apache stack called Camel, this is an open-source integration product that supports EIP and is ideally suited for this task. Why? Because the way Camel is setup, we can make use of its Domain Specific Language models, where it have multiple options like Java, XML, Groovy, YAML and etc. As we have the options to select, we can compare the performance need of our integration and the knowledge expertise of the team and select the DSL for the development. When it comes to file reading, the camel feature of paginated reading is one of the capabilities that can attract a lot. The streaming option in Camel helps us to read the data in a paginated way and process it based on the pagination value we have set. This will help to avoid the memory issues that can occur during the large file processing, as the file will not be read fully at once to memory.
In this article, we are looking into the most used two DSLs: Java and XML based way of implementation and at last we will compare the performance difference between XML-DSL and Java-DSL as well. The Part-1 will contain the XML DSL based Integration and the Part-2 will contain the Java DSL based Integration.
Apache Camel Message Flow

In Apache Camel the Camel Context is the container which holds all the fundamental components. Once the Route has been configured through DSLs and added to the Camel Context, then the route will become active for processing messages. In a typical scenario, the message will be processed through each configuration defined in the Route such as logging initial message, setting properties, translating the message, further processing through custom business logics and finally will be handed over to the endpoint.
Helping you to select and design your Enterprise API Management platform
Download nowImplementation Use Case
As mentioned earlier, in this article we are going to build a file processing use case in both XML-DSL and Java-DSL, which are the two most used and featured DSLs for Apache Camel Integrations.
Below the diagram which illustrates the use case that we are going to implement. We are looking to implement two routes:
File To Topic Route
Topic To Rest API Route
XML-DSL Implementation
Before starting the implementation it’s better to design the way we write the code as a reusable component. Below diagram depicts how we can make the XML-DSL Apache Camel implementation as a reusable component using the Route Templates feature of Apache Camel.
Note: Unit Testing feature is not include in this section of the article for XML-DSL but will be covered under Java-DSL.
So here our intention is to create a camel-file-route-templates project in spring boot and reuse the component classes and routes in the camel-file-integration-one project.
Explanation on camel-file-route-templates project
- JsonAggregationStrategy.java
In this context, our approach involves first extracting the information from the CSV file, then performing data processing to generate a JSON output containing specified fields. Additionally, we aim to group a defined number of lines from the file simultaneously. This aggregation process is facilitated by the Camel Configuration specified in the application.yml file under the property “noOfLinesToReadAtOnce.” To carry out this aggregation, we will utilize the following class.
package com.camel.file.process.templates.aggregate;
import org.apache.camel.AggregationStrategy;
import org.apache.camel.Exchange;
import org.springframework.stereotype.Component;
@Component
public class JsonAggregationStrategy implements AggregationStrategy {
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
if (oldExchange == null) {
return newExchange;
}
String oldBody = oldExchange.getIn().getBody(String.class);
String newBody = newExchange.getIn().getBody(String.class);
String body = null;
if (!oldBody.startsWith("[")) {
body = "[ " + oldBody + ", " + newBody + " ]";
} else{
body = oldBody.replace("]", "") + ", " + newBody + " ]";
}
oldExchange.getIn().setBody(body);
return oldExchange;
}
}
- TimeGap.java
Utils package having TimeGap.java: This is used to calculate the process time which can be useful to comparing the performance.
package com.camel.file.process.templates.utils;
import java.util.Date;
import java.util.concurrent.TimeUnit;
import org.springframework.stereotype.Component;
@Component
public class TimeGap {
public String calculateTimeDifference(Date startTime, Date endTime) {
long diffInMillis = endTime.getTime() - startTime.getTime();
long seconds = TimeUnit.MILLISECONDS.toSeconds(diffInMillis);
long minutes = TimeUnit.MILLISECONDS.toMinutes(diffInMillis);
long hours = TimeUnit.MILLISECONDS.toHours(diffInMillis);
return String.format("%d hours, %d minutes, %d seconds %d milliseconds", hours, minutes, seconds, diffInMillis);
}
}
- FileRouteTemplateApplication.java
package com.camel.file.process.templates;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class FileRouteTemplatesApplication {
public static void main(String[] args) {
SpringApplication.run(FileRouteTemplatesApplication.class, args);
}
}
Under the resources/templates file-to-topic.xml: This route template has been designed for the purpose of monitoring a file location, fetching the file’s content, processing it through streaming, segmenting it based on CSV lines, and finally publishing it to a topic.
<?xml version="1.0" encoding="UTF-8"?>
<!-- This is a common template for csv/text file processing and generating and publish to Topic -->
<routeTemplates xmlns="http://camel.apache.org/schema/spring">
<routeTemplate id="file-to-topic">
<templateBean name="jsonBean" type="#class:com.camel.file.process.templates.aggregate.JsonAggregationStrategy"
beanType="com.camel.file.process.templates.aggregate.JsonAggregationStrategy"/>
<templateBean name="timeGapBean" type="#class:com.camel.file.process.templates.utils.TimeGap"
beanType="com.camel.file.process.templates.utils.TimeGap"/>
<route id="{{file-to-topic.routeId}}">
<from uri="{{file-to-topic.file.uri}}"/>
<log message="Starting to process big file: ${header.CamelFileName} and ${header.camelFileLength} Bytes"
loggingLevel="INFO"/>
<setProperty name="startTime">
<simple>${date:now}</simple>
</setProperty>
<split streaming="true">
<tokenize token="{{file-to-topic.file.token}}"
skipFirst="{{file-to-topic.file.noOfLinesToSkip}}"
group="{{file-to-topic.file.noOfLinesToReadAtOnce}}"/>
<log message="Message Before Splitting: ${body}"/>
<unmarshal>
<bindy type="Csv" classType="{{file-to-topic.mapperClass}}"/>
</unmarshal>
<split aggregationStrategy="{{jsonBean}}">
<simple>${body}</simple>
<bean beanType="{{file-to-topic.processorClass}}"/>
<marshal>
<json library="Jackson"/>
</marshal>
<log message="Message Sent after Processing: ${body}"/>
</split>
<log message="Message Sent after Splitting: ${body}" loggingLevel="INFO"/>
<to uri="{{file-to-topic.endpoint.uri}}"/>
</split>
<setProperty name="endTime">
<simple>${date:now}</simple>
</setProperty>
<log message="Done processing big file: ${header.CamelFileName}" loggingLevel="INFO"/>
<to uri="bean:{{timeGapBean}}?method=calculateTimeDifference(${exchangeProperty.startTime},${exchangeProperty.endTime})"/>
<log message="Time difference: ${body}" loggingLevel="INFO"/>
</route>
</routeTemplate>
</routeTemplates>
Under the resources/templates topic-to-rest.xml: This route template is designed to retrieve messages from the topic and subsequently forward them to a backend, while also allowing for the possibility of redelivery.
<?xml version="1.0" encoding="UTF-8"?>
<!-- This is a common template for listening to a topic and publish to a REST endpoint -->
<routeTemplates xmlns="http://camel.apache.org/schema/spring">
<routeTemplate id="topic-to-rest">
<route id="{{topic-to-rest.routeId}}">
<from uri="{{topic-to-rest.listener.uri}}"/>
<throttle timePeriodMillis="{{topic-to-rest.receiver.throttle.lockPeriodMilliSeconds}}">
<constant>{{topic-to-rest.receiver.throttle.requestCount}}</constant>
</throttle>
<setHeader name="Content-Type">
<constant>application/json</constant>
</setHeader>
<setHeader name="Authorization">
<constant>Bearer myToken {{topic-to-rest.receiver.token}}</constant>
</setHeader>
<onException>
<exception>org.apache.camel.http.base.HttpOperationFailedException</exception>
<onWhen>
<simple>${exception.statusCode} == 422</simple>
</onWhen>
<redeliveryPolicy maximumRedeliveries="{{topic-to-rest.receiver.reDelivery.attempts}}"
redeliveryDelay="{{topic-to-rest.receiver.reDelivery.delay}}"/>
<handled>
<constant>true</constant>
</handled>
<log message="HTTP error occurred with status ${exception.statusCode}. Response body: ${exception.message}"/>
<to uri="{{topic-to-rest.receiver.reDelivery.deadLetterQueue}}"/>
</onException>
<to uri="{{topic-to-rest.receiver.uri}}"/>
<choice>
<when>
<simple>${header.CamelHttpResponseCode} == 200</simple>
<log message="Message Successfully sent to Rest Endpoint and Received status code: ${header.CamelHttpResponseCode}"/>
</when>
</choice>
</route>
</routeTemplate>
</routeTemplates>
- pom.xml
Refer the sample pom.xml at https://bitbucket.org/yenlo/yenlo_camel/src/master/xml-dsl/camel-file-route-templates/pom.xml
Here, one important part in the pom.xml is:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>3.1.1</version>
<executions>
<execution>
<goals>
<goal>jar</goal>
</goals>
<phase>package</phase>
<configuration>
<classifier>library</classifier>
</configuration
</execution>
</executions>
</plugin>
This is needed to package the project in a way that can be used as a library project.
Execute the mvn clean install: this will deploy the jar to local maven repo so we can reuse in our next project.
When you extract and see the main jar generated by spring boot you will observe the project structure as below.
If we give this as dependency the classes will not be able to be reused and will throw class not found issues. That’s why we are having the maven-jar-plugin configuration, which will generate a jar with -library.jar. If you extract that jar file:
This is now suitable for acting as a dependency.
Note: Also, there are some additional dependencies added to pom.xml which are needed for the Route templates to run. As we are including those in the common project, we will be not need to add these in the next project.
Explanation on camel-file-integration-one project
In this project we are going to use the common templates created in previous project and will be creating the routes from it.
- pom.xml
Refer the pom.xml at https://bitbucket.org/yenlo/yenlo_camel/src/master/xml-dsl/camel-file-integration-one/pom.xml
The important part to take note in this is the dependency section, which adds the previous template project as a dependency.
<dependency>
<groupId>com.camel.file.process.templates</groupId>
<artifactId>camel-file-route-templates</artifactId>
<version>1.0.0-SNAPSHOT</version>
<classifier>library</classifier>
</dependency>
And then the part that loads the templates to the Camel Context. This is one of the overheads encountered during the XML-DSL common project implementation. Even though the templates folder is added to dependency JAR file it will not be loaded to Camel Context when using in the camel-file-integration-one project. To overcome this, we use the maven-dependency-plugin.
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<version>3.2.0</version>
<executions>
<execution>
<id>unpack-dependency</id>
<phase>generate-resources</phase>
<goals>
<goal>unpack</goal>
</goals>
<configuration>
<artifactItems>
<artifactItem>
<groupId>com.camel.file.process.templates</groupId>
<artifactId>camel-file-route-templates</artifactId>
<version>1.0.0-SNAPSHOT</version>
<type>jar</type>
<overWrite>true</overWrite>
<outputDirectory>${basedir}/src/main/resources/common-templates</outputDirectory>
<includes>BOOT-INF/classes/templates/*.xml</includes>
</artifactItem>
</artifactItems>
</configuration>
</execution>
</executions>
</plugin>
- Pojo/InputCsvMapper.java: This is used to map the csv file headers, so that we can process it with transformations for specific fields.
package com.camel.file.process.camelintegrationone.pojo;
import lombok.Data;
import org.apache.camel.dataformat.bindy.annotation.CsvRecord;
import org.apache.camel.dataformat.bindy.annotation.DataField;
@Data
@CsvRecord(separator = ",")
public class InputCsvMapper {
@DataField(pos = 1, columnName = "id")
private int id;
@DataField(pos = 2, columnName = "firstname")
private String firstName;
@DataField(pos = 3, columnName = "lastname")
private String lastName;
@DataField(pos = 4, columnName = "email")
private String email;
@DataField(pos = 5, columnName = "email2")
private String email2;
@DataField(pos = 6, columnName = "profession")
private String profession;
}
- Process/InputCsvProcessor.java: This implementation will transform the message in to a defined format.
package com.camel.file.process.camelintegrationone.process;
import com.camel.file.process.camelintegrationone.pojo.InputCsvMapper;
import lombok.extern.slf4j.Slf4j;
import org.apache.camel.Exchange;
import org.apache.camel.Processor;
import org.apache.camel.util.json.JsonObject;
@Slf4j
public class InputCsvProcessor implements Processor {
@Override
public void process(Exchange exchange) throws Exception {
InputCsvMapper csvRecord = exchange.getIn().getBody(InputCsvMapper.class);
JsonObject jsonObject = new JsonObject();
jsonObject.put("updatedId", csvRecord.getId());
jsonObject.put("updateName", csvRecord.getFirstName());
exchange.getIn().setBody(jsonObject);
}
}
- route-builder.xml : Builders containing the route builders
<?xml version="1.0" encoding="UTF-8"?>
<templatedRoutes id="camel" xmlns="http://camel.apache.org/schema/spring">
<templatedRoute routeTemplateRef="file-to-topic"/>
<templatedRoute routeTemplateRef="topic-to-rest"/>
</templatedRoutes>
- Values of above 4 elements needs to be set at the application.yml
camel:
springboot:
name: camel-file-integration-one
routes-include-pattern: classpath:common-templates/**/templates/*.xml,classpath:builders/*.xml,classpath:templates/*.xml
logging:
level:
org:
apache:
camel: DEBUG
spring:
activemq:
broker-url: "tcp://XXXXXXXXX:61616"
user: XXXXXX
password: XXXXXX
file-to-topic:
routeId: "file-to-topic-route"
file:
uri: "file:src/main/resources?noop=true&delay=20000&antInclude=file_*.csv"
token: "\n"
noOfLinesToSkip: 1
noOfLinesToReadAtOnce: 2
mapperClass: "com.camel.file.process.camelintegrationone.pojo.InputCsvMapper"
processorClass: "com.camel.file.process.camelintegrationone.process.InputCsvProcessor"
endpoint:
uri: "activemq:topic:camel.testtopic"
topic-to-rest:
routeId: "topic-to-rest-route"
listener:
uri: "activemq:topic:camel.testtopic"
receiver:
uri: "https://run.mocky.io/v3/c18b3268-7472-4061-8132-1ba9dc15c3dd"
#uri: "https://mock.codes/422"
token: "12323444552211"
reDelivery:
attempts: 3
delay: 5000
deadLetterQueue: "activemq:queue:dead-letter"
throttle:
lockPeriodMilliSeconds: 10000
requestCount: 1
Once mvn clean package is executed, we can observe that the dependency templates will be loaded to the common-templates folder.
The built target/classes also will have these templates. As we have configured our application.yml with “routes-include-pattern”.
camel:
springboot:
name: camel-file-integration-one
routes-include-pattern: classpath:common-templates/**/templates/*.xml,classpath:builders/*.xml,classpath:templates/*.xml
It will load the templates to the Camel Context when we start the Spring Boot Application.
Execute the typical spring boot start command: java -jar target/camel-file-integration-one-1.0.0-SNAPSHOT.jar
That concludes our discussion of the XML-DSL Implementation, where we’ve addressed the challenges encountered when integrating large file processing. We’ve explored how to tackle these challenges using the Apache Camel DSL – XML-DSL. In our upcoming blog post, we’ll delve into implementing the Java DSL for the same use case, and we’ll also examine a performance comparison between the two approaches: XML-DSL and Java DSL. Stay Tuned!