One of the lesser documented transports in the WSO2 ESB is the VFS (Virtual Files System) transport. This Axis2 transport allows you to move files from one location to another. But how do we use it? In this WSO2 Tutorial we will look into the WSO2 ESB VFS system and will move 10000 files from location A to Location B.
Tweedledum, Tweedledee and Axis2
The fictional characters of Tweedledum and Tweedledee here in a picture from Through the Looking Glass are strangely relevant for the VFS transport. The axis2 transports come in pairs, a sender and receiver. So, if you want to use both you need to uncomment the two transports in axis2.
The file can be found at [
ESB-HOME]/repository/conf/axis2/axis2.xml
The [ESB-HOME] refers to the fully qualified path to the installed version of the ESB.
Remove the <!—and –> to enable the Receiver and Listener. Contrary to other transports there are no other parameters to be defined. The receiver looks like this:
And the sender like this.
Restart the ESB since transports are only loaded at start, not when they are changed on the fly.
Defining the VFS
The parameters for VFS are defined not in the axis2 file but when you use them in a proxy for instance. This table below is taken from the documentation where we edited some of the descriptions to make it more readable. Please see the documentation for the complete table. The parameters in red are used in this lab
Parameter Name | Description | Required | Possible Value (default in BOLD) |
transport.vfs. FileURI | The URI where the files you want to process are located.
| Yes | A valid file URI in the following form: file://<path> (other prefixes like SFTP and SMB are also possible) |
transport.vfs. ContentType | Content type of the files processed by the transport. To specify the encoding, follow the content type with a semi-colon and the character set.
| Yes | A valid content type for the files (e.g., text/xml). You can specify the encoding after the content type, such as text/plain;charset=UTF-32 |
transport.vfs. FileNamePattern | If the VFS listener should process only a subset of the files available at the specified file URI location, use this parameter to select those files by name using a regular expression. | No | A regular expression to select files by name (e.g., *.xml) |
transport. PollInterval | The polling interval for the transport receiver to poll the file URI location. The value is expressed in seconds unless you add “ms” for milliseconds, e.g., “2” or “2000ms” to specify 2 seconds. | No | A positive integer. |
transport.vfs. ActionAfterProcess | Whether to move, delete or take no action on the files after the transport has processed them. | No | MOVE, DELETE or NONE |
transport.vfs. ActionAfterFailure | Whether to move, delete or take no action on the files if a failure occurs. | No | MOVE, DELETE or NONE |
transport.vfs. MoveAfterProcess | Where to move the files after processing if ActionAfterProcess is MOVE. | Yes, if ActionAfterProcess is MOVE | A valid file URI |
transport.vfs. MoveAfterFailure | Where to move the files after processing if ActionAfterFailure is MOVE. | Yes, if ActionAfterFailure is MOVE | A valid file URI |
transport.vfs. ReplyFileURI | The location where reply files should be written by the transport. | No | A valid file URI |
transport.vfs. ReplyFileName | The name for reply files written by the transport. | No | A valid file name (response.xml) |
transport.vfs. MoveTimestampFormat | The pattern/format of the timestamps added to file names as prefixes when moving files. | No | A valid timestamp pattern (e.g., yyyy-MM-dd’T’HH:mm:ss.SSSZ ) |
transport.vfs. Streaming | Whether files should be transferred in streaming mode, which is useful when transferring large files | No | true or false |
transport.vfs. ReconnectTimeout | Reconnect timeout value in seconds to be used in case of an error when transferring files | No | A positive integer (30 sec) |
transport.vfs. MaxRetryCount | Maximum number of retry attempts to carry out in case of errors. | No | A positive integer (3) |
transport.vfs.Append | When writing the response to a file, whether the response should be appended to the response file instead of overwriting the file.
| No | true or false (the response file will be completely overwritten). |
transport.vfs. MoveAfterFailedMove | Where to move the failed file. | No | A valid file URI |
transport.vfs. FailedRecordsFileName | The name of the file that maintains the list of failed files. | No | A valid file name vfs-move-failed-records. properties |
transport.vfs. FailedRecordsFile Destination | Where to store the failed records file. | No | A folder URI (repository/conf/) |
transport.vfs. MoveFailedRecord TimestampFormat | Entries in the failed records file include the name of the file that failed and the timestamp of its failure. This property configures the time stamp format. | No | A valid timestamp pattern (dd-MM-yyyy HH:mm:ss) |
transport.vfs. FailedRecordNext RetryDuration | The time in milliseconds to wait before retrying the move task. | No | A positive integer 3000 milliseconds |
transport.vfs.Locking | By default, file locking is enabled in the VFS transport. This parameter lets you configure the locking behavior on a per service basis. You can also disable locking globally by specifying the parameter at the receiver level and selectively enable locking only for a set of services. | No | enable or disable |
transport.vfs. FileProcessCount | This setting allows you to throttle the VFS listener by processing files in batches. Specify the number of files you want to process in each batch. | No | A positive integer, such as 10 |
transport.vfs. FileProcessInterval | The interval in milliseconds between two file processes. | No | A positive integer, such as 1000 |
transport.vfs.ClusterAware | Whether VFS coordination support is enabled in a clustered deployment or not. | No | true or false |
transport.vfs.FileSizeLimit | Only file sizes that are less than the defined limit will be processed. | No | File size in bytes 1(unlimited file size) |
transport.vfs.AutoLockReleaseInterval | The timeout value for stale locks where the VFS transport will ignore those file locks once the defined time period is reached
| No | Time in milliseconds (20000) |
transport.vfs.SFTPIdentities | Location of the private key | No | A valid file path |
transport.vfs.SFTPIdentityPassPhrase | Passphrase of the private key | No | A valid passphrase |
transport.vfs.SFTPUserDirIsRoot | If the SFTP user directory should be treated as root | No | true or false |
Need files?
With so many parameters, where does one start?
Well, let’s establish what we want to do. We would like to move 10000 files from location A to B. Why 10.000? Simply because we can and we would like to show how much time it will take. Step by step we will add additional restrictions on the files, e.g. filename and filesize.
But where do we find 10.000 files? The windows fsutil will create them for you. This simple loop creates 10000 files with the name VFS_TEST_[number].dat
For /L %i in (1,1,10000) do fsutil file createnew VFS_TEST_%i.dat 32483
It will approximately take two minutes to do so.
For the purpose of this blog we will create the files in the directory C:WSO2ESBVFSINPUT directory.
The commands are for windows :
cd
md WSO2ESBVFSINPUT
cd WSO2ESBVFSINPUT
For /L %i in (1,1,10000) do fsutil file createnew VFS_TEST_%i.dat 32483
For Linux you can use this script. (with thanks to my colleague Rob Brouwers)
#! /bin/bash
for n in {1..10000}; do dd if=/dev/zero of=input/VFS_TEST_$( printf %03d "$n" ).dat bs=1024 count=30
done
Now you have the files in the right location. These files are completely empty of course but for the purpose of this blog quite suitable!
Creating a proxy
So how do we instruct the ESB to move the files?
We create a simple proxy with IN-, OUT- and FAULT Sequence.
We have a log mediator and a clone mediator that will send the file to the sequence that will write a file that indicates the file is processed. Please observe that the parameters are outside the <target/> tags and are not visible in the proxy. That is because parameters are not mediators
These parameters are self-explanatory to a large extend by the name of the parameter and the associated value.
In short: we will move files that have a .dat extension from C:/WSO2/ESB/VFS/INPUT
to C:/WSO2/ESB/VFS/ORIGINAL
. You see that I do not mention the ContentType and PollInterval. These are also important to describe the interval for polling and the ContentType of the file.
<?xml version="1.0" encoding="UTF-8"?>
<proxy name="FileProxy" startOnLoad="true" transports="vfs" xmlns="http://ws.apache.org/ns/synapse">
<parameter name="transport.PollInterval">15</parameter>
<parameter name="transport.vfs.FileURI">file:///C:/WSO2/ESB/VFS/INPUT/</parameter>
<parameter name="transport.vfs.ContentType">text/plain</parameter>
<parameter name="transport.vfs.ActionAfterProcess">MOVE</parameter>
<parameter name="transport.vfs.MoveAfterFailure">file:///C:/WSO2/ESB/VFS/FAILURE/</parameter>
<parameter name="transport.vfs.ActionAfterFailure">MOVE</parameter>
<parameter name="transport.vfs.FileNamePattern">.*.dat</parameter>
<parameter name="transport.vfs.MoveAfterProcess">file:///C:/WSO2/ESB/VFS/ORIGINAL/</parameter>
<target>
<inSequence>
<log level="custom">
<property name="sequence" value="Proxy"/>
</log>
<clone>
<target sequence="fileWriteSequence"/>
</clone>
</inSequence>
<outSequence/>
<faultSequence/>
</target>
</proxy>
Sequence
For each file we pick up we will write a file to the C:/WSO2/ESB/VFS/OUTPUT
directory using a separate sequence. This sequence looks like this:
We log to the console that we hit this sequence and set each time the unique ReplyFileName from a concatenated MessageID and the UUID with a .txt suffix and send it to the FileEpr endpoint. We set the OUT_ONLY property that we do not expect a message back.
<?xml version="1.0" encoding="UTF-8"?>
<sequence name="fileWriteSequence" trace="disable" xmlns="http://ws.apache.org/ns/synapse">
<log level="custom">
<property name="sequence" value="fileWriteSequence"/>
</log>
<property expression="fn:concat(fn:substring-after(get-property('MessageID'), 'urn:uuid:'), '.txt')" name="transport.vfs.ReplyFileName" scope="transport" type="STRING" xmlns_ns2="http://org.apache.synapse/xsd"/>
<property name="OUT_ONLY" scope="default" type="STRING" value="true"/>
<send>
<endpoint name="FileEpr">
<address uri="vfs:file:///c:/WSO2/ESB/VFS/OUTPUT/"/> </endpoint>
</send>
</sequence>
We need to create two more directories in order to make it work. These are the windows commands:
cd
md WSO2ESBVFSOUTPUT
MD WSO2ESBVFSFAILURE
MD WSO2ESBVFSORIGINAL
This is of course a command line command in windows.
Minimizing integration platform outage risks to secure business
Download nowTesting the setup
Let’s give it a spin. We create a C-App and CAR file from the two artifacts we just created and deploy them to the server. We will not show this process here to make the article not too long.
However, we want to look if the CAR is deployed and the proxy and sequence are on the ESB:
And the sequence
In order to get a good overview, I am going to stop the ESB. Since when you activate the proxy it will immediately start processing files. I am creating the 10.0000 files as described earlier.
I am now starting the ESB and immediately starts picking up.
But in original we find all of our files.
And in output the files we wrote there.
On the console we find the log mediators, 10.000 x fileWriteSequence and 10.000 x Proxy.
This is of course a very simple setup of a VFS transport, just moving it. In the next blog we will extend the functionality and see how we can do something with the content of such a file.
If you have any questions about this blogpost contact us via the comments section of this blog. View also our WSO2 Tutorials, webinars or white papers for more technical information. Need support? We do deliver WSO2 Product Support, WSO2 Development Support, WSO2 Operational Support and WSO2 Training Programs.
Yenlo is the leading, global, multi-technology integration specialist in the field of API-management, Integration technology and Identity Management. Known for our strong focus on best-of-breed hybrid and cloud-based iPaaS technologies. Yenlo is the product leader and multi-award winner in WSO2, Boomi, MuleSoft and Microsoft Azure technologies and offers best-of-breed solutions from multiple leading integration vendors.