Hello everyone, hope you are finding the tutorials quite useful. In the previous post, we performed the Sqoop Export operation. In this tutorial, we are going to start the flume agent.
Flume is one of the projects of Apache Ecosystem. Apache Flume is a reliable and distributed service for moving a large amount of log data. I used Apache Flume in one of the projects that I have worked on. I used it for importing the data in real time from a weblog server. It is very easy to bring it up and running in Hortonworks Sandbox, as it comes preinstalled and there is nothing you have to do to get started with it.
So, let us get started.
In this tutorial, our aim is to get Flume agent up and running. We are going to send some message/data in the next tutorial.
For starting the flume agent, all we need is the configuration file and flume-ng command to start it.
For your reference, I have created and uploaded the configuration file and it looks like as follows.
|# example.conf: A single-node Flume configuration|
|# Name the components on this agent|
|a1.sources = r1|
|a1.sinks = k1|
|a1.channels = c1|
|# Describe/configure the source|
|a1.sources.r1.type = netcat|
|a1.sources.r1.bind = localhost|
|a1.sources.r1.port = 44444|
|# Describe the sink|
|a1.sinks.k1.type = logger|
|# Use a channel which buffers events in memory|
|a1.channels.c1.type = memory|
|a1.channels.c1.capacity = 1000|
|a1.channels.c1.transactionCapacity = 100|
|# Bind the source and sink to the channel|
|a1.sources.r1.channels = c1|
|a1.sinks.k1.channel = c1|
Once you have this configuration file, all you need is to start the flume agent with the help of flume-ng command, as already mentioned.
But, before actually running the flume-ng command, let us get to know what are the things mentioned in the above configuration file.
- a1: a1 is the flume agent name that we are going to use for this tutorial. It can be any name, but should not contain spaces in it.
- r1: r1 is the source name, which agent a1 is going to use to get the data from.
- k1: k1 is the sink or destination name, which agent a1 is going to drop the data to.
- c1: c1 is the channel used by agent a1 to transfer data from source r1 to sink k1.
- a1.sources: this configuration binds the sources with our flume agent. In this case, we are binding agent a1 with source r1.
- a1.sinks: it defines the sink for a flume agent, which is k1 in this case.
- a1.channels: it gives the channel which is going to be used by the flume agent to connect both source and sink. c1 is the channel in this case.
- a1.sources.r1.type: this command is used to define the source type. In this case, it is netstat.
- a1.sources.r1.bind: this command is used to define the hostname of the source. In our case, it is the same system, therefore localhost is entered in the value.
- a1.sources.r1.port: this indicates the port number on which flume agent a1 is going to expect the data from the source. We have used 44444 as the port number, which is the default value.
- a1.sinks.k1.type: it defines the type of the sink, which is a logger in this case.
- a1.channels.c1.type: it defines the type of the channel, which is a memory in this case.
- a1.channels.c1.capacity: this indicates the capacity of the channel in a number of bytes. I have given the value to be 1000 bytes for this example.
- a1.channels.c1.transactionCapacity: this defines the channel’s transaction capacity. This is also measured in the number of bytes.
- a1.sources.r1.channels: it defines which channel should be used for the respective source. This example does bind operation for source t1 and channel c1.
- a1.sinks.k1.channel: this defines the channel name that should be used for the corresponding sink mentioned along with the command. This example does bind operation between sink k1 and channel c1.
Now, that we have got the meaning of each and every line in the configuration file, let us start the flume agent and see how it looks once it is up and running.
The syntax to start the flume agent looks something like this.
flume-ng agent –name [AGENT_NAME] –conf-file [CONFIGURATION_FILE_PATH]
With the help of configuration file example.conf and the syntax shown above, I came up with the following command which will start the flume agent for the mentioned source, sink, and channel.
flume-ng agent –name a1 –conf-file example.conf
Once you run above command, it gives us following output on the terminal window.
The output of above command is as follows.
The log messages like Starting Sink k1, Starting Source r1 and other messages show us that the flume agent started successfully.
Once it gives above output, we can test whether the flume process is running or not. This can be done with the help of telnet command.
The syntax of this command is as follows.
telnet [HOSTNAME] [PORT_NUMBER]
As discussed earlier, HOSTNAME is localhost and flume is installed on 44444 port number. Therefore, the syntax above will get converted like this.
telnet localhost 44444
For executing this, open another terminal window and execute above shown ommand, as shown in below screenshot.
This confirms that the flume agent is up and running, which completes our objective for this tutorial.
I hope all the commands, screenshots, and explanation are enough to understand the process to start the flume agent.
In the next tutorial, we are going to configure a flume memory channel to send the data from one source to the destination via a channel.
Hope this helps.