Consuming from nsq to Hadoop HDFS with Apache Flume
I was working on a prototype for consuming from nsq to the Hadoop HDFS file system with Apache Flume. I couldn’t find much information, so I made a small project.
Below are a few extra notes about nsq and the project.
docker-flume/Dockerfilebuilds a container which includes Flume, nsq and Hadoop core necessary for persisting files to HDFS
- In order to be able to quickly modify some files in the built docker container, some extra paths are mapped in the
- Adding the start-flume.sh script allows updating the classpath to quickly add libraries and restart the container.
- Adding the flume/flume.conf path means the flume config can be adjusted and the container restarted, without requiring a container rebuild.
- In flume.conf
nsq_tail. While this is a quick way to consume from nsq, it is possibly not a reliable technique to use in a production environment.
- nsq doesn’t appear to have a mechanism for resuming from an offset, like Kafka. This means there could be gaps in data if a consumer goes down and comes back up. In this sense, nsq has similarities to high volume messaging systems like RabbitMQ or ZeroMQ, which provide fewer message durability guarantees than Kafka.