Background Big data concepts evolved to solve a specific problem of processing data of diversified nature, high volume and streaming data. Hadoop came with the first architectural solution to process this nature of data on commodity hardware against the high cost HPC and appliance based systems. During past progressive years, it solved the then challenges… Read More Cloudera vs AWS vs AZURE vs Google Cloud: How to decide on the right big data platform?
In this article, I am going to explain how to use Hadoop streaming with Perl scripts. First, let’s understand some theory behind Hadoop streaming. Hadoop has been written in Java. Therefore, the native language to write MapReduce program is Java. But, Hadoop also provide an API to MapReduce that allows you to write your map… Read More Hadoop Streaming with Perl Script
When we talk about security in Hadoop, we need to explore all the aspect of cluster networking and understand how the Nodes and Client communicate to each other. Let’s list down possible communication in a simple cluster. Master – Slave communication => Namenode – Datanode / Jobtracker – Tasktracker communication Slave to slave communication =>… Read More Implementing Security in Hadoop Cluster
Writing a mapper & reducer Program definition is easy. Just extend your class by org.apache.hadoop.mapreduce.Mapper and org.apache.hadoop.mapreduce.Reducer respectively and override the map and reduce methods to implement your logics. But, when it comes to write driver program (contain main method of program) for the MapReduce Job, it’s always preferable to use ToolRunner class & Tool… Read More Tool & ToolRunner – Simplifying the concept
Introduction In this post, I have explained how to develop hadoop jobs in Java and export JAR to run on Hadoop clusters. Most of the articles on internet, talk about installing eclipse-plugin and using maven or ANT to build JAR. To install eclipse-plugin for hadoop, one needs to install eclipse on the same Linux machine… Read More Developing Java Map-Reduce on local machine to run on Hadoop Cluster