Tuesday, 6 September 2016

Hadoop1x vs Hadoop 2x



Hadoop 1

Hadoop 1.x Supports only MapReduce (MR) processing model.it Does not support non-MR tools.
MR does both processing and cluster resource management.
1.x Has limited scaling of nodes. Limited to 4000 nodes per cluster.
Works on concepts of slots – slots can run either a Map task or a Reduce task only.
A single Namenode to manage the entire namespace.
1.x Has Single-Point-of-Failure (SPOF) – because of single Namenode- and in case of Namenode failure, needs manual intervention to overcome.
MR API is compatible with Hadoop 1x. A program written in Hadoop1 executes in Hadoop1x without any additional files.
1.x Has a limitation to serve as a platform for event processing, streaming and real-time operations.
Hadoop 2

Hadoop 2.x Allows to work in MR as well as other distributed computing models like Spark, Hama, Giraph, Message Passing Interface) MPI & HBase coprocessors.
YARN (Yet Another Resource Negotiator) does cluster resource management and processing is done using different processing models.
2.x Has better scalability. Scalable up to 10000 nodes per cluster.
Works on concepts of containers. Using containers can run generic tasks.
Multiple Namenode servers manage multiple namespace.
2.x Has feature to overcome SPOF with a standby Namenode and in case of Namenode failure, it is configured for automatic recovery.
MR API requires additional files for a program written in Hadoop1x to execute in Hadoop2x.
Can serve as a platform for a wide variety of data analytics-possible to run event processing, streaming and real time operations.

No comments:

Post a Comment