YARN/Hadoop 2.x has a completely different architecture with compared to Hadoop 1.x.
In Hadoop 1.x JobTracker serves two major functions;
YARN tries to separate above mentioned two functionalities into two daemons.
YARN uses the term 'application' instead of the term 'job', which is used in Hadoop 1.x. In YARN an application can be a single job or a Direct Acyclic Graph(DAG) of jobs, and an application does not essentially have to be a MapReduce type. An application is an instance of a certain application type, which associated with an Application Master. For each application an ApplicationMaster instance will be initiated.
YARN components includes;
Container
Container is a place where a unit of work occurs. For instance each MapReduce task(not the entire job) runs in one container. An application/job will run on one or more containers. Set of physical resources are allocated for each container, currently CPU core and RAM are supported. Each node in a Hadoop cluster can run several containers.
Global Resource Manager
Resource Manager consists of two main elements;
The Application Master does several jobs;
You may have noticed, Global Resource Manager could be a single point of failure. After its 2.4 release, Hadoop introduced the high availability Resource Manager concept, having Active/Standby ResourceManager pair to remove this single point of failure. you can read more about it from this link.
Per-node Node Manager
Node Manager is the component which actually provisions resources to applications. NodeManager daemon is a slave service which runs on each computation node in a Hadoop cluster.
NodeManager takes resource requests from ResourceManager and provisions Containers to applications. NodeManager keeps track of the health of each node and reports to ResourceManager periodically and that way ResourceManager can keep track of global health.
During each node startup, they registers with the ResourceManager and let ResourceManager know the amount of resources available. These informations are updated periodically.
NodeManager manages resources and periodically reports to ResourceManager about node status, it does not know anything about application status. Applications are handled by a different component called ApplicationMaster, which we are going to discuss next.
Per-application Application Manager
ApplicationMaster negotiates resource containers which are required to execute the application from ResourceManager and obtain resources from NodeManager and executes application. For each application which is actually an instance of a certain application type, an ApplicationManager instance is initiated.
In Hadoop 1.x when a task fails the JobTracker is responsible to re-execute that task, this increases the load of JobTracker and reduce its scalability.
In Hadoop 2.x ResourceManager is only accountable for scheduling resources. ApplicationMaster is responsible for negotiating resource containers from ResourceManager, if a task fails ApplicationMaster negotiates resources from ResourceManager and tries to re-execute the failed task.
Hadoop 1.x only supports MapReduce type of jobs as its design is tightly coupled to solve MapReduce type of computations. In contrast Hadoop 2.x has a more pluggable architecture and supports new frameworks which uses HDFS underneath. A new framework can be plugged in and play with Hadoop framework by developing its ApplicationMaster.
In Hadoop 1.x JobTracker serves two major functions;
- Resource management
- Job scheduling/Job monitoring
YARN tries to separate above mentioned two functionalities into two daemons.
- Global Resource Manager
- Per-application Application Master
Yarn Architecture Overview
Yarn Components
As mentioned earlier, with the architectural refurbishment, YARN introduces a whole new set of terminologies which we have to be familiar with.YARN uses the term 'application' instead of the term 'job', which is used in Hadoop 1.x. In YARN an application can be a single job or a Direct Acyclic Graph(DAG) of jobs, and an application does not essentially have to be a MapReduce type. An application is an instance of a certain application type, which associated with an Application Master. For each application an ApplicationMaster instance will be initiated.
YARN components includes;
- Container
- Global Resource Manager
- Per-node Node Manager
- Per-application Application Master
Container
Container is a place where a unit of work occurs. For instance each MapReduce task(not the entire job) runs in one container. An application/job will run on one or more containers. Set of physical resources are allocated for each container, currently CPU core and RAM are supported. Each node in a Hadoop cluster can run several containers.
Global Resource Manager
Resource Manager consists of two main elements;
- Scheduler
- Application Manager
The Application Master does several jobs;
- Accepts job submissions by client programs.
- Negotiate the first container to execute per-application Application Master.
- Provide the service to restart a failed Application Master container.
You may have noticed, Global Resource Manager could be a single point of failure. After its 2.4 release, Hadoop introduced the high availability Resource Manager concept, having Active/Standby ResourceManager pair to remove this single point of failure. you can read more about it from this link.
Per-node Node Manager
Node Manager is the component which actually provisions resources to applications. NodeManager daemon is a slave service which runs on each computation node in a Hadoop cluster.
NodeManager takes resource requests from ResourceManager and provisions Containers to applications. NodeManager keeps track of the health of each node and reports to ResourceManager periodically and that way ResourceManager can keep track of global health.
During each node startup, they registers with the ResourceManager and let ResourceManager know the amount of resources available. These informations are updated periodically.
NodeManager manages resources and periodically reports to ResourceManager about node status, it does not know anything about application status. Applications are handled by a different component called ApplicationMaster, which we are going to discuss next.
Per-application Application Manager
ApplicationMaster negotiates resource containers which are required to execute the application from ResourceManager and obtain resources from NodeManager and executes application. For each application which is actually an instance of a certain application type, an ApplicationManager instance is initiated.
In Hadoop 1.x when a task fails the JobTracker is responsible to re-execute that task, this increases the load of JobTracker and reduce its scalability.
In Hadoop 2.x ResourceManager is only accountable for scheduling resources. ApplicationMaster is responsible for negotiating resource containers from ResourceManager, if a task fails ApplicationMaster negotiates resources from ResourceManager and tries to re-execute the failed task.
Hadoop 1.x only supports MapReduce type of jobs as its design is tightly coupled to solve MapReduce type of computations. In contrast Hadoop 2.x has a more pluggable architecture and supports new frameworks which uses HDFS underneath. A new framework can be plugged in and play with Hadoop framework by developing its ApplicationMaster.