Q : What is a Namenode?
A : Namenode is the master node on which Resource manager runs.
It contains metadata about data present in datanodes. It maintains and manages the blocks which are present on the datanodes. It is a high-availability machine and single point of failure in HDFS.
Q : Is Secondary Name node a hot backup for Name node ?
A : No. Secondary Name node not a hot backup for name node. It gets updated with name node information every 1 hour and can provide the information in case of a name node failure, But it can’t replace the name node.
Q : What is the default replication factor of blocks in HDFS ?
A : Default replication factor is 3.
Q : What are the daemons running on a slave node in a multinode cluster ?
A : Each slave has datanode service running for HDFS and node manager running for processing.
Q : How HDFS federation help in scaling HDFS horizontally ?
A : HDFS Federation uses multiple independent name nodes. The name nodes are independent of each other.
Q : What is the format of input to a mapper task ?
A : Mapper accepts (key,value) as input.
Q : What are Combiners ?
A : Combiners work as semi-reducers. They work on mapper output and help performance by reducing amount of data shared across the network from mappers to reducers.
Q : Can reducer logic be used for Combiners ?
A : Yes, reducer logic can be used for combiners as long as the logic follows commutative and associative properties.
For example, addition is commutative, so same logic can be used.
However, average is not commutative.. so, the combiner logic will have to different from reducer logic.
Q : Which design patterns handle reading/writing data outside Hadoop ?
A : The external source input pattern reads data that is outside of Hadoop and HDFS, such as from a SQL database or a web service. External source output pattern writes data to a system outside Hadoop and HDFS.
Q: What is an Input Split ?
A : An Input Split is a chunk of data that is processed by a single Mapper.
Q : What is Apache Oozie ?
A : Apache Oozie is a system for running dependent jobs. It has 2 main parts : a workflow engine that runs different types of hadoop jobs like map reduce, pig, hive etc and a coordination engine that runs jobs based on predefined schedules.
Q : What is Sqoop ?
A : Sqoop is used to load data from RDBMS to HDFS.
Q: What is Flume ?
A : Flume supports collecting, aggregating and transferring huge streaming data to HDFS.