Big Data Analytics with Hadoop 3
上QQ阅读APP看书,第一时间看更新

Starting HDFS

Follow these steps as shown to start HDFS (NameNode and DataNode):

  1. Format the filesystem:
$ ./bin/hdfs namenode -format
  1. Start the NameNode daemon and the DataNode daemon:
$ ./sbin/start-dfs.sh

The Hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).

  1. Browse the web interface for the NameNode; by default it is available at http://localhost:9870/.
  2. Make the HDFS directories required to execute MapReduce jobs:
$ ./bin/hdfs dfs -mkdir /user 
$ ./bin/hdfs dfs -mkdir /user/<username>
  1. When you're done, stop the daemons with the following:
$ ./sbin/stop-dfs.sh
  1. Open a browser to check your local Hadoop, which can be launched in the browser as http://localhost:9870/. The following is what the HDFS installation looks like:
  1. Clicking on the Datanodes tab shows the nodes as shown in the following screenshot:

Figure: Screenshot showing the nodes in the Datanodes tab

  1. Clicking on the logs will show the various logs in your cluster, as shown in the following screenshot:
  1. As shown in the following screenshot, you can also look at the various JVM metrics of your cluster components:
  1. As shown in the following screenshot, you can also check the configuration. This is a good place to look at the entire configuration and all the default settings:
  1. You can also browse the filesystem of your newly installed cluster, as shown in the following screenshot:

Figure: Screenshot showing the Browse Directory and how you can browse the filesystem in you newly installed cluster

At this point, we should all be able to see and use a basic HDFS cluster. But this is just a HDFS filesystem with some directories and files. We also need a job/task scheduling service to actually use the cluster for computational needs rather than just storage.