2013年4月8日 星期一

MiCloud Hadoop服務

MiCloud新增加Hadoop服務,透過預載的Hadoop軟件,可以透過幾個簡單的指令,將Hadoop的建置變得簡單...
目前MiCloud提供了五種不同種類的主機,供不同需求的人員使用...



主機開立只需要短短的幾分鐘,開起來之後,看到的是如下畫面...



選用Standard64 + Hadoop 1.0.4版本的Hadoop是安裝在/usr/local/hadoop目錄下...



透過官方文件介紹,1.0.4版本的Hadoop有支援三種啟動方式:
  • Local (Standalone) Mode
  • Pseudo-Distributed Mode
  • Fully-Distributed Mode
下面介紹前面兩種,而第三種方式為Cluster模式,容後再介紹...

透過Standalone模式啟動hadoop

單機啟動,直接執行job,並將產出放置到output資料夾下...
$ mkdir input 
$ cp conf/*.xml input 
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+' 
$ cat output/*




啟動Pseudo-Distributed Operation模式

編輯~/.bashrc,設定環境變數
export JAVA_HOME=/opt/local/java/sun6
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=$JAVA_HOME/lib/tools.jar:.



格式化namenode:
# bin/hadoop namenode -format



啟動所有服務:
# bin/start-all.sh



透過ps檢視服務啟動狀態:






連線JobTracker:http://localhost:50030/




執行內建範例:

[root@bddbe1f4 /usr/local/hadoop]# bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
13/04/08 20:47:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/04/08 20:47:04 WARN snappy.LoadSnappy: Snappy native library not loaded
13/04/08 20:47:04 INFO mapred.FileInputFormat: Total input paths to process : 16
13/04/08 20:47:04 INFO mapred.JobClient: Running job: job_201304082037_0001
13/04/08 20:47:05 INFO mapred.JobClient:  map 0% reduce 0%
13/04/08 20:47:21 INFO mapred.JobClient:  map 12% reduce 0%
13/04/08 20:47:27 INFO mapred.JobClient:  map 25% reduce 0%
13/04/08 20:47:33 INFO mapred.JobClient:  map 37% reduce 0%
13/04/08 20:47:36 INFO mapred.JobClient:  map 37% reduce 8%
13/04/08 20:47:39 INFO mapred.JobClient:  map 50% reduce 8%
13/04/08 20:47:45 INFO mapred.JobClient:  map 62% reduce 12%
13/04/08 20:47:51 INFO mapred.JobClient:  map 75% reduce 20%
13/04/08 20:47:57 INFO mapred.JobClient:  map 87% reduce 20%
13/04/08 20:48:00 INFO mapred.JobClient:  map 87% reduce 25%
13/04/08 20:48:03 INFO mapred.JobClient:  map 100% reduce 25%
13/04/08 20:48:06 INFO mapred.JobClient:  map 100% reduce 29%
13/04/08 20:48:15 INFO mapred.JobClient:  map 100% reduce 100%
13/04/08 20:48:20 INFO mapred.JobClient: Job complete: job_201304082037_0001
13/04/08 20:48:20 INFO mapred.JobClient: Counters: 27
13/04/08 20:48:20 INFO mapred.JobClient:   Job Counters
13/04/08 20:48:20 INFO mapred.JobClient:     Launched reduce tasks=1
13/04/08 20:48:20 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=78254
13/04/08 20:48:20 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/04/08 20:48:20 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/04/08 20:48:20 INFO mapred.JobClient:     Launched map tasks=16
13/04/08 20:48:20 INFO mapred.JobClient:     Data-local map tasks=16
13/04/08 20:48:20 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=52826
13/04/08 20:48:20 INFO mapred.JobClient:   File Input Format Counters
13/04/08 20:48:20 INFO mapred.JobClient:     Bytes Read=26794
13/04/08 20:48:20 INFO mapred.JobClient:   File Output Format Counters
13/04/08 20:48:20 INFO mapred.JobClient:     Bytes Written=180
13/04/08 20:48:20 INFO mapred.JobClient:   FileSystemCounters
13/04/08 20:48:20 INFO mapred.JobClient:     FILE_BYTES_READ=82
13/04/08 20:48:20 INFO mapred.JobClient:     HDFS_BYTES_READ=28516
13/04/08 20:48:20 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=368653
13/04/08 20:48:20 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=180
13/04/08 20:48:20 INFO mapred.JobClient:   Map-Reduce Framework
13/04/08 20:48:20 INFO mapred.JobClient:     Map output materialized bytes=172
13/04/08 20:48:20 INFO mapred.JobClient:     Map input records=758
13/04/08 20:48:20 INFO mapred.JobClient:     Reduce shuffle bytes=166
13/04/08 20:48:20 INFO mapred.JobClient:     Spilled Records=6
13/04/08 20:48:20 INFO mapred.JobClient:     Map output bytes=70
13/04/08 20:48:20 INFO mapred.JobClient:     Total committed heap usage (bytes)=2588475392
13/04/08 20:48:20 INFO mapred.JobClient:     Map input bytes=26794
13/04/08 20:48:20 INFO mapred.JobClient:     Combine input records=3
13/04/08 20:48:20 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1722
13/04/08 20:48:20 INFO mapred.JobClient:     Reduce input records=3
13/04/08 20:48:20 INFO mapred.JobClient:     Reduce input groups=3
13/04/08 20:48:20 INFO mapred.JobClient:     Combine output records=3
13/04/08 20:48:20 INFO mapred.JobClient:     Reduce output records=3
13/04/08 20:48:20 INFO mapred.JobClient:     Map output records=3
13/04/08 20:48:20 INFO mapred.FileInputFormat: Total input paths to process : 1
13/04/08 20:48:20 INFO mapred.JobClient: Running job: job_201304082037_0002
13/04/08 20:48:21 INFO mapred.JobClient:  map 0% reduce 0%
^@13/04/08 20:48:36 INFO mapred.JobClient:  map 100% reduce 0%
13/04/08 20:48:48 INFO mapred.JobClient:  map 100% reduce 100%
13/04/08 20:48:53 INFO mapred.JobClient: Job complete: job_201304082037_0002
13/04/08 20:48:53 INFO mapred.JobClient: Counters: 27
13/04/08 20:48:53 INFO mapred.JobClient:   Job Counters
13/04/08 20:48:53 INFO mapred.JobClient:     Launched reduce tasks=1
13/04/08 20:48:53 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=12631
13/04/08 20:48:53 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/04/08 20:48:53 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/04/08 20:48:53 INFO mapred.JobClient:     Launched map tasks=1
13/04/08 20:48:53 INFO mapred.JobClient:     Data-local map tasks=1
13/04/08 20:48:53 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10034
13/04/08 20:48:53 INFO mapred.JobClient:   File Input Format Counters
13/04/08 20:48:53 INFO mapred.JobClient:     Bytes Read=180
13/04/08 20:48:53 INFO mapred.JobClient:   File Output Format Counters
13/04/08 20:48:53 INFO mapred.JobClient:     Bytes Written=52
13/04/08 20:48:53 INFO mapred.JobClient:   FileSystemCounters
13/04/08 20:48:53 INFO mapred.JobClient:     FILE_BYTES_READ=82
13/04/08 20:48:53 INFO mapred.JobClient:     HDFS_BYTES_READ=295
13/04/08 20:48:53 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=42723
13/04/08 20:48:53 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=52
13/04/08 20:48:53 INFO mapred.JobClient:   Map-Reduce Framework
13/04/08 20:48:53 INFO mapred.JobClient:     Map output materialized bytes=82
13/04/08 20:48:53 INFO mapred.JobClient:     Map input records=3
13/04/08 20:48:53 INFO mapred.JobClient:     Reduce shuffle bytes=0
13/04/08 20:48:53 INFO mapred.JobClient:     Spilled Records=6
13/04/08 20:48:53 INFO mapred.JobClient:     Map output bytes=70
13/04/08 20:48:53 INFO mapred.JobClient:     Total committed heap usage (bytes)=177016832
13/04/08 20:48:53 INFO mapred.JobClient:     Map input bytes=94
13/04/08 20:48:53 INFO mapred.JobClient:     Combine input records=0
13/04/08 20:48:53 INFO mapred.JobClient:     SPLIT_RAW_BYTES=115
13/04/08 20:48:53 INFO mapred.JobClient:     Reduce input records=3
13/04/08 20:48:53 INFO mapred.JobClient:     Reduce input groups=1
13/04/08 20:48:53 INFO mapred.JobClient:     Combine output records=0
13/04/08 20:48:53 INFO mapred.JobClient:     Reduce output records=3
13/04/08 20:48:53 INFO mapred.JobClient:     Map output records=3