全部文档
当前文档

暂无内容

如果没有找到您期望的内容,请尝试其他搜索词

文档中心

Hadoop Streaming 构建MR作业

最近更新时间:2024-01-16 16:40:46

1.Hadoop Streaming允许用户使用可执行的命令或者脚本作为mapper和reducer。以下用几个示例说明Hadoop Streaming如何使用。 详细可参考Hadoop官网Hadoop Streaming

2.使用python脚本作为mapper和reducer。

mapper.py

#!/usr/bin/env python
import sys
for line in sys.stdin:
    line = line.strip()
    words = line.split()
    for word in words:
        print "%s\t%s" % (word, 1)

reducer.py

#!/usr/bin/env python
from operator import itemgetter
import sys
 
current_word = None
current_count = 0
word = None
 
for line in sys.stdin:
    line = line.strip()
    word, count = line.split('\t', 1)
    try:
        count = int(count)
    except ValueError:
        continue
    if current_word == word:
        current_count += count
    else:
        if current_word:
            print "%s\t%s" % (current_word, current_count)
        current_count = count
        current_word = word
 
if word == current_word:
    print "%s\t%s" % (current_word, current_count)

3.命令行执行streaming作业。

chmod +x mapper.py 
chmod +x reducer.py 
hadoop jar /usr/hdp/2.6.1.0-129/hadoop-mapreduce/hadoop-streaming-2.7.3.2.6.1.0-129.jar -input /user/hadoop/examples/input/ -output /user/hadoop/examples/output2 -mapper mapper.py -reducer reducer.py -file mapper.py  -file reducer.py

文档导读
纯净模式常规模式

纯净模式

点击可全屏预览文档内容
文档反馈