1.Hadoop Streaming允许用户使用可执行的命令或者脚本作为mapper和reducer。以下用几个示例说明Hadoop Streaming如何使用。 详细可参考Hadoop官网Hadoop Streaming。
2.使用python脚本作为mapper和reducer。
mapper.py
#!/usr/bin/env python
import sys
for line in sys.stdin:
line = line.strip()
words = line.split()
for word in words:
print "%s\t%s" % (word, 1)
reducer.py
#!/usr/bin/env python
from operator import itemgetter
import sys
current_word = None
current_count = 0
word = None
for line in sys.stdin:
line = line.strip()
word, count = line.split('\t', 1)
try:
count = int(count)
except ValueError:
continue
if current_word == word:
current_count += count
else:
if current_word:
print "%s\t%s" % (current_word, current_count)
current_count = count
current_word = word
if word == current_word:
print "%s\t%s" % (current_word, current_count)
3.命令行执行streaming作业。
chmod +x mapper.py
chmod +x reducer.py
hadoop jar /usr/hdp/2.6.1.0-129/hadoop-mapreduce/hadoop-streaming-2.7.3.2.6.1.0-129.jar -input /user/hadoop/examples/input/ -output /user/hadoop/examples/output2 -mapper mapper.py -reducer reducer.py -file mapper.py -file reducer.py
文档内容是否对您有帮助?
评价建议不能为空
非常感谢您的反馈,我们会继续努力做到更好!