什么是Hadoop流?

岁月 : Hadoop的数据流是一个功能强大的工具,它与Hadoop分布。Hadoop框架的基本概念是分裂的工作,并行处理,然后再加入到得到最终result.So,在此框架内,有两个主要组成部分回.
一) 地图应用程序
b) 减少应用程序

在Hadoop流实用程序允许您STDIN和STDOUT的工作,能够在任何语言编写的Map / Reduce应用程序.

Tagged on:

One thought on “什么是Hadoop流?

  1. Dingcheng Li

    I read your introduction article about hadoop streaming. I found it really helpful. But I have more questions about how to use it.

    One main question I want to ask is if my perl script needs more than one argument, how can I pass them to the command line?

    For example, I used the following command, where I used multiple inputs to handle multiple arguments. But in fact, the data input is just the first one. All others are just some resources the perl script needs to read in to help process the first data input.

    hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -D mapred.reduce.tasks=0 -D mapred.map.tasks.speculative.execution=false -D mapred.task.timeout=12000000 -input nlp_research/edt_nlp_data/3000001.txt -input shift.txt -input lists -input dict -input nlp_research/deid-1.1/deid.config -inputformat org.apache.hadoop.mapred.lib.NLineInputFormat -output perl_output -mapper deid_mapper.pl -file deid_mapper.pl

    If you can give me some guidance, that would be great!

============================================= ============================================== 在亚马逊上购买最佳技术书籍,en,电工CT Chestnutelectric,en
============================================== ---------------------------------------------------------------- electrician ct chestnutelectric
error

Enjoy this blog? Please spread the word :)

Follow by Email
LinkedIn
LinkedIn
Share