What is Hadoop Streaming?

Ans : Hadoop streaming is a powerful utility which comes with Hadoop distribution.The basic concept of Hadoop framework is to split the job,process it in parallel and then join it back to get the end result.So there are two main components involved in this framework.
a) Map application
b) Reduce application

The Hadoop streaming utility allows you to write Map/Reduce applications in any language that is capable of working with STDIN and STDOUT.

Please follow and like us:
Tagged on:

One thought on “What is Hadoop Streaming?

  1. Dingcheng Li

    I read your introduction article about hadoop streaming. I found it really helpful. But I have more questions about how to use it.

    One main question I want to ask is if my perl script needs more than one argument, how can I pass them to the command line?

    For example, I used the following command, where I used multiple inputs to handle multiple arguments. But in fact, the data input is just the first one. All others are just some resources the perl script needs to read in to help process the first data input.

    hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -D mapred.reduce.tasks=0 -D mapred.map.tasks.speculative.execution=false -D mapred.task.timeout=12000000 -input nlp_research/edt_nlp_data/3000001.txt -input shift.txt -input lists -input dict -input nlp_research/deid-1.1/deid.config -inputformat org.apache.hadoop.mapred.lib.NLineInputFormat -output perl_output -mapper deid_mapper.pl -file deid_mapper.pl

    If you can give me some guidance, that would be great!

Leave a Reply

Your email address will not be published. Required fields are marked *

3 × = 21

============================================= ============================================== Buy TechAlpine Books on Amazon
============================================== ----------------------------------------------------------------

Enjoy this blog? Please spread the word :)

Follow by Email