hadoop ffrydio

Blynyddoedd : Hadoop streaming is a powerful utility which comes with Hadoop distribution.The basic concept of Hadoop framework is to split the job,process it in parallel and then join it back to get the end result.So there are two main components involved in this framework.
a) Map application
b) Reduce application

The Hadoop streaming utility allows you to write Map/Reduce applications in any language that is capable of working with STDIN and STDOUT.

Share on Facebook

Save

Tagged on: Hadoop Streaming

One thought on “What is Hadoop Streaming?”

Dingcheng Li Tachwedd 7, 2015 at 7:28 pm

I read your introduction article about hadoop streaming. I found it really helpful. But I have more questions about how to use it.

One main question I want to ask is if my perl script needs more than one argument, how can I pass them to the command line?

For example, I used the following command, where I used multiple inputs to handle multiple arguments. Ond mewn gwirionedd, mewnbwn data yn unig yw yr un cyntaf. Pawb arall yn rhai adnoddau mae angen i'r sgript perl i ddarllen mewn i helpu prosesu'r mewnbwn data cyntaf.

jar hadoop /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -D mapred.reduce.tasks = 0 -D mapred.map.tasks.speculative.execution = ffug -D mapred.task.timeout = 12,000,000 -input nlp_research /edt_nlp_data/3000001.txt -input shift.txt -input rhestrau -input dict -input nlp_research / deid-1.1 / deid.config -inputformat org.apache.hadoop.mapred.lib.NLineInputFormat -y perl_output -mapper deid_mapper.pl - deid_mapper.pl ffeil

Os gallwch roi rhywfaint o arweiniad i mi, bydd hynny'n gret!

TechAlpine – All About Technology

www.techalpine.com

What is Hadoop Streaming?

One thought on “What is Hadoop Streaming?”

Enjoy this blog? Please spread the word :)