Li cas Hadoop tau Streaming xwb?

Txheej txheem cej luam: Hadoop tau streaming yog ib qhov tseem ceeb tshaj plaws company nyob Hadoop tis. Streaming interface txog Hadoop tso cai rau koj sau daim ntawv qhia kom qhov kev pab cuam tej lus uas koj xaiv, uas tau ua hauj lwm nrog cov STDIN thiab STDOUT. Li ntawd, Streaming yuav kuj tau txhais tias yog ib tug generic Hadoop API tsocai rau kom daim ntawv qhia cov kev pab cuam yuav tsum sau tej lus zoo. Nyob rau hauv Mapper no mus kom ze tau tawm tswv yim los ntawm STDIN thiab emit tso zis rau STDOUT. Txawm hais tias Hadoop yog sau rau hauv Java thiab nws yog ib hom lus rau txoj hauj lwm uas tsis muaj daim ntawv qhia kom, Streaming API muab qhov yooj hauj lwm daim ntawv qhia kom sau tej lus.

Nyob rau cov tshooj no kuv yuav tham txog qhov Hadoop Streaming APIs.

Taw qhia: Hadoop tau Streaming yog tus API generic tsocai rau kev sau ntawv Mapper thiab Reduces tej lus. Tab sis lub tswvyim yooj yim tshua zoo li qub. Mappers thiab Reducers tau txais lawv cov tswv yim thiab tso zis rau ntawm stdin thiab stdout li (qhov tseem ceeb, tus nqi) officers. Apache Hadoop siv ntws as per UNIX txheem ntawm koj daim ntawv thov thiab Hadoop lawv. Streaming yog tus zoo haum rau phau ntawv ua. Cov ntaub ntawv no saib yog kab oriented thiab puab le yuam sij/tus khub los 'tab' cim cais. Qhov kev pab nyeem txhua kab thiab processes as per yuav tsum tau.

Ua hauj lwm nrog Hadoop ntws: Nyob hauv txhua txoj hauj lwm daim ntawv qhia kom, peb muaj input thiab tso zis xws li yuam sij/tus nqi officers. Yog tib lub tswvyim tseeb tau Streaming API. Nyob tau Streaming, cov tswv yim thiab tso zis yog ib txwm uas muaj tuaj raws li phau ntawv. Tus (qhov tseem ceeb, tus nqi) officers input muaj teev nyob rau hauv stdin lub Mapper thiab Reducer. Cov ' tab’ ximxoo siv cais qhov tseem ceeb thiab muaj nqis. Qhov kev pab cuam Streaming siv lub ' tab’ ximxoo phua ib kab mus yuam sij/tus khub. Qhov tib txoj kev ntawd yuav mus tso zis ntau lawm. Qhov kev pab cuam Streaming sau lawv cov zis rau tom qab ib yam li hais hauv qab no stdout.

key1 t value1 n

key2 t value2 n

key3 t value3 n

Nyob hauv txoj kev no txhua kab muaj tib tug yuam sij/tus khub. Li ntawd lub tswv yim rau lub reducer no sorted kom tas same daws yuav tau kawm uas mus ib leeg nyob ib sab.

Kev pab nyiaj los sis cov cuab tam yuav siv li Mapper thiab Reducer yog hais tias lawv muaj peev xwm sawv tuav tswv yim hom ntawv ntawv raws li tau piav los saum no. Lwm yam scripts zoo li perl, nab hab sej los bash kuj yuav siv rau qhov hom phiaj, muab tag nrho cov ntshav yuav tsum neeg txhais lus rau kev nkag siab cov lus.

Tiav cov kauj ruam: Hadoop tau Streaming company pub cov tsab ntawv los yog executable ua hauj lwm raws li Mapper/Reducer los lawv yeej ua hauj lwm nrog cov stdin thiab stdout. Ntawm ntu no kuv yuav piav txog cov kauj ruam yuav siv ntawm lub chaw tso dej Streaming. Kuv yuav xav tias ob tug pab coj mus ua hauj lwm raws li Mapper thiab Reducer.

  • Ua ntej, Peb cia saib qhov hais kom ua li nram no khiav tau Streaming txoj hauj lwm. Qhov hais kom ua tsis tau tej nqe lus ces nws yuav ua ntau pab xaiv li hauv qab no.
Streaming command

Streaming hais kom ua

Image1: Uas qhia tau Streaming hais kom ua thiab pab

  • Tam sim no peb xav tias streamMapProgram thiab streamReduceProgram yuav ua hauj lwm raws li Mapper thiab Reducer. Cov kev pab cuam no yuav ua tau scripts, executables los yog muaj lwm yam tivthaiv cuab kav ntawm cov kev tawm tswv yim los ntawm stdin thiab khoom tso zis thaum stdout. Qhov hais kom ua li no yuav ua li cas cov Mapper thiab Reducer sib cam yuav tau tag nrho rau qhov hais kom ua Streaming.
Input Output

Rau cov zis input

Image2: Uas qhia input thiab tso zis sib cam

Nws yog assumed tias cov Mapper thiab Reducer cov kev pab muaj nyob rau hauv tag nrho cov ntshav ua ntej yuav pib tau Streaming hauj lwm.

  • Ua ntej, ua hauj lwm Mapper converts tus tawm tswv yim rau hauv kab thiab muab nws rau lub stdin ntawm tus txheej txheem. Tom qab no cov Mapper yog siv tus txheej txheem tso zis los ntawm stdout thiab converts mus yuam sij/tus khub. Cov no yuam sij/tus nqi officers yog qhov zis sij los ntawm kev ua hauj lwm Mapper. Lub ntsiab yog tus nqi txog thawj ' tab’ cim thiab seem ntxiv ntawm cov kab yog hais raws li tus nqi. Yog tias tsis muaj ' tab’ cim ntawm tug kab tag nrho ces yog suav li suav tau qhov tseem ceeb uas tus nqi li 'thov'.
  • Ib yam txheej txheem ntawd yuav thaum ua hauj lwm reducer sau. Ua ntej nws converts tus yuam sij/tus nqi officers ua kab thiab muab tso rau hauv lub stdin ntawm tus txheej txheem. Ces tus reducer nug tus kab zis los ntawm cov stdout ntawm tus txheej txheem thiab npaj officers yuam sij/tus nqi ua zaum kawg tso zis txo tej txoj hauj lwm. Tus yuam sij/tus nqi yog cais tau ib yam nkaus tom qab thawj ' tab’ cim.

Daim duab hauv qab no qhia tus txheej txheem txaus rau tau Streaming hauj lwm:

Process flow

Txheej txheem txaus

Image3: Txheej txheem streaming txaus

Tsim thiab sib txawv Java API Streaming: Muaj dab tsi tsim sib txawv ntawm qib API MapReduce Java thiab Hadoop tau Streaming. Qhov txawv ntawm qhov kev siv ua mas yog. Rau tus txheem Java API, tus mechanism yog txheej txheem txhua cov ntaub ntawv ib qho zuj zus. Yog li cov lub moj khaum yog DVR yuav hu rau hauv daim ntawv qhia () txujci (hauv koj Mapper) rau txhua cov ntaub ntawv. Tab sis, cov tau Streaming API, daim ntawv qhia kev yuav tswj tus ua cov tswv yim. Nws yuav los nyeem thiab ntaub ntawv ntau hom kab tsis zuj zus li qhov nws yuav tswj tau qhov kev nyeem ntawv mechanism. Nyob Java tib yam yuav muab los siv tab sis nrog kev pab los ntawm ib co lwm mechanism xws li siv cov tsiaj ntawv lom mus ntau yam kab thiab ces txheej txheem nws.

Streaming Commands: Hadoop Streaming API txhawb cov streaming thiab generic hais kom ua kev. Ib co tseem ceeb streaming hais kom ua los ntawm kev xaiv yog li nram no.

#Tsis yog Parameter Yuav tsum tau/yeem Hauj lwm lawm
1 -input directoryname los yog filename Yuav tsum tau Qhov chaw nyob rau Mapper input.
2 -rau cov zis directoryname Yuav tsum tau Qhov chaw tso zis rau Reducer.
3 -mapper executable los yog JavaClassName Yuav tsum tau Mapper Executable.
4 -reducer executable los yog JavaClassName Yuav tsum tau Reducer Executable.
5 -cov ntaub ntawv filename Yeem Ntawd hais rau mapper, reducer, thiab combiner executables muaj nyob rau lub zos laij o.
6 -inputformat JavaClassName Yeem Qhov no nws yog supplied cov chav kawm ntawv uas yuav tsum tau rov qab mus yuam sij/tus nqi officers hoob kawm ntawv. TextInputFormat yog lub neej ntawd tus nqi.
7 -outputformat JavaClassName Yeem Qhov no nws yog supplied cov chav kawm ntawv uas yuav tsum tau rov qab mus yuam sij/tus nqi officers hoob kawm ntawv. TextOutputformat yog lub neej ntawd tus nqi.

Zog ntxiv configuration: Nyob tau streaming hauj lwm configuration ntxiv zog yuav tau hais rau xaiv – D (“-D <khoom>=<tus nqi>”).

  • Raws li tau hais kom ua tau muab hloov chaw temp directory

-D dfs.data.dir=/tmp

  • Raws li tau hais kom ua yuav siv tau los qhia kom meej ntxiv temp Wage zos

-D mapred.local.dir=/tmp/local/streamingjob

  • Raws li tau hais kom ua yuav siv tau los qhia txoj hauj lwm daim ntawv qhia nkaus xwb

-D mapred.reduce.tasks=0

  • Raws li tau hais kom ua yuav siv tau los qhia pes tsawg tus neeg reducers

-D mapred.reduce.tasks=4

  • Raws li tau hais kom ua yuav siv tau los qhia kom meej kab phua xaiv

-D stream.map.output.field.separator=. \

-D stream.num.map.output.key.fields=6

Xaus: Apache Hadoop moj khaum thiab MapReduce programming yog kev lag luam txheem xyuas txog kev loj ntim ntawm cov ntaub ntawv. Lub MapReduce programming moj khaum yuav muab siv ua lub txoos ua thiab siv logic. Java MapReduce API yog qhov option txheem kom sau ntawv rau qhov kev pab cuam MapReduce. Tab sis yog Hadoop Streaming API muab kev xaiv txoj hauj lwm MapReduce sau hauv lwm haiv lus. Qhov no yog ib qhov yooj zoo muaj nyob rau MapReduce programmers muaj kev kawm txawj lwm hom lus apart from Java. Executables tseem tseem yuav siv nrog tau Streaming API mus ua hauj lwm raws li ib txoj hauj lwm MapReduce. Tus mob tsuas yog tias cov kev pab cuam/executable yuav tsum tau muab input ntawm STDIN thiab ua rau cov zis thaum STDOUT.

 

============================================= ============================================== Yuav zoo TechAlpine phau ntawv rau Amazon
============================================== ---------------------------------------------------------------- electrician ct chestnutelectric
error

Txaus siab rau qhov blog? Tshaj tawm lus thov :)

Follow by Email
LinkedIn
LinkedIn
Share