Cov Hadoop MapReduce ntsiab yog dab tsi?

Dab tsi ua koj txhais los ntawm daim ntawv qhia kom programming?

MapReduce yog ib tug programming qauv rua cov zauv loj tagnrho cov ntaub ntawv rau mus tib seem los mus faib rau cov chaw ua hauj lwm rau hauv ib pawg ntawm cov neeg sab nraud paub tab.

Cov MapReduce programming qauv inspired tej lus thiab lub hom phaj ntaub ntawv-muab pawg. Cov ntaub ntawv input hom ntawv no daim ntawv qhia tseeb, thiab yog sau tseg qhia los ntawm tus neeg siv. Qhov zis yog ib txheej <qhov tseem ceeb,tus nqi> officers. Tus neeg siv expresses ib algorithm siv zog ob, Daim ntawv qhia thiab txo qhov uas. Qhov kev ua daim ntawv qhia yog ntaub ntawv los ntawm cov ntaub ntawv input thiab ua ib daim ntawv teev txog nrab <qhov tseem ceeb,tus nqi> officers. Cov txo tej kev ua yog koj yuav tag nrho intermediate officers nrog tus yuam sij tib. Nws xws li tej lub hom phiaj qee yam phiajcim kev zov me nyuam thiab ua voj los yog ntau tshaj rau cov zis officers. Thaum kawg, rau cov zis officers yog sorted los ntawm lawv cov nqi tseem ceeb. Nyob rau hauv daim ntawv ntawm cov kev pab cuam MapReduce nyuaj, tus programmer qhia xwb rau hauv daim ntawv qhia ua haujlwm. Tag nrho lwm functionality, nrog rau lub grouping ntawm cov officers intermediate uas muaj qhov tib qhov tseem ceeb thiab cov lus kawg sorting, no yog muab los ntawm lub runtime.

MapReduce qauv theem

Theem saum chav ua hauj lwm hauv MapReduce yog ib txoj hauj lwm. Ib txoj hauj lwm mas muaj ib daim ntawv qhia thiab ib tug txo tej theem, tab sis yog tus txo tej theem yuav tsum rho. Piv txwv, xav txog ib tug MapReduce txoj hauj lwm uas thiaj suav tau ceeb tsawg zaus ib lo lus siv thoob ib pawg ntawm cov ntaub ntawv. Hauv daim ntawv qhia theem thiaj suav ceeb lus hauv ib daim ntawv, ces cov txo tej theem aggregates rau ib daim ntawv ua lus Suav ib qhov sau los ua tag nrho.

Nyob hauv daim ntawv qhia theem, cov ntaub ntawv input faib ua input tawg rau kev tsom xam los ntawm daim ntawv qhia paub tab si laim rau mus tib seem nyob rau Hadoop pawg. Yog vim, lub moj khaum MapReduce tau txais ntaub ntawv input ntawm Hadoop muab theej thiab faib cov ntaub ntawv uas (HDFS).

Cov txo tej theem siv tau los ntawm daim ntawv qhia kev pab raws qib raws li cov tswv yim los mus tib seem txheej kom paub tab. Cov kev pab raws qib txo tej ntsaub cov ntaub ntawv rau hauv cov ntsiab lus kawg. Yog vim, lub moj khaum MapReduce stores ntsuam hauv HDFS.

Txawm tias qhov kev txo tej theem nyob ntawm cov zis los ntawm daim ntawv qhia theem, daim ntawv qhia thiab kom tsis txhob ua yuav tsis tshwm sim los muaj tas. Uas yog, txo kev pab raws qib yuav pib thaum ua hauj lwm daim ntawv qhia tej tub num tsoom sau. Yog tsis tau rau txhua daim ntawv qhia kev pab raws qib kom tiav ua ntej kev txo tej hauj lwm yuav pib.

MapReduce rau tus yuam sij officers koomtxoos. Conceptually, ib txoj hauj lwm MapReduce yuav siv txheej input officers tus yuam sij ua ib pawg ntawm cov zis tus yuam sij officers ua dua cov ntaub ntawv los ntawm daim ntawv qhia thiab thiaj li tso cai. Cov kev pab raws qib daim ntawv qhia ua ib txheej intermediate officers tus yuam sij ua tus tsis paub tab txo tej siv raws li cov tswv yim.

Tus tuav hauv daim ntawv qhia rau cov zis officers xav tsis tau nws. Hauv daim ntawv qhia ua thiab txo tej AUTHORIZED, ib kauj ruam shuffle sorts txhua daim ntawv qhia rau cov zis yaam tseem ceeb nuav nrog tus yuam sij tib yam rau ib tug hluas txo tej tswv yim (qhov tseem ceeb, sau tus nqi) khub, qhov twg lub ' tus nqi’ ib daim ntawv teev txhua yaam tseem ceeb nuav yog muab tus yawm sij tib. Yog li, lub tswv yim rau tus txo tej hauj lwm yeej yog txheej (qhov tseem ceeb, sau tus nqi) officers.

Tab sis yog tus yuam sij officers ib txheej yog homogeneous, cov officers tus yuam sij rau ib kauj ruam xav tsis muaj qhov tib yam. Piv txwv, cov officers tus yuam sij rau cov txheej input (KV1) yuav tau (hlua, hlua) officers, nrog rau daim ntawv qhia theem yuav ua tau (hlua, integer) officers ua intermediate ntsiab (KV2), thiab cov txo tej theem yuav ua tau (integer, hlua) officers rau kev soj ntsuam (KV3).

Tus tuav hauv daim ntawv qhia rau cov zis officers xav tsis tau nws. Hauv daim ntawv qhia ua thiab txo tej AUTHORIZED, ib kauj ruam shuffle sorts txhua daim ntawv qhia rau cov zis yaam tseem ceeb nuav nrog tus yuam sij tib yam rau ib tug hluas txo tej tswv yim (qhov tseem ceeb, sau tus nqi) khub, qhov twg lub ' tus nqi’ ib daim ntawv teev txhua yaam tseem ceeb nuav yog muab tus yawm sij tib. Yog li, lub tswv yim rau tus txo tej hauj lwm yeej yog txheej (qhov tseem ceeb, sau tus nqi) officers.

Piv txwv li ib cov ntsiab lus MapReduce

Ntu qha tau MapReduce tej tswvyim los xam xyuas cov kev tshwm sim ntawm txhua lo lus hauv ntawv nyeem ntaub ntawv txheej.

Lub MapReduce input ntawv faib ua tsob input, thiab tus tsob no ntxiv faib ua tus yuam sij input officers. Hauv qhov ua piv txwv, cov ntaub ntawv txheej input yog cov ntaub ntawv ob, document1 thiab document2. Cov InputFormat subclass divides cov ntaub ntawv teeb ua ib split ib daim ntawv, rau nraud 2 tawg:

Ceeb toom: Lub moj khaum MapReduce divides cov ntaub ntawv txheej input rau hauv hu ua tsob siv lub subclass nkag hauv cov hauj lwm configuration org.apache.hadoop.mapreduce.InputFormat chunks. Tawg yog tsim los ntawm lub chaw hauj lwm neeg thiab nrog cov lus ua muaj nyob rau cov hauj lwm Tracker hauj lwm. Lub JobTracker tau tsim ib daim ntawv qhia ua hauj lwm rau txhua split. Txhua daim ntawv qhia neeg ua hauj lwm siv ib RecordReader uas yog muab los ntawm lub InputFormat subclass txia lub split rau tus yuam sij input officers.

IB (kab nab npawb, ntawv nyeem) tus yuam sij khub generated rau ib kab rau hauv ib daim ntawv input. Qhov kev ua daim ntawv qhia discards cov kab thiab ua ib ib-kab (lo lus, suav) khub rau txhua lo lus nyob hauv cov kab input. Cov txo tej theem ua (lo lus, suav) cov sawv cev rau lo lus aggregated suav nyob tag nrho rau input cov ntawv officers. Muab cov ntaub ntawv input qhia hauv daim ntawv qhia-kom tsis txhob muaj mob rau cov hauj lwm piv txwv yog:

Qhov zis los ntawm daim ntawv qhia theem muaj ntau tus yuam sij officers nrog tus yuam sij tib: Cov ' qib’ thiab ' noj’ daws tuaj ob zaug. Nco qab tias lub moj khaum MapReduce consolidates txhua qhov tseem ceeb uas tus yuam sij tib ua ntej yuav nkag mus kawm rau txo tej theem, li ntawd lub tswv yim los txo qhov uas yuav ua tau (qhov tseem ceeb, qhov tseem ceeb) officers. Yog li no, qhov muaj mob daim ntawv qhia txog ntawm daim ntawv qhia rau cov zis, ntev kom tsis txhob, yuav tau zaum kawg no yog muaj li saum toj no.

Lub neej MapReduce txoj hauj lwm mus los

Nram no yog lub neej mus los uas raug MapReduce hauj lwm thiab qhia txog cov neeg lam. Daim ntawv qhia txog lub neej nws yog li no peb yuav mloog zoo rau lub Cheebtsam thawj txoj ntau.

Cov Hadoop configuration yuav ua yam tiam sis tus sau configuration muaj qab.

  • Ib qho ntawm npaj khiav hauj lwm Tracker
  • Ntau tus neeg muab ntshav khiav hauj lwm Tracker

Nram qab no yog lub Cheebtsam txoj sia nyob ntawm MapReduce txoj hauj lwm.

  • Tus thov kev pab hauv txoj hauj lwm: Chaw hauj lwm kev npaj cov hauj lwm rau kev cuav thiab ob txhais tes nws tawm mus rau cov hauj lwm Tracker.
  • Txoj hauj lwm Tracker: Cov hauj lwm Tracker schedules hauj lwm thiab distributes daim ntawv qhia ua hauj lwm ntawm cov neeg ua hauj lwm Trackers rau thaum uas tig mus ua.
  • Neeg ua hauj lwm Tracker: Txhua tus neeg ua hauj lwm Tracker spawns ua hauj lwm daim ntawv qhia ib. Cov hauj lwm Tracker tau txais kev kawm tau ntaub ntawv los ntawm cov neeg ua hauj lwm Trackers.

Thaum tau daim ntawv qhia no muaj nyob, cov hauj lwm Tracker distributes txo tej haujlwm ntawm cov neeg ua hauj lwm Trackers rau thaum uas tig mus ua.

Txhua tus neeg ua hauj lwm Tracker spawns tus kom tsis txhob ua hauj lwm ua lub chaw ua hauj lwm. Cov hauj lwm Tracker tau txais kev kawm tau ntaub ntawv los ntawm cov neeg ua hauj lwm Trackers.

Txhua daim ntawv qhia paub tab tsis tau kom tiav ua ntej yuav txo tau kev pab raws qib pib khiav. Txo kev pab raws qib yuav pib thaum daim ntawv qhia kev pab raws qib pib sau. Yog li, hauv daim ntawv qhia thiab txo qhov uas cov kauj ruam heev tshaj.

Functionality ntawm nyias lub Cheebtsam hauv txoj hauj lwm MapReduce

Nyob hauv kev pab nrhiav haujlwm: Txoj hauj lwm nyob hauv kev pab tej lub hom phiaj cov kev pab nram qab no raws qib

  • Validates cov hauj lwm configuration
  • Generates tus tsob input. Qhov no yog yeej yuav splitting cov input hauj lwm rau hauv chunks
  • Cov ntaub ntawv cov chaw muab kev pab nrhiav haujlwm (configuration, txoj hauj lwm THAWV ntawv, tsob input) rau ib qhov chaw sib, xws li ib tug HDFS directory, nws nyob qhov twg puas siv tau rau cov hauj lwm Tracker thiab Trackers ua hauj lwm
  • Siv rau qhov hauj lwm Tracker hauj lwm

Txoj hauj lwm Tracker: Txoj haujlwm Tracker tej lub hom phiaj cov paub tab li nram no

  • Cov tswv yim fetches splits ntawm tus sis qhov chaw uas cov hauj lwm nyob hauv kev pab muab cov lus qhia
  • Tsim ib daim ntawv qhia ua hauj lwm rau txhua split
  • Assigns ua hauj lwm daim ntawv qhia txhua rau ib tug neeg ua hauj lwm Tracker (neeg ua hauj lwm ntawm)

Tom qab ua hauj lwm daim ntawv qhia no yuav tag, Tracker txoj hauj lwm yog cov paub tab li nram no

  • Ib qho kev pab kom paub tab uas ze tshaj plaws rau enabled ntawm tus hauj lwm configuration.
  • Assigns txhua daim ntawv qhia raug muab faib rau ib tug txo tej hauj lwm.
  • Assigns twg txo tej hauj lwm rau ib tug neeg ua hauj lwm Tracker.

Neeg ua hauj lwm Tracker: Ib tug neeg ua hauj lwm Tracker tswj cov kev pab raws qib txog ntawm ib tug neeg ua hauj lwm thiab xa lawv cov hauj lwm Tracker.

Tracker ua hauj lwm yog cov paub tab li nram no thaum twg daim ntawv qhia los yog kom tsis txhob ua hauj lwm yog muab rau nws

  • Fetches hauj lwm cov chaw muab kev pab zos
  • Spawns tus me nyuam muaj JVM rau cov neeg ua hauj lwm ntawm txim tuag hauv daim ntawv qhia no los pab kom cov neeg ua hauj lwm
  • Ntaub ntawv raws li txoj cai rau cov hauj lwm Tracker

Debugging daim ntawv qhia kom tsis txhob

Hadoop yuav cav txog cov koom txoos tseem ceeb thaum tiav qhov kev pab cuam. Yog vim, Cov no yog muab cia hauv lub cav / subdirectory ntawm lub hadoop-version / directory qhov koj khiav Hadoop ntawm. Cav ntaub ntawv muaj npe hu ua hadoop-username-kev-hostname.log. Vaj no rau hauv .log; loj cav muaj cov appended rau lawv. Tus username hauv lub cav filename hais txog tus uas Hadoop yog pib username — qhov no yuav tsis tas rau tib username koj siv khiav cov kev pab cuam. Cov kev pab cuam lub npe hais txog ntawm qhov ntau Hadoop cov kev pab cuam uas yog sau tus cav; tej no yuav ua tau jobtracker, namenode, datanode, secondarynamenode, los yog tasktracker. Tag nrho cov no yog ib qho tseem ceeb rau debugging Hadoop plahaum tseem. Tab sis rau cov kev pab cuam rau ib tug neeg, lub tasktracker cav yuav cov feem ntau yam. Muaj raws li ces muab pov tseg los ntawm koj qhov kev pab cuam yuav muab kaw hauv lub tasktracker cav.

Lub cav directory kuj yuav muaj ib tug subdirectory hu ua userlogs. Ntawm no yog ib lub subdirectory rau txhua cov hauj lwm khiav. Ua hauj lwm txhua rau nws cov stdout thiab stderr rau ntaub ntawv ob tug hauv no directory. Nco ntsoov tias nyob rau ntawm tej Hadoop sawv, cov cav cov tsis centrally aggregated — koj yuav tsum mus kuaj txhua tus TaskNode cav/userlogs/phau qhia lawv cov zis.

Debugging nyob rau hauv qhov chaw ntawd distributed kheej yog qhov nyuab thiab yuav tsum tau logging rau hauv ntau cov cav tov los saib tau cov ntaub ntawv ca. Yog ua tau, cov kev pab cuam yuav tsum chav tsev kuaj cov ciav Hadoop Senior. Lub neej ntawd configuration deployed ntawm Hadoop sau “ib qho lom” hom, qhov twg lub MapReduce tag nrho kev no kuj muaj nyob hauv tus tib yam lom ntawm Java li hu ua JobClient.runJob(). Siv ib lub debugger zoo li dab noj hnub, koj ces yuav tau breakpoints nyob rau hauv daim ntawv qhia hauv() los yog txo() txoj kev uas yuav nrhiav tau koj tus kab.

Yog kom tsum hauj lwm?

Tej hauj lwm ntawv sau tag nrho cov hauj lwm nyob rau hauv daim ntawv qhia theem. CES cov hauj lwm yuav ua tau daim ntawv qhia txoj hauj lwm xwb. Nres ib txoj hauj lwm tom qab ntawm daim ntawv qhia tub num tsoom sau, koj muab cov txo tej kev pab raws qib rau xoom.

Xaus

No module piav tus MapReduce tso platform hauv plawv lub kaw lus Hadoop. Thaum uas siv cov MapReduce, lub teeb parallelism loj tau tau tiav los ntawm kev siv. Lub moj khaum MapReduce muab ib tug teeb kam rau ua txhaum rau kev siv khiav rau nws los ntawm limiting cov kev sib txuas lus uas yuav tshwm sim ntawm cov ntshav siab.

============================================= ============================================== Yuav zoo TechAlpine phau ntawv rau Amazon
============================================== ---------------------------------------------------------------- electrician ct chestnutelectric
error

Txaus siab rau qhov blog? Tshaj tawm lus thov :)

Follow by Email
LinkedIn
LinkedIn
Share