Cov yam ntxwv siab heev Hadoop MapReduce no zoo li cas?

Cov MapReduce programming sau qhia txog cov hauj lwm txaus kom paub meej. Tab sis nws tsis them nqi ua hauj lwm kom paub meej hauv lub MapReduce programming moj khaum. Qhov tsab xov xwm no yuav piav txog qhov ntaub ntawv no mus txog lub MapReduce architecture thiab qhov API hu siv ua lub txoos ua. Peb yuav tau sib tham txog cov hom kev kawm customization thiab ua overriding rau daim ntawv thov kev pab.

Tus siab heev MapReduce nta piav qhov tiav thiab txo qib lus. Nyob zoo MapReduce programming, tsuas paub APIs thiab lawv pab no txaus sau ntawv. Tiam sis txhawb puab kev MapReduce yog ib yuav tsum to taub txog tus nqi ua hauj lwm txhawb thiab tau tej qoob loo.

Tam sim no peb tham txawm nta nyob rau khej no.

Hom kev cai (Cov ntaub ntawv): Rau cov neeg siv yog muab Mapper thiab cov Reducer, Hadoop MapReduce moj khaum ib txwm siv ntaus ntawv. Cov ntaub ntawv uas yuav kis tau los ntawm Mappers thiab Reducers yog muab rau lub Java.

  • Writable Interface: Cov Writable interface yog ib qhov tseem ceeb tshaj interfaces. Cov khoom uas yuav marshaled mus/los ntawm ntaub ntawv thiab siv network no interface. Hadoop kuj siv cov interface no yuav kis tau cov ntaub ntawv nyob rau hauv ib daim ntawv serialized. Ib co ntawm cov hoob uas implements Writable interface yog hais nram qab no
  1. Kawm ntawv(Txoj hlua ntaub stores nws)
  2. LongWritable
  3. FloatWritable
  4. IntWritable
  5. BooleanWritable

Hom ntaub ntawv kev cai kuj tau ua los ntawm kev siv lub tswv yim Writable interface. Hadoop muaj cuab kav kev kis kev cai ntawm cov ntaub ntawv twg (koj yuav tsum tau fits uas) ntawd qiv Writable interface.

Hauv qab no yog cov Writable interface uas muaj ob txoj kev readFields thiab sau. Tus thawj txoj kev (readFields) initializes cov ntaub ntawv ntawm tus kwv los ntawm cov ntaub ntawv uas muaj nyob hauv cov ' hauv’ kwj binary. Qhov thib ob hom (sau ntawv) yog siv los sis cov nyom dej binary 'tawm'. Daim ntawv cog lus tseem ceeb ntawm txoj kev uas qhov kev txiav txim txog kev nyeem thiab sau rau dej binary tib.

Listing1: Qhia Writable interface

pej xeem interface Writable {

readFields tsis muaj dabtsis(DataInput nyob rau hauv);

khoob lug sau(DataOutput tawm);

}

Hom kev cai (Qhov tseem ceeb): Nyob rau hauv dhau los peb muaj sab laj hais txog qhov kev cai ntaub ntawv kom raws li qhov yuav tsum tau muab cov ntaub ntawv thov. Nws tswj tus nqi ib sab xwb. Tam sim no peb yuav tau sab laj hais txog txoj kev cai tseem ceeb. Nyob rau hauv Hadoop MapReduce, lub Reducer processes tus yuam kom sorted. Thiaj tseem ceeb lub hom kev cai yuav tsum mus siv cov interface hu ua WritableComparable. Cov kev tseem ceeb yuav tsum tau siv hashCode ().

Hauv qab no uas yog qhia WritableComparable interface. Nws nruab nrab yog ib Writable Nws kuj yog Nyob rau lub.

Listing2: Qhia WritableComparable interface

pej xeem interface WritableComparable<T>

extends Writable, Nyob rau lub<T>

Siv hom nchuav: Peb tau qhia txog tus nqi kev cai thiab tej yam tseem ceeb uas yuav tau tiav los ntawm Hadoop. Tam sim no peb yuav tham txog qhov mechanism kom Hadoop yuav nkag siab. Tus tsav haus dejcawv JobConf (uas nyiaj thiab lub sij rau txoj hauj lwm) muaj ob txoj kev hu li cas setOutputKeyClass () thiab setOutputValueClass () thiab cov kev yuav siv los tswj cov nqi thiab cov ntaub ntawv yam. Yog tus Mapper ua ntau yam uas tsis yog raws li Reducer ces tus JobConf setMapOutputKeyClass () thiab setMapOutputValueClass () txoj kev yuav siv los muab input cas ua tau ib leej twg ntawm lub Reducer.

Kev xav: Lub neej sorting kev ntawd no kuj nce mentsis qeeb qeeb li thawj zaug nws nyeem hom kab mob seb yam tseem ceeb los ntawm ib tug kwj ces dej byte parse (siv readFields() txujci) ces thaum kawg hu tuaj rau tus compareTo () peb cov tseem ceeb hauv chav kawm ntawv. Cov kev ceev puas yuav txiav txim siab seb ib ordering ntawm lub hauv lub lag luam los ntawm khij tus byte ntws tsis parsing rau tag nrho cov ntaub ntawv txheej. Kom muaj kev no xav sib piv mechanism, WritableComparator chav kawm ntawv tau muab txuas nrog lub comparator kev rau koj cov ntaub ntawv yam. Hauv qab no yog cov chav kawm ntawv tshaj tawm.

Listing3: Qhia WritableComparator hoob

tsev kawm WritableComparator

extends nruas

RawComparator qiv

Kev cai no, cov ntaub ntawv thiab cov hom kheev siv dua cov qauv hauv lub moj khaum Hadoop. Tswv yim Hadoop daim ntawv kev cai ntaub ntawv yam yog ib qhov tseem ceeb tshaj plaws yuav tsum. Yog li no feature pub siv hom kev cai writable hab npaaj txhim kho kev kawm tseem ceeb.

Tawm tswv yim input: Tus InputFormat yog ib qhov tseem ceeb tshaj interfaces uas txhais tau cov specification ib txoj hauj lwm MapReduce input. Hadoop muaj ntau hom InputFormat kev txhais lus uas kheev muaj ntau hom ntaub ntawv input. Tshaj ib qho thiab vim yog TextInputFormat uas yuav muab siv los nyeem cov kab los ntawm ib cov ntawv luam ntaub ntawv. Piv SequenceFileInputFormat yog siv los nyeem binary tej ntaub ntawv tawm tswv yim.

Kho ua hauj lwm hauv InputFormat yog mus nyeem cov ntaub ntawv los ntawm cov ntaub ntawv input. Teg nchuav InputFormat puas tseem yuav tau as per rau koj daim ntawv thov yuav. Rau lub neej ntawd TextInputFormat yuav siv lub ntsiab yog rau byte offset ntawm cov kab thiab tus nqi yog cov ntsiab lus ntawm txoj haujlwm los ntawm kab ' n’ cim. Rau cov kev cai siv, tus separator yuav ua tej lub cim thiab lub InputFormat yuav parse faj.

Nrhiav lwm yam hauj lwm uas InputFormat yog phua input ntawv (cov ntaub ntawv tau qhov twg los) ua tawg tsam uas muaj lub tswv yim mus map paub tab. Cov tawg tsam/tsob yog encapsulated nyob rau hauv cov kev zoo muaj InputSplit interface. Yog cov ntaub ntawv input qhov yuav muaj tej yam zoo li ib lub rooj database, xml tej ntaub ntawv los yog tej ntaub ntawv. Thiaj yuav muab tau cov split raws li qhov yuav tsum tau ua daim ntawv thov. Kis uas tseem ceeb tshaj yog tias cov lag luam split yuav tsum ceev thiab pheej yig.

Tom qab splitting qhov ntaub ntawv, nyeem ntawv yog ib qho tseem ceeb heev zov me nyuam los ntawm ib tug neeg tawg. Cov RecordReader no tau nyeem cov ntaub ntawv los ntawm tus tsob. Tus RecordReader yuav tsum tau npaum txaus lis Disease fact uas tus tsob tsis yeej xaus ntxiab kawg ntawm ib txoj kab. Tus RecordReader Nco ntsoov nyeem txog rau thaum xaus rau ntawm kab txawm yog nws crosses theoretical tag ib split. Tseem ceeb heev uas yuav tsum tsis txhob nco txog cov ntaub ntawv uas yuav tau tso rau ib thaj tsam hauv InputSplit no feature.

  • Kev cai InputFormat: Nyob rau hauv daim ntaub ntawv yooj yim InputFormat yog siv ncaj nraim. Tab sis kev nyeem cov kev cai tshaj ces yog kom subclass FileInputFormat. Kawm paub daws teeb no muab functionalities muab ntaub ntawv as per rau daim ntawv thov yuav tsum tau. Rau kev cai parsing, tus getRecordReader () txoj kev yuav tsum tau overridden uas rov thaj ntawm RecordReader. Qhov no RecordReader yog tus nyeem ntawv thiab parsing.
  • Lwm qhov chaw (Cov ntaub ntawv): Lub InputFormat piav ob yam, ua ntej tshaj yog cov qhia txog tej ntaub ntawv rua cov Mapper thiab cov thib ob yog cov ntaub ntawv no los. Feem ntau ntawm cov implementations uas nyob rau lub FileInputFormat, qhov twg cov ntaub ntawv los yog ntaub ntawv hauv zos qhov system of HDFS (Hadoop muab theej thiab faib cov ntaub ntawv kaw lus).Tab sis rau lwm yam kev pab cov ntaub ntawv, yog yuav tsum tau siv kev cai ntawm InputFormat. Piv txwv, Muab NoSQL database xws li HBase TableInputFormat kev nyeem cov ntaub ntawv los ntawm cov ntxhuav database. Cov ntaub ntawv no qhov yuav tau txhua yam uas tau muaj leej twg yuav siv kev cai li.

Rau cov zis tawm tswv yim: Tus OutputFormat yog them rau kev zov me nyuam sau. Peb tau qhia txog kev uas InputFormat thiab RecordReader interfaces yuav tau nyeem cov ntaub ntawv mus rau hauv qhov kev pab cuam MapReduce. Tom qab xyuas cov ntaub ntawv, lub lag luam sau rau lub cia li yuav tswj tau los OutputFormat thiab RecordWriter interfaces. Cov hom ntawv vim yog TextOutputFormat uas sau tus yuam sij/tus nqi officers yog cov hlua kom cov ntaub ntawv rau cov zis. Yog cov lwm tso zis ntau hom SequenceFileOutputFormat thiab nws yuav cov ntaub ntawv nyob rau hauv daim ntawv binary. Tag nrho cov hoob siv sau ntawv () thiab readFields () txoj kev Writable hoob.

Tus OutputFormat yuav siv yuav tsum tau mus Mekas yuav sau cov ntaub ntawv ntawm ib hom kev cai. Tus FileOutputFormat kawm paub daws teeb yuav tsum tau ncua mus ua tus customization. Tus JobConf.setOutputFormat () txoj kev yuav tsum muab hloov rau kev siv ntau hom kev cai.

Cov ntaub ntawv Partitioning: Partitioning yuav tau txhais tias yog ib tug txheej txheem uas txiav txim uas Reducer lom yuav tau txais yam intermediate yuam sij/tus khub. Txhua tus Mapper yuav tsum paub cov lo lus uas peb Reducer rau cov qhov tso zis yuam sij/tus nqi officers. Kis uas tseem ceeb tshaj yog uas rau tej qhov tseem ceeb tsis hais txog lawv cov Mapper lom, cov lo lus uas peb muab faib yog tib yam. Rau kev kawm ntawv vim li cas Mappers yeej tsis qhia sib rau lub muab faib rau ib tug yuam.

Tus Partitioner interface siv lub kaw lus Hadoop kom paub cov lo lus uas peb muab faib rau ib tug yuam sij/tus khub. Pes tsawg tus partitions yuav phim nrog cov txo tej kev pab raws qib. Lub moj khaum MapReduce thiab txiav txim seb cov partitions thaum pib ib txoj hauj lwm.

Hauv qab no yog cov Partitioner interface.

Qhia 4: Qhia Partitioner interface

pej xeem interface Partitioner<K2, V2>

extends JobConfigurable

Xaus: Hauv kev sib tham no peb muaj kev pab qhov tseem ceeb tshaj Hadoop MapReduce nta. Yog yuav pab tau rau cov laj thawj customization cov nta. Nyob rau hauv lub tswv yim kev siv MapReduce, lub neej ntawd teg APIs tsis muaj ntau pab. Theej, cov yam ntxwv kev cai (uas yog raws cov tsau APIs) muaj tej yam teeb meem loj. Tag nrho cov customizations yuav ua tau yooj yim thaum muaj cov ntsiab lus tseeb. Cia siab yuav qhov tsab xov xwm no yuav pab kom koj to taub cov yam ntxwv siab heev thiab lawv siv.

 

============================================= ============================================== Yuav zoo TechAlpine phau ntawv rau Amazon
============================================== ---------------------------------------------------------------- electrician ct chestnutelectric
error

Txaus siab rau qhov blog? Tshaj tawm lus thov :)

Follow by Email
LinkedIn
LinkedIn
Share