Qhov tseem ceeb ntawm Hadoop Architecture ntau lawm tav no li cas?

Txheej txheem cej luam:

Hadoop yog lub platform uas yuav luag synonymous uas muaj ntaub ntawv loj. Nws yuav yeej tau tus moj qhib qhov khaum uas ua rau lub cia thiab ua kev clustered-teeb ua khub ntawm ib teev loj. Feem ntau, lub Hadoop architecture yog paub comprise ntawm plaub loj modules, uas yog HDFS (Hadoop Distributed File System), Hadoop Common, XOV PAJ thiab MapReduce. Ib tug ntawm cov modules yog teem caij mus ua tau tej yam kev pab raws qib kev, uas tuaj ua ke ua ib pob rau raws li qhov yuav tau ua cov ntaub ntawv. Lub Hadoop architecture yog ib lub chaw tseem ceeb yuav kawm tau zoo ntau lawm. No architecture muaj ntau tseem ceeb nta uas yuav tau nws tej chaw nyob lwm coj as of tam sim no. Txawm li cas los, tseem muaj ob peb lwm yam uas yuav tau xyuas kom zoo txog qhov kev siv Hadoop. Qhov no txhais tau tias, nws cia li tsis muaj lub cia muab lawv cov ntaub ntawv los yog hauv 24×7 khiav ntawm daim ntaub ntawv, tiam sis tseem li cas nws integrates zuag qhia tag nrho architecture thiab lwm yam cuab yeej ntawm ib enterprise.

Qhov tsab xov xwm no yuav lom zem ntau tham txog Hadoop architecture txhua huvsi nrog rau tus zoo txhua module muaj. Peb yuav kuj tau tej teeb meem zoo hau ntau lawm.

Nram no yog ib tug yooj yim Hadoop architecture daim duab rau 2.0 versions

Hadoop 2.0 architecture

Hadoop 2.0 architecture

Duab 1: Hadoop 2.0 architecture

HDFS Architecture

Raws li twb hais, Hadoop HDFS muaj tseeb yog ib qhov tseem ceeb Cheebtsam ntawm lub moj khaum tag nrho. Nws yog tus module uas yog tasked uas muab ib txhim khu kev qha, ruaj khov thiab distributed cia lawv nyob rau ntau cov ntshav uas nyob rau hauv Hadoop sawv.

Tam sim no, sawv mas muaj ntau cov ntshav uas yog txuas ua ke rau daim ntaub ntawv tiav ib zog. Tag nrho cov ntaub ntawv uas yuav tsum tau muab cia yog thaum xub pib liam sim ua ob peb me chunks paub li blocks. Cov blocks ces faib thiab muab cia nyob ob peb cov ntshav ntawm cov pawg. Qhov no yog lub caij uas Hadoop tej ntaub ntawv uas yog ua lub tsev thiab yuav ua tau tej yam zoo zoo li.

Peb tau saib cov lwm yam ntxwv ntawm HDFS.

Scalable

Vim muaj cov ntaub ntawv distributed lawv architecture, Hadoop cov duab kos thiab qhov nws kom tsis txhob muaj zog ua hauj lwm xws li tshau. Qhov tso cai no yuav muab tau yooj yim sau tseg subsets me me ntawm cov ntaub ntawv yeej, yam Self scalability zoo kawg li. Qhov no tseem yog tus tso kom zoo dua rau cov lag luam, thaum lawv nyuam qhuav ntxiv tau servers linearly, Thaum twg lawv cov ntaub ntawv pom lawm hais tias loj hlob.

Saj zawg zog

Nam ntawm HDFS ib qho heev kev zoo yog nws cov xwm ntawd cia cov ntaub ntawv mas yooj ywm. Thaum qhib tau qhov twg los, Hadoop tau yooj yim tso rau hauv kho vajtse, uas tej yam nqi lawg hlob hlob. Tseem, Hadoop tej ntaub ntawv uas yuav muab txhua yam ntawm cov ntaub ntawv, seb nws yog structured, unstructured, formatted los tseem encoded.

Hadoop txawm zoo rau unstructured tej ntaub ntawv yuav nqi los yog cov koom haum hauv kev txiav txim tau ua cov txheej txheem, tej yam uas muaj suab unheard of ua ntej.

Txhim khu kev qha

Hadoop tej ntaub ntawv uas yog txhaum tiv thaiv, uas txhais tau hais tias cov ntaub ntawv nyob rau hauv HDFS no replicated kom tsawg kawg yog ob tug lwm qhov chaw. Yog li, nyob rau hauv rooj plaub yog lub cev qhuav dej ntawm lub cev los yog ob tug, yog ib lub thoj lawv yuav muaj ib daim qauv ntawm tas nrho koj cov ntaub ntawv. Lawv thiaj mam li allocate workloads mus rau qhov chaw no thiab txhua yam ua haujlwm li qub.

Ntaub ntawv I/O

Tus efficiency ntawm tej ntaub ntawv uas yog nyob li cas nws tej lub hom phiaj ntawm cov haujlwm I/O. Nyob rau hauv HDFS, cov ntaub ntawv teev los ntawm kev tsim ib tug ntawv tshiab thiab yuav sau cov ntaub ntawv muaj. Tom qab no, cov ntaub ntawv no kaw lawm xwb thiab cov ntaub ntawv sau tsis tau yuav deleted lossis hloov lawm. Tab sis, cov ntaub ntawv tshiab yuav tau appended rov qhib cov ntaub ntawv. Kom lub foundation yooj yim ntawm HDFS yog ' xwb sau thiab nyeem ntau’ qauv.

Qhov kev tso kawm thaiv

Nyob rau hauv HDFS, ib cov ntaub ntawv yog thaum lawv tseem sib txuam blocks. Kev muab ib ntu tshiab, NameNode assigns ib daim id thaiv nws thiab ntxiv rau qhov ntaub ntawv. Tom qab no tus thaiv tshiab yog tseem replicated nyob rau hauv ntau DataNodes.

HDFS thaiv qhov kev tso kawm ntawv yog configurable, li ntawd, cov neeg siv cov yuav xyaum ua tej yam txawv cawm tau optimized lub ntsiab. Yog vim, HDFS thaiv qhov kev tso kawm ntawv ncav mus muab txo nqi sau thiab maximize read ntawv, raws li daim phiaj thiab kev cia siab rau. Kom muaj kev no, Thaum twg ib ntu tshiab ntawv ntxiv rau ib tug, tus thawj replica muab tso rau lub qub ntawm cov txawj sau ntawv qhov twg tam sim no. Tom qab no, 2nd thiab koob thib 3 replica positioned rau ob hom o nyob rau hauv ib kem khib. Tam sim no kom tag tus replicas yuav tau kawm across. Tab sis, txoj kev txwv ntawd yog, ib ntawm tsis tau kom ntau lub replica thiab khib ib tsis tau ua rau ntau tshaj ob replicas.

Cov duab hauv qab no qhia tau hais tias ib rooj plaub no raug kev has replica nyob hauv ib lub chaw khib (nej nyob rau sab lus saum toj no)

replica placement

qhov kev tso kawm replica

Image2: Qhia tau hais tias qhov kev tso kawm replica nyob hauv ib lub chaw khib ob

Tub ntxhais tej/Hadoop Hadoop

Ib qho Hadoop muaj cov txheej uas hlauv taws xob los txhawb Hadoop architecture. Cov no yog cov luj yeej APIs pab lwm modules nrog sib tham. Qhov ntawd kuj yog li ib feem tseem ceeb ntawm Hadoop architecture xws li HDFS, MapReduce thiab xov paj. Kev abstraction nyob saum tus lwm hoob nta xws li cov ntaub ntawv uas muab nws, Thiab lwm yam OS.

Xov paj Infrastructure

XOV PAJ, los 'Yli IBnother Resource Negotiator', yog tus module hauv Hadoop uas yog tus tswj cov chaw muab kev pab computational. Zoj, nws allocates CPUs los nco, raws li tus neeg ua hauj lwm uas yog ntawm tes. Tam sim no, Xov paj yog neeg ua los ntawm ob qho loj yam – tus manager saib xyuas kev pab thiab cov neeg saib xyuas ntawm.

  • Kev pab tus nai

Qhov kev pab tus nai, uas yog kuj hu ua tus tswv, muaj ib zaug xwb muaj nyob rau hauv ib pawg thiab sau ob peb pab. Nws yuav khiav uas cov neeg ua hauj lwm uas muaj nyob rau thiab tseem tswj cov chaw muab kev qhia Schedule, uas assigns III..

  • Tus thawj tswj ntawm

Tus neeg saib xyuas ntawm zoo li muab tus neeg ua hauj lwm ntawm lub infrastructure thiab tej zaum mas muaj ntau leej nyob sawv Hadoop. Muaj ntau yam los ntawm pawg no ntawm tswj txhua. Nws tau txog cov kev pab yog li nyob rau hauv tsab ntawv nco thiab vcores (feem CPU cores). Tus manager saib xyuas kev pab utilizes cov chaw muab kev pab los ntawm qhov ntawm tus thawj tswj, Thaum twg yuav tsum tau los khiav hauj lwm tus.

Xov paj Hadoop muaj tej yam sib nrauj ib qho heev kev zoo ua nws ib feem tseem ceeb ntawm cov architecture. Cov no muaj lawm tau teev nyob rau hauv nthuav dav.

Tej tsev

Ib tus Hadoop xov paj tus biggest zoo yog tias nws txhawb dynamic fwm. Txawm muab cov chaw muab kev pab los ntawm cov pawg tib, Nws tseem muaj peev ntawm khiav xyaw ntau thiab workloads. Thiab, ib yam li HDFS, Xov paj los kuj yog ib scalable, uas muaj soj teem dua peevxwm, txawm tus twg workload yuav.

Robustness

Xov paj Hadoop muaj robustness, uas tso cai rau koj mus qhib tau koj cov ntaub ntawv rau ntau yam cuab yeej thiab yees uas yuav pab kom koj tau qhov zoo tshaj plaws hauv cov ntaub ntawv ua. Nws cov ecosystem yog zoo teeb kom tau raws li cov kev tu ncua ntau yam developers thiab kuj koom haum kev teev me me thiab loj.

qhov tseeb, Hadoop thaum nimno tawm nrog rau ntau cov paub txug tej yaam xws li nas muv, MapReduce, Zookeeper, HBase, HCatalog, thiab ntau heev. Tseem, raws li lub tsev khw Hadoop yuav expanding, newer cuab yeej muab rau no suav txhua hnub.

Hauv qab no yog daim duab architecture raug xov paj.

YARN Architecture diagram

Xov paj Architecture daim duab

Image3: Xov paj Architecture daim duab

Moj khaum MapReduce

MapReduce hais tias nws yog lub plawv ntawm lub kaw lus Hadoop. Nws muaj lub moj khaum programming pub rau sau rau ntawm daim ntaub ntawv rau thaum uas tig mus ua kev loj-teeb ua khub muaj nyob ntau 175,000 lossis phav phav servers rau sawv Hadoop.

Cov tswv yim yooj yim qab nws ua hauj lwm yog cov kuas thiab txo cov kev pab raws qib. Saib rau ntawm filtering thiab sorting ntawm cov ntaub ntawv ntawm kev ua daim ntawv qhia yog, Thaum cov txo tej kev ua yog ua tej yam haujlwm kev. MapReduce heev tuaj nrog nws ncaj ncees qhia tawm ntawm kev tseem ceeb uas nyug pub thiab zoo ntau lawm, uas yog

Yooj

MapReduce tau ntaub ntawv cov ntaub ntawv ntawm txhua yam, seb nws yog structured, xyaum ib los unstructured. Qhov no yog ib lub chaw tseem ceeb uas ua nws ib feem tseem ceeb ntawm cov Hadoop nrho architecture.

VR

Ntau hom lus yog txaus siab los ntawm MapReduce, developers mus ua hauj lwm zoo no uas. qhov tseeb, MapReduce muab nyiaj yug Java, Nab hab sej thiab C , thiab tseem tau has lug xws li Apache npua thiab nas muv.

Scalability

Yog ib feem ntawm qhov Hadoop architecture, MapReduce tau lawm txawm los uas nws ntais ntawv cov qib uas muab los ntawm HDFS scalability loj heev. Qhov no saib kuas unlimited cov ntaub ntawv ua, tag nrho hauv ib lub platform tag.

Li cas Hadoop Cheebtsam kuas zoo ntau lawm?

Nyob rau hauv ib cheeb tsam ntau lawm, scalability yog ib cov qauv tseem ceeb rau kev ua hauj lwm zoo. Vim hais tias, Yog hais tias cov ntawv tsis scale (uas sau rau hauv HDFS) thaum lub caij ncov, ces nws yuav tsis tau them kom tag cov neeg muas zaub. Yog li ntawd cov cov lag luam yuav poob nyiaj. Li ntawd, ntawm architectural pom nws tseem ceeb heev scalable cia thiab peevxwm ua, uas Hadoop tau muab cov uas nws cov ntaub ntawv distributed lawv (HDFS).

Cov lwm HDFS yam ntxwv zoo li yooj rau cov yuav txhawb tau txhua yam ntawm cov ntaub ntawv; kev cia siab rau (tiv thaiv txhaum) thaum lub kaw lus cev qhuav dej kuj ntxiv tus nqi rau ib cheeb tsam ntau lawm. Cov ntaub ntawv I/O thiab nris tso kawm mas tseem ceeb heev thaum nws txhawb cov ntaub ntawv los xyuas dua nraaj heev nyob rau hauv ib cheeb tsam clustered. Vim li ntawd peb yuav xaus tias ntau lawm zoo tej ntaub ntawv Hadoop majorly yog nyob li tus HDFS architecture xwb.

Nyob rau hauv ib pawg ntawm raug 4000 o, peb yeej nyob ib ncig ntawm 65 lab ntaub ntawv thiab 80 lab blocks. Muaj ib lub thaiv 3 replicas, li ntawd, txhua tus ntawm yuav tau 60,000 blocks. Qhov no yog ib rooj plaub no raug hauv Yahoo cov ntaub ntawv los xyuas dua. Ces nws muab ib lub tswv yim npav txog clustered ib puag ncig thiab cov ntaub ntawv cia.

Xov paj architecture muab tus npaum tshwj fwm uas yog hauv Hadoop introduces 2.0 architecture. Nws saib kuas zoo fwm nyob ib puas ncig ntau lawm.

Apart from lub Cheebtsam, MapReduce programming pab rau thaum uas tig mus ua cov ntaub ntawv nyob rau hauv ib cheeb tsam distributed. Ces cov tus xav ua yog tiav nyob rau hauv ntau lawm lawv yug zov ntiaj teb tiag.

Xaus

Nws tseem zoo paub tias loj cov ntaub ntawv yog muab rau tus thawj lub sij hawm yavtom ntej nyob rau hauv cov ntaub ntawv ua, thiab nrog lub Hadoop ecosystem nws yog thriving ntawm nimno, Nws kuj xav pom kom tau qhov frontrunner ntawm tus sau. Tag nrho cov ntaub ntawv raws li lwm yam cuab yeej ua lawv txoj kev nrog Hadoop, yuav kom counter ntawm txoj kev sib tw yuav muag near yav tom ntej. Hadoop architecture ua uas tswj cov loj loj tagnrho ntawm cov ntaub ntawv nyob rau hauv ib cheeb tsam distributed. Txhua tivthaiv Hadoop platform yog tsim los siv rau tej yam kev functionalities. Li ntawd, ua ib tug lej nws saib kuas ntau lawm mus tau zoo ntawm tej ntaub ntawv bigdata. Tiam sis peb kuj yuav tsum nco ntsoov hais tias cov kab bigdata yees kuj ua ib tug tseem ceeb nyob rau hauv tsab ntawv teev npe txiag thiab nws zoo nyob rau lub neej tiag scenarios.

============================================= ============================================== Yuav zoo TechAlpine phau ntawv rau Amazon
============================================== ---------------------------------------------------------------- electrician ct chestnutelectric
error

Txaus siab rau qhov blog? Tshaj tawm lus thov :)

Follow by Email
LinkedIn
LinkedIn
Share