Hadoop yog zoo li cas faib cov ntaub ntawv kaw lus (HDFS)?

Txheej txheem cej luam: Nyob rau cov tshooj no kuv yuav tham txog HDFS, Nws yog ib lwm lawv tej ntaub ntawv ntawm lub moj khaum Apache Hadoop. Hadoop muab theej thiab faib cov ntaub ntawv kaw lus (HDFS) yog ib qhov chaw cia distributed uas spans nyob hauv radon txhiab. Lub tej ntaub ntawv qhia txog kam rau ua txhaum, throughput npaum, kev ntseeg tau streaming mus saib tej ntaub ntawv thiab cov. Yog tsim rau cov storing loj loj ntim ntawm cov ntaub ntawv thiab nws ua ceev tus architecture ntawm HDFS. HDFS yog ib qho uas Apache eco-lawv.

Taw qhia:

Apache Hadoop yog ib tug moj khaum software yog muab los ntawm lub zej zog qhib tau qhov twg los. Qhov no yuav pab tau rau hauv storing thiab xyuas cov ntaub ntawv poob lawm uas loj teev rau cov nyob hauv kho vajtse. Hadoop tseem raug tso cai raws li tus Apache daim ntawv tso cai 2.0.

Lub moj khaum Apache Hadoop muaj cov nram qab no modules:

  • Hadoop Common – Cov hom module muaj qiv thiab uas yuav tsum tau los ntawm lwm lub modules ntawm Hadoop hlauv.
  • Hadoop muab theej thiab faib cov ntaub ntawv kaw lus (HDFS) – Qhov no yog lub distributed ntaub ntawv-suab uas stores cov ntaub ntawv rau cov cav tov hauv. Qhov no yuav muaj kev ib aggregate kub bandwidth nyob rau sawv.
  • Hadoop xov PAJ – Qhov no yog lub platform fwm uas yog lub luag hauj lwm rau cov tswj txheeb kev pab rau tej pawg ua ke thiab siv rau cov neeg npaj’ kev siv.
  • Hadoop MapReduce – Qhov no yog qhov programming qauv siv rau kev loj teev cov ntaub ntawv ua.

Tag nrho cov kev modules hauv Hadoop yog tsim tau nrog ib tug kho assumption ntawd kho vajtse failures (nws yuav ua tau ib lub tshuab ib los yog tag nrho cov khib) muaj cuab kev thiab li no yuav tsum tau txais leej twg nyob rau hauv tsab ntawv teev npe software lub moj khaum Hadoop. Apache Hadoop HDFS cheebtsam uas Ameslikas derived ntawm Googlentawv MapReduce thiab Google tej ntaub ntawv kaw lus (GFS) feem.

Hadoop muab theej thiab faib cov ntaub ntawv kaw lus (HDFS):

Hadoop Distributed tej ntaub ntawv kaw lus los HDFS yog ib tug thawj distributed cia siv los ntawm qhov kev siv Hadoop. Tus HDFS pawg neeg muaj ib cov NameNode thiab cov DataNode. Lub NameNode tswj cov ntaub ntawv kaw lus metadata thiab DataNodes yuav siv los muab cov ntaub ntawv muaj tseeb.

HDFS Architecture

HDFS Architecture

Duab 1: HDFS Architecture

HDFS architecture daim duab qhia txog cov kev sib tshuam sau cov NameNode, lub DataNodes, thiab cov neeg tau. Tus neeg tivthaiv hu rau NameNode metadata tej ntaub ntawv los yog ntaub ntawv modifications. Tus neeg ces tej lub hom phiaj sij ntawv I/O ua haujlwm ncaj nraim nrog cov DataNodes.

HDFS salient nta: Cov nram no yog ib co ntawm cov yam ntxwv salient uas yuav yuav txaus siab rau cov neeg siv ntau –

  • Hadoop, xws li HDFS, lub zoo meej match distributed cia thiab distributed ua yog siv yig hauv radon. Yog Hadoop scalable, txhaum tiv thaiv thiab yooj yim heev kom nthuav dav mus. MapReduce, Nws yog ib qho zoo paub qhov simplicity thiab applicability thaum ntau txheej distributed daim ntaub ntawv.
  • HDFS yog mas configurable. Lub neej ntawd configuration teeb nws zoo txaus rau feem ntau ntawm cov kev siv lub. Feem ntau, lub neej ntawd configuration xav tuned rau tej pawg ua ke loj heev.
  • Hadoop tim based rau Java platform thiab txaus siab rau yuav luag txhua platforms loj.
  • Hadoop txhawb plhaub thiab lub plhaub zoo li commands sib txuas lus nrog HDFS
  • Cov NameNode thiab cov DataNodes tau lawv tus kheej ua nyob rau hauv lub web servers uas ua kom nws yooj yim mus tshawb xyuas tam sim no raws li txoj cai ntawm cov pawg.
  • Nta tshiab thiab tshiab yog toog DVR hauv HDFS. Daim ntawv hauv qab no yog ib subset ntawm tus nta pab muaj nyob rau hauv HDFS:
    • Cov ntaub ntawv permissions thiab authentication.
    • Khib khiav: Qhov no yuav pab muab ib qhov ntawm lub cev qhov chaw nyob rau hauv koj tus account thaum teem dua kev paub tab thiab allocating cia.
    • Safemode: Qhov no yog cov thawj tswj siv mas hom rau nkawm.
    • fsck: Qhov no nws yog ib tug dej npaum li. peb yuav mob rau lawv cov ntaub ntawv, thiab yuav nrhiav tau cov ntaub ntawv uas ploj lawm lossis blocks.
    • fetchdt: Qhov no yog ib cov dej siv mus nqa DelegationToken thiab muab tso rau hauv ib cov ntaub ntawv hauv lawv lub zos.
    • Rebalancer: Qhov no nws yog ib tug uas siv sib npaug ntawm pawg thaum cov ntaub ntawv yog unevenly faib cov DataNodes.
    • Txawj tej yam ntxiv thiab rollback: Thaum cov software yog upgraded, tej zaum yuav rollback rau cov txiv HDFS xeev ua ntej qhov txawj tej yam ntxiv thaum twg npaj txhij txog teeb meem.
    • Tej zaum NameNode: No ntawm tej lub hom phiaj periodic checkpoints ntawm cov namespace thiab yuav pab kom cov ntaub ntawv uas muaj cov cav txog HDFS modifications rau tej yam yus ntawm tus NameNode luaj li cas.
    • Checkpoint ntawm: No ntawm tej lub hom phiaj periodic checkpoints ntawm cov namespace thiab yuav pab txo tus cav muab cia rau hauv cov NameNode uas hloov mus rau lub HDFS qhov luaj li cas. Nws tau ntau xyoo txog lub luag hauj lwm/kev ua yav tas los lawm ua tus sau rau lwm NameNode. Raws li lwm txoj kev, cov NameNode rau ntshav ntau li taw yas kos, tsuav muaj muaj tsis thaub qab o nyob (sau npe) nrog lawv.
    • Thaub qab ntawm: Qhov no thiaj yuav tau txhais tias yog cov mus ntxiv rau lub Checkpoint ntawm. Nrog checkpointing kuj siv tau ib tug kwj ntawm edits los ntawm cov NameNode. Li no nws koom tes nws daim ntawv hauv-nco tus namespace. Yeej ib txwm nyob rau ntawm sync nrog rau lub NameNode thiab namespace lub xeev. Tsuas muaj ib thaub qab ntawm muaj sijhawm los mus tso npe nrog lub NameNode zuj zus.

Lub hom phiaj ntawm HDFS:

Hadoop muaj ib lub hom phiaj siv servers feem ntau muaj nyob rau hauv ib pawg loj heev, qhov chaw uas txhua tus neeg rau zaub mov muaj txheej pheej yig nrog disk drives. Rau kev kawm ntawv zoo dua qub, MapReduce API nws mus cob rau lub workloads ntawm cov servers uas cov ntaub ntawv yog muab nws yog ib qho yuav tau. Thaum zoo li no cov ntaub ntawv tas. Vim no Disease fact, nyob rau hauv ib lub chaw kawm Hadoop, nws tsis pom zoo rau siv ib cia rau tshav network (SAN), los yog ib Network txuas cia (NAS). Rau Hadoop deployments siv ib SAN los NAS, lub network ntxiv kev sib txuas lus nyiaj siv ua haujlwm yuav ua rau kev kawm bottlenecks, tshwj xeeb tshaj yog thaum loj nyob.

Tam sim no cia muab ib lub caijnyoog twg uas peb muaj tshuab 1000 tauj sawv, txhua yam ntawm cov cav tov muaj peb sab hauv disk drives. Thiaj xav tsis ua hauj lwm tus nqi ntawm ib pawg uas muaj li ntawm 3000 pheej yig drives + 1000 pheej yig servers! Peb yeej yuav luag nyob rau hauv kev pom zoo ntawm no : Cov kev tivthaiv sij hawm tsis ua hauj lwm (MTTF) koj mus kev nyob rau ib Hadoop pawg no tej zaum yuav zoo ib yam li txoj swb ntawm koj lub kas tsho: nws yuav ua txhua yam los tsis. Lub qhov zoo tshaj plaws txog Hadoop ntawd yog txoj kev muaj tiag ntawm lub MTTF cov nqi uas pheej yig kho vajtse puas ua tau thiab to taub thiab lees. Qhov nov tas ib feem ntawm lub dag lub zog ntawm Hadoop. Hadoop muaj built-in txhaum thiaj kam rau ua thiab them txhaum peevxwm. Zoo li qub mus HDFS, raws li cov ntaub ntawv faib ua blocks thiab chunks, thiab cov chunks/blocks cov ntaub ntawv uas khaws tseg rau lwm servers hla lub Hadoop pawg. Kom nws to taub yooj yim ncees peb yuav hais tias muaj ib tug neeg cov ntaub ntawv yog ua tau muab ua me blocks uas yog replicated hla ntau servers nyob rau hauv tag nrho sawv kom ceev cov ntaub ntawv no.

Ib qho piv txwv: Tam sim no peb yuav tham tau cov ntaub ntawv kawm kom to taub cov HDFS

Peb muab ib daim ntawv uas muaj cov xov tooj ntawm tag nrho cov kev tuaj nyob hauv lub tebchaws United States of America. Cov neeg uas muaj lawv lub xeem mus nrog IB yuav tau khaws tseg rau neeg rau zaub mov 1; cov neeg uas coj lawv lub xeem lub npe nrog B nyob rau ntawm neeg rau zaub mov 2, thiab hais txog. Nyob rau hauv ib lub chaw kawm Hadoop, Cov no phonebook yuav tau muab faib nyob rau hauv tag nrho sawv. Los sis cov ntaub ntawv ntawm cov phonebook nrho, qhov kev pab cuam koj yuav tau nkag tau cov blocks ntawm txhua cov neeg rau zaub mov nyob hauv pawg. After dua nyob rau HDFS replicates me me thiab daim ntawv mus rau ob servers ntxiv los vim. Ib tug yuav tham txog redundancy no tab sis lub cav lug txhawb redundancy yog zam tsis ua hauj lwm ceev thiab muab tov kam rau ua txhaum. No redundancy yuav tau ntau zog los yog tsis tau muaj ib daim puav los yog rau cov tseem nyob. Cov redundancy no muaj ntau yam kev pab cuam; tus ib cuab kev tshaj yog cov ntaub ntawv uas tau muaj tug. Ntxiv rau qhov no, cov ntaub ntawv no redundancy pub rau cov pawg Hadoop dam ua hauj lwm rau chunks me me thiab khiav cov hauj lwm me me rau tag nrho cov kev servers hauv cov pawg kom zoo scalability. Thaum kawg, raws li tus neeg siv kawg peb tau txais cov kev pab ntawm cov ntaub ntawv tas, Nws yog ib qho tseem ceeb heev thaum uas ua hauj lwm nrog ntau teeb ua khub.
Xaus: Kom peb pom tias HDFS tau yog ib yam ntawm lub Cheebtsam hauv Apache Hadoop eco-zog loj. Cov ntaub ntawv uas yog tus lwm cia qauv yog haib heev dua lawv cov ntaub ntawv hauv zos. Kom tag nrho cov ntaub ntawv loj kev siv siv lub HDFS rau lawv cov ntaub ntawv cia
Peb cia xaus peb sib tham nrog rau lub cia nyias qhov nram qab no:

  • Apache Hadoop HDFS yog ib qhov hau yog muab los ntawm lub zej zog qhib tau qhov twg los siv los muab ntau txheej cov ntaub ntawv nyob rau hauv tej pawg ua ke
  • Moj khaum Hadoop muaj plaub lub modules no :
    • Hadoop Common
    • Hadoop muab theej thiab faib cov ntaub ntawv qhov system los yog HDFS
    • Xov paj Hadoop
    • Hadoop daim ntawv qhia kom tsis txhob
  • Ib pawg HDFS muaj ib NameNode thiab ib DataNode.
  • Lub hom phiaj ntawm HDFS yog siv lub yig servers sawv loj heev.

Vam tias koj muaj enjoyed tsab xov xwm thiab to taub cov ntsiab lus yooj yim ntawm HDFS. Nyeem tiag mus.

============================================= ============================================== Yuav zoo TechAlpine phau ntawv rau Amazon
============================================== ---------------------------------------------------------------- electrician ct chestnutelectric
error

Txaus siab rau qhov blog? Tshaj tawm lus thov :)

Follow by Email
LinkedIn
LinkedIn
Share