Yog vim li cas Apache txim lub platform yav tom ntej rau tej ntaub ntawv loj?

Apache Spark and Big Data

Apache Spark thiab loj cov ntaub ntawv

Txheej txheem cej luam:

Raws li ntaub ntawv loj yog ib qhov tseem ceeb tshaj plaws cuab tam ib enterprise yuav muaj possesses, enterprises yog demanding ntau tawm ntawm cov ntaub ntawv. Enterprises cia siab tias cov ntaub ntawv muab complex thiab multimidensional insights nyob siab ceev. Muab tej kev pom zoo li no, tuam txhab uas muag tau tsim nyog, tshuab thiab cuabyeej. Kauj ruam ntawm cov ntaub ntawv loj yuav txhais txoj kev sib raug zoo ntawm enterprises thiab cov ntaub ntawv. Apache Spark muab cov framework los ua tej yam xws li txheej txheem, querying thiab generating analytics siab ceev thiab saib lub neej yav tom ntej, No mas, tej zaum yuav tias Apache Spark yuav muab lub platform nyiam heev rau cov ntaub ntawv loj. Ib qho tseem ceeb tshaj nyob rau hauv cov ntsiab lus teb no yog Apache spark yog ib qhov qhib framework uas ua rau nws tsis txaus siab nyob rau hauv ib qho kev tsis txwv tsis pub kim proprietary tshuab lag luam. Apache Spark pom raws li ib tug competitor los yog successor rau MapReduce. Muaj ib co kws txawj uas tseem xav txog Txim lub moj khaum ntawm nws cov theem nascent thiab nws yuav muaj cai tam sim no txhawb ob peb lub lag luam analytics.








Ntsiab lus teb rau Apache spark

Apache Spark muaj emerged ib lub sij hawm thaum tus enterprises cia siab tias cov ntaub ntawv lawv yuav tsum muaj ntau tab sis yog constrained los ntawm ntau yam. Enterprises muaj txojkev teeb meem rau ob peb fronts xws li inadequate framework thiab tshuab, siv tshuab kim thiab tsis muaj cov neeg txawj ntse. Peb xyuas tej teeb meem no me ntsis ntxiv zoo.

Inadequate framework

Lub frameworks muaj tsis muaj peev xwm txheej tau cov ntaub ntawv uas muaj kev kawm ntawm efficiency. Ceev, khaub lig-platform compatibility thiab querying yog tag nrho, kev ua kom neeg npau npaum li cas, teeb meem nrog software software frameworks. Muaj sij hawm, Cov rua ntawm cov ntaub ntawv yog qhov ntau varied, txoj yooj yim thiab multimensional. Qhov no yog tsim ib kis ntawm cov miv nyuas thiab cov capabilities

nqi software

Cov nqi ntawm proprietary software los yog framework yog txwv thiab uas yog tsim ib qho qws tsoo kom vim mid-sized me tuam txhab uas muag tsis muaj peev xwm muas thiab rov qab cov lais xees. Tsuas loj tuam txhab uas muag nrog deep pockets yuav them taus tej kev siv nyiaj uas txhais tau tias tuam txhab uas muag nyob deprived ntawm cov ntaub ntawv ntau dua processing capabilities.


Nyiaj Incompatibility

Lub frameworks muaj teeb meem rau lwm yam cuab yeej. Piv txwv, Mapreduce sau xwb Hadoop. Spark tsis muaj tej yam teeb meem compatibility. Nws yuav khiav ntawm tej chaw muab kev pab xws li XOV PAJ los sis Xovxwm.

Yam Apache Spark yog lub platform yav tom ntej rau cov ntaub ntawv loj

Thaum koj xav tau kev Apache Spark yog lub platform yav tom ntej rau ntaub ntawv loj, nws yog hom inevitable los piv nrog Hadoop hadoop. Hadoop yog tseem nyiam tshaj ntaub ntawv loj Processing framework thiab muaj zoo vim li cas Spark replaces Hadoop. Li no yog ob peb yam spark yog lub neej yav tom ntej.

Npaum kov cov algorithms

Spark yog zoo kawg thiab ntawm tuav programming qauv uas muaj cov iterations, sib tham ntawd muaj xws li streaming thiab ntau npaum li cas. Rau lwm cov tes, Mapreduce zaub ntau inefficiencies hauv tuav iterative algorithms. Uas yog ib tug loj vim li cas Apache Spark xam tau tias yog ib tug nqi hloov rau Mapreduce.

Spark muab analytics workflows

Thaum nws tawm los rau analytics platforms, Spark muab ib wealth ntawm cov chaw muab kev pab. Nws muaj nws, Piv txwv, tsev qiv ntawv rau kev kawm tshuab (MLlib), Kev Siv Programming Interfaces (APIs) rau nraaj, los yog GraphX, Kev them nyiaj yug rau SQL kuas querying, cwj mem qaib. Tag nrho cov constitute ib comprehensive analytics platform. Raws li Ian Lumb of Bright Computing, "Workflows yuav tau sau tseg rau hauv ib hom ntaub ntawv batch los yog nyob rau lub sij hawm uas siv lub built-interactive plhaub muaj nyob rau hauv Scala thiab Nab hab sej. Vim hais tias cov tsis ruaj khov pob khoom R Twb yog ib qho ntawm cov txug tej yaam num, Spark tus analytics nres yog heev heev. Spark yuav saib tau tej ntaub ntawv Hadoop Hadoop – los ntawm HDFS (thiab lwm lub nruab nrog cev) databases xws li Apache HBase thiab Apache Cassandra. Cov ntaub ntawv xee npe los ntawm Hadoop yuav tsum incorporated rau hauv spark daim ntaub ntawv thiab cov hauj lwm."








Cim xeeb cim xeeb

Nyob rau hauv ib txoj kev tshawb no benchmarking txoj kev tshawb no nyob rau hauv-nco cia zaj dabneeg binary, Nws yog sab tias Spark outperformed Hadoop los ntawm 20x factor. Qhov no yog vim spark muaj cov Resilient faib Datasets (RDDs). Alan Lumb ntawm Bright Computing ntxiv, "RDDs yog fault-tolerant, cov ntaub ntawv parallel lug suited rau-nco cluster computing. Zoo ib yam nrog cov Hadoop paradigm, RDDs yuav persist thiab tau partitioned nyob ib cov ntaub ntawv loj infrastructure ensuring ntawd cov ntaub ntawv yog optimally tso. Thiab, tau mas, RDDs yuav muab manipulated siv ib tug nplua nuj txheej tswv." Ces nrog zoo nco tau utilization, enterprises yuav nrhiav lwm tus rau pem hauv ntej kom zoo dua kev tswj thiab cov nqi nyiaj.

Qhabnias tau zoo

Nyob rau hauv ib rooj plaub best-case scenario rau Hadoop, Spark tuav Hadoop los ntawm ib qhov zoo tshaj 20x. Saib cov duab hauv qab no, Nws qhia tau hais tias Spark tuav Hadoop Hadoop txawm tias nco yog unavailable thiab nws tau siv nws cov disks.

Spark and Hadoop comparison

Txim thiab Hadoop sib piv

Raws li lub spark Apache website, Spark yuav "Khiav cov kev pab cuam mus txog 100x sai tshaj Hadoop Mapreduce nyob hauv nco, los yog 10 x sai rau disk. Spark muaj ib qho dhau daG tiav cov cav uas txhawb cov ntaub ntawv cycliclic txaus thiab in-memory computing."

kev ua kom muaj zog

Spark yuav muab tau streaming, SQL, thiab txoj ke. Nws yuav hwj huam ib pawg neeg uas muaj SQL, GraphX, MLlib rau kev kawm thiab DataFrames, thiab spark Streaming. Koj muab tau tag nrho cov tsev qiv ntawv seamlessly nyob rau hauv tib daim ntawv thov.

Spark yuav khiav zaws

Spark yuav khiav ntawm Mesos, pob zeb, Hadoop, ntxhua khaub ncaws. Nws kuj yuav mus saib tau cov ntaub ntawv los ntawm daim ntawv uas muaj Cassandra, HDFS, HBase, thiab S3.

Teeb meem loj uptake

Spark yuav raug muab cov chaw muab kev pab kom zoo dua. Kaj lug computing, uas muab software software rau deploying thiab tswj cov ntaub ntawv loj clusters thiab HPC thiab OpenStack nyob rau hauv cov ntaub ntawv chaw thiab nyob rau hauv cov huab, soj ntsuam xyuas, "txim tawv 1.2.0 Txiav txim rau hlis ntuj nqeg-hlis ntuj nqeg 2014. Dhau 1,000 xyaw tau ua los ntawm qhov 172 developers contributing no – uas yog ntau tshaj 3x pes tsawg tus developers uas contributed rau yav dhau los tso tawm, Spark 1.1.1." Tus Spark achievements pw qhov tseeb tias nws yuav cia cov zej zog tseem ntawm software developers mus rau hauv contributing.








Txoj kev

Thaum muaj ntau txoj kev vibes txog spark, Nws tseem yuav tsum tau deployed hla enterprises thiab siv tus neeg mob yuav tsum tau mus kuaj. Sib txuas lus, Cov nta thiab peevxwm yog impressive thiab nws cog kom xa ntau heev.

============================================= ============================================== Yuav zoo TechAlpine phau ntawv rau Amazon
============================================== ---------------------------------------------------------------- electrician ct chestnutelectric
error

Txaus siab rau qhov blog? Tshaj tawm lus thov :)

Follow by Email
LinkedIn
LinkedIn
Share