Apache Shark yog dab tsi?

Txheej txheem cej luam: Apache shark yog muaj lus nug distributed cav tsim los ntawm lub zej zog qhib tau qhov twg los. Qhov lus nug cav mas siv rau cov ntaub ntawv Hadoop. Nws kev kawm enhanced thiab high-end analytical tau rau cov neeg siv tshees.

Nyob rau hauv daim ntawv no, Kuv yuav tham txog Apache shark thiab nws cov nta hauv lus.

Taw qhia: Apache shark yog lub chaw warehouse raws li lawv siv nrog Apache txim. Qhov no yog tsim kom tau tshaj Apache Hive. Shark muaj peev xwm mus coj HIVE QL queries 100 lub sij hawm sai dua nas muv tsis ua rau muaj kev hloov hauv cov queries uas twb muaj lawm. Shark txhawb ntau tus nas muv nta li lus nug lus, metastore, serialization tawm tswv yim, thiab tso cai sau tseg neeg siv. Li no nws ua rau qhov kev koom ua ke ntawm ib deployments tshees yooj yim.

Tseem ceeb nta Apache Shark:

Apache shark tawm nrog tus tseem ceeb nta hauv qab no –

  • Xav tso cav – Apache Shark yog ua nyob saum Apache txim uas yog muaj ntaub ntawv rau thaum uas tig mus tso cav. Txawm tias cov ntaub ntawv no rau tus disk, vim yog cov xav tso cav, yog shark kuj ceev tshaj nws cov competitors. Shark txhob cov nyiaj siv ua haujlwm ntawm daim ntawv qhia kom Hadoop. Nrog nws sai xyaw, shark yuav teb rau txoj kev queries nyob rau hauv ncua ob latency.
  • Kem nco paub qab hau cia – Cov ntaub ntawv soj mechanism tsom rua ib pawg me ntawm cov ntaub ntawv e.g. nws yuav ua tau raws li sijhawm los yog ib tug tas raws li. Yog li peb xav kov xwb me txheej dimension ntxhuav los yog ib ib feem ntawm cov Disease fact ntxhuav. Cov queries txim tuag nkaus xwb li temporal tas. Qhov no kom haum cov teeb ua hauj lwm rau hauv pawg lub cim xeeb enables.

Raws li tus neeg siv, peb yuav siv no temporal tas los ntawm storing cov ua hauj lwm txheej cov ntaub ntawv tsis pub dhau ntawm pawg nco, los yog hauv ib database yog muaj nyob rau hauv-nco materialized views. Hom kheev siv cov ntaub ntawv kuj yuav cached rau hom columnar e.g. txheej thaum ub arrays, uas yog npaum heev rau tej ntaub ntawv storing thiab khib nyiab. Qhov no muaj kev kawm ntau, raws li cov ntaub ntawv yog fetched los ntawm cov ntxhuav thiab tsis los ntawm cov disk.

Teeb thiab coj Senior:

Pov- Ua ntej teem li shark koj lub computer ua kom koj muaj li nram no ntsia tau rau koj mob –

Lub binary tis ntawm Shark muab downloaded ntawm cov nom lub website ntawm github amplab. Lub hnab binary muaj ob folders-

  • shark-0.8.0
  • nas muv-0.9.0-shark-0.8.0-rau hauv

Peb yuav tsum teem lub tsaam thawj qhob nram qab no thiaj li ua lub teeb –

  • JAVA_HOME
  • HIVE_HOME
  • SCALA_HOME

Shark los nrog ib tug template env ntaub ntawv – shark-env.sh.template. Ua tau ib daim qauv ntawm no template ntaub ntawv rau qhov chaw nyob – shark-0.8.0/conf. Lub npe ntawm cov ntaub ntawv env yuav tsum shark-env.sh. Tom qab cov tsiaj ntawv ib puas ncig nws zoo, peb yuav tau los ua lub neej ntawd nas MUV warehouse directory. Qhov no yog qhov chaw nyob qhov twg, Nas MUV stores cov ntaub ntawv cov lus rau cov ntxhuav haiv. Thaum tsim no directory nco ntsoov tias tus tswv ntawm cov Wage yog ib yam uas ua rau shark teeb.

Tam sim no peb yuav npaj nrog peb shark teeb. Khiav rau nram qab no hais kom ua-

Qhia 1 – Pib nce Shark hais kom ua kab interface

./rau hauv/shark

Yuav kom paub tseeb tias shark yog li thiab running, peb khiav cov piv txwv hauv qab no uas tau tsim ib cov lus uas muaj cov qauv ntaub ntawv –

Qhia 2 – Qauv chaws los tsim tau ib cov lus yooj yim thiab thauj ib co ntaub ntawv

SAU COV LUS SOURCE_MAP (qhov tseem ceeb rau cov menyuam, tus nqi hlua);

LOAD COV NTAUB NTAWV HAUV ZOS INPATH ' ${env:HIVE_HOME}/examples/files/kv1.txt’ RAU COV LUS

SOURCE_MAP;

XAIV SUAV(1) NTAWM src;

SAU cov lus SOURCE_MAP_cached AS xaiv * NTAWM SRC;

XAIV SUAV(1) NTAWM SOURCE_MAP_cached;

Ntxiv rau qhov lub hais kom ua shark saum toj no, peb muaj ob peb lwm executables raws li hais hauv qab no –

  • rau hauv/shark-withdebug – qhov no sau shark hais kom ua kab interface nrog debug qib cav luam tawm lub console.
  • rau hauv/shark-withinfo – Qhov no sau shark hais kom ua kab interface nrog pab ntxiv yog theem cav luam tawm lub console.

Raws li cov kauj ruam saum toj noj mentioned peb yuav setup shark ntawm ib zaug xwb ntawm. Thiaj li khiav shark rau sawv, cia peb ua raws li cov kauj ruam.

Pov- Ua ntej teem li shark koj lub computer ua kom koj muaj li nram no ntsia tau rau koj mob –

Tsis zoo li lub versions shark thiab cov txim ua ntej lawm, qhov tseeb version tsis tau Apache Mesos lawm.

Ua ntej peb hloov tej yam hauv lub txim nyob –

  • Npaj cov qhev nkag teb chaws – Cov ntaub ntawv txim qhev – txim-0.8.0/conf/qhev yuav tsum muab hloov ntxiv tswv tsev lub npe ntawm txhua tus qhev. Nws yuav ua tau ib kab nkag teb chaws ib tug qhev.
  • Thov txim yuav env – Cov ntaub ntawv txim env – txim-0.8.0/conf/txim-env.sh xav tau tus nkag hauv qab no –
    • SCALA_HOME -raws li tau piav los saum no.
    • SPARK_WORKER_MEMORY – Qhov no yog nyiaj ntau cim xeeb twg ua txim tau siv rau txhua tus ntawm xwb. Thaum teem cov parameter no peb yuav tsum ceev faj thiab nco ntsoov cia li 1 Cim xeeb GB OS txhua yam kom zoo rau cov.

Tam sim no peb cia ua lub chaw hais txog cov kev hloov hauv hav – shark

Raws li hais saum toj no, download tau lub binary tis ntawm Shark ntawm cov nom lub website ntawm github amplab. Lub hnab binary muaj ob folders-

  • shark-0.8.0
  • nas muv-0.9.0-shark-0.8.0-rau hauv

Tam sim no qhib lub hark-env.sh tej ntaub ntawv thiab hloov cov khoom nram qab no as per peb ib puag ncig-

  • JAVA_HOME
  • HIVE_HOME
  • SCALA_HOME
  • NPAJ tej qhov tseem ceeb.

Qhov URL npaj yuav raws nraim phim nrog cov txim:// PAS lum ntawm chaw nres nkoj 8080 ntawm qhov standalone tswv. Shark-env.sh ntawv yuav tsum tau saib kom muaj-

Qhia 3 – Env shark ntaub ntawv ntawm teeb clustered – incase

HADOOP_HOME = / kev/rau/hadoop

HIVE_HOME = / kev/rau/nas muv

TSWV =<Tswv URI>

SPARK_HOME = / kev/rau/xwb

PARK_MEM = j 16

tau qhov twg los $SPARK_HOME/conf/spark-env.sh

Tus kab lub xeem ntxiv no yog kom tsis txhob duplicate nkag rau SPARK_HOME. Thaum muab cov tsis nco ntsoov export lawv siv tau hais kom ua tus txheem export ntawm unix. Nws yuav tsum tau muab sau tias tus nqi nco lum hauv parameter- SPARK_MEM yuav tsum tsis tau ntau tshaj li qhov SPARK_WORKER_MEMORY hais los saum no. Yog koj xav siv shark nrog tus uas twb muaj lawm nas MUV teeb, Nco ntsoov muab cov HIVE_CONF_DIR parameter hauv lub shark-env.sh cov ntaub ntawv.

Kauj ruam tom ntej yog luam rau txim thiab shark Wage los tus ua qhev. Ib zaug ua li cas peb yuav pib tus sawv los executing tus hais kom ua-

Qhia 4 -Tso lub txim sawv –

/rau hauv/pib-all.sh

Lus lus nug Shark:

Shark tau nws tus kheej subset txog SQL uas yog heev npaum ze ntawm qhov lus nug lus ntawv nas MUV. E.g. los ua ib cached cov lus los ntawm cov natwm ntawm ib cov lus uas twb muaj lawm, Peb xav kom koj muab cov shark.cache rooj cuab yeej raws li qhia hauv qab no –

Qhia 5 – Tus qauv shark lus nug

UA TAU IB COV LUS … TBLPROPERTIES (“shark.cache” = “tseeb”) RAWS LI KOJ XAIV …

Peb kuj yuav cuag HiveQL muaj ib shortcut kev no syntax. Cias peb tau append _cached' rau lub npe lus sau lus AS xaiv thaum, thiab lub rooj yuav tsum cached nyob hauv lub cim xeeb.

Txoj kev:

Peb cia xaus peb sib tham nyob rau hauv daim ntawv uas cia nyias qhov hauv qab no –

  • Apache shark yog ib cov lus nug distributed cav tsim los ntawm lub zej zog qhib tau qhov twg los
  • Apache shark yog ib cov ntaub ntawv warehouse raws li uas yuav muab coj los rau txim Apache.
  • Apache shark yog tshaj HIVE QL thiab yuav muab tau yooj yim kev nrog nas MUV.
  • Nws tau khiav rau ntawm hom standalone thiab hom clustered.
Tagged:
============================================= ============================================== Yuav zoo TechAlpine phau ntawv rau Amazon
============================================== ---------------------------------------------------------------- electrician ct chestnutelectric
error

Txaus siab rau qhov blog? Tshaj tawm lus thov :)

Follow by Email
LinkedIn
LinkedIn
Share