Yuav ua li cas rau cov ntaub ntawv koj cov ntaub ntawv siv tus npua Apache?

Txheej txheem cej luam:

Tus npua Apache yog lub platform thiab ib qho rau BigData eco-lawv. Lub platform yuav siv cov txheej txheem loj ntim ntawm cov ntaub ntawv txheej thaum uas tig mus ua tib. Tus npua platform saum Apache Hadoop thiab MapReduce Platform xwb. Raws li peb paub, MapReduce yog tus programming qauv siv rau kev siv Hadoop. Tam sim no Apache npua platform qhia ib abstraction nyob MapReduce qauv yooj yim kom cov programming. Nws muab SQL yam li interface los tsim tej kev pab MapReduce. Tsis txhob sau ntawv MapReduce cov kev pab ncaj nraim li ntawd, developers tau sau tsab ntawv npua thiab nws yuav ua hauj lwm tau ib yam thaum uas tig mus ntawm ib puas ncig distributed.

Taw qhia:

Tus npua Apache yog ib lub platform uas siv los txheeb xyuas cov ntaub ntawv ntim loj uas muaj ib tug has lus siv hais txog cov kev pab cuam tsom xam ntaub ntawv poob lawm. Nws kuj muab cov infrastructure ntsuam cov ntawv. Cov cuab yeej uas tseem ceeb tshaj, kev pab npua yog tias tus qauv yog qhib rau kev parallelization, uas nyeg enables los lis cov ntaub ntawv poob lawm ntau heev.

Saib tam sim no, infrastructure txheej ntawm npua muaj ib compiler generates ib theem ntawm lwm daim ntawv qhia kom txo cov kev pab cuam uas. Thiab qhov no yuav ua hauj lwm, thaum uas tig mus implementations large-scale twb nyob ua ib ke hauv lub moj khaum.

Cov txheej lus ntawm npua muaj ib cov lus hu ua Latin npua. Nws muaj cov yam ntxwv tseem ceeb nram no:

  • Yooj yim ntawm programming: Nws pib ua ib txoj kev uas tsis tseem ceeb after tiav thaum uas tig mus ntawm qhov yooj yim, tsom xam ntaub ntawv rau thaum uas tig mus paub tab. Txoj kev paub tab nrog rau ntau yam ntaub ntawv interrelated transformations yog ntsees encoded ua tau ntaub ntawv txaus sequences. Yog li ntawd txoj kev siv cov tau yooj yim sau ntawv lastisnas PI nyob, to taub thiab muab khaws cia.
  • Optimization: Cov kev pab raws qib cov encoded hauv ib txoj kev no rau lawv kom lawv tso optimize yeej, tas cov neeg siab rau semantics es efficiency.
  • Extensibility: Peb muaj peev xwm tsim peb lub zog ua tshwj xeeb-cov ntsiab lus ua.

Cov plahaum npua thiab tiav:

Tus npua Apache yuav downloaded ntawm cov nom lub website – http://pig.apache.org. Feem ntau tuaj ti li ib tus neeg ua ntaub ntawv thov archive. Peb nyuam qhuav tau extract lub archive thiab teem lub chaw tsis. Npua tau kuj muab ntsia tau siv lub rpm pob ntawm ib puas ncig redhat los siv rau tus lim pob rau ntawm qhov chaw debian. Thaum tus plahaum ua yav tag los peb cias pib tus npua specifying lub zos hom siv qhov hais kom ua li nram no:

Qhia 1: Qauv uas qhia pib cov npua

$ tus npua-x zos

….

….

grunt>

Nyob rau executing no peb tau lub plhaub grunt uas tso cai rau peb interactively nkag mus rau thiab coj tus npua nqe lus.

Ib tug qauv npua tsab ntawv mus rau lo lus Suav muab shown raws li nyob rau hauv:

Qhia 2: Qauv npua tsab ntawv

input_lines = LOAD ' / tmp/myLocalCopyOfMyWebsite’ RAWS LI (kab:chararray);

— Nws extracts cov lus ntawm ib kab thiab muab lawv tso rau hauv ib lub hnab npua

— datatype, flatten lub hnab ntim tau ib lo lus rau txhua leejlus = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(kab)) Raws li lo lus;

— Nws hais tej lo lus yog dawb tej qhov chaw uas limfiltered_words = LIM lo lus los ntawm tej lus siv ntais ntawv '\w ';

— tsim ib pawg rau ib lo lusword_groups = pawg filtered_words koj tus lus;

— suav nkag rau hauv ib pawgword_count = FOREACH word_groups GENERATE suav(filtered_words) Thaum suav, koom siab raws li lo lus;

— muab cov ntaub ntawv los ntawm suavordered_word_count = yuam word_count koj tus suav DESC;Tom KHW ordered_word_count rau ' / tmp/numberOfWordsInMyWebsite;

Cov cai hais los saum toj snippet generates thaum uas tig mus executable paub tab uas siv rau mus faib rau thoob cov cav tov sib txuam rau hauv ib pawg Hadoop suav cov lus nyob hauv lub dataset xws li “tag nrho cov nplooj ntawv web hauv internet”.

Tus npua hauv MapReduce:

Siv tus npua nyob rau hauv hom MapReduce peb yuav tsum xub xyuas kom meej tias Hadoop yog li thiab running. Qhov no yuav ua tau los ntawm executing rau cov hais hauv qab no kom ua rau $ txhob:

Qhia 3: Xyuas nyob rau Hadoop

$ hadoop dfs -ls /

Nrhiav tau 3 khoom

drwxrwxrwx – Hawj txawm supergroup 0 2011-12-08 05:20 /TMP

drwxr-xr-x – Hawj txawm supergroup 0 2011-12-08 05:20 /neeg

drwxr-xr-x – mapred supergroup 0 2011-12-08 05:20 /Var

$

Daim no chaws npe tawm dua kab yog Hadoop nce thiab khiav. Tias ziag no peb muaj ensured tias Hadoop no si laim cia saib tus npua. Mus start with peb yuav tsum tau koj tus grunt txhob raws li hais hauv daim qhia 1.

Qhia 4: Tus npua ntawd nrog Hadoop

$ tus npua-x zos

2013-12-06 06:39:44,276 [ntsiab] INFO org.apache.pig.Main – Txiav tau yuam kom…

2013-12-06 06:39:44,601 [ntsiab] INFO org.apache.pig….Txuas rau hadoop \

lawv nyob: hdfs://0.0.0.0:8020

2013-12-06 06:39:44,988 [ntsiab] INFO org.apache.pig…. siv tau daim ntawv qhia kom txo \

txoj hauj lwm tracker rau: 0.0.0.0:8021

grunt> CD hdfs:///

grunt> lshdfs://0.0.0.0/TMP <dir>

hdfs://0.0.0.0/neeg <dir>

hdfs://0.0.0.0/Var <dir>

grunt>

Li ntawd, tam sim no peb pom tau Hadoop tej ntaub ntawv uas txij li ntawm tus npua. Thaum peb cuag no peb yuav tsum ua tiag nyeem ib co rau nws los ntawm peb zos cov ntaub ntawv kaw lus. Ua li no peb yuav tsum xub luam cov ntaub ntawv los ntawm lub chaw cia lawv mus siv tus npua HDFS.

Qhia 5: Tau txais cov ntaub ntawv kuaj

grunt> mkdir tomcatwebFolgrunt> CD tomcatwebFol

grunt> copyFromLocal /usr/share/apache-tomcat/webapps/MywebApp/WEB-IINF/web.xml webXMLFile

grunt> ls

hdfs://0.0.0.0/tomcatwebFol /webXMLFile <r 1> 10,924

Ntsuam xyuas cov ntaub ntawv tsis pub dhau lub Hadoop tej ntaub ntawv uas siv cov qauv no tam sim no, peb yuav sim thiab coj lwm tsab ntawv. Piv txwv li peb yuav ua tau ib tug miv rau cov ntaub ntawv li ntawm tus npua mus saib lub. Yuav kom ncav qhov no peb yuav tau thauj tus webXMLFile ntawm tus HDFS mus ua tus npua piv.

Qhia 6: Thauj khoom thiab parse ntawv

grunt> webXMLFile = LOAD ' /usr/share/apache-tomcat/webapps/MywebApp/WEB-IINF/web.xml ‘ SIV PigStorage(‘>’) RAWS LI (ntsiab lus teb-param:chararray, \param-lub npe:chararray, \ param-lub npe:chararray);

grunt> POB TSEG webXMLFile;(RootDir, /usr/Oracle/AutoVueIntegrationSDK/FileSys/Repository/filesysRepository)

grunt>

Npua thiab muab rau pawg neeg uas pab nyob rau hauv grouping lub tuples nyob rau hauv nws lub plhaub lawm.

Tswv nyob rau npua:

Tus npua Apache muaj ib tug xov tooj uas tau paub thiab diagnostic tswv. Qhov tseem ceeb tshaj sawv daws yuav muaj teev nyob rau nram qab no:

Lub npe neeg teb xov tooj Hom Hauj lwm lawm
LIM
Paub Xaiv ib txheej tuples los ntawm ib tug piv rau ib yam kev mob.
FOREACH .
Paub Iterate lub tuples ntawm tus piv thiab generates tus ntawv transformation
IB PAB
Paub Pab pawg neeg cov ntaub ntawv rau ib los yog ntau tshaj relations.
SIB SAU
Paub Koom nrog ob los sis ntau relations (puab yog cov sib sau).
LOAD Paub Thauj cov ntaub ntawv los ntawm cov ntaub ntawv kaw lus.
TXIAV TXIM Paub
Txheeb tus piv raws li ib los yog ntau lub teb.
SPLIT Paub Partition ib piv rau ob los sis ntau relations.
KHW Paub Khaws ntaub ntawv nyob rau hauv lawv cov ntaub ntawv.
PIAV QHIA Diagnostic Xa cov schema ntawm tus piv.
POB TSEG Diagnostic Dump tus txheem ntawm tus piv rau qhov screen.
PIAV TXOG Diagnostic Tso saib cov kev npaj MapReduce tiav.

Neeg twb txhais muaj zog:

Txawm tias tus npua yog ib tug haib thiab pab scripting uas piav lub ntsiab lus teb siv no Tshooj, nws yuav ua haib tshaj nrog kev pab los ntawm cov neeg siv txhais zog (UDFs). Tus npua scripts yuav siv zog uas peb txhais kom meej rau scenarios parsing cov ntaub ntawv input lossis formatting tej ntaub ntawv rau cov zis thiab tej tswv. Cov UDFs yog sau ua lus Java thiab pub npua txhawb ua kev cai.

Txoj kev:

Peb cia xaus peb sib tham nrog rau lub txheej hauv qab no cia nyias qhov:

  • Apache npua yog ib feem ntawm lub BigData ecosystem
  • Tus npua Apache yog ib lub platform uas siv los txheeb xyuas cov ntaub ntawv ntim loj uas muaj ib tug has lus siv hais txog cov kev pab cuam tsom xam ntaub ntawv poob lawm.
  • Tus npua Apache yuav downloaded thiab ntsia ntawm cov nom lub website – http://pig.apache.org.
  • Nws yuav yooj yim configured thiab tseg tsis pub dhau Hadoop faib cov ntaub ntawv kaw lus.

Vam tias koj muaj enjoyed tsab xov xwm. Nyeem tiag mus!!

 

Tagged:
============================================= ============================================== Yuav zoo TechAlpine phau ntawv rau Amazon
============================================== ---------------------------------------------------------------- electrician ct chestnutelectric
error

Txaus siab rau qhov blog? Tshaj tawm lus thov :)

Follow by Email
LinkedIn
LinkedIn
Share