Lub caij nplooj hlav rau Apache Hadoop yog dab tsi?

Txheej txheem cej luam: Caij nplooj ntoos hlav yog ib tug ntawm tus coj lug siv nyob rau hauv txoj kev loj hlob daim ntaub ntawv enterprise. Caij nplooj ntoos hlav muaj ntau lub Cheebtsam ib yam li lub caij nplooj ntoos hlav ORM, Caij nplooj ntoos hlav JDBC yam txhawb ntau nta. Lub caij nplooj hlav rau Apache Hadoop yog lub moj khaum lug txhawb lub tsev ntawv nrog Hadoop Cheebtsam xws li HDFS, MapReduce thiab lwm yam nas muv. Caij nplooj ntoos hlav muaj APIs ua hauj lwm nrog rau tag nrho cov cheebtsam. Caij nplooj ntoos hlav tseem txhawb txog ntawm Hadoop nrog lwm tus tej yaam num ecosystem caij nplooj ntoos hlav rau txoj kev loj hlob daim ntawv thov lub neej tiag tiag. Nyob rau cov tshooj no peb yuav tham txog cov kev pab ntawm Caij nplooj ntoos hlav kev coj Apache Hadoop.

Taw qhia:
Apache Hadoop yog ib qhov chaw qhib software moj khaum, uas yog siv los muab thiab txheej txheem-teeb ua khub uas loj ntim. Caij nplooj ntoos hlav tseem yog ib lub moj khaum qhib tau qhov twg los, uas yog lug siv nyob rau hauv daim ntaub ntawv Java/J2ee. Caij nplooj ntoos hlav kev quav tshuaj txhaj (DI) los yog inversion uas tswj (RHUAV TSHEM COV NTAUB) mechanism kuj muaj lwm txoj nrov rau lub noob taum Enterprise Java (los yog EJB) qauv. Caij nplooj ntoos hlav twb tus kom zoo dua yog heeev yooj yim tus xob uas muaj lwm yam kev moj khaum yuav. Siv lub caij nplooj hlav no kom zoo dua, peb yuav ntsaws nws nrog Apache Hadoop kom peb tau txais cov nyiaj ntawm txhua cov coj ob tug.

Pib tau:
Ntawm ntu no peb yuav tham txog yuav ua Hadoop MapReduce haujlwm siv caij nplooj ntoos hlav. Qhov no yuav cov kauj ruam hauv qab no –

  • Kauj ruam 1 – Nrhiav tus yuav tsum tau dependencies siv Maven – raws li peb paub maven yog tug tub rau cov ntaub ntawv pom.xml, peb ua cov nram qab no nkag rau hauv peb cov ntaub ntawv pom.xml. Cov quav nkag rau tub ntxhais Hadoop thiab lub caij nplooj ntoos hlav moj khaum yog.

Listing1: Qauv configuration nkag rau hauv ntaub ntawv pom.xml

[chaws]

< !– Caij nplooj ntoos hlav ntaub ntawv Apache Hadoop — >
< quav dej caws >
< groupId > org.springframework.data </ groupId >
< artifactId > caij nplooj ntoos hlav-ntaub ntawv-hadoop </ artifactId >
< Version > 1.0.0.TSO TAWM </ Version >
< /quav dej caws >
< !– Tub ntxhais Hadoop Apache-- >
< quav dej caws >
< groupId > org.apache.hadoop </ groupId >
< artifactId > ntxhais hadoop </ artifactId >
< Version > 1.0.3 </Version >
</quav dej caws>

[/ chaws]

  • Kauj ruam 2 – Tsim tau lub mapper tivthaiv – raws li peb paub ib mapper tivthaiv yog siv rau cov teeb meem hauv Cheebtsam me. Cov me Cheebtsam ua yooj yim los daws kom tau. Peb yuav muaj peb cov customized mapper tivthaiv los ntawm extending lub Apache daim ntawv qhia kom kawm Mapper. Peb yuav tsum override rau hauv daim ntawv qhia kev kawm no. Cov chav kawm ntawv mapper expects tus plaub tsis qab –

Kev tawm tswv yim: Hauv qab no tsis yog qhov tseem ceeb input thiab tus nqi

  • KEYIN No parameter piav yam tseem ceeb uas muab los ua ib cov tswv yim rau qhov mapper tivthaiv.
  • VALUEIN No parameter piav seb cov nqi uas muab los ua ib cov tswv yim rau qhov mapper tivthaiv.

Rau cov zis: Hauv qab no tsis yog qhov tseem ceeb rau cov zis thiab tus nqi

  • KEYOUTNo parameter piav seb cov tsis ua muab nrhav parameter ntawm tus mapper tivthaiv.
  • VALUEOUTNo parameter piav seb tus nqi rau cov zis los ntawm cov mapper tivthaiv.

Yuav tsum siv cov tsis txhua tus writable interface. Hauv cov zauv Piv txwv, peb tau siv peb mapper nyeem tus txheem uas muaj ntaub ntawv ib kab zuj zus thiab npaj tus yuam sij officers rau txhua kab. Peb yuav ua raws li daim ntawv qhia nug tej lub hom phiaj cov paub tab no-

  • Ua ntej, phua txhua ib kab rau hauv cov lus
  • Thib ob, iterate los ntawm txhua lo lus xwb thiab siv tawm ob txhua tus Unicode cim uas ntawv sau tej nuj nqis los yog cim.
  • Peb, txua ib tug khub uas siv rau txoj kev sau txog tus yuam sij rau Ntsiab lus teb cov chav kawm ntawv uas tau tshaj tuaj tso zis yuam tus khub.

Listing2: Qauv Mekas hoob kawm Mapper

[Chaws]

pej xeem hoob MyWordMapper extends Mapper<LongWritable, Ntawv nyeem, Ntawv nyeem, IntWritable> {
qhov ntawv myword = ntawv tshiab();

@Override
tiv thaiv ua ntej daim ntawv qhia kawm(Qhov tseem ceeb LongWritable, Ntawv luam tus nqi, Ntsiab lus teb cov ntsiab lus teb) throws IOException, InterruptedException {
Txoj hlua kab = value.toString();
StringTokenizer lineTokenz = StringTokenizer tshiab(kab);
Thaum (lineTokenz.hasMoreTokens()) {
Kaj cleaned_data = removeNonLettersNonNumbers(lineTokenz.nextToken());
myword.set(cleaned_data);
context.Write(myword, IntWritable tshiab(1));
}
}

/**
* Hloov cov Unicode rau cim uas yog tej nuj nqis tooj tsis txuam nrog ib txoj hlua npliag.
* @param yeej, Nws yog tus thawj txoj hlua
* @return ib txoj hlua yam twj paj nruas uas muaj tib cov ntawv thiab zauv
*/
chaw removeNonLettersNonNumbers txoj hlua (Txoj hlua tseem cia nrog) {
rov qab mus original.replaceAll(“[^ \p{L}\\p{N}]”, “”);
}
}

[/Chaws]

Kauj ruam 3 – Ua tau ib qhov Reducer Tivthaiv – ib reducer yog ib tug tivthaiv uas deletes hauv pliaj intermediate qhov tseem ceeb thiab forwards xwb cov nqi tseem ceeb officers uas muaj ntau yam. Muaj peb cov customized reducer, peb kawm ntawv yuav tsum tau cuag tus Reducer chav kawm ntawv thiab dhau caij cov txo tej txujci. Cov chav kawm ntawv reducer expects tus plaub hauv qab no tsis.

Kev tawm tswv yim: Hauv qab no tsis yog qhov tseem ceeb input thiab tus nqi

  • KEYIN No parameter piav yam tseem ceeb uas muab los ua ib cov tswv yim rau qhov mapper tivthaiv.
  • VALUEIN No parameter piav seb cov nqi uas muab los ua ib cov tswv yim rau qhov mapper tivthaiv.

Rau cov zis: Hauv qab no tsis yog qhov tseem ceeb rau cov zis thiab tus nqi

  • KEYOUTNo parameter piav seb cov tsis ua muab nrhav parameter ntawm tus mapper tivthaiv
  • VALUEOUTNo parameter piav seb tus nqi rau cov zis los ntawm cov mapper tivthaiv.

Thaum uas siv lub tswv yim peb yuav tsum nrhiav kom lub datatype ntawm lub ' keyin' thiab 'keyout' tsis muaj tib. Tseem lub 'valuein' thiab valueout' tsis yuav yuav tib yam. Peb yuav ua raws li ntawm txoj kev txo tej co cov kauj ruam hauv qab no –

  • Ua ntej, xyuas kom tus yuam sij input muaj ib yam lo lus.
  • Thib ob, Yog hais tias cov kauj ruam saum toj no yeej muaj tseeb, muab cov lus ntawm lo lus.
  • Peb, txua tus yuam sij khub ntawm txoj kev teev cov hoob kawm reducer.

Listing3: Qauv Mekas hoob kawm Reducer

[Chaws]

ntshuam org.apache.hadoop.io.IntWritable;
ntshuam org.apache.hadoop.io.Text;
ntshuam org.apache.hadoop.mapreduce.Reducer;

pej xeem hoob MyWordReducer extends Reducer<Ntawv nyeem, IntWritable, Ntawv nyeem, IntWritable> {
kev tiv thaiv zoo li qub thaum kawg txoj hlua MY_TARGET_TEXT = “Hadoop”;

@Override
kom tiv thaiv tsis muaj dabtsis(Phau ntawv keyTxt, Iterable<IntWritable> qhov tseem ceeb, Ntsiab lus teb cov ntsiab lus teb) throws IOException, InterruptedException {
Yog hais tias (containsTargetWord(keyTxt)) {
rau cov menyuam wCount = 0;
rau (Tus nqi IntWritable: qhov tseem ceeb) {
wCount = value.get();
}
context.Write(qhov tseem ceeb, IntWritable tshiab(wCount));
}
}
lwm yam boolean containsTargetWord(Phau ntawv keyTxt) {
rov qab mus keyTxt.toString().qhov sib npaug(MY_TARGET_TEXT);
}
}

[/Chaws]

  • Kauj ruam 4 – Ua tau ib lub ntsiab lus teb hauv daim ntawv thov – cov kauj ruam tom ntej yog los ua cov ntaub ntawv ntsiab lus teb siv XML. Peb yuav tau configure lub ntsiab lus teb daim ntawv thov peb daim ntawv thov siv cov nram no ruam-
    • Tsim ib zog cov ntaub ntawv uas muaj cov nqi ntawm qhov configuration zog. Ib tug qauv thov zog thov yog muaj li nram qab no-

[Chaws]
FS.default.name=hdfs://localhost:9000
mapred.Job.tracker=localhost:9001
input.path=/path/to/input/File/
output.path=/path/to/output/File
[/Chaws]

  • Configure ib tug qhov chaw nrog uas yog siv los nqa cov qhov tseem ceeb ntawm configuration zog los ntawm cov ntaub ntawv tsim khoom. Qhov no yuav ua tau los ntawm kev ntxiv cov nram no hauv peb daim ntawv ntsiab lus teb XML ntaub ntawv –

[Chaws]
<ntsiab lus teb:qhov chaw nyob ntawm tus kheej-placeholder =”classpath:Application.properties” />

[/Chaws]

  • Configure Apache Hadoop thiab nws cov hauj lwm – peb tau configure lub neej ntawd tej ntaub ntawv kaw lus thiab nws cov hauj lwm tracker los ntawm kev ntxiv cov nram qab no nyob hauv peb cov ntaub ntawv ntsiab lus teb daim ntawv thov

[Chaws]

<hdp:configuration>
FS.default.name=${FS.default.name}
mapred.Job.tracker=${mapred.Job.tracker}
</hdp:configuration>

[/Chaws]

Peb yuav tsum tau ntxiv cov nram no hauv peb daim ntawv ntsiab lus teb XML tej ntaub ntawv txhais tau cov hauj lwm tracker-

[Chaws]
<hdp:txoj hauj lwm tus id =”wordCountJobId”
kev tawm tswv yim =”${input.path}”
kev tso zis ntau =”${output.path}”
hwj iav los-kawm =”net.qs.spring.data.apachehadoop.Main”
mapper =”net.qs.spring.data.apachehadoop.MyWordMapper”
reducer =”net.qs.spring.data.apachehadoop.MyWordReducer”/>

[/Chaws]

  • Configure tus khiav hauj lwm uas tau sau rau txoj hauj lwm tsim hadoop. Cov hauj lwm khiav tau configured los ntawm kev ntxiv cov nram no hauv peb daim ntawv ntsiab lus teb XML ntaub ntawv

[Chaws]
<hdp:khiav hauj lwm daim id =”wordCountJobRunner” txoj hauj lwm nyob =”wordCountJobId” mus dhia-rau-startup =”tseeb”/>
[/Chaws]

  • Kauj ruam 5 – Loading lub ntsiab lus teb hauv daim ntawv thov rau startup – peb tau tam sim no coj created Hadoop hauj lwm los ntawm loading lub ntsiab lus teb hauv daim ntawv thov thaum ua daim ntawv thov no yog pib. Peb yuav tau ua qhov no tsim cov kev lom ntawm tus kwv ClasspathXmlApplicationContext uas lees txais peb cov ntaub ntawv ntsiab lus teb daim ntawv thov lub npe li input parameter kom cov constructor. Qhov no yuav ua tau kom muaj-

Listing4: Chaw thau khoom qhia qauv ntawm cov ntsiab lus teb daim ntawv thov

[Chaws]
ntshuam org.springframework.context.ApplicationContext;
ntshuam org.springframework.context.support.ClassPathXmlApplicationContext;

pej xeem hoob loj {
pej xeem tsis muaj dabtsis loj zoo li qub(Txoj hlua[] cov lus) {
ApplicationContext ctx = ClassPathXmlApplicationContext tshiab(“applicationContext.xml”);
}
}

[/Chaws]

  • Kauj ruam 6 – Khiav cov hauj lwm Mapreduce – tej zaum peb yuav pib peb daim ntawv qhia kom tsis txhob siv cov nram qab no ua hauj lwm nrhiav –
  • Upload ib input ua ntaub ntawv thov mus rau hauv HDFS – peb yuav tau ua qhov no los ntawm executing qhov hais kom ua li no rau qhov hais kom ua txhob –

[Chaws]

hadoop dfs-muab sample.txt /input/sample.txt

[/Chaws]

Nram no yog ib tug qauv input tej ntaub ntawv uas twb tau muab siv hauv qhov ua piv txwv. Lo lus yuam phiaj 'Hadoop' yog highlighted rau NTSUAB. Lo lus 'Hadoop' tshwm sim 4 nyob rau hauv tus qauv.

Input

Tswv yim

Image1: Qauv ntaub ntawv input

  • Yog tias cov ntaub ntawv twb tau uploaded ntse los khiav qhov hais kom ua li nram no mus. Nws yuav qhia cov ntaub ntawv input.

[chaws]

hadoop dfs -ls /input

[/chaws]

  • Khiav hauj lwm Mapreduce. Qhov no yuav ua tau los ntawm executing lub ntsiab kev peb java tej ntaub ntawv los ntawm lub IDE. Yog hais tias txhua yam ua hauj lwm raws li tau leej twg ces cov nram no yuav raug cov zis.

Tso zis: Hadoop 4
Txoj kev: Peb cia xaus li cas peb tau tham kom deb li deb rau lub cia nyias qhov nram qab no –

  • Ob leeg Caij nplooj ntoos hlav thiab Hadoop yog pab tau coj los ntawm zej zog qhib tau qhov twg los.
  • Yog combining no peb yeej mus muab tau cov kev pab ntawm ob tus coj.
  • Pab kom txoj hauj lwm tsim daim ntawv qhia siv caij nplooj ntoos hlav yog ib kauj ruam rau txheej txheem raws li tau piav los saum no.
============================================= ============================================== Yuav zoo TechAlpine phau ntawv rau Amazon
============================================== ---------------------------------------------------------------- electrician ct chestnutelectric
error

Txaus siab rau qhov blog? Tshaj tawm lus thov :)

Follow by Email
LinkedIn
LinkedIn
Share