What is Spring for Apache Hadoop?

Forbhreathnú: Earraigh is one of the widely used frameworks in enterprise applications development. Earraigh has different components like Spring ORM, Spring JDBC etc to support different features. Spring for Apache Hadoop is the framework to support application building with Hadoop components like HDFS, MapReduce and Hive etc. Earraigh provides APIs to work with all these components. Spring also supports integration of Hadoop with other Spring ecosystem projects for real life application development. In this article we will discuss the usage of Earraigh for Apache Hadoop frameworks.

Réamhrá:
Apache Hadoop is an open source software framework, which is used to store and process data-sets of larger volume. Spring is also an open source framework, which is widely used in Java/J2ee applications. Spring’s dependency injection (DI) or inversion of control (IO) mechanism has become a popular alternative to the Enterprise Java Beans (or EJB) model. Spring has the advantage of being flexible enough to be easily plugged with any other development framework. Using this advantage of spring, we can plug it with Apache Hadoop to help us get the maximum benefit of each of these two frameworks.

Getting Started:
In this section we will talk about how to create a Hadoop MapReduce Job using Spring. This involves the following steps –

Step 1 – Obtain the required dependencies using Maven – As we know maven is highly dependent on the pom.xml file, we make the following entries in our pom.xml file. These dependency entries are for Hadoop core and Spring framework.

Listing1: Sample configuration entries in pom.xml file

[Cód]

< !– Spring Data Apache Hadoop — >
< dependency >
< groupId > org.springframework.data </ groupId >
< artifactId > spring-data-hadoop </ artifactId >
< version > 1.0.0.RELEASE </ version >
< /dependency >
< !– Apache Hadoop Core –- >
< dependency >
< groupId > org.apache.hadoop </ groupId >
< artifactId > hadoop-core </ artifactId >
< version > 1.0.3 </version >
</dependency>

[/ Cód]

Step 2 – Create the mapper component – As we know a mapper component is used to break the actual problem into smaller components. These smaller components then become easier to solve. We can have our own customized mapper component by extending the Apache map reduce Mapper class. We need to override the map method of this class. The mapper class expects the following four parameters –

For input: Following parameters are for input key and value

KEYIN – This parameter describes the key type which is provided as an input to the mapper component.
VALUEIN – This parameter describes the type of the value which is provided as an input to the mapper component.

For output: Following parameters are for output key and value

KEYOUT – Seo paraiméadar cur síos ar an gcineál an amach a chur ar paraiméadar phríomhthorthaí an chomhpháirt mapper.
VALUEOUT – Seo paraiméadar cur síos ar an gcineál an luach aschuir ó na comhpháirte mapper.

Ní mór do gach ceann de na paraiméadair a chur i bhfeidhm ar an inscríofa interface. Sa sampla a thugtar, ní mór dúinn úsáid as ár n-mapper a léamh ar an ábhar comhad líne amháin ag an am agus a ullmhú péirí príomh-luach gach líne. Fheidhmíonn Ár cur i bhfeidhm an modh léarscáil na tascanna seo a leanas -

First, scoilt gach líne amháin i bhfocail
Second, iterate trí gach focal amháin agus a thógáil amach ní na carachtair Unicode inti nach bhfuil litreacha ná carachtair.
Third, thógáil eochairphéire-luach ag baint úsáide as an modh scríobh an comhthéacs aicme atá ag luí leis an aschur ag súil príomh-luach péire.

Listing2: aicme Mapper Samplach saincheaptha

[Code]

Síneann rang poiblí MyWordMapper Mapper<LongWritable, téacs, téacs, IntWritable> {
myword Text príobháideach = Téacs nua();

@ Sáraigh
léarscáil neamhní cosanta(LongWritable eochair, luach téacs, comhthéacs comhthéacs) throws IOException, InterruptedException {
líne Teaghrán = value.toString();
StringTokenizer lineTokenz = nua StringTokenizer(line);
cé (lineTokenz.hasMoreTokens()) {
Teaghrán cleaned_data = removeNonLettersNonNumbers(lineTokenz.nextToken());
myword.set(cleaned_data);
context.write(myword, nua IntWritable(1));
}
}

/**
* Ionadaigh gach carachtair Unicode go bhfuil nach huimhreacha ná litreacha a bhfuil teaghrán folamh.
* @param bunaidh, Tá sé an teaghrán bunaidh
* @return rud teaghrán ina bhfuil litreacha amháin agus uimhreacha
*/
removeNonLettersNonNumbers Teaghrán príobháideach (teaghrán bunaidh) {
ais original.replaceAll(“[^ \ P{L}\\p{N}]”, “”);
}
}

[/Code]

Step 3 – Cruthaigh an reducer Is reducer comhpháirt a Scrios na luachanna idirmheánach nach dteastaíonn agus ar aghaidh ach na péirí luach eochair atá ábhartha - Comhpháirt. Bheith acu ar ár reducer saincheaptha, our class should extend the reducer class and over ride the reduce method. The reducer class expects the following four parameters.

For input: Following parameters are for input key and value

KEYIN – This parameter describes the key type which is provided as an input to the mapper component.
VALUEIN – This parameter describes the type of the value which is provided as an input to the mapper component.

For output: Following parameters are for output key and value

KEYOUT – Seo paraiméadar cur síos ar an gcineál an amach a chur ar paraiméadar phríomhthorthaí an chomhpháirt mapper
VALUEOUT – Seo paraiméadar cur síos ar an gcineál an luach aschuir ó na comhpháirte mapper.

While implementing we must make sure that the datatype of the ‘keyin’ and ‘keyout’ parameters are same. Also the ‘valuein’ and valueout’ parameters should be of same type. Our implementation of the reduce method performs the following steps –

First, check that the input key contains the desired word.
Second, if the above step is true, get the number of occurrences of the word.
Third, construct a new key-value pair by calling the write method of the reducer class.

Listing3: Sample customized Reducer class

[Code]

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MyWordReducer extends Reducer<téacs, IntWritable, téacs, IntWritable> {
cosanta statach MY_TARGET_TEXT Teaghrán deiridh = “Hadoop”;

@ Sáraigh
neamhní cosanta a laghdú(keyTxt téacs, Iterable<IntWritable> values, comhthéacs comhthéacs) throws IOException, InterruptedException {
más rud é go (containsTargetWord(keyTxt)) {
int wCount = 0;
le haghaidh (luach IntWritable: values) {
wCount = value.get();
}
context.write(key, nua IntWritable(wCount));
}
}
containsTargetWord Boole príobháideach(keyTxt téacs) {
ais keyTxt.toString().ionann(MY_TARGET_TEXT);
}
}

[/Code]

Step 4 – Cruthaigh an comhthéacs iarratais - Is an chéad chéim eile a chruthú ar an chomhthéacs iarratais ag baint úsáide as XML. Is féidir linn a chumrú comhthéacs bhfeidhm ár n-iarratas ag baint úsáide as na bearta seo a leanas -
- Cruthaigh comhad airíonna ina bhfuil an luach na maoine chumraíocht. Tá sampla airíonna iarratas comhad a thaispeántar thíos -

[Code]
fs.default.name = hdfs://localhost:9000
mapred.job.tracker = localhost:9001
input.path = / cosán / a / input / comhad /
output.path = / cosán / a / aschur / comhad
[/Code]

Cumraigh sealbhóir áit mhaoin a úsáideann chun gabháil luachanna maoine chumraíocht ó na hairíonna comhad a cruthaíodh. Is féidir é seo a dhéanamh tríd an méid seo a leanas in ár n-iarratas comhad comhthéacs XML -

[Code]
<comhthéacs:suíomh maoine-placeholder =”classpath:application.properties” />

[/Code]

Cumraigh Apache Hadoop agus a post - Is féidir linn a chumrú an gcóras comhad réamhshocraithe agus a lorgaire post tríd an méid seo a leanas in ár gcomhad gcomhthéacs iarratais

[Code]

<HDP:cumraíocht>
fs.default.name = ${fs.default.name}
mapred.job.tracker = ${mapred.job.tracker}
</HDP:cumraíocht>

[/Code]

Ba chóir dúinn a chur leis an méid seo a leanas in ár n-iarratas comhad comhthéacs XML a shainmhíniú ar an lorgaire poist -

[Code]
<HDP:id poist =”wordCountJobId”
ionchuir-chonair =”${input.path}”
aschur-chonair =”${output.path}”
jar-by-aicme =”net.qs.spring.data.apachehadoop.Main”
mapper =”net.qs.spring.data.apachehadoop.MyWordMapper”
reducer =”net.qs.spring.data.apachehadoop.MyWordReducer”/>

[/Code]

Cumraigh an dara háit poist a ritheann an post Hadoop cruthaíodh. Is féidir leis an dara háit Post a chumrú tríd an méid seo a leanas in ár n-iarratas comhad comhthéacs XML

[Code]
<HDP:post-rádala id =”wordCountJobRunner” job-ref=”wordCountJobId” run-at-startup=”fíor”/>
[/Code]

Step 5 – Loading the application context at startup – We can now execute the created Hadoop job by loading the application context when the application starts up. We can do this by creating the instance of the ClasspathXmlApplicationContext object which accepts the name of our application context file as input parameter to the constructor. This can be done as under –

Listing4: Sample showing loading of application context

[Code]
import org.springframework.context.ApplicationContext;
import org.springframework.context.support.ClassPathXmlApplicationContext;

public class Main {
statach neamhní príomh phoiblí(Teaghrán[] arguments) {
ApplicationContext ctx = new ClassPathXmlApplicationContext(“applicationContext.xml”);
}
}

[/Code]

Step 6 – Run the Mapreduce job – We can start our map reduce job using the following steps –

Uaslódáil comhad ionchur HDFS isteach - Is féidir linn é seo a dhéanamh forghníomhaitheach an ordú seo a leanas ar an ordú go pras -

[Code]

Hadoop DFS -curtha sample.txt /input/sample.txt

[/Code]

Tar comhad ionchur samplach a bhí in úsáid sa sampla seo. An focal eochair sprioc 'Hadoop ' Tá béim i GREEN. Tá an focal 'Hadoop ' ann 4 amanna sa sampla.

Ionchur

Image1: inchomhad Samplach

Seiceáil má tá an comhad a uaslódáil go rathúil ag reáchtáil an ordú seo a leanas. Beidh sé a thaispeáint ar an inchomhad.

[Cód]

DFS Hadoop -LS / ionchur

[/Cód]

Rith an post MapReduce. Is féidir seo a dhéanamh trí forghníomhaitheach an modh is mó d'ár chomhad java ón IDE. Má oibríonn na céimeanna ag súil leis ansin beidh na táirgí a leanas an t-aschur.

Output: Hadoop 4
Summary: Lig dúinn a thabhairt i gcrích an méid atá pléite againn go dtí seo sna urchair a leanas –

An dá Earraigh agus Hadoop are useful frameworks from the open source community.
By combining these we can get the benefit of both the frameworks.
Creating a map reduce job using spring is a six step process as explained above.

Share on Facebook

Save

Tagged on: Hadoop, Earraigh

TechAlpine – All About Technology

www.techalpine.com

What is Spring for Apache Hadoop?

Enjoy this blog? Please spread the word :)