Nrhiav los sau daim ntawv qhia kom kev siv moj khaum Hadoop thiab java

MapReduce

MapReduce

Txheej txheem cej luam: Ntawv loj yog ib paradigm tshiab thiab nws ua yog qhov chaw tseem ceeb tshaj plaws rau koj mloog zoo. Loj cov ntaub ntawv hais txog qhov loj ntim ntawm cov ntaub ntawv uas yog los ntawm ntau qhov xws li cov kev tawm, cov ntaub ntawv sensor, mobile ntaub ntawv, cov ntaub ntawv hauv internet thiab ntau ntxiv. Nyob rau cov tshooj no peb yuav mloog zoo nyob ua ib sab Hadoop moj khaum thiab daim ntawv qhia-txo programming. Daim ntawv qhia kom tau muab txhais tias yog ib yam tshwj xeeb txog moj khaum programming siv ntaub ntawv tus nqi loj loj ntawm cov ntaub ntawv hauv lub moj khaum distributed hu ua kho vajtse hauv. Thiaj li tsab xov xwm yuav piav qhov txo daim ntawv qhia tej ntsiab lus thiab qhov yuav siv tswv yim siv Java chaws.

Taw qhia: Hadoop MapReduce yuav tau txhais tias yog ib tug software programming moj khaum siv ntaub ntawv loj ntim ntawm cov ntaub ntawv (terabyte raws) nyob rau thaum uas tig mus ib puas ncig o clustered. Cov pawg muaj nqaij txhiab tus ntshav uas kho vajtse hauv. AUTHORIZED faib, txhim khu kev qha thiab tiv thaiv txhaum. Raug MapReduce hauj lwm yam raws li cov kauj ruam nram qab no

1) Phua cov ntaub ntawv rau cov neeg sab nraud los ntawm tus yuam sij khub chunks. Qhov no yog ua los ntawm kev ua hauj lwm daim ntawv qhia yam thaum uas tig mus.

2) Cov ua muab daim ntawv qhia txoj hauj lwm yog sorted based rau cov tseem ceeb hauv qhov tseem ceeb.

3) Qhov zis sorted yog lub tswv yim kom txo tej hauj lwm. Thiab ces nws ua zaum kawg tso zis rau AUTHORIZED thiab rov cov tshwm sim no rau tus neeg.

Ceeb toom: Radon hauv yog cov yeej yig computer systems

Moj khaum MapReduce: Apache Hadoop MapReduce moj khaum sau yog Java. Lub moj khaum muaj tswv-qhev configuration. Tus tswv tsis raug raws li JobTracker thiab tus qhev no hu ua TaskTrackers.The npaj ntawm lwm yam uas cov neeg ua hauj lwm tiav rau tus qhev (uas yog tsis muaj dab tsi tab sis lub o nyob rau hauv ib pawg). Cov le caag yog ua los ntawm cov qhev. Yog li cov laij thiab storages o yog ib yam nyob rau hauv ib cheeb tsam clustered. Yog lub tswvyim ‘ tshem cov le caag kom cov ntshav uas cov ntaub ntawv yog muab ', thiab muaj tus ua zoo ceev.

MapReduce ua: Lub MapReduce moj khaum qauv yog pom pom cus. Li ntawd, cov nqi kho vajtse yog tsis piv rau lwm yam kev coj. Tab sis tib lub sijhawm peb yuav tsum to taub tias cov qauv ua haujlwm nraaj tsuas nyob ib cheeb tsam distributed li AUTHORIZED yog ua los ntawm cov ntshav uas cov ntaub ntawv nyob lub npe. Cov lwm yam ntxwv li scalability, kam rau ua kev cia siab rau thiab txhaum xwb kuj zoo rau ib puag ncig distributed.

MapReduce yuav pib siv ntawd: Tam sim no peb yuav tham txog qhov yuav ua raws li lub hom MapReduce Java programming platform. Nram qab no yog lub Cheebtsam txawv ntawm qhov kev xaus rau thaum xaus tas siv.

  • Tus qhov kev pab cuam nyob hauv kev pab uas yog cov hoob kawm tsav tsheb thiab initiates txoj kev
  • Tus Daim ntawv qhia muaj nuj nqi uas tej lub hom phiaj lub split siv yuam tus khub.
  • Tus Kom tsis txhob muaj nuj nqi uas aggregate daim phiaj processed thiab xa cov zis rov qab rau tus neeg.

Hoob kawm tsav tsheb: Nram no yog ib tug kawm tsav tsheb uas binds cov duab kos thiab txo tej nuj nqi thiab pib AUTHORIZED. Qhov no yog cov kev pab uas initiates txoj kev.

Listing1: Qhov kev pab cuam nyob hauv kev pab (hoob kawm tsav tsheb) tuaj ntawm

[chaws]

pob com.mapreduce.techalpine;

ntshuam org.apache.commons.logging.Log;

ntshuam org.apache.commons.logging.LogFactory;

ntshuam org.apache.hadoop.conf.Configuration;

ntshuam org.apache.hadoop.fs.Path;

ntshuam org.apache.hadoop.io.Text;

ntshuam org.apache.hadoop.mapreduce.Job;

ntshuam org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

ntshuam org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

ntshuam org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

ntshuam org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

/**

* cov npawg kaushik @author

*

* Qhov no yog tus tsav tsheb loj hoob kawm los pib lub map-reduce

* txheej txheem. Nws teev tom qab av rau txoj kev thiab

* Mam li pib nws.

*/

tsev kawm DevXDriver {

pej xeem tsis muaj dabtsis loj zoo li qub(Txoj hlua[] args) throws kos {

// Pib configuration

Configuration configx = Configuration tshiab();

// Ntxiv cov chaw muab kev qhia ntaub ntawv

configx.addResource(Kev tshiab(“/user/hadoop/Core-site.xml”));

configx.addResource(Kev tshiab(“/user/hadoop/hdfs-site.xml”));

// Ua tau ib txoj hauj lwm MapReduce

Txoj hauj lwm devxmapjob = hauj lwm tshiab(configx,”DevXDriver.class”);

devxmapjob.setJarByClass(DevXDriver.class);

devxmapjob.setJobName(“Txoj hauj lwm MapReduce DevX”);

// Koj muab cov zis kay thiab tus nqi kawm ntawv

devxmapjob.setOutputKeyClass(Text.class);

devxmapjob.setOutputValueClass(Text.class);

// Koj muab daim ntawv qhia kawm

devxmapjob.setMapperClass(DevXMap.class);

// Koj muab cov hoob kawm Combiner

devxmapjob.setCombinerClass(DevXReducer.class);

// Koj muab cov hoob kawm Reducer

devxmapjob.setReducerClass(DevXReducer.class);

// Koj muab daim ntawv qhia cov zis qhov tseem ceeb thiab tus nqi kawm

devxmapjob.setMapOutputKeyClass(Text.class);

devxmapjob.setMapOutputValueClass(Text.class);

// Koj muab cov reducer paub tab

devxmapjob.setNumReduceTasks(10);

// Koj muab tswv yim thiab tso zis ntau hom kawm

devxmapjob.setInputFormatClass(TextInputFormat.class);

devxmapjob.setOutputFormatClass(TextOutputFormat.class);

// Koj muab cov tswv yim thiab kev tso zis

FileInputFormat.addInputPath(devxmapjob, Kev tshiab(“/neeg/map_reduce/tswv yim ntawv /”));

FileOutputFormat.setOutputPath(devxmapjob,Kev tshiab(“/neeg/map_reduce/zis”));

// MapReduce pib hauj lwm

devxmapjob.waitForCompletion(tseeb);

}

}

[/chaws]

Daim ntawv qhia muaj nuj nqi: Qhov no nws yog lub luag hauj lwm rau splitting cov ntaub ntawv los ntawm tus yuam sij khub. Thaum zoo li ntawm cov ntaub ntawv kuas no.

Listing2: Qhov no yog ib daim ntawv qhia ua splitting cov ntaub ntawv rau hauv chunks

[chaws]

pob com.mapreduce.techalpine;

ntshuam java.io.BufferedReader;

ntshuam java.io.InputStreamReader;

ntshuam java.net.URI;

ntshuam java.util.StringTokenizer;

ntshuam org.apache.commons.logging.Log;

ntshuam org.apache.commons.logging.LogFactory;

ntshuam org.apache.hadoop.conf.Configuration;

ntshuam org.apache.hadoop.fs.FileSystem;

ntshuam org.apache.hadoop.fs.Path;

ntshuam org.apache.hadoop.io.LongWritable;

ntshuam org.apache.hadoop.io.Text;

ntshuam org.apache.hadoop.mapreduce.Mapper;

/**

* cov npawg kaushik @author

*

* Qhov no yog daim ntawv qhia kev. Nws ua li cov kuas rau lo lus tseem ceeb tus khub.

*/

pej xeem hoob DevXMap extends Mapper<LongWritable, Ntawv nyeem, Ntawv nyeem,Ntawv nyeem> {

// Tsim kev, BufferedReader thiab ntawv zog

Kev file_path;

BufferedReader buffer_reader;

Phau ntawv tweet_values = ntawv tshiab();

/**

* qhov tseem ceeb @param

* tus nqi @param

* ntsiab lus teb @param

*/

pej xeem tsis muaj dabtsis daim ntawv qhia(Qhov tseem ceeb LongWritable, Ntawv luam tus nqi, Ntsiab lus teb cov ntsiab lus teb) {

sim{

// Ua configuration rau daim ntawv qhia

Configuration map_config = Configuration tshiab();

// Thauj khoom Hadoop ntxhais ntaub ntawv nyob rau hauv configuration

map_config.addResource(Kev tshiab(“/user/hadoop/Core-site.xml”));

map_config.addResource(Kev tshiab(“/user/hadoop/hdfs-site.xml”));

// Ua kom muaj zog

Kaj searchkeyword = “”;

// Qhib cov ntaub ntawv los ntawm ib cov ntaub ntawv txog kev

file_path = kev tshiab(“files/repository/keys.txt”);

FileSystem file_system = FileSystem.get(URI.create(“files/repository/keys.txt”),Configuration tshiab());

// Thauj khoom tsis nyeem ntawv

buffer_reader = BufferedReader tshiab(InputStreamReader tshiab(file_system.Open(file_path)));

Thaum(buffer_reader.Ready())

{

searchkeyword=buffer_reader.readLine().tshib();

}

// Muab tus nqi tseem ceeb

Ntawv kawg key_value = ntawv tshiab(searchkeyword);

// Xyuas seb tus nqi thiab muab kev txiav txim siab

Yog hais tias(tus nqi == thov)

{

rov qab mus;

}

ntxiv{

StringTokenizer string_tokens = StringTokenizer tshiab(value.toString(),”,”);

rau cov menyuam suav = 0;

Thaum(string_tokens.hasMoreTokens()) {

suav ++;

Yog hais tias(suav <= 1)

ntxiv mus;

Kaj new_tweet_value = string_tokens.nextToken().toLowerCase().tshib().replaceAll(“\\*”,””);

Yog hais tias(new_tweet_value.contains(searchkeyword.toLowerCase().tshib())) {

tweet_values.set(new_tweet_value);

context.Write(key_value,tweet_values);

}

}

}

}

ntes(Kos e){

e.printStackTrace();

}

}

}

[/chaws]

Kom tsis txhob muaj nuj nqi: Qhov no nws yog lub luag hauj lwm rau aggregating cov ntaub ntawv. Tus aggregation no ua based rau cov tseem ceeb hauv qhov tseem ceeb. Ces tom qab ntxuav thiab sorting cov aggregation kom tiav thiab xa cov no rov qab mus rau qhov kev pab cuam nyob hauv kev pab.

Listing3: Cov nuj nqi txo tej aggregates cov ntaub ntawv processed

[chaws]

pob com.mapreduce.techalpine;

ntshuam java.io.BufferedReader;

ntshuam java.io.IOException;

ntshuam java.io.InputStreamReader;

ntshuam java.net.URI;

ntshuam java.util.RandomAccess;

ntshuam java.util.regex.Matcher;

ntshuam java.util.regex.Pattern;

ntshuam org.apache.commons.logging.Log;

ntshuam org.apache.commons.logging.LogFactory;

ntshuam org.apache.hadoop.conf.Configuration;

ntshuam org.apache.hadoop.fs.FSDataOutputStream;

ntshuam org.apache.hadoop.fs.FileSystem;

ntshuam org.apache.hadoop.fs.Path;

ntshuam org.apache.hadoop.io.Text;

ntshuam org.apache.hadoop.mapreduce.Reducer;

/**

* cov npawg kaushik @author

*

* Qhov no yog qhov muaj nuj nqi reducer. Aggregates cov zis raws li nws tus

* sorting ntawm tus yuam sij officers.

*/

pej xeem hoob DevXReducer extends Reducer<Ntawv nyeem ,Ntawv nyeem,Ntawv nyeem,Ntawv nyeem>

{

// Sau tus nqi rau cov ntaub ntawv txog kev

Kev positive_file_path;

Kev negative_file_path;

Kev output_file_path;

Kev keyword_file_path;

// Sau tsiaj ntawv kom tsis

BufferedReader positive_buff_reader;

BufferedReader negative_buff_reader;

BufferedReader keyword_buff_reader;

// Sau tsiaj ntawv kom muab xam

Coj zoo li qub total_record_count = tshiab muab ob npaug rau(“0”);

Coj zoo li qub count_neg = tshiab muab ob npaug rau(“0”);

Coj zoo li qub count_pos = tshiab muab ob npaug rau(“0”);

Coj zoo li qub count_neu = tshiab muab ob npaug rau(“0”);

Coj zoo li qub percent_neg = tshiab muab ob npaug rau(“0”);

Coj zoo li qub percent_pos = tshiab muab ob npaug rau(“0”);

Coj zoo li qub percent_neu = tshiab muab ob npaug rau(“0”);

Txawv pattrn_matcher;

Matcher matcher_txt;

zoo li qub rau cov menyuam new_row = 0;

FSDataOutputStream out_1st,out_2nd;

/**

* qhov tseem ceeb @param

* qhov tseem ceeb @param

* ntsiab lus teb @param

* @throws IOException

* @throws InterruptedException

*/

kom pej xeem tsis muaj dabtsis(Cov ntawv nyeem, Iterable<Ntawv nyeem> qhov tseem ceeb,Ntsiab lus teb cov ntsiab lus teb) throws IOException, InterruptedException

{

// Ua configuration rau reducer

Configuration reduce_config = Configuration tshiab();

// Thauj khoom ntaub ntawv config hadoop

reduce_config.addResource(Kev tshiab(“/user/hadoop/Core-site.xml”));

reduce_config.addResource(Kev tshiab(“/user/hadoop/hdfs-site.xml”));

// Ua kom muaj zog

Kaj key_word = “”;

Kaj check_keyword = key_word;

keyword_file_path = kev tshiab(“files/repository/keys.txt”);

FileSystem file_system_read = FileSystem.get(URI.create(“files/repository/keys.txt”),Configuration tshiab());

keyword_buff_reader = BufferedReader tshiab(InputStreamReader tshiab(file_system_read.Open(keyword_file_path)));

FileSystem get_filesys = FileSystem.get(reduce_config);

FileSystem get_filesys_posneg = FileSystem.get(reduce_config);

Kev path_output = kev tshiab(“/user/sentiment_output_file.txt”);

Kev path_output_posneg = kev tshiab(“/user/posneg_output_file.txt”);

// Tau lo lus tseem ceeb

Thaum(keyword_buff_reader.Ready())

{

key_word=keyword_buff_reader.readLine().tshib();

}

// Tshawb xyuas cov ntaub ntawv kaw lus

Yog hais tias (!get_filesys.exists(path_output)) {

out_1st = get_filesys.create(path_output);

out_2nd = get_filesys_posneg.create(path_output_posneg);

}

// Saib lo lus tseem ceeb txuam siv positive thiab negative dictionaries

Yog hais tias(check_keyword.equals(key.toString().toLowerCase()))

{

rau(Phau ntawv new_tweets:qhov tseem ceeb)

{

// Phau ntawv txhais lus zoo lus load

positive_file_path = kev tshiab(“/user/map_reduce/pos_words.txt”);

FileSystem filesystem_one = FileSystem.get(URI.create(“files/pos_words.txt”),Configuration tshiab());

positive_buff_reader = BufferedReader tshiab(InputStreamReader tshiab(filesystem_one.Open(positive_file_path)));

// Thauj khoom tsis zoo lo lus disctinary

negative_file_path = kev tshiab(“/user/map_reduce/neg_words.txt”);

FileSystem filesystem_two = FileSystem.get(URI.create(“files/neg_words.txt”),Configuration tshiab());

negative_buff_reader = BufferedReader tshiab(InputStreamReader tshiab(filesystem_two.Open(negative_file_path)));

++total_record_count;

boolean first_flag = cuav;

boolean second_flag = cuav;

Txoj hlua all_tweets=new_tweets.toString();

Kaj first_regex = “”;

Kaj second_regex = “”;

Thaum(positive_buff_reader.Ready())

{

first_regex=positive_buff_reader.readLine().tshib();

new_row ;

pattrn_matcher = Pattern.compile(first_regex, Pattern.CASE_INSENSITIVE);

matcher_txt = pattrn_matcher.matcher(all_tweets);

first_flag=matcher_txt.find();

 

Yog hais tias(first_flag)

{

out_2nd.writeBytes(all_tweets);

context.Write(cov ntawv tshiab(first_regex),cov ntawv tshiab(all_tweets));

so;

}

}

Thaum(negative_buff_reader.Ready())

{

new_row ;

second_regex=negative_buff_reader.readLine().tshib();

pattrn_matcher = Pattern.compile(second_regex, Pattern.CASE_INSENSITIVE);

matcher_txt = pattrn_matcher.matcher(all_tweets);

second_flag=matcher_txt.find();

Yog hais tias(second_flag)

{

out_2nd.writeBytes(all_tweets);

context.Write(cov ntawv tshiab(second_regex),cov ntawv tshiab(all_tweets));

so;

}

}

Yog hais tias(first_flag&second_flag)

{

++count_neu;

}

ntxiv

{

Yog hais tias(first_flag)

{

++count_pos;

}

Yog hais tias(second_flag)

{

++count_neg;

}

Yog hais tias(first_flag == cuav&second_flag == cuav)

{

++count_neu;

}

}

// Zoo buffers

negative_buff_reader.close();

positive_buff_reader.close();

}

// Laij cov feem pua ntawm qhov tseem ceeb

percent_pos = count_pos/total_record_count * 100;

percent_neg = count_neg/total_record_count * 100;

percent_neu = count_neu/total_record_count * 100;

sim{

// Cov ntaub ntawv sau ntawv

out_1st.writeBytes(“\n”+key_word);

out_1st.writeBytes(“,”+total_record_count);

out_1st.writeBytes(“,”+percent_neg);

out_1st.writeBytes(“,”+percent_pos);

out_1st.writeBytes(“,”+percent_neu);

// Tej ntaub ntawv kaw lub

out_1st.close();

get_filesys.close();

}ntes(Kos e){

e.printStackTrace();

}

}

}

}

[/chaws]

Xaus: Nyob rau cov tshooj no kuv muaj tham txog cov ua siv Java MapReduce programming ib puag ncig. Tus nyias tivthaiv li daim ntawv qhia thiab txo tej kev ua ua hauj lwm tseem ceeb thiab rov cov zis rau tus neeg. Cov ua tej lub hom phiaj nraaj rau distributed chaw xwb. Li ntawd, peb yuav tsum teem lub moj khaum Apache Hadoop rau ntawm ib cheeb tsam distributed tau qhov zoo tshaj no.

Vam tias koj muaj enjoyed tsab xov xwm thiab koj yuav tau siv nyob rau hauv koj cov tswv yim programming. Nyeem tiag mus.

Tagged: ,
============================================= ============================================== Yuav zoo TechAlpine phau ntawv rau Amazon
============================================== ---------------------------------------------------------------- electrician ct chestnutelectric
error

Txaus siab rau qhov blog? Tshaj tawm lus thov :)

Follow by Email
LinkedIn
LinkedIn
Share