Natural lus zauv nrog nab hab sej

NLP and Python

Kawm NLP thiab Python

Taw qhia:

Wb pib tsab xov xwm nrog ib lo lus nug yooj yim heev, "Ua li cas koj xav tias yog ntuj hais lus?"Feem coob ntawm koj yuav hais tias tej hom lus uas peb yuav sib txuas lus yooj yim thiab hassles dawb nrog lwm tus neeg uas yuav to taub tias lus. Tej hom lus sib txawv rau cov neeg yuav sib txawv xws li Hindi, Aaskiv, Bengali, lus mev, thiab Fabkis txoj etc. Tam sim no, yog peb tham txog interacting nrog cov cav tov ces arises Peb yuav sib txuas lus nyob rau hauv peb hom lus uas yooj yim nrog cov cav tov ib yam nkaus thiab? Yog lawm xwb tau hauv Sci-Fi tsos, Peb ua txhua hnub tam sim no, thiab nws twb tau ua los ntawm lub computer kev kawm tshuab hu ua Ua lus Natural (NLP).

Yeej, NLP yog tias teb ntawm lub computer kev kawm, artificial txoj kev ntse thiab kev sib txuas lus linguistic uas yog txhawj xeeb nrog Tib neeg-Machine kev sib tshuam. Nws ua rau cov cav to taub tib neeg hais lus li nws yog nes mas hais Ius. Hauv, Peb yuav hais tias NLP daim ntawv thov pub cov neeg siv los sib txuas lus nrog cov cav tov rau lawv tus kheej ntuj.







Wb saib tej thaj chaw uas peb siv NLP txhua txhua hnub.

Spamiltering Filtering: Qhov no yog ib qho tseem ceeb tshaj plaws ntawm NLP. Sawv daws xav siv emails xwb emails nyob hauv nws lub inbox ntawm cov 175,000 emails los txhua hnub. Tias yog vim li cas tus neeg zov me nyuam filters tus email los ntawm calculating nws lilihood uas yog spam raws li nws cov ntsiab lus. Nws yuav ua tau nrog kev pab Naïve raws li spam filtering.

Virtual Digital assistants: DVA (Cov pab pawg neeg virtual) Siv tshuab yog ib cov ntaub ntawv software siv los yog platform uas pab tib neeg los ntawm kev to taub tej lus. Tam sim no hnub, nrog rau kev pab virtual assistants li Google tam sim no, Kua Siri thiab Microsoft lub Cortana Peb yuav ua txhua yam li tau ib daim ntawv qhia, nrhiav ib lub tsev so, cov lus qhia kom mus zoo raws li ib qho chaw ua hauj lwm. tsuas hais lus.

Skype Translator: Skype translator siv NLP rau-tus-yoov txhais lus los txhais lus nyob rau lub sij hawm tiag hla ib tug xov tooj ntawm cov lus. Nws tseem yuav pab tau neeg sib txuas lus uas lawv zoo heev.

Google txhais lus: Google txhais siv txheeb cais kev txhais lus rau txhais lus nrog to taub lub ntsiab ntawm kab lus. Ib tug kawm algorithm ua rau nws kawm thiab txhim khu cov txhais lus thaum twg ib tug neeg siv kev pab ua ib tug neeg siv kev txhais lus.

Xov xwm txheej rau cov websites kev: Rau qhov no peb yuav muab piv txwv ntawm Facebook xov xwm noj uas koj yuav pom cov adsoring rau koj tus paj laum. Yeej, Xov xwm noj algorithms to taub peb siv NLP thiab pub peb lub website phab website uas hais txog Ads.







Python rau NLP kev siv:

Muaj ntau yam qhib NLP qhib lub neej zoo ib yam li tej hom lus Natural Toolkit (NTXAIJ), Apache OpenNLP, Stanford NLP suite, Rooj vag NLP NLP etc. kev siv NLP. Ntawm lawv cov, NTXAIJ, sau ntawv rau hauv Python, tsev qiv ntawv nyiam heev. Nyob rau cov tshooj no, Peb yeej yuav siv NLTK rau tuav txoj kev sib tw ntawm NLP. Ua ntej yuav siv, peb yuav tsum tau nruab nrab, tseem ceeb thiab download.

Ntsia NTSW:

NLTK yuav tau ntsia nrog kev pab kav dej hais kom ua raws li cov lus qhia:

kais dej ntim

Uas tsis yog qhov ntawd, Peb yuav ncaj qha mus ncaj nraim ntawm qhov txuas: https://pypi.python.org/pypi/nltk

Importing NLTK:

Importing NLTK yeej siv los mus xyuas seb nws puas nkag tau los yog tsis. Peb yuav sau cov nram qab no hais txog peb python davhlau ya nyob twg los import NLTK:

>>> import

Tom qab khiav cov hais kom ua, Yog hais tias tsis muaj yuam kev los ntawm python ces nws txhais tau tias NLTK lub tsev qiv ntawv yog ntse ntsia.

Downloading NLTK pob khoom:

Tam sim no, tom qab importing NLTK, Peb yuav tsum download NLTK pob. Peb yuav sau cov nram qab no hais rau peb python davhlau ya nyob twg mus download NLTK pob:

>>> txee.download()

Tom qab khiav cov hais kom ua, Peb yuav tau txais NLTK downloader raws li muaj nyob rau hauv cov duab hauv qab no:

NLTK Downloader

Duab 1 : NLTK Downloader

Nyem rau lub download khawm thiab peb yuav tau pib nrog NLTK pob khoom tsocai thaum nws yuav tsis coj npaum li cas los nruab nrab cov pob me me. Nws pom zoo tias peb yuav tsum download tag nrho cov pob khoom vim peb yuav tau tag nrho cov tokenizers, chunkers, tag nrho cov corpora thiab lwm yam algorithms ib yam nkaus thiab.

Txoj kev sib tw tseem ceeb nyob rau hauv NLP & Lawv cov tshuaj siv NLTK:

Ua hauj lwm nrog NLP daim ntawv thov yuav nyiam heev tiam sis tib lub sijhawm sib tw. Cov kev sib tw yooj yim tshaj plaws yog yuav tau qauv zoo cov ntaub ntawv. Hnub no tus scenario, Peb muaj kwv yees li ntawm 80% Cov ntaub ntawv nyob rau hauv daim ntawv unstructured uas yog tau generated continuously los ntawm cov kev ua ub no xws li ncej rau kev tawm, neeg txheeb ze, Moo, etc blogs. Yuav luag txhua yam ntawm peb lub web yog yuav ua tau tej ntaub ntawv, feem ntau unstructured.

Kev Koom Tes: Nws yog txoj kev splitting lus & kab lus, Qhov chaw me feem ntau hu ua tokens ntawm cov ntawv nyeem. Peb muaj ob hom ntawm tokenization npe lo lus tokenizers thiab kab lus tokenizers. Raws li qhia lub npe, Lo lus tokenizer splits cov ntawv ua lus thiab kab lus tokenizer splits cov ntawv ua sentences. NLTK muab ob hom kev ua ub no, word_tokenize() thiab sent_tokenize() rau lo lus thiab kab lus tokenizers feem.

Qhia 1: Chaws rau cov lus splitting lus thiab kab lus los ntawm cov ntawv nyeem los ntawm kev siv word_tokenize() thiab sent_tokenize() feem:

[chaws]

import

ntawm nltk.tokenize import word_tokenize

Sample_text = "Kuv lub npe yog Ram. Kuv yog ib tug neeg kawm ntawv."

luam(word_tokenize(sample_text))

[/chaws]

Tso zis: ['Kuv', 'npe', 'yog', 'Ram', '.', 'Kuv', 'am', 'a', 'Tub ntxhais kawm ntawv', '.']

[chaws]

import

ntawm nltk.tokenize import sent_tokenize

Sample_text = "Kuv lub npe yog Ram. Kuv yog ib tug neeg kawm ntawv."

luam(sent_tokenize(sample_text))

[/chaws]

Tso zis: ['Kuv lub npe hu ua Ram.', 'Kuv yog ib tug me nyuam kawm ntawv.']

stemming: Yuav luag txhua hom lus yog inflected i.e.e.. Nws yog coj ntau variations nyob rau hauv cov lus. Os, Lub tswvyim ntawm variation txhais tau hais tias yuav muaj ntau hom ntawv qub li qub xws li piv txwv, 'nuv ntses', 'nuv ntses' thiab 'nuv ntses'. Qhov no yog lwm txoj kev sib tw rau NLP kom computer to taub tias cov lus no muaj ib yam puag hais 'ntses' rau cov lus saum toj no hais tias 'ntses' rau cov lus saum toj no hais lus.

Qhov teeb meem saum toj no yuav tau solved los stemming cov kev pab cuam uas yog hu ua stemming algorithms los yog stemmers. Stemming yeej yog tus txheej txheem uas extracting puag los yog hauv paus lus. Julie Beth Lovins sau tus thawj stemmer nyob rau hauv 1968, uas muaj pawg influence thaj chaw ua hauj lwm.

NLTK muab peb hom stemmers lub npe PorterStemmer, LancasterStemmer thiab SnowballStemmer txheej txheem stemming. Tag nrho peb stemmers siv txawv algorithms thiab coj txawv qib uas nruj. Thov cov stemmers rau tib lo lus, Tej zaum peb yuav tau cov ntaub ntawv puag nram qab no.

Qhia 2: Chaws rau extracting puag lo lus uas siv PorterStemmer, LancasterStemmer thiab SnowballStemmer feem:

[Chaws]

import

ntawm nltk.stem chaw nres nkoj PorterStemmer

stemmer = PorterStemmer()

lo lus = "sau ntawv"

luam(kav paj huam(lo lus))

[/Chaws]

Tso zis:sau ntawv

[Chaws]

import

ntawm nltk.stem import LancasterStemmer

stemmer = LancasterStemmer()

lo lus = "sau ntawv"

luam(kav paj huam(lo lus))

[/Chaws]

Tso zis:sau ntawv

[Chaws]

import

ntawm nltk.stem chaw nres nkoj Snowballmer

stemmer = SnowballStemmer(lus Askiv")

lo lus = "sau ntawv"

luam(kav paj huam(lo lus))

[/Chaws]

Tso zis:sau ntawv Thaum uas siv SnowballStemmer, peb yuav tsum tau mus hais lus rau lus. Nyob rau hauv cov piv txwv uas peb tau siv "Lus Askiv" raws li cov lus cam. Peb kuj yuav xyuas tau cov lus txawv ntawm SnowballStemmer.

Qhia 3: Code kom tau qhov txawv ntawm SnowballStemmer:

[Chaws]

import

ntawm nltk.tokenize import Snowballmer

luam(” “.sib sau(Snowballmer.hom lus))

[/Chaws]

Tso zis:danish dutch hmo finnish finnish german hungarian italian norwegian norwegian porter romanian Lavxias teb sab spanish swedish







Lemmatization: Lemmatization kuj yuav muab peb lub hauv paus los yog hauv paus ntawm cov lus ua stemming. Lub lemmatization txheej txheem siv lub suab thiab morphological tsom xam cov lus nws muab cov lus ua tso zis ua zis uas yog in contrast to stemming txoj kev uas feem ntau tsim cov lus non-existent. Daim ntawv uas ua los ntawm lemmatization yog hu ua lemmatization. NLTK muab WordNetmatizer txoj kev lemmatizing.

Qhia 4: Chaws rau extracting puag cov lus uas siv WordNetLemmatizer:

[Chaws]

import

ntawm nltk.stem chaw nres nkoj WordNetmatizer

lemmatizer = WordNetLemmatizer()

lo lus = "zoo dua"

luam(lemmatizer.stem(lo lus))

[/Chaws]

Tso zis:Zoo

Nws yuav siv qhov ntawm hais lus parameter 'poss' kuj. Yog peb yuav tsis muab tej parameter ces los ntawm default cov lo lus puag yuav tsis muaj.

Qhia 5: Chaws rau extracting puag cov lus uas siv WordNetLemNetmatizer nrog pos parameter:

[Chaws]

import

ntawm nltk.stem chaw nres nkoj WordNetmatizer

lemmatizer = WordNetLemmatizer()

lo lus = "zoo dua"

luam(lemmatizer.lemmatize(lo lus,lauj kaub huam kaub puab”ib”))

[/Chaws]

Tso zis:

Zoo

Koj yuav saib tau qhov txawv ntawm cov tso zis thaum peb siv WordNetmatizer tsis muaj parameter nyob rau hauv Daim Ntawv Teev 4 thiab teev 5 feem.

Chunking: Nws yog tus txheej txheem ntawm hais lus thiab luv li noun cov lus nyob rau hauv cov ntawv nyeem. Yog xav paub ntxiv meej tshaj, chunking cov tokens uas yog tsim los ntawm txoj kev ntawm tokenization. Peb muaj ob hom kev chunking namely pheej yim, tus kwv yuav tau mus ntawm ntau hom lus thiab hais lus tau ntau dhau, thiab pheej txo, tus kwv yuav tau mus ntawm ntau hom lus thiab lus tau ntau dua penetrated.

Qhia 6: Code for implementing lub qhov ntswg chunking:

[Chaws]

import

ntawv nyeem text=[(“ib”,”DT”),(“zoo nkauj”,”Txiav txim plaub ntug”),(“Poj niam”,”XYAUM”),(“yog”,”VBP”),(“hla kev”,”VBP”),(“tus”,”DT”),(“Kev”,”XYAUM”)]

piam thaj av ntawd = “NPUAB:{<DT>?<Txiav txim plaub ntug>*<XYAUM>}”

parser_chunking=nlkaus. RegexParser(poj niam)

parser_chunking.parse(ntawv nyeem)

Output_chunk parser_chunking parse.(ntawv nyeem)

Output_chunk... kos()

[/Chaws]

Tso zis:Ntoo('S', [Ntoo('NP', [('a', 'DT'), ('zoo nkauj', 'JJ'), ('poj niam', 'NN')]), ('yog', 'VBP'), ('hla', 'VBP'), Ntoo('NP', [('', 'DT'), ('Kev', 'NN')])])

Output of noun phrase chunking

Duab 2: Tso zis tsis muaj kab lus chunking

Lus xaus: Nyob rau cov tshooj no, Kuv tau muab ib nyuag qhia rau Natural Language Procession (NLP) nrog rau tej thaj chaw tseem ceeb uas nws siv tau. Kuv kuj sim piav seb peb yuav siv NLTK pob ntawm Python rau beating kev siv txoj kev sib tw txoj kev sib tw los ntawm peb thaum ua hauj lwm nrog NLP. Yog hais tias koj tuaj hla tej teeb meem thaum uas ua hauj lwm hauv NLP nrog Python, los yog koj muaj tej tswv yim / feedback thov koj dawb rau ncej lawv nyob rau hauv qab no.

You may like to read : How deep learning is helping the industry to grow?
Another interesting article for you : Artificial Intelligence and Product development







Sau Bio: Gaurav Leekha yog ib tug freelance freelance txheej txheem kws ntawv muaj 7+ kev qhia ntawv. Nws tej chaw tseemceeb uas nyiam yog AI, Cov kev kawm tshuab, Kev kawm sib sib zog nqus, IB XYOOS TWG, Hais lus thiab Python programming. Nws yog tseem muab rau neeg noj raws li tus soj ntsuam ntawm ntau lub teb chaws thiab raws li thoob ntiaj teb journals nrog International Journal ntawm hais lus (KUV XYAUM), caij nplooj ntoos hlav

Tagged: ,
============================================= ============================================== Yuav zoo TechAlpine phau ntawv rau Amazon
============================================== ---------------------------------------------------------------- electrician ct chestnutelectric
error

Txaus siab rau qhov blog? Tshaj tawm lus thov :)

Follow by Email
LinkedIn
LinkedIn
Share