导入词网和停用词时出现chaquopy错误 [英] chaquopy error in import of wordnet and stopwords

查看:60
本文介绍了导入词网和停用词时出现chaquopy错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.corpus import wordnet as wn
#from itertools import product

#variables that will be used

database_word_synset=[]
uploaded_sentence_synset=[]
uploaded_sentence_words_tokenized=[]
filtered_uploaded_sentences = []
database_sentence_words_tokenized=[]
filtered_database_sentence=[]
database_sentence_synset=[]

word_check=[0.0]
sentence_check=[0.0]
count_sentence=0
count_word=0
not_fond=0

#the given data

uploaded_sentence=" The issue of text semantics, such as word semantics and sentence semantics has received increasing attentions in recent years. However, rare research focuses on the document-level semantic matching due to its complexity. Long documents usually have sophisticated structure and massive information, which causes hardship to measure their semantic similarity. The semantic similarity between words, sentences, texts, and documents is widely studied in various fields, including natural language processing, document semantic comparison, artificial intelligence, semantic web, and semantic search engines. "
database_word=["car","complete",'run',"sleep"]
database_sentence="the earth is round not flat"

stopwords = stopwords.words('english')
uploaded_sentence_words_tokenized = word_tokenize(uploaded_sentence)

#filtering the sentence and synset

for word in uploaded_sentence_words_tokenized:
    if word not in stopwords:      
        filtered_uploaded_sentences.append(word)
print (filtered_uploaded_sentences)

for sentences_are in filtered_uploaded_sentences:
    uploaded_sentence_synset.append(wn.synsets(sentences_are))
    
print(uploaded_sentence_synset)

#for finding similrity in the words

for databasewords in database_word:
    database_word_synset.append(wn.synsets(databasewords))
    
print(database_word_synset)



words_list_synset=list()
for t in database_word_synset: 
    for x in t: 
        words_list_synset.append(x)

print(words_list_synset)




#removing empty list element and making single dimension list

removing_empty_list_uploaded_sentence=list()
removing_empty_list_uploaded_sentence = [x for x in uploaded_sentence_synset if x != []]

up_list_sentence=list()
for t in removing_empty_list_uploaded_sentence: 
    for x in t: 
        up_list_sentence.append(x)

print(up_list_sentence)

#the similarity main function for words
#sims=[]
#for sense1, sense2 in product(database_word_synset, up_list_sentence):
#    d = wn.wup_similarity(sense1, sense2)
#    sims.append(d)
#print (sims)
#word_found=list()
for data in words_list_synset:
    for sen in up_list_sentence :
        if wn.wup_similarity(data,sen) is None or wn.wup_similarity(data,sen) <0.70:
            not_fond=not_fond+1
        else:
            count_word=count_word+1


print (word_check)
print("\n words that are not found :",not_fond)
print("\n words that are found :", count_word)
#for finding similrity in the sentence

database_sentence_words_tokenized=word_tokenize(database_sentence)

for word in database_sentence_words_tokenized:
    if word not in stopwords:
        filtered_database_sentence.append(word)
print(filtered_database_sentence)

for sentence_synset in filtered_database_sentence:
    database_sentence_synset.append(wn.synsets(sentence_synset))
print(database_sentence_synset)

#removing empty list element and making single dimension list

removing_empty_list_db=list()
removing_empty_list_db = [x for x in database_sentence_synset if x != []]

db_list_sentence=list()
for t in removing_empty_list_db: 
    for x in t: 
        db_list_sentence.append(x)

print(db_list_sentence)

#the similarity main function for sentence

for db_sentence in db_list_sentence:
   for upl_sentence in up_list_sentence:
       sentence_check.append(wn.wup_similarity(db_sentence,upl_sentence))
           
for sentence_checks in sentence_check:
   if sentence_checks is None or sentence_checks <0.70:
      not_fond=not_fond+1
   else:
       count_sentence=count_sentence+1   
       
print (sentence_check)
print("\n words that are not found :",not_fond)
print("\n words that are found :",count_sentence)

在构建文件android studio中安装库:

Installing of libraries in build file android studio:

在此项目中,我们使用chaquopy在我们的android项目中使用python,但是它存在一些问题,例如在导入库时,我还分别安装了Nltk,wordnet,停止单词和单词标记化,但是我无法访问这些库python文件,如果我们安装我们的应用程序,它将崩溃.

In this project we use chaquopy to use python in our android project but it have some issues like in importing of libraries i have install Nltk, wordnet, stopping words and word tokenization seperately also but i am not able to access these libaries in python file and if we install our app it crashes.

 if (! Python.isStarted()) {
           Python.start(new AndroidPlatform(this));
           Python py = Python.getInstance();
           final PyObject pyobj = py.getModule("sum");


           b2.setOnClickListener(new View.OnClickListener() {
               @Override
               public void onClick(View view) {
                   if (path==null) {
                       Toast.makeText(documentupload.this, " plz upload the doc", Toast.LENGTH_SHORT).show();
                       //upload.setText(path);


                       // Intent intent= new Intent(documentupload.this,result.class);
                       //startActivity(intent);
                   }
                   else {
                       PyObject obj = pyobj.callAttr("main", Words.toString());
                       upload.setText(obj.toString());
                       Toast.makeText(documentupload.this, "uploaded" + Words, Toast.LENGTH_LONG).show();
                      // Toast.makeText(documentupload.this, " plz upload the doc", Toast.LENGTH_LONG).show();
                   }
               }
           });

应用崩溃时,会显示以下错误消息:

When app crash it give this error message:

推荐答案

我假设调用 wn.synsets 时发生崩溃?这是我看到的堆栈跟踪:

I assume the crash happened when calling wn.synsets? Here's the stack trace I saw:

  File "/data/user/0/com.chaquo.python.pkgtest3/files/chaquopy/AssetFinder/requirements/nltk/corpus/util.py", line 120, in __getattr__
  File "/data/user/0/com.chaquo.python.pkgtest3/files/chaquopy/AssetFinder/requirements/nltk/corpus/util.py", line 85, in __load
  File "/data/user/0/com.chaquo.python.pkgtest3/files/chaquopy/AssetFinder/requirements/nltk/corpus/util.py", line 80, in __load
  File "/data/user/0/com.chaquo.python.pkgtest3/files/chaquopy/AssetFinder/requirements/nltk/data.py", line 585, in find
LookupError: 
**********************************************************************
  Resource [93mwordnet[0m not found.

我不认为"wordnet"和语料库"pip软件包与nltk有关.相反,您应该使用 nltk.download 来安装它们,就像错误消息中所说的那样.

I don't think the "wordnet" and "corpus" pip packages have anything to do with nltk. Instead, you should install them using nltk.download, just as the error message says.

由于存在模拟器错误,您可能需要循环调用 nltk.download ,如此说明中所述答案.

Because of an emulator bug, you may need to call nltk.download in a loop, as described in this answer.

这篇关于导入词网和停用词时出现chaquopy错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆