斯坦福大学的coreNLP名称实体识别器抛出错误500服务器错误:URL的内部服务器错误 [英] Stanford's coreNLP Name Entity Recogniser throwing error 500 Server Error: Internal Server Error for url

查看:196
本文介绍了斯坦福大学的coreNLP名称实体识别器抛出错误500服务器错误:URL的内部服务器错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组文本文件.我正在使用斯坦福大学的coreNLP名称实体识别器来提取这些文件中提到患者姓名的行的详细信息.当我在单个句子上运行NER时,它可以正确打印结果,但是当我在一组文件上运行它时,它可以将结果打印出来并带有错误,并且由于以下原因,我无法将结果写在文本文件上这个:

I have a set of text files. I am using Stanford's coreNLP Name Entity Recogniser to extract details of the lines where patient name is mentioned out of those files. When I am running NER on a single sentence, it is printing results correctly but when I am running it on set of files, it is printing the results along with error and also I am not able to write the results on a text file because of this:

500 Server Error: Internal Server Error for url: http://localhost:9000/?properties=%7B%22outputFormat%22%3A+%22json%22%2C+%22annotators%22%3A+%22tokenize%2Cssplit%2Cner%22%2C+%22ssplit.isOneSentence%22%3A+%22true%22%7D

这是我正在使用的代码:

Here is the code which I am using:

import re
import os
from nltk.parse import CoreNLPParser
tagger = CoreNLPParser(url='http://localhost:9000', tagtype='ner')

def name_detail_extracter():    

    data_location="D:\Data" # folder containing all the data
    for root, dirs, files in os.walk(data_location):
    for filename in files:
        with open(os.path.join(root, filename), encoding="utf8",mode="r") as f:
            patient_name_check=re.compile(r".*\s+(patient name)\s*:*\s*(.*)",re.I)                
            for line_number, line in enumerate(f, 1):

                patient_name_matches=patient_name_check.findall(line)
                for match in patient_name_matches:

                    name_details=match[1]
                    tokens = name_details.split()
                    result=tagger.tag(tokens)
                    for m in result:
                        print(m)

name_detail_extracter()

推荐答案

该问题已解决,因为有一些空令牌传递给NER,所以现在我对它们进行了检查.

The issue has been resolved as there were some empty tokens getting passed to NER, so now I have put a check for them.

for match in patient_name_matches:
    name_details=match[1]
    tokens = name_details.split()
    if tokens: # this is the check which I put
        result=tagger.tag(tokens)
        for m in result:
            print(m)

这篇关于斯坦福大学的coreNLP名称实体识别器抛出错误500服务器错误:URL的内部服务器错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆