使用spaCy NLP的简单Flask应用程序间歇性挂起 [英] Simple Flask app using spaCy NLP hangs intermittently

查看:88
本文介绍了使用spaCy NLP的简单Flask应用程序间歇性挂起的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个简单的Flask应用程序,该应用程序最终将变成一个简单的REST API,以便使用spaCy对给定的文本字符串进行命名实体识别.我有一个简单的原型,如下所示:

I'm working on a simple Flask app that will eventually turn into a simple REST API for doing named entity recognition using spaCy on a given text string. I have a simple prototype as follows:

from flask import Flask, render_template, request, json
import spacy
from spacy import displacy

def to_json(doc):
        return [
                {
                'start': ent.start_char,
                'end': ent.end_char,
                'type': ent.label_,
                'text': str(ent),
                } for ent in doc.ents
                ]

nlp = spacy.load('en')

app = Flask(__name__)

@app.route('/')
def index():
        return render_template('index.html')

@app.route('/demo', methods=['GET', 'POST'])
def demo():
        q = request.values.get('text')
        doc = nlp(q)

        if request.values.get('type') == 'html':
                return displacy.render(doc, style='ent', page=True)
        else:
                return app.response_class(
                                response=json.dumps(to_json(doc), indent=4),
                                status=200,
                                mimetype='text/string'
                                )

if __name__ == '__main__':
     app.run(host='0.0.0.0')

Flask应用程序是使用Ubuntu上的Apache网络服务器提供的.我使用简单的Web表单向应用程序提交文本,它以HTML或JSON文本形式返回结果.

The Flask app is served using an Apache webserver on Ubuntu. I submit text to the app using a simple web form and it returns results as either HTML or JSON text.

我遇到的问题是该应用程序间歇性挂起...我无法弄清楚导致其挂起的模式.Apache错误日志中未显示任何内容,并且挂起的请求未显示在Apache访问日志中.如果在浏览器旋转时杀死服务器,浏览器会报告服务器提供了空响应.如果我重新启动服务器,错误日志将报告在SIGTERM之后没有退出1或2个子进程,并且必须发送SIGKILL.

The problem I am having is that the app hangs intermittently...I can't figure out a pattern of what causes it to hang. Nothing shows up in the Apache error log, and the request that hangs does not appear in the Apache access log. If I kill the server while the browser is spinning, the browser reports that the server provided an empty response. If I restart the server, the error log reports that 1 or 2 child processes don't exit after a SIGTERM, and a SIGKILL has to be sent.

一个可能的线索是服务器启动时错误日志报告以下内容:

One possible clue is that the error log reports the following when the server starts up:

[Wed Dec 06 20:19:33.753041 2017] [wsgi:warn] [pid 1822:tid 140029812619136] mod_wsgi: Compiled for Python/2.7.11.
[Wed Dec 06 20:19:33.753055 2017] [wsgi:warn] [pid 1822:tid 140029812619136] mod_wsgi: Runtime using Python/2.7.12.

另一个可能的线索是索引"路由(/)似乎从未挂起.但是"/demo"路由可以挂在 request.values.get('type')=='html' if 语句的两个分支上.

Another possible clue is that the "index" route (/) never seems to hang. But the "/demo" route can hang for both branches of the request.values.get('type') == 'html' if statement.

我已经将Apache和mod_wsgi带出了循环,现在正在使用独立的Flask服务器运行该应用程序.该应用程序仍然偶尔会挂起...当它挂起时,我可以按Ctrl-c并始终返回以下内容作为最新代码:

I've taken Apache and mod_wsgi out of the loop, and am now running the app using the standalone Flask server. The app still hangs occasionally...when it does, I can press control-c and it consistently returns the following as the most recent code:

Exception happened during processing of request from ('xxx.xxx.xxx.xxx', 55608)
Traceback (most recent call last):
  File "/usr/lib/python2.7/SocketServer.py", line 290, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib/python2.7/SocketServer.py", line 318, in process_request
    self.finish_request(request, client_address)
  File "/usr/lib/python2.7/SocketServer.py", line 331, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python2.7/SocketServer.py", line 652, in __init__
    self.handle()
  File "/usr/local/lib/python2.7/dist-packages/werkzeug/serving.py", line 232, in handle
    rv = BaseHTTPRequestHandler.handle(self)
  File "/usr/lib/python2.7/BaseHTTPServer.py", line 340, in handle
    self.handle_one_request()
  File "/usr/local/lib/python2.7/dist-packages/werkzeug/serving.py", line 263, in handle_one_request
    self.raw_requestline = self.rfile.readline()
  File "/usr/lib/python2.7/socket.py", line 451, in readline
    data = self._sock.recv(self._rbufsize)
KeyboardInterrupt
----------------------------------------

按下Ctrl-c后,Flask被释放",然后返回我期望的结果.服务器将继续正常运行,并会接受更多请求,直到再次挂起.如果我等待足够长的时间,有时挂起的请求会自行返回.

After pressing control-c, Flask gets "released" and then returns the result I expect. The server continues on as normal and will accept more requests until it hangs again. Sometimes a hung request will come back on its own if I wait long enough.

这似乎越来越像Flask的问题(或我的使用方式).如果有人可以提供有关如何解决问题的建议,我将不胜感激!

This seems more and more like it's a problem with Flask (or how I'm using it). If anyone can provide advice on how to track down the problem, I would appreciate it!

推荐答案

这似乎是Spacy v2.0中的一个已知问题.我降级为Spacy v1.9之后,问题就消失了.

This appears to be a known issue in Spacy v2.0. The issue went away after I downgraded to Spacy v1.9.

有关更多详细信息,请参见:

For more details, see:

https://github.com/explosion/spaCy/issues/1571

https://github.com/explosion/spaCy/issues/1572

这篇关于使用spaCy NLP的简单Flask应用程序间歇性挂起的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆