无法在python中导入样板 [英] Trouble importing boilerpipe in python
问题描述
我正在使用python构建一个应用程序,其中涉及从RSS feed获取新闻文章.作为我项目的一部分,我决定使用样板程序,以便仅从出现文章的html页面中提取文章内容.
I'm building an application using python which involves getting news articles from RSS feeds. As part of my project, I have decided to use boilerpipe in order to extract just the article content from the html page on which the article appears.
尽管boilerpipe最初是为java编写的,但它也已移植到python.您可以在github上查看其页面: https://github.com/misja/python-boilerpipe
Although boilerpipe was originally written for java, it has been ported to python too. You can see its page on github here: https://github.com/misja/python-boilerpipe
问题是尝试使用以下命令导入时出现异常:
The problem is that I get an exception when trying to import it using:
from boilerpipe.extract import Extractor
我得到的错误是:
Traceback (most recent call last):
File "", line 1, in
File "build\bdist.win32\egg\boilerpipe\extract__init__.py", line 12, in
File "C:\Python26\lib\site-packages\jpype_jclass.py", line 54, in JClass
raise _RUNTIMEEXCEPTION.PYEXC("Class %s not found" % name)
jpype._jexception.ExceptionPyRaisable: java.lang.Exception: Class
de.l3s.boilerpipe.sax.HTMLHighlighter not found
什么可能导致此问题,该如何解决?
What might be causing this problem and how can I fix it?
推荐答案
这在Mac OS X 10.8.5和Python 2.7.9上对我有用.
This worked for me on Mac OS X 10.8.5 with Python 2.7.9.:
pip install JPype1 # to install https://pypi.python.org/pypi/JPype1
pip install charade
git clone https://github.com/misja/python-boilerpipe.git
cd python-boilerpipe
sudo python setup.py install
那么您应该可以在python控制台中完成
Then you should be able to do in the python console
>>> from boilerpipe.extract import Extractor
>>> extractor = Extractor(extractor='ArticleExtractor', url="http://en.wikipedia.org/wiki/Main_Page")
>>> print extractor.getText()
这篇关于无法在python中导入样板的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!