python 如何在不重新下载的情况下将 tika 与现有的 jar 文件一起使用 [英] python how to use tika with existing jar file without downloading again
问题描述
我正在使用 Tika,我意识到每次下载 jar 文件并将其放置在 Temp 文件夹中
I'm using Tika and I realized that each time the jar file is downloaded and placed in Temp folder
Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.19/tika-server-1.19.jar to C:\Users\asus\AppData\Local\Temp\tika-server.jar.
Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.19/tika-server-1.19.jar.md5 to C:\Users\asus\AppData\Local\Temp\tika-server.jar.md5.
问题是jar文件大小在60MB左右,下载需要一些时间.
The problem is that the jar file size is around 60MB, which takes some time to download.
这是我正在使用的代码:
This is the code I'm using :
from tika import parser
def get_pdf_text(path):
parsed = parser.from_file(path):
return parsed['content']
我发现的唯一解决方法是:
The only workaround I found is this :
1 - 使用 java -jar tika-server-x.x.jar --port xxxx
2 - 使用 tika.TikaClientOnly = True
3 - 用 parser.from_file(path, '/path/to/server')
但我不想手动运行 jar 文件.要是能用Python自动运行jar文件,不用重新下载就可以设置tika就更好了.
But I don't want to run the jar file manually. It would be better if I can use Python to automatically run the jar file and setup tika with it without redownloading.
推荐答案
要解决此问题,您应该向 tika 服务器 jar 添加一个环境变量,并指定包含 tika jar 文件的路径文件夹.
To resolve this problem you should add an environment variable to the tika server jar and specify the path folder which contains the tika jar file.
TIKA_SERVER_JAR = 'PATH_OF_FOLDER_CONTAINING_TIKA_SERVER_JAR'.
TIKA_SERVER_JAR = 'PATH_OF_FOLDER_CONTAINING_TIKA_SERVER_JAR'.
这篇关于python 如何在不重新下载的情况下将 tika 与现有的 jar 文件一起使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!