python如何将tika与现有的jar文件一起使用而无需再次下载 [英] python how to use tika with existing jar file without downloading again

查看:248
本文介绍了python如何将tika与现有的jar文件一起使用而无需再次下载的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Tika,我意识到每次下载jar文件并将其放置在Temp文件夹中

I'm using Tika and I realized that each time the jar file is downloaded and placed in Temp folder

Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.19/tika-server-1.19.jar to C:\Users\asus\AppData\Local\Temp\tika-server.jar.
Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.19/tika-server-1.19.jar.md5 to C:\Users\asus\AppData\Local\Temp\tika-server.jar.md5.

问题在于jar文件的大小约为60MB,这需要一些时间才能下载.

The problem is that the jar file size is around 60MB, which takes some time to download.

这是我正在使用的代码:

This is the code I'm using :

from tika import parser

def get_pdf_text(path):
    parsed = parser.from_file(path):
    return parsed['content']

我发现的唯一解决方法是:

The only workaround I found is this :

1-使用java -jar tika-server-x.x.jar --port xxxx

2-使用tika.TikaClientOnly = True

3-用parser.from_file(path, '/path/to/server')

但是我不想手动运行jar文件.如果我能使用Python自动运行jar文件并用它设置tika而不重新下载,那就更好了.

But I don't want to run the jar file manually. It would be better if I can use Python to automatically run the jar file and setup tika with it without redownloading.

推荐答案

要解决此问题,应将环境变量添加到tika服务器jar中,并指定包含tika jar文件的路径文件夹.

To resolve this problem you should add an environment variable to the tika server jar and specify the path folder which contains the tika jar file.

TIKA_SERVER_JAR ='PATH_OF_FOLDER_CONTAINING_TIKA_SERVER_JAR'.

TIKA_SERVER_JAR = 'PATH_OF_FOLDER_CONTAINING_TIKA_SERVER_JAR'.

这篇关于python如何将tika与现有的jar文件一起使用而无需再次下载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆