python将文件发送到作为服务运行的tika [英] python send file to tika running as a service
问题描述
参考这个问题 我想发送一个 MSWord (.doc) 文件到作为服务运行的 tika 应用程序,我该怎么做?
Reference to this question I would like to send a MS Word (.doc) file to a tika application running as a service, how can I do this?
运行 tika 有这个链接:http://mimi.kaktusteam.de/blog-posts/2013/02/running-apache-tika-in-server-mode/
There is this link for running tika: http://mimi.kaktusteam.de/blog-posts/2013/02/running-apache-tika-in-server-mode/
但是对于访问它的 python 代码,我不确定我是否可以使用套接字或 urllib 或者究竟是什么?
But for the python code to access it I am not sure if I can use sockets or urllib or what exactly?
推荐答案
对于远程访问 Tika,基本上有两种方法可用.一个是 Tika JAXRS 服务器,它提供了一个完整的 RESTful 接口.另一种是简单的 Tika-App --server 模式,它仅适用于网络管道级别.
For remote access to Tika, there are basically two methods available. One is the Tika JAXRS Server, which provides a full RESTful interface. The other is the simple Tika-App --server mode, which just works at a network pipe level.
对于生产用途,您可能希望使用 Tika JAXRS 服务器,因为它的功能更全面.为了简单的测试和入门,服务器模式下的Tika App应该没问题
For production use, you'll probably want to use the Tika JAXRS server, as it's more fully featured. For simple testing and getting started, the Tika App in Server mode ought to be fine
对于后者,只需连接到您正在运行 Tika-App 的端口,将您的文档数据流式传输到它,然后读取您的 html.例如,在一个终端运行
For the latter, just connect to the port that you're running the Tika-App on, stream it your document data, and read your html back. For example, in one terminal run
$ java -jar tika-app-1.3.jar --server --port 1234
然后,在另一个,做
$ nc 127.0.0.1 1234 < test.pdf
然后您将看到测试 PDF 返回的 html
You'll then see the html returned of your test PDF
从 python 中,您只需要一个简单的套接字调用,就像 netcat 所做的那样,发送二进制数据,然后读回您的结果.例如,尝试以下操作:
From python, you just want a simple socket call much as netcat there is doing, send over the binary data, then read back your result. For example, try something like:
#!/usr/bin/python
import socket, sys
# Where to connect
host = '127.0.0.1'
port = 1234
if len(sys.argv) < 2:
print "Must give filename"
sys.exit(1)
filename = sys.argv[1]
print "Sending %s to Tika on port %d" % (filename, port)
# Connect to Tika
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host,port))
# Open the file to send
f = open(filename, 'rb')
# Stream the file to Tika
while True:
chunk = f.read(65536)
if not chunk:
# EOF
break
s.sendall(chunk)
# Tell Tika we have sent everything
s.shutdown(socket.SHUT_WR)
# Get the response
while True:
chunk = s.recv(65536)
if not chunk:
# EOF
break
print chunk
这篇关于python将文件发送到作为服务运行的tika的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!