python将文件发送到作为服务运行的tika [英] python send file to tika running as a service
问题描述
对此问题的引用我想发送一个MS Word(.doc)文件到作为服务运行的tika应用程序中,我该怎么做?
Reference to this question I would like to send a MS Word (.doc) file to a tika application running as a service, how can I do this?
此链接用于运行tika: http://mimi.kaktusteam .de/blog-posts/2013/02/running-apache-tika-in-server-mode/
There is this link for running tika: http://mimi.kaktusteam.de/blog-posts/2013/02/running-apache-tika-in-server-mode/
但是对于要访问它的python代码,我不确定是否可以使用套接字或urllib或到底使用什么?
But for the python code to access it I am not sure if I can use sockets or urllib or what exactly?
推荐答案
要远程访问Tika,基本上可以使用两种方法.一个是 Tika JAXRS服务器,它提供了完整的RESTful接口.另一个是简单的 Tika-App-服务器模式,该模式仅适用于网络管道级别.
For remote access to Tika, there are basically two methods available. One is the Tika JAXRS Server, which provides a full RESTful interface. The other is the simple Tika-App --server mode, which just works at a network pipe level.
对于生产用途,您可能要使用Tika JAXRS服务器,因为它功能更全.为了进行简单的测试和入门,服务器模式下的Tika App应该很好
For production use, you'll probably want to use the Tika JAXRS server, as it's more fully featured. For simple testing and getting started, the Tika App in Server mode ought to be fine
对于后者,只需连接到正在运行Tika-App的端口,将其流传输到您的文档数据,然后再读回html.例如,在一个终端中运行
For the latter, just connect to the port that you're running the Tika-App on, stream it your document data, and read your html back. For example, in one terminal run
$ java -jar tika-app-1.3.jar --server --port 1234
然后在另一个地方做
$ nc 127.0.0.1 1234 < test.pdf
然后您将看到测试PDF返回的html
You'll then see the html returned of your test PDF
在python中,您只需要一个简单的套接字调用,就像netcat所做的一样,发送二进制数据,然后读回您的结果.例如,尝试以下操作:
From python, you just want a simple socket call much as netcat there is doing, send over the binary data, then read back your result. For example, try something like:
#!/usr/bin/python
import socket, sys
# Where to connect
host = '127.0.0.1'
port = 1234
if len(sys.argv) < 2:
print "Must give filename"
sys.exit(1)
filename = sys.argv[1]
print "Sending %s to Tika on port %d" % (filename, port)
# Connect to Tika
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host,port))
# Open the file to send
f = open(filename, 'rb')
# Stream the file to Tika
while True:
chunk = f.read(65536)
if not chunk:
# EOF
break
s.sendall(chunk)
# Tell Tika we have sent everything
s.shutdown(socket.SHUT_WR)
# Get the response
while True:
chunk = s.recv(65536)
if not chunk:
# EOF
break
print chunk
这篇关于python将文件发送到作为服务运行的tika的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!