python将文件发送到作为服务运行的tika [英] python send file to tika running as a service

查看:158
本文介绍了python将文件发送到作为服务运行的tika的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对此问题的引用我想发送一个MS Word(.doc)文件到作为服务运行的tika应用程序中,我该怎么做?

Reference to this question I would like to send a MS Word (.doc) file to a tika application running as a service, how can I do this?

此链接用于运行tika: http://mimi.kaktusteam .de/blog-posts/2013/02/running-apache-tika-in-server-mode/

There is this link for running tika: http://mimi.kaktusteam.de/blog-posts/2013/02/running-apache-tika-in-server-mode/

但是对于要访问它的python代码,我不确定是否可以使用套接字或urllib或到底使用什么?

But for the python code to access it I am not sure if I can use sockets or urllib or what exactly?

推荐答案

要远程访问Tika,基本上可以使用两种方法.一个是 Tika JAXRS服务器,它提供了完整的RESTful接口.另一个是简单的 Tika-App-服务器模式,该模式仅适用于网络管道级别.

For remote access to Tika, there are basically two methods available. One is the Tika JAXRS Server, which provides a full RESTful interface. The other is the simple Tika-App --server mode, which just works at a network pipe level.

对于生产用途,您可能要使用Tika JAXRS服务器,因为它功能更全.为了进行简单的测试和入门,服务器模式下的Tika App应该很好

For production use, you'll probably want to use the Tika JAXRS server, as it's more fully featured. For simple testing and getting started, the Tika App in Server mode ought to be fine

对于后者,只需连接到正在运行Tika-App的端口,将其流传输到您的文档数据,然后再读回html.例如,在一个终端中运行

For the latter, just connect to the port that you're running the Tika-App on, stream it your document data, and read your html back. For example, in one terminal run

$ java -jar tika-app-1.3.jar --server --port 1234

然后在另一个地方做

$ nc 127.0.0.1 1234 < test.pdf

然后您将看到测试PDF返回的html

You'll then see the html returned of your test PDF

在python中,您只需要一个简单的套接字调用,就像netcat所做的一样,发送二进制数据,然后读回您的结果.例如,尝试以下操作:

From python, you just want a simple socket call much as netcat there is doing, send over the binary data, then read back your result. For example, try something like:

#!/usr/bin/python
import socket, sys

# Where to connect
host = '127.0.0.1'
port = 1234

if len(sys.argv) < 2:
  print "Must give filename"
  sys.exit(1)

filename = sys.argv[1]
print "Sending %s to Tika on port %d" % (filename, port)

# Connect to Tika
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host,port))

# Open the file to send
f = open(filename, 'rb')

# Stream the file to Tika
while True:
  chunk = f.read(65536)
  if not chunk:
    # EOF
    break
  s.sendall(chunk)

# Tell Tika we have sent everything
s.shutdown(socket.SHUT_WR)

# Get the response
while True:
  chunk = s.recv(65536)
  if not chunk:
    # EOF
    break
  print chunk

这篇关于python将文件发送到作为服务运行的tika的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆