使用 pdfplumber 和 Paramiko 从 SFTP 服务器读取 PDF 文件 [英] Use pdfplumber and Paramiko to read a PDF file from an SFTP server
问题描述
我直接连接到 SFTP 服务器 - 连接没有任何问题,我可以显示所选目录中的文件,没有任何重大问题.服务器上有不同的文件,我有几个函数可以读取它们,下面是一段关于 .pdf
文件的代码 - 我使用 pdfplumber
读取 PDF 文件:
I have a direct connection to an SFTP server – the connection works without any problem and I can display files from the selected directory without any major problem. There are different files on the server, I have several functions to read them and below here is a piece of code concerning .pdf
files – I use pdfplumber
to read PDF files:
# SSH.connect configuration
sftp = ssh.open_sftp()
path = "/server_path/.."
for filename in sftp.listdir(path):
fullpath = path + "/" + filename
if filename.endswith('.pdf'):
#fullpath - full server path with filename - like /server_path/../file.pdf
#filename - filename without path - like file.pdf
with sftp.open(fullpath, 'rb') as fl:
pdf = pdfplumber.open(fl)
在这个 for
循环中,我想读取所选目录中的所有 .pdf
文件 - 它在本地主机上对我来说没有任何问题.
in this for
loop I want to read all the .pdf
files in the chosen directory - and it works for me on the localhost without any problem.
我试图以这种方式解决它with sftp.open(path, 'rb') as fl:
- 但在这种情况下,这个解决方案不起作用,出现这样的错误代码:
I tried to solve it this way with sftp.open(path, 'rb') as fl:
- but in this case this solution doesn't work and such an error code appears:
Traceback (most recent call last):
pdf = pdfplumber.open(fl)
return cls(open(path, "rb"), **kwargs)
TypeError: expected str, bytes or os.PathLike object, not SFTPFile
pdfplumber.open
将文件的确切路径及其名称作为参数 - 在本例中为 fullpath.如何解决这个问题,使其直接从服务器工作?在这种情况下如何管理内存 - 因为我知道这些文件以某种方式被拉入内存.请给我一些提示.
pdfplumber.open
takes as an argument the exact path to the file with its name – in this case fullpath. How can I solve this problem so that it works directly from the server? How to manage the memory in this case – because I understand that these files are somehow pulled into memory. Please give me some hints.
推荐答案
Paramiko SFTPClient.open
返回一个类似文件的对象.
Paramiko SFTPClient.open
returns a file-like object.
要使用带有pftplumber
的类文件对象,似乎可以使用加载
函数:
To use a file-like object with pftplumber
, it seems that you can use load
function:
pdf = pdfplumber.load(fl)
您还想阅读以下内容:
读取使用 Python Paramiko SFTPClient.open 方法打开的文件很慢
由于 Paramiko 类文件对象在与 pftplumber.load
函数结合使用时似乎效果不佳,作为一种解决方法,您可以将文件下载到内存中:
As the Paramiko file-like object seems to work suboptimal when combined with pftplumber.load
function, as a workaround, you can download the file to memory instead:
flo = BytesIO()
sftp.getfo(fullpath, flo)
flo.seek(0)
pdfplumber.load(flo)
参见如何使用Paramiko getfo从SFTP服务器下载文件到内存进行处理
这篇关于使用 pdfplumber 和 Paramiko 从 SFTP 服务器读取 PDF 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!