重定向将pdf文件转换为txt文件的函数的输出到python中的新文件夹 [英] Redirect output of a function that converts pdf to txt files to a new folder in python

查看:108
本文介绍了重定向将pdf文件转换为txt文件的函数的输出到python中的新文件夹的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用python3.我的代码使用pdfminer将pdf转换为文本.我想在一个新文件夹中获取这些文件的输出.当前,它位于现有文件夹中,使用pdfminer从该文件夹转换为.txt.如何将输出重定向到其他文件夹.直到现在,我都希望将输出保存在名为"D:\ extracted_text"的文件夹中:

I am using python 3. My code uses pdfminer to convert pdf to text. I want to get the output of these files in a new folder. Currently it's coming in the existing folder from which it does the conversion to .txt using pdfminer. How do I redirect the output to a different folder. I want the output in a folder called "D:\extracted_text" Code till now:

from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
from io import StringIO
import glob
import os

def convert(fname, pages=None):
   if not pages:
       pagenums = set()
   else:
       pagenums = set(pages)

   output = StringIO()
   manager = PDFResourceManager()
   converter = TextConverter(manager, output, laparams=LAParams())
   interpreter = PDFPageInterpreter(manager, converter)

   infile = open(fname, 'rb')
   for page in PDFPage.get_pages(infile, pagenums):
       interpreter.process_page(page)
   infile.close()
   converter.close()
   text = output.getvalue()   
   output.close

   savepath = 'D:/extracted_text/'
   outfile = os.path.splitext(fname)[0] + '.txt'
   comp_name = os.path.join(savepath,outfile)
   print(outfile)
   with open(comp_name, 'w', encoding = 'utf-8') as pdf_file:
       pdf_file.write(text)

   return text    



directory = glob.glob(r'D:\files\*.pdf')  

for myfiles in directory:  
     convert(myfiles)

推荐答案

问题出在一行:

outfile = os.path.splitext(os.path.abspath(fname))[0] + '.txt'

如果打印输出文件,您将看到它包含文件的完整路径.替换为:

If you print out outfile, you'll see that it contains the full path of your file. Replace it with:

outfile = os.path.splitext(fname)[0] + '.txt'

这应该可以解决您的问题!请注意,如果'D:/extracted_text/'不存在,这将中断.因此,可以手动创建该目录,也可以使用os.makedir编程创建该目录.

This should solve your problem! Note that this will break if 'D:/extracted_text/' does not exist. So either create that directory manually or programmatically using os.makedir.

要将问题分解成较小的部分,请打开一个新文件并运行此代码段,看是否能解决问题,然后在原始代码中进行更改:

To break down the problem into smaller pieces, open a new file and run this snippet, see if it does the trick, then make the changes in the original code:

import os

fname = "some_file.pdf"
text = "Here's the extracted text"
savepath = 'D:/extracted_text/'
outfile = os.path.splitext(fname)[0] + '.txt'
print(outfile)
comp_name = os.path.join(savepath,outfile)
print(comp_name)

with open(comp_name, 'w', encoding = 'utf-8') as pdf_file:
    pdf_file.write(text)

这篇关于重定向将pdf文件转换为txt文件的函数的输出到python中的新文件夹的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆