获取TypeError:ord()预期的字符串长度为1,但int发现错误 [英] Getting TypeError: ord() expected string of length 1, but int found error
问题描述
代码为
from PyPDF2 import PdfFileReader
with open('HTTP_Book.pdf','rb') as file:
pdf=PdfFileReader(file)
pagedd=pdf.getPage(0)
print(pagedd.extractText())
此代码引发如下所示的错误:
This code raises the error shown below:
TypeError: ord() expected string of length 1, but int found
我在互联网上搜索后发现此故障排除"TypeError:ord()预期的长度为1的字符串,但找到整数" 但这并没有太大帮助.我知道此错误的背景是什么,但不确定在这里与它有什么关系?
I searched on internet and found this Troubleshooting "TypeError: ord() expected string of length 1, but int found" but it doesn't help much. I am aware of what is the background of this error but not sure how is it related here?
试图更改pdf文件,但效果很好.那是什么问题:pdf文件或PyPDF2无法处理?我知道根据文档这种方法不太可靠:
Tried changing the pdf file and it works fine. Then what is wrong: pdf file or PyPDF2 is not able to handle it? I know this method is not much reliable as per documentation:
这对于某些PDF文件效果很好,但对于其他PDF文件效果不佳,具体取决于所使用的生成器
This works well for some PDF files, but poorly for others, depending on the generator used
应如何处理?
跟踪:
Traceback (most recent call last):
File "pdf_reader.py", line 71, in <module>
print(pagedd.extractText())
File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\pdf.py", line 2595, in ex
tractText
content = ContentStream(content, self.pdf)
File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\pdf.py", line 2673, in __
init__
stream = BytesIO(b_(stream.getData()))
File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\generic.py", line 841, in
getData
decoded._data = filters.decodeStreamData(self)
File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\filters.py", line 350, in
decodeStreamData
data = LZWDecode.decode(data, stream.get("/DecodeParms"))
File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\filters.py", line 255, in
decode
return LZWDecode.decoder(data).decode()
File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\filters.py", line 228, in
decode
cW = self.nextCode();
File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\filters.py", line 205, in
nextCode
nextbits=ord(self.data[self.bytepos])
TypeError: ord() expected string of length 1, but int found
推荐答案
我遇到了问题.这只是PyPDF2的局限性.我使用了tika和BeautifulSoup来解析和提取文本,效果很好.尽管它只需要做更多的工作.
I got the issue. This is just a limitation of PyPDF2. I used tika and BeautifulSoup to parse and extract the text, it worked fine. Although it needs little more work.
from tika import parser
from bs4 import BeautifulSoup
raw=parser.from_file('HTTP_Book.pdf',xmlContent=True)['content']
data=BeautifulSoup(raw,'lxml')
message=data.find(class_='page') # for first page
print(message.text)
这篇关于获取TypeError:ord()预期的字符串长度为1,但int发现错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!