使用Python时如何识别特殊的eol字符? [英] How to recognize special eol character when I see it, using Python?

查看：356 发布时间：2020/5/17 19:49:38 python file-io unicode newline

本文介绍了使用Python时如何识别特殊的eol字符?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Python抓取一组原始的pdf文件.让他们发短信后，我很难排成一行.我不知道什么是行分隔符.麻烦是，我仍然不知道.

I'm scraping a set of originally pdf files, using Python. Having gotten them to text, I had a lot of trouble getting the line endings out. I couldn't figure out what the line separator was. The trouble is, I still don't know.

它不是'\n'，也不是'\r\n'.但是，我设法隔离了这些特殊字符之一.我确实将其存储在内存中，并且通过调用my_str.replace(eol, '')，可以从一个文件中删除所有这些字符.

It's not a '\n', or, I don't think, '\r\n'. However, I've managed to isolate one of these special characters. I literally have it in memory, and by doing a call to my_str.replace(eol, ''), I can remove all of these characters from one of my files.

所以我的问题是开放性的.当涉及到unicode之类的时候，我有点迷失了.如何在我的文件中识别此字符而又无需进行一些荒谬的操作，例如将其序列化然后读入?也许有一种方法可以将其称为代码吗?我无法让Python产生它实际上是什么.我所看到的只是打印还是调用unicode(special_eol)都是换行符.

So my question is open-ended. I'm a bit lost when it comes to unicode and such. How can I identify this character in my files without resorting to something ridiculous, like serializing it and then reading it in? Is there a way I can refer to it as a code, perhaps? I can't get Python to yield what it actually IS. All I ever see if I print it, or call unicode(special_eol) is the character in its functional usage as a newline.

请帮助！谢谢，对不起，如果我错过了明显的内容.

Please help! Thanks, and sorry if I'm missing something obvious.

使用Python时如何识别特殊的eol字符? [英] How to recognize special eol character when I see it, using Python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用Python时如何识别特殊的eol字符? [英] How to recognize special eol character when I see it, using Python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭