python打开文本文件,每个字符之间有一个空格 [英] python opens text file with a space between every character

查看:2071
本文介绍了python打开文本文件,每个字符之间有一个空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

每当我尝试用python命令打开一个.csv文件
fread = open('input.csv','r')
它总是打开文件,每个单个字符之间有空格。我猜这是错误的文本文件,因为我可以打开其他文本文件用相同的命令,他们被正确加载。有没有人知道为什么文本文件将加载像这样在python?

Whenever I try to open a .csv file with the python command fread = open('input.csv', 'r') it always opens the file with spaces between every single character. I'm guessing it's something wrong with the text file because I can open other text files with the same command and they are loaded correctly. Does anyone know why a text file would load like this in python?

谢谢。

Update

好吧,我在Jarret Hardie的帖子

Ok, I got it with the help of Jarret Hardie's post

我用来将文件转换为ascii

this is the code that I used to convert the file to ascii

fread = open('input.csv', 'rb').read()
mytext = fread.decode('utf-16')
mytext = mytext.encode('ascii', 'ignore')
fwrite = open('input-ascii.csv', 'wb')
fwrite.write(mytext)


推荐答案

递归的帖子可能是对的...文件的内容很可能用多字节字符集编码。如果这是,事实上,这种情况下,你可能可以读取python本身的文件,而不必先将其转换为python之外。

The post by recursive is probably right... the contents of the file are likely encoded with a multi-byte charset. If this is, in fact, the case you can likely read the file in python itself without having to convert it first outside of python.

尝试类似:

fread = open('input.csv', 'rb').read()
mytext = fread.decode('utf-16')

'b'标志确保文件被读取为二进制数据。你需要知道(或猜测)原始编码...在这个例子中,我使用了utf-16,但是YMMV。这将把文件转换为unicode。如果你真的有一个多字节字符的文件,我不建议将其转换为ascii,因为你可能会失去很多字符的过程中。

The 'b' flag ensures the file is read as binary data. You'll need to know (or guess) the original encoding... in this example, I've used utf-16, but YMMV. This will convert the file to unicode. If you truly have a file with multi-byte chars, I don't recommend converting it to ascii as you may end up losing a lot of the characters in the process.

编辑:感谢您上传文件。在文件的前面有两个字节,表明它确实,使用宽字符集。如果你好奇,打开文件在十六进制编辑器,因为有人建议...你会看到一些文本版本,如I.D. |。(等)。

Thanks for uploading the file. There are two bytes at the front of the file which indicates that it does, indeed, use a wide charset. If you're curious, open the file in a hex editor as some have suggested... you'll see something in the text version like 'I.D.|.' (etc). The dot is the extra byte for each char.

上面的代码片段似乎在我的机器上与该文件一起工作。

The code snippet above seems to work on my machine with that file.

这篇关于python打开文本文件,每个字符之间有一个空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆