Python从文件中读取并删除非ascii字符 [英] Python read from file and remove non-ascii characters

查看：1452 发布时间：2016/11/19 16:43:08 python encoding character-encoding utf

本文介绍了Python从文件中读取并删除非ascii字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下程序，逐字读取文件，并将该字再次写入另一个文件，但没有第一个文件中的非ASCII字符。

I have the following program that reads a file word by word and writes the word again to another file but without the non-ascii characters from the first file.

import unicodedata
import codecs
infile = codecs.open('d.txt','r',encoding='utf-8',errors='ignore')
outfile = codecs.open('d_parsed.txt','w',encoding='utf-8',errors='ignore')


for line in infile.readlines():
    for word in line.split():
        outfile.write(word+" ")
    outfile.write("\n")

infile.close()
outfile.close()

我面临的是，使用这个代码，它不打印一个新行到第二个文件（d_parsed）。任何线索

The only problem that I am facing is that with this code it does not print a new line to the second file (d_parsed). Any clues??

推荐答案

codecs.open（）支持通用换行符例如，它不会在Windows上阅读时将\r\\\翻译为\\\ 。

codecs.open() doesn't support universal newlines e.g., it doesn't translate \r\n to \n while reading on Windows.

改用 io.open（）：

#!/usr/bin/env python
from __future__ import print_function
import io

with io.open('d.txt','r',encoding='utf-8',errors='ignore') as infile, \
     io.open('d_parsed.txt','w',encoding='ascii',errors='ignore') as outfile:
    for line in infile:
        print(*line.split(), file=outfile)

btw，如果要删除非ascii字符，应该使用 ascii ，而不是 utf-8 。

btw, if you want to remove non-ascii characters, you should use ascii instead of utf-8.

如果输入编码与ascii兼容（例如utf-8），那么您可以以二进制模式打开该文件，并使用 bytes.translate （）删除非ascii字符：

If the input encoding is compatible with ascii (such as utf-8) then you could open the file in binary mode and use bytes.translate() to remove non-ascii characters:

#!/usr/bin/env python
nonascii = bytearray(range(0x80, 0x100))
with open('d.txt','rb') as infile, open('d_parsed.txt','wb') as outfile:
    for line in infile: # b'\n'-separated lines (Linux, OSX, Windows)
        outfile.write(line.translate(None, nonascii))

它不像第一个代码示例那样规范化空格。

It doesn't normalize whitespace like the first code example.

这篇关于Python从文件中读取并删除非ascii字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python从文件中读取并删除非ascii字符 [英] Python read from file and remove non-ascii characters

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python从文件中读取并删除非ascii字符 [英] Python read from file and remove non-ascii characters

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭