使用python从CSV文件中删除特殊字符 [英] Remove special characters from csv file using python

查看:1435
本文介绍了使用python从CSV文件中删除特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个主题上似乎已经有一些东西(

There seems to something on this topic already (How to replace all those Special Characters with white spaces in python?), but I can't figure this simple task out for the life of me.

我有一个75列,近4000行的.CSV文件.我需要将所有特殊字符"($#& * ect)替换为"_"并写入新文件.这是我到目前为止的内容:

I have a .CSV file with 75 columns and almost 4000 rows. I need to replace all the 'special characters' ($ # & * ect) with '_' and write to a new file. Here's what I have so far:

import csv

input = open('C:/Temp/Data.csv', 'rb')
lines = csv.reader(input)
output = open('C:/Temp/Data_out1.csv', 'wb')
writer = csv.writer(output)

conversion = '-"/.$'
text =  input.read()
newtext = '_'
for c in text:
    newtext += '_' if c in conversion else c
    writer.writerow(c)

input.close()
output.close()

所有这些成功完成的工作就是将所有内容作为一个单独的列写入输出文件,从而产生超过65K的行.此外,特殊字符仍然存在!

All this succeeds in doing is to write everything to the output file as a single column, producing over 65K rows. Additionally, the special characters are still present!

很抱歉,出现了多余的问题. 预先谢谢你!

Sorry for the redundant question. Thank you in advance!

推荐答案

我可能会做类似的事情

import csv

with open("special.csv", "rb") as infile, open("repaired.csv", "wb") as outfile:
    reader = csv.reader(infile)
    writer = csv.writer(outfile)
    conversion = set('_"/.$')
    for row in reader:
        newrow = [''.join('_' if c in conversion else c for c in entry) for entry in row]
        writer.writerow(newrow)

变成

$ cat special.csv
th$s,2.3/,will-be
fixed.,even.though,maybe
some,"shoul""dn't",be

(请注意,我有一个带引号的值)

(note that I have a quoted value) into

$ cat repaired.csv 
th_s,2_3_,will-be
fixed_,even_though,maybe
some,shoul_dn't,be


现在,您的代码将整个文本读入一行:


Right now, your code is reading in the entire text into one big line:

text =  input.read()

_字符开始:

newtext = '_'

遍历text中的每个单个字符:

Looping over every single character in text:

for c in text:

将已更正的字符添加到newtext(非常缓慢):

Add the corrected character to newtext (very slowly):

    newtext += '_' if c in conversion else c

然后将原始字符(?)作为一栏写入新的csv:

And then write the original character (?), as a column, to a new csv:

    writer.writerow(c)

..这不太可能是您想要的. :^)

.. which is unlikely to be what you want. :^)

这篇关于使用python从CSV文件中删除特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆