从CSV文件中删除字母和符号-python 3.7 [英] Remove letters and signs from csv file - python 3.7

查看:476
本文介绍了从CSV文件中删除字母和符号-python 3.7的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我得到了一个名为activity的CSV文件,其数据如下:

I got a CSV file with a column named activity which has data like:

instv2-02_00001_20190517235008
instv2 (9)
Insti2(3)
Fbstt1_00001_20190517131933

我只需要从活动"列中的名称中删除数字和任何其他符号(例如:_). 这意味着只需要保留字母即可. 例如instv3-02_00001_20190517235157instv1-02_00000_20190517234840,instv1(4)...等都需要重命名/替换为instv.如何在Python脚本中执行此操作?

I need to remove numbers and any other signs (example: _) from the names in the 'activity' column only. That means need to keep just the letters. for example instv3-02_00001_20190517235157, instv1-02_00000_20190517234840, instv1 (4)...etc all need to be renamed/replaced as instv. How can I do this in a Python script?

推荐答案

使用pandas,加载CSV文件,并对activity列值应用正则表达式替换.

Using pandas, load the CSV file and apply a regex replacement on the activity column values.

尝试以下代码:

import re
import pandas as pd

df = pd.read_csv('your_file.csv')
df['activity'] = df['activity'].apply(lambda x: re.sub(r'^([a-zA-Z]+).*', r'\1', x))
df.to_csv('output.csv', index=False)

并且如果它与您的问题相关此处,则你只需要 导入re并将解决方案的最后一行更改为:

and if it is related to your question here, then you just need to import re and change the last line of the solution to be like:

import re

# ...

all_df['activity'] = all_df['activity'].apply(lambda x: re.sub(r'^([a-zA-Z]+).*', r'\1', x))
all_df.to_csv('all_data.csv', index=False)

这篇关于从CSV文件中删除字母和符号-python 3.7的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆