从CSV文件中删除字母和符号-python 3.7 [英] Remove letters and signs from csv file - python 3.7
问题描述
我得到了一个名为activity
的CSV文件,其数据如下:
I got a CSV file with a column named activity
which has data like:
instv2-02_00001_20190517235008
instv2 (9)
Insti2(3)
Fbstt1_00001_20190517131933
我只需要从活动"列中的名称中删除数字和任何其他符号(例如:_).
这意味着只需要保留字母即可.
例如instv3-02_00001_20190517235157
,
instv1-02_00000_20190517234840
,instv1(4)...等都需要重命名/替换为instv.如何在Python脚本中执行此操作?
I need to remove numbers and any other signs (example: _) from the names in the 'activity' column only.
That means need to keep just the letters.
for example instv3-02_00001_20190517235157
,
instv1-02_00000_20190517234840
, instv1 (4)...etc all need to be renamed/replaced as instv. How can I do this in a Python script?
推荐答案
使用pandas
,加载CSV文件,并对activity
列值应用正则表达式替换.
Using pandas
, load the CSV file and apply a regex replacement on the activity
column values.
尝试以下代码:
import re
import pandas as pd
df = pd.read_csv('your_file.csv')
df['activity'] = df['activity'].apply(lambda x: re.sub(r'^([a-zA-Z]+).*', r'\1', x))
df.to_csv('output.csv', index=False)
并且如果它与您的问题相关此处,则你只需要 导入re并将解决方案的最后一行更改为:
and if it is related to your question here, then you just need to import re and change the last line of the solution to be like:
import re
# ...
all_df['activity'] = all_df['activity'].apply(lambda x: re.sub(r'^([a-zA-Z]+).*', r'\1', x))
all_df.to_csv('all_data.csv', index=False)
这篇关于从CSV文件中删除字母和符号-python 3.7的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!