如何删除数据框中的回车 [英] How to remove carriage return in a dataframe
问题描述
我有一个数据框,其中包含名为id,country_name,location和total_deaths的列.在执行数据清理过程时,我在附加了'\r'
的行中遇到了一个值.完成清理过程后,将结果数据帧存储在destination.csv文件中.由于上面的特定行已附加\r
,因此始终会创建一个新行.
I am having a dataframe that contains columns named id, country_name, location and total_deaths. While doing data cleaning process, I came across a value in a row that has '\r'
attached. Once I complete cleaning process, I store the resulting dataframe in destination.csv file. Since the above particular row has \r
attached, it always creates a new row.
id 29
location Uttar Pradesh\r
country_name India
total_deaths 20
我要删除\r
.我尝试了df.replace({'\r': ''}, regex=True)
.它对我不起作用.
I want to remove \r
. I tried df.replace({'\r': ''}, regex=True)
. It isn't working for me.
还有其他解决方案吗?有人可以帮忙吗?
Is there any other solution. Can somebody help?
在上面的过程中,我遍历df以查看是否存在\r
.如果存在,则需要更换.在这里row.replace()
或row.str.strip()
似乎没有用,或者我做错了方法.
In the above process, I am iterating over df to see if \r
is present. If present, then need to replace. Here row.replace()
or row.str.strip()
doesn't seem to be working or I could be doing it in a wrong way.
在使用replace()
时,我不想指定列名或行号.因为我不能确定只有'location'列会包含\r
.请在下面找到代码.
I don't want specify the column name or row number while using replace()
. Because I can't be certain that only 'location' column will be having \r
. Please find the code below.
count = 0
for row_index, row in df.iterrows():
if re.search(r"\\r", str(row)):
print type(row) #Return type is pandas.Series
row.replace({r'\\r': ''} , regex=True)
print row
count += 1
推荐答案
Another solution is use str.strip
:
df['29'] = df['29'].str.strip(r'\\r')
print df
id 29
0 location Uttar Pradesh
1 country_name India
2 total_deaths 20
如果要使用 replace
,添加r
和一个\
:
print df.replace({r'\\r': ''}, regex=True)
id 29
0 location Uttar Pradesh
1 country_name India
2 total_deaths 20
在replace
中,您可以定义要替换的列,例如:
In replace
you can define column for replacing like:
print df
id 29
0 location Uttar Pradesh\r
1 country_name India
2 total_deaths\r 20
print df.replace({'29': {r'\\r': ''}}, regex=True)
id 29
0 location Uttar Pradesh
1 country_name India
2 total_deaths\r 20
print df.replace({r'\\r': ''}, regex=True)
id 29
0 location Uttar Pradesh
1 country_name India
2 total_deaths 20
通过评论
import pandas as pd
df = pd.read_csv('data_source_test.csv')
print df
id country_name location total_deaths
0 1 India New Delhi 354
1 2 India Tamil Nadu 48
2 3 India Karnataka 0
3 4 India Andra Pradesh 32
4 5 India Assam 679
5 6 India Kerala 128
6 7 India Punjab 0
7 8 India Mumbai, Thane 1
8 9 India Uttar Pradesh\r\n 20
9 10 India Orissa 69
print df.replace({r'\r\n': ''}, regex=True)
id country_name location total_deaths
0 1 India New Delhi 354
1 2 India Tamil Nadu 48
2 3 India Karnataka 0
3 4 India Andra Pradesh 32
4 5 India Assam 679
5 6 India Kerala 128
6 7 India Punjab 0
7 8 India Mumbai, Thane 1
8 9 India Uttar Pradesh 20
9 10 India Orissa 69
如果仅需要在location
列中替换:
If need replace only in column location
:
df['location'] = df.location.str.replace(r'\r\n', '')
print df
id country_name location total_deaths
0 1 India New Delhi 354
1 2 India Tamil Nadu 48
2 3 India Karnataka 0
3 4 India Andra Pradesh 32
4 5 India Assam 679
5 6 India Kerala 128
6 7 India Punjab 0
7 8 India Mumbai, Thane 1
8 9 India Uttar Pradesh 20
9 10 India Orissa 69
这篇关于如何删除数据框中的回车的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!