如何删除数据框中的回车 [英] How to remove carriage return in a dataframe

查看:87
本文介绍了如何删除数据框中的回车的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中包含名为id,country_name,location和total_deaths的列.在执行数据清理过程时,我在附加了'\r'的行中遇到了一个值.完成清理过程后,将结果数据帧存储在destination.csv文件中.由于上面的特定行已附加\r,因此始终会创建一个新行.

I am having a dataframe that contains columns named id, country_name, location and total_deaths. While doing data cleaning process, I came across a value in a row that has '\r' attached. Once I complete cleaning process, I store the resulting dataframe in destination.csv file. Since the above particular row has \r attached, it always creates a new row.

id                               29
location            Uttar Pradesh\r
country_name                  India
total_deaths                     20

我要删除\r.我尝试了df.replace({'\r': ''}, regex=True).它对我不起作用.

I want to remove \r. I tried df.replace({'\r': ''}, regex=True). It isn't working for me.

还有其他解决方案吗?有人可以帮忙吗?

Is there any other solution. Can somebody help?

在上面的过程中,我遍历df以查看是否存在\r.如果存在,则需要更换.在这里row.replace()row.str.strip()似乎没有用,或者我做错了方法.

In the above process, I am iterating over df to see if \r is present. If present, then need to replace. Here row.replace() or row.str.strip() doesn't seem to be working or I could be doing it in a wrong way.

在使用replace()时,我不想指定列名或行号.因为我不能确定只有'location'列会包含\r.请在下面找到代码.

I don't want specify the column name or row number while using replace(). Because I can't be certain that only 'location' column will be having \r. Please find the code below.

count = 0
for row_index, row in df.iterrows():
    if re.search(r"\\r", str(row)):
        print type(row)               #Return type is pandas.Series
        row.replace({r'\\r': ''} , regex=True)
        print row
        count += 1

推荐答案

另一个解决方案是使用

Another solution is use str.strip:

df['29'] = df['29'].str.strip(r'\\r')
print df
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

如果要使用 replace ,添加r和一个\:

print df.replace({r'\\r': ''}, regex=True)
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

replace中,您可以定义要替换的列,例如:

In replace you can define column for replacing like:

print df
               id               29
0        location  Uttar Pradesh\r
1    country_name            India
2  total_deaths\r               20

print df.replace({'29': {r'\\r': ''}}, regex=True)
               id             29
0        location  Uttar Pradesh
1    country_name          India
2  total_deaths\r             20

print df.replace({r'\\r': ''}, regex=True)
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

通过评论

import pandas as pd

df = pd.read_csv('data_source_test.csv')
print df
   id country_name           location  total_deaths
0   1        India          New Delhi           354
1   2        India         Tamil Nadu            48
2   3        India          Karnataka             0
3   4        India      Andra Pradesh            32
4   5        India              Assam           679
5   6        India             Kerala           128
6   7        India             Punjab             0
7   8        India      Mumbai, Thane             1
8   9        India  Uttar Pradesh\r\n            20
9  10        India             Orissa            69

print df.replace({r'\r\n': ''}, regex=True)
   id country_name       location  total_deaths
0   1        India      New Delhi           354
1   2        India     Tamil Nadu            48
2   3        India      Karnataka             0
3   4        India  Andra Pradesh            32
4   5        India          Assam           679
5   6        India         Kerala           128
6   7        India         Punjab             0
7   8        India  Mumbai, Thane             1
8   9        India  Uttar Pradesh            20
9  10        India         Orissa            69

如果仅需要在location列中替换:

If need replace only in column location:

df['location'] = df.location.str.replace(r'\r\n', '')
print df
   id country_name       location  total_deaths
0   1        India      New Delhi           354
1   2        India     Tamil Nadu            48
2   3        India      Karnataka             0
3   4        India  Andra Pradesh            32
4   5        India          Assam           679
5   6        India         Kerala           128
6   7        India         Punjab             0
7   8        India  Mumbai, Thane             1
8   9        India  Uttar Pradesh            20
9  10        India         Orissa            69

这篇关于如何删除数据框中的回车的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆