在pandas python中删除文本中的'\n' [英] Remove '\n' in text in pandas python

查看:897
本文介绍了在pandas python中删除文本中的'\n'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下代码是我用来删除 ['text'] 列中的 \n 的当前代码:

df = pd.read_csv('file1.csv')df['text'].replace('\s+', ' ', regex=True, inplace=True) # 去除多余的空格df['text'].replace('\n',' ', regex=True) # 删除文本中的\nheader = ["text", "word_length", "author"]df_out = df.to_csv('sn_file1.csv', columns = header, sep=',', encoding='utf-8')

我也根据建议尝试过:

df['text'].replace('\n', '')df['text'] = df['text'].str.replace('\n', '').str.replace('\s+', ' ').str.strip()

<块引用>

输出:'真是个聪明人!\n就像他对房地产交易一无所知一样......'

删除空格的代码正在运行.但不是在删除\n.任何人都可以帮助我解决这个问题吗?谢谢.

我也尝试根据此链接的建议解决从熊猫数据帧单元格中凌乱的字符串中删除换行符? 但它仍然无法正常工作.

已解决:

df['text'].replace(r'\s+|\\n', ' ', regex=True, inplace=True)

解决方案

考虑到要将更改应用于texts"列,请选择该列作为

df['text']

然后,为了实现这一点,可以使用 pandas.DataFrame.replace.

这让我们可以传递正则表达式,regex=True,它将把两个列表中的两个字符串都解释为正则表达式(而不是直接匹配它们).

接受 @Wiktor Stribiżew 建议,以下将完成工作

df['text'] = df['text'].replace(r'\s+|\\n', ' ', regex=True)

这个正则表达式语法参考可能会有所帮助.

The following code is current code that i use to remove \n in ['text'] column:

df = pd.read_csv('file1.csv')

df['text'].replace('\s+', ' ', regex=True, inplace=True) # remove extra whitespace
df['text'].replace('\n',' ', regex=True) # remove \n in text

header = ["text", "word_length", "author"]

df_out = df.to_csv('sn_file1.csv', columns = header, sep=',', encoding='utf-8')

I've tried too from the suggestions:

df['text'].replace('\n', '')
df['text'] = df['text'].str.replace('\n', '').str.replace('\s+', ' ').str.strip()

Output: ' What a smartass! \nLike he knows anything about real estate deals too...'

The code to remove whitespace is working. But not in removing the \n. Anyone can help me on this matter? Thanks.

I've tried to solve based on the suggestion from this link too removing newlines from messy strings in pandas dataframe cells? but it's still not working.

Solved:

df['text'].replace(r'\s+|\\n', ' ', regex=True, inplace=True) 

解决方案

Considering one wants to apply the changes to the column 'texts', select that column as

df['text']

Then, to achieve that, one might use pandas.DataFrame.replace.

This lets one can pass regular expressions, regex=True, which will interpret both the strings in both lists as regexs (instead of matching them directly).

Picking up on @Wiktor Stribiżew suggestion, the following will do the work

df['text'] = df['text'].replace(r'\s+|\\n', ' ', regex=True) 

This regular expression syntax reference may be of help.

这篇关于在pandas python中删除文本中的'\n'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆