从python panda dataframe中的大量文本中逐行删除URL [英] Remove a URL row by row from a large set of text in python panda dataframe

查看:265
本文介绍了从python panda dataframe中的大量文本中逐行删除URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已将数据插入pandas数据框。就像图片建议
一样,您可以看到有些行包含url链接,我想删除所有url链接,并用代替(不要只是擦掉它)作为您可以看到第4行有一个网址,还有其他行也有该网址。我想遍历status_message列中的所有行,找到任何url并将其删除。我一直在看如何删除任何URL在Python的字符串中,但是不确定如何在数据框上使用它。因此,第4行应该现在就投票赞成劳工登记。

i have inserted data into pandas dataframe. like the picture suggest as you can see there are some rows that contain url links i want to remove all the url links and replace them with " " (nothing just wiping it ) as you can see row 4 has a url there are other rows too that have url. i want to go through all the rows in the status_message column find any url and remove them. i've been looking at this How to remove any URL within a string in Python but am not sure how to use to it on the dataframe. so row 4 should like vote for labour register now.

推荐答案

您可以使用 str.replace case = False 参数:

df = pd.DataFrame({'status_message':['a s sd Www.labour.com',
                                    'httP://lab.net dud ff a',
                                     'a ss HTTPS://dd.com ur o']})
print (df)
             status_message
0     a s sd Www.labour.com
1   httP://lab.net dud ff a
2  a ss HTTPS://dd.com ur o

df['status_message'] = df['status_message'].str.replace('http\S+|www.\S+', '', case=False)
print (df)
  status_message
0        a s sd 
1       dud ff a
2     a ss  ur o

这篇关于从python panda dataframe中的大量文本中逐行删除URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆