从python panda dataframe中的大量文本中逐行删除URL [英] Remove a URL row by row from a large set of text in python panda dataframe
问题描述
我已将数据插入pandas数据框。就像图片建议
一样,您可以看到有些行包含url链接,我想删除所有url链接,并用代替(不要只是擦掉它)作为您可以看到第4行有一个网址,还有其他行也有该网址。我想遍历status_message列中的所有行,找到任何url并将其删除。我一直在看如何删除任何URL在Python的字符串中,但是不确定如何在数据框上使用它。因此,第4行应该现在就投票赞成劳工登记。
i have inserted data into pandas dataframe. like the picture suggest as you can see there are some rows that contain url links i want to remove all the url links and replace them with " " (nothing just wiping it ) as you can see row 4 has a url there are other rows too that have url. i want to go through all the rows in the status_message column find any url and remove them. i've been looking at this How to remove any URL within a string in Python but am not sure how to use to it on the dataframe. so row 4 should like vote for labour register now.
推荐答案
您可以使用 str.replace
与 case = False
参数:
df = pd.DataFrame({'status_message':['a s sd Www.labour.com',
'httP://lab.net dud ff a',
'a ss HTTPS://dd.com ur o']})
print (df)
status_message
0 a s sd Www.labour.com
1 httP://lab.net dud ff a
2 a ss HTTPS://dd.com ur o
df['status_message'] = df['status_message'].str.replace('http\S+|www.\S+', '', case=False)
print (df)
status_message
0 a s sd
1 dud ff a
2 a ss ur o
这篇关于从python panda dataframe中的大量文本中逐行删除URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!