从 Pandas 数据框中删除句子长于特定字长的行 [英] Remove the rows from pandas dataframe, that has sentences longer than certain word length

查看:64
本文介绍了从 Pandas 数据框中删除句子长于特定字长的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从 Pandas 数据框中删除行,其中包含来自特定列的长度大于所需长度的字符串.

I want to remove the rows from the pandas dataframe, that contains the strings from a particular column whose length is greater than the desired length.

例如:

输入框:

X    Y
0    Hi how are you.
1    An apple
2    glass of water
3    I like to watch movie

现在,假设我想从数据框中删除包含长度大于或等于 4 的单词字符串的行.

Now, say I want to remove the rows which has the string of words with length greater than or equal to 4 from the dataframe.

所需的输出帧必须是:

X    Y
1    An apple
2    glass of water

删除X"列中值为 0,3 的行,因为第 0 列中的字数分别为 4 和第 3 列中的字数为 5.

Row with value 0,3 in column 'X' is removed as the number of words in column 0 is 4 and column 3 is 5 respectively.

推荐答案

首先用空格分割值,通过 Series.str.len 并通过反转条件>=进行检查<Series.lt 用于 布尔索引:

First split values by whitespace, get number of rows by Series.str.len and check by inverted condition >= to < with Series.lt for boolean indexing:

df = df[df['Y'].str.split().str.len().lt(4)]
#alternative with inverted mask by ~
#df = df[~df['Y'].str.split().str.len().ge(4)]
print (df)
   X               Y
1  1        An apple
2  2  glass of water

这篇关于从 Pandas 数据框中删除句子长于特定字长的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆