从 Pandas 数据框中删除句子长于特定字长的行 [英] Remove the rows from pandas dataframe, that has sentences longer than certain word length
问题描述
我想从 Pandas 数据框中删除行,其中包含来自特定列的长度大于所需长度的字符串.
I want to remove the rows from the pandas dataframe, that contains the strings from a particular column whose length is greater than the desired length.
例如:
输入框:
X Y
0 Hi how are you.
1 An apple
2 glass of water
3 I like to watch movie
现在,假设我想从数据框中删除包含长度大于或等于 4 的单词字符串的行.
Now, say I want to remove the rows which has the string of words with length greater than or equal to 4 from the dataframe.
所需的输出帧必须是:
X Y
1 An apple
2 glass of water
删除X"列中值为 0,3 的行,因为第 0 列中的字数分别为 4 和第 3 列中的字数为 5.
Row with value 0,3 in column 'X' is removed as the number of words in column 0 is 4 and column 3 is 5 respectively.
推荐答案
首先用空格分割值,通过 Series.str.len
并通过反转条件>=
进行检查<
和 Series.lt
用于 布尔索引
:
First split values by whitespace, get number of rows by Series.str.len
and check by inverted condition >=
to <
with Series.lt
for boolean indexing
:
df = df[df['Y'].str.split().str.len().lt(4)]
#alternative with inverted mask by ~
#df = df[~df['Y'].str.split().str.len().ge(4)]
print (df)
X Y
1 1 An apple
2 2 glass of water
这篇关于从 Pandas 数据框中删除句子长于特定字长的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!