筛选出具有字符串字段的行包含在另一字符串列的行之一中 [英] Filtering out rows that have a string field contained in one of the rows of another column of strings
问题描述
具有两个熊猫数据集
csv1 = pandas.read_csv('test1')
csv2 = pandas.read_csv('test2')
如何显示 csv1
的所有行,这些行的 str1
字段不是 csv2
的任何行的子字符串 str2
字段?
how to display all rows of csv1
that have a str1
field that is not a substring of any of the rows of csv2
's str2
field?
Note: I tried with contains
but it seems that its parameter pat
should be a string and not a column of strings.
示例:
#csv1
id str1
1 abc
2 def
3 ghi
4 xyz
#csv2
data1 str2
69236 pghiww
9623 habcrv
6152 de
然后输出应为:
id str1
2 def
4 xyz
独立行2和4的 str1
字段不包含在 csv2
的 str2
的任何行中.
Indeed rows 2 and 4's str1
fields are not contained in any of the row of csv2
's str2
.
推荐答案
此问题的挑战在于,不仅要检测是否存在匹配项,还要找出匹配的项,然后进行相应的过滤.一种使用 str.contains
的选项:
The challenge with this problem is to not only detect whether there exists a match, but to figure out what matches what, and filter accordingly. One option using str.contains
in a comprehension:
csv1 = csv1.iloc[[~csv2.str2.str.contains(x).any() for x in csv1.str1]]
print(csv1)
id str1
1 2 def
3 4 xyz
这篇关于筛选出具有字符串字段的行包含在另一字符串列的行之一中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!