有没有一种方法可以将Pandas DataFrame的值与第二个DataFrame的值进行比较? [英] Is there a way to compare the values of a Pandas DataFrame with the values of a second DataFrame?
问题描述
我有2个Pandas数据框,每个框有5列,每列约1000行(使用python3).
我有兴趣在df1
中的第一列和df2
的第一列之间进行比较,如下所示:
I have 2 Pandas Dataframes with 5 columns and about 1000 rows each (working with python3).
I'm interested in making a comparison between the first column in df1
and the first column of df2
as follows:
DF1
[index] [col1]
1 "foobar"
2 "acksyn"
3 "foobaz"
4 "ackfin"
... ...
DF2
[index] [col1]
1 "old"
2 "fin"
3 "new"
4 "bar"
... ...
我要实现的是:对于DF1
的每一行,如果DF1.col1
以DF2.col1
的任何值结尾,则删除该行.
在此示例中,结果DF1
应为:
What I want to achieve is this: for each row of DF1
, if DF1.col1
ends in any values of DF2.col1
, drop the row.
In this example the resulting DF1
should be:
DF1
[index] [col1]
2 "acksyn"
3 "foobaz"
... ...
(请参阅DF2
索引2和4是DF1
索引1和4的最后部分)
我尝试使用内部定义的函数,例如:
(see DF2
indexes 2 and 4 are the final part in DF1
indexes 1 and 4)
I tried using an internally defined function like:
def check_presence(df1_col1, second_csv):
for index, row in second_csv.iterrows():
search_string = "(?P<first_group>^(" + some_string + "))(?P<the_rest>" + row["col1"] + "$)"
if re.search(search_string, df1_col1):
return True
return False
以及以下格式的说明:
indexes = csv[csv.col1.str.contains(some_regex, regex= True, na=False)].index
但在这两种情况下,python控制台都遵循无法将非字符串对象与字符串进行比较的问题
我究竟做错了什么?加入2个CSV后,我什至可以尝试解决方案,但我认为最后我需要做同样的事情
感谢您的耐心等待,我是python的新手...
and instructions with this format:
indexes = csv[csv.col1.str.contains(some_regex, regex= True, na=False)].index
but in both cases the python console complies about not being able to compare non-string objects with a string
What am I doing wrong? I can even try a solution after joining the 2 CSVs but I think I would need to do the same thing in the end
Thanks for patience, I'm new to python...
推荐答案
可能的解决方案
对于DF1的每一行,",如果DF1.col1以DF2.col1的任何值结尾,则删除该行."
"" for each row of DF1, if DF1.col1 ends in any values of DF2.col1, drop the row.""
如果我理解正确的话,这是单线的:
# Search for Substring
# Generate an "OR" statement with a join
# Drop if match.
df[~df.col1.str.contains('|'.join(df2.col1.values))]
这将仅保留在DF1.Col1中找不到否的DF2.Col1行.
This will keep only the rows where DF2.Col1 is NOT found in DF1.Col1.
这篇关于有没有一种方法可以将Pandas DataFrame的值与第二个DataFrame的值进行比较?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!