有没有一种方法可以将Pandas DataFrame的值与第二个DataFrame的值进行比较? [英] Is there a way to compare the values of a Pandas DataFrame with the values of a second DataFrame?

查看:212
本文介绍了有没有一种方法可以将Pandas DataFrame的值与第二个DataFrame的值进行比较?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个Pandas数据框,每个框有5列,每列约1000行(使用python3).
我有兴趣在df1中的第一列和df2的第一列之间进行比较,如下所示:

I have 2 Pandas Dataframes with 5 columns and about 1000 rows each (working with python3).
I'm interested in making a comparison between the first column in df1 and the first column of df2 as follows:

DF1
[index]   [col1]
1         "foobar"
2         "acksyn"
3         "foobaz"
4         "ackfin"
...       ...

DF2
[index]   [col1]
1         "old"
2         "fin"
3         "new"
4         "bar"
...       ...

我要实现的是:对于DF1的每一行,如果DF1.col1DF2.col1的任何值结尾,则删除该行.
在此示例中,结果DF1应为:

What I want to achieve is this: for each row of DF1, if DF1.col1 ends in any values of DF2.col1, drop the row.
In this example the resulting DF1 should be:

DF1
[index]   [col1]
2         "acksyn"
3         "foobaz"
...       ...

(请参阅DF2索引2和4是DF1索引1和4的最后部分)


我尝试使用内部定义的函数,例如:

(see DF2 indexes 2 and 4 are the final part in DF1 indexes 1 and 4)


I tried using an internally defined function like:

def check_presence(df1_col1, second_csv):
    for index, row in second_csv.iterrows():
        search_string = "(?P<first_group>^(" + some_string + "))(?P<the_rest>" + row["col1"] + "$)"
        if re.search(search_string, df1_col1):
            return True
    return False

以及以下格式的说明:
indexes = csv[csv.col1.str.contains(some_regex, regex= True, na=False)].index
但在这两种情况下,python控制台都遵循无法将非字符串对象与字符串进行比较的问题


我究竟做错了什么?加入2个CSV后,我什至可以尝试解决方案,但我认为最后我需要做同样的事情
感谢您的耐心等待,我是python的新手...

and instructions with this format:
indexes = csv[csv.col1.str.contains(some_regex, regex= True, na=False)].index
but in both cases the python console complies about not being able to compare non-string objects with a string


What am I doing wrong? I can even try a solution after joining the 2 CSVs but I think I would need to do the same thing in the end
Thanks for patience, I'm new to python...

推荐答案

可能的解决方案

对于DF1的每一行,

",如果DF1.col1以DF2.col1的任何值结尾,则删除该行."

"" for each row of DF1, if DF1.col1 ends in any values of DF2.col1, drop the row.""

如果我理解正确的话,这是单线的:

# Search for Substring
# Generate an "OR" statement with a join
# Drop if match. 
df[~df.col1.str.contains('|'.join(df2.col1.values))]

这将仅保留在DF1.Col1中找不到的DF2.Col1行.

This will keep only the rows where DF2.Col1 is NOT found in DF1.Col1.

pd.Series.str包含

这篇关于有没有一种方法可以将Pandas DataFrame的值与第二个DataFrame的值进行比较?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆