从2个数据框中的列中减去两个日期 [英] Subtracting Two dates from columns in 2 dataframes pandas

查看:28
本文介绍了从2个数据框中的列中减去两个日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码:

for tup in unique_tuples:
    user_review = reviews_prior_to_influence_threshold[(reviews_prior_to_influence_threshold.business_id == tup[0]) & (reviews_prior_to_influence_threshold.user_id == tup[1])]     

    for friend in tup[2]:
        friend_review = reviews_prior_to_influence_threshold[(reviews_prior_to_influence_threshold.business_id == tup[0]) & (reviews_prior_to_influence_threshold.user_id == friend)] 

        if (friend_review.date - user_review.date) <= 62:
            tup[2].remove(friend)

我要从元组列表中提取值,并将其与数据帧中一列中的值匹配,然后在该值等于true的行中屏蔽.

I'm extracting values from a list of tuples and matching them to values in a column from a dataframe, then masking the row where that value is equal to true.

user_review_mask是一行,代表用户对企业进行的评论.friend_review掩码也是一行,代表用户的朋友进行的评论.

The user_review_mask is one row, representing the review that the user made on a business. The friend_review mask is also one row, representing the review that the user's friend made.

tup [2]是tup [1]中user_id的friend_id的列表.因此,我要遍历用户的每个朋友,然后将那个friend_id与他对公司的评价相匹配.

tup[2] is a list of friend_ids of the user_id in tup[1]. So I am looping through each friend of a user, and then match that friend_id to his review for a business.

基本上我想查看的是,对于2位不同用户的2条不同评论,friend_review.date和user_review.date之间的差异是否为< = +2个月.不少于2个月,我想从tup [2]列表中删除friend_id.

Essentially I am looking to see if, for 2 different reviews by 2 different users, the difference between the friend_review.date and the user_review.date is <= +2 months. If the difference isn't less than 2 months, I want to remove the friend_id from the tup[2] list.

两个数据帧/行中的日期均为数据类型datetime64 [ns],并且每个日期的格式都为"yyyy-mm-dd",因此我想我可以轻松地减去它们以查看是否存在两次评论之间的差异少于2个月.

Both the dates in both dataframes/rows are of the data type datetime64[ns], and each date is formatted as such "yyyy-mm-dd", so I thought I could easily subtract them to see if there was a difference of less than 2 months between reviews.

但是,我不断收到以下错误:

However, I keep getting the following error:

TypeError: invalid type comparison

它还提到Numpy不喜欢比较与无",这也让我有些困惑,因为我的列中没有空值.

It also mentions that Numpy does not like comparisons vs "None", which I'm also a bit confused about since I have no null values in my column.

更新:解决方案最终将其追加到新列表中,而不是从当前列表中删除,但这是可行的.

UPDATE: SOLUTION Ended up appending to new list instead of deleting from current one, but this works.

#to append tuples
business_reviewer_and_influenced_reviewers = []

#loop through each user and create a single row df based on a match from the reviews df and our tuple values
for tup in unique_tuples:
    user_review_date = reviews_prior_to_influence_threshold.loc[(reviews_prior_to_influence_threshold.business_id == tup[0]) & 
                                                                (reviews_prior_to_influence_threshold.user_id == tup[1]), 'date']     

    user_review_date = user_review_date.values[0]

    #loop through list each friend of the reviewer that also reviewed the business in tup[2]
    for friend in tup[2]:
        friend_review_date = reviews_prior_to_influence_threshold.loc[(reviews_prior_to_influence_threshold.business_id == tup[0]) & 
                                                                      (reviews_prior_to_influence_threshold.user_id == friend), 'date']

        friend_review_date = friend_review_date.values[0]
        diff = pd.to_timedelta(friend_review_date - user_review_date).days

        #append business_id, reviewer, and influenced_reviewer as a tuple to a list
        if (diff >= 0) and (diff <= 62):
            business_reviewer_and_influenced_reviewers.append((tup[0], tup[1], friend))

推荐答案

数据框中的日期可能不是 datetime64 dtype 实例,因此无效的类型比较.您可以使用 df.dtypes 进行检查.如果是这样,请使用 df.date = pd.to_datetime(df.date).

The dates in your dataframe are likely not datetime64 dtype instances, hence the invalid type comparison. You can check with df.dtypes. If that's true, use df.date = pd.to_datetime(df.date).

您的数据框中可能有一些日期为 null ,因此与无"进行了比较.使用 df [pd.notnull(df.dates)] .

You likely have some dates in your dataframe that are null, hence the comparisons vs. "None". Use df[pd.notnull(df.dates)].

顺便说一句:减去日期应该可以使您获得 timedelta ,因此您可能需要执行类似(friend_review.date-user_review.date).dt.days< = 62 .

BTW: Subtracting the dates should get you timedelta so you'll likely need to do something like (friend_review.date - user_review.date).dt.days <= 62.

这篇关于从2个数据框中的列中减去两个日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆