Python/Pandas-用另一个数据框中的值替换一个数据框中的元素 [英] Python/Pandas - Replacing an element in one dataframe with a value from another dataframe
问题描述
我遇到一个问题,就是用另一个熊猫数据框架中的值替换一个熊猫数据框架中的元素.很长的道歉.我试图给出许多例子来阐明我的问题.我使用的是Python 2.7.11(Anaconda 4.0.0,64位).
I have an issue with replacing an element in one pandas DataFrame by a value from another pandas DataFrame. Apologies for the long post. I have tried to give many inbetween examples to clarify my problem. I use Python 2.7.11 (Anaconda 4.0.0, 64bit).
数据
我有一个包含许多用户项对的pandas DataFrame.此DataFrame(我们称其为 initial_user_item_matrix )的格式为:
I have a pandas DataFrame containing many user item pairs. This DataFrame (let's call it the initial_user_item_matrix) is of the form:
userId itemId interaction
1 1 1 1
2 1 2 0
3 1 3 1
4 1 4 1
5 2 9 1
6 3 3 1
7 3 5 0
此外,我有一个仅包含用户1的用户项对的DataFrame.我将其称为 cold_user_item_matrix ,该DataFrame的形式为:
Furthermore, I have a DataFrame containing only the user item pairs of user 1. I call this the cold_user_item_matrix, this DataFrame is of the form:
userId itemId interaction
1 1 1 1
2 1 2 0
3 1 3 1
4 1 4 1
接下来,我有一个带有项的numpy ndarray,我将其称为 ranked_items .格式为:
Next, I have a numpy ndarray with items, which I call the ranked_items. It is of the form:
[9 5 3 4]
最后,我将 initial_user_item_matrix 中用户1的交互更改为NaN
,这将提供以下DataFrame(称为 new_user_item_matrix ):
Finally, I change the interactions of user 1 in the initial_user_item_matrix to NaN
's which gives the following DataFrame (call it new_user_item_matrix):
userId itemId interaction
1 1 1 NaN
2 1 2 NaN
3 1 3 NaN
4 1 4 NaN
5 2 9 1
6 3 3 1
7 3 5 0
我想实现什么?
我想将 new_user_item_matrix 中的用户1-项目对(当前为NaN
)的交互更改为 initial_user_item_matrix 如果且仅当时,该项目包含在 ranked_items 数组中.此后,应删除仍仍为NaN
交互的所有用户项对(DataFrame的行)(用户1-itemId不在 ranked_items 中的项对).看看下面的结果集应该是什么样子.
I want to change the interaction of the user 1 - item pairs in the new_user_item_matrix (currently NaN
's) to the value of that particular interaction in the initial_user_item_matrix IF AND ONLY IF the item is contained in the ranked_items array. Afterwards, all user item pairs (rows of the DataFrame) where the interaction is still NaN
should be removed (user 1 - item pairs for which the itemId is not in ranked_items). See below what the result set should look like.
中间结果:
userId itemId interaction
1 1 1 NaN
2 1 2 NaN
3 1 3 1
4 1 4 1
5 2 9 1
6 3 3 1
7 3 5 0
最终结果:
userId itemId interaction
3 1 3 1
4 1 4 1
5 2 9 1
6 3 3 1
7 3 5 0
我尝试了什么?
这是我的代码:
for item in ranked_items:
if new_user_item_matrix.loc[new_user_item_matrix['userId']==cold_user].loc[new_user_item_matrix['itemId']==item].empty:
pass
else: new_user_item_matrix.replace(to_replace=new_user_item_matrix.loc[new_user_item_matrix['userId']==1].loc[new_user_item_matrix['itemId']==item].iloc[0,2],value=cold_user_item_matrixloc[cold_user_item_matrix['itemId']==item].iloc[0,2],inplace=True)
new_user_item_matrix.dropna(axis=0,how='any',inplace=True)
它是做什么的?它遍历 ranked_items 数组中的所有项目.首先,它检查用户1是否已与项目(if语句的if部分)进行了交互.如果不是,则转到 ranked_items 数组中的下一项(通过).如果用户1与项目(if语句的else部分)进行了交互,则将用户1的交互替换为 new_user_item_matrix 中的项目(当前为NaN
),并替换为用户1与 cold_user_item_matrix 中的项目的交互,可以是1或0(我希望你们都还和我在一起).
What does it do? It loops over all items in the ranked_items array. First, it checks whether user 1 has interacted with the item (the if-part of the if statement). If not, then go to the next item in the ranked_items array (pass). If user 1 did interact with the item (the else-part of the if statement), replace the interaction of user 1 with the item from the new_user_item_matrix (currently a NaN
) by the value of the interaction of user 1 with the item from the cold_user_item_matrix, which is either a 1 or a 0 (I hope you are all still with me).
出了什么问题?
if语句的if部分不会出现任何问题.当我尝试替换 new_user_item_matrix (if语句的else部分)中的值时,这是错误的.替换特定元素(交互)时,它不仅会替换该元素,还会替换 new_user_item_matrix 中NaN
的其他值 ALL .为了说明这一点,如果循环开始,它将首先在itemId的9和5上循环,用户1尚未与之交互(因此什么也没有发生).接下来,它遍历itemId 3,并且userId 1和itemId 3的交互应该从NaN
更改为0.但是,它不仅将userId 1和itemId 3的交互更改为0,而且还将用户的所有其他交互更改NaN
的1.给出以下结果集:
The if-part of the if statement does not give any problems. It is going wrong when I'm trying to replace the value from the new_user_item_matrix (the else-part of the if statement). When replacing the particular element (the interaction), it does not only replace that element, but also ALL other values that are NaN
in the new_user_item_matrix. To illustrate, if the loop starts, it first loops over itemId's 9 and 5, which user 1 has not interacted with (hence nothing happens). Next, it loops over itemId 3, and the interaction for userId 1 and itemId 3 should change from NaN
to 0. But it does not only change the interaction for userId 1 and itemId 3 to 0, but also all other interactions of user 1 that are NaN
's. Giving the following result set:
userId itemId interaction
1 1 1 1
2 1 2 1
3 1 3 1
4 1 4 1
5 2 9 1
6 3 3 1
7 3 5 0
这显然是不正确的,因为itemId 1和2不在 ranked_items 数组中,因此不应发现它们的真实相互作用.此外,用户1和itemId 3的交互(a 1)都被填写用于所有交互(即使它们的交互不是1而是0).
Which is obviously incorrect, as itemId 1 and 2 are not in the ranked_items array and hence their true interaction should not be uncovered. Also, the interaction (a 1) for user 1 and itemId 3 are filled in for all interactions (even if their interaction is not a 1 but a 0).
有人可以在这里帮助我吗?
Anybody that can help me out here?
推荐答案
简短解决方案
从本质上讲,您希望放弃给定用户的所有项目交互,而只丢弃那些未排名的项目.
In essence, you want to throw away all item interactions for a given user, but only for items which are not ranked.
为使所提出的解决方案更具可读性,请假定为df = initial_user_item_matrix
.
To make the proposed solutions more readable, assume df = initial_user_item_matrix
.
具有布尔条件的简单行选择(在原始df
上生成只读视图):
Simple row selection with boolean conditions (generates a read-only view on the original df
):
filtered_df = df[(df.userID != 1) | df.itemID.isin(ranked_items)]
通过删除无效"行来就地修改数据框的类似解决方案:
Similar solution modifying the dataframe in-place by dropping "invalid" rows:
df.drop(df[(df.userID == 1) & ~df.itemID.isin(ranked_items)].index, inplace=True)
使用所有中间结构的分步解决方案
假设所有上述中间工件都是必需的,则可以按以下方式获得所需结果:
Assuming all above mentioned intermediate artifacts are required, the desired result can be obtained as follows:
import pandas as pd
import numpy as np
initial_user_item_matrix = pd.DataFrame([[1, 1, 1],
[1, 2, 0],
[1, 3, 1],
[1, 4, 1],
[2, 9, 1],
[3, 3, 1],
[3, 5, 0]],
columns=['userID', 'itemID', 'interaction'])
print("initial_user_item_matrix\n{}\n".format(initial_user_item_matrix))
ranked_items = np.array([9, 5, 3, 4])
cold_user = 1
cold_user_item_matrix = initial_user_item_matrix.loc[initial_user_item_matrix.userID == cold_user]
print("cold_user_item_matrix\n{}\n".format(cold_user_item_matrix))
new_user_item_matrix = initial_user_item_matrix.copy()
new_user_item_matrix.ix[new_user_item_matrix.userID == cold_user, 'interaction'] = np.NaN
print("new_user_item_matrix\n{}\n".format(new_user_item_matrix))
new_user_item_matrix.ix[new_user_item_matrix.userID == cold_user, 'interaction'] = cold_user_item_matrix.apply(lambda r: r.interaction if r.itemID in ranked_items else np.NaN, axis=1)
print("new_user_item_matrix after replacing\n{}\n".format(new_user_item_matrix))
new_user_item_matrix.dropna(inplace=True)
print("new_user_item_matrix after dropping nans\n{}\n".format(new_user_item_matrix))
产生
initial_user_item_matrix
userID itemID interaction
0 1 1 1
1 1 2 0
2 1 3 1
3 1 4 1
4 2 9 1
5 3 3 1
6 3 5 0
cold_user_item_matrix
userID itemID interaction
0 1 1 1
1 1 2 0
2 1 3 1
3 1 4 1
new_user_item_matrix
userID itemID interaction
0 1 1 NaN
1 1 2 NaN
2 1 3 NaN
3 1 4 NaN
4 2 9 1
5 3 3 1
6 3 5 0
new_user_item_matrix after replacing
userID itemID interaction
0 1 1 NaN
1 1 2 NaN
2 1 3 1
3 1 4 1
4 2 9 1
5 3 3 1
6 3 5 0
new_user_item_matrix after dropping nans
userID itemID interaction
2 1 3 1
3 1 4 1
4 2 9 1
5 3 3 1
6 3 5 0
这篇关于Python/Pandas-用另一个数据框中的值替换一个数据框中的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!