使用基于数据框中其他列的功能将值添加到pandas数据框 [英] Adding values to pandas dataframe with function based on other column in dataframe

查看:90
本文介绍了使用基于数据框中其他列的功能将值添加到pandas数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这听起来与很多SO问题相似,但我实际上还没有发现它.如果在这里,请随时链接,我将删除.

This sounds similar to a lot of SO questions but I haven't actually found it; if it's here, please feel free to link and I'll delete.

我有两个数据框.第一个看起来像这样:

I have two dataframes. The first looks like this:

owned   category                            weight  mechanics_split
28156   Environmental, Medical              2.8023  [Action Point Allowance System, Co-operative P...
9269    Card Game, Civilization, Economic   4.3073  [Action Point Allowance System, Auction/Biddin...
36707   Modern Warfare, Political, Wargame  3.5293  [Area Control / Area Influence, Campaign / Bat...

第二个看起来像这样:

    type                            amount  owned
0   Action Point Allowance System   378     0
1   Co-operative Play               302     0
2   Hand Management                 1308    0
3   Point to Point Movement         278     0
4   Set Collection                  708     0
5   Trading                         142     0

我想做的是遍历mechanics_split中的每个单词,以便将第一个数据帧中的owned值添加到第二个数据帧中的owned列.例如,如果骰子滚动"在mechanics_split列的games的第一行中,则该整个行的拥有量将添加到games_owned['owned']中,依此类推,对于mechanics_split中列表中的每个值整个数据框架.

What I'm trying to do is iterate over each word in mechanics_split so that the owned value in the first dataframe is added to the owned column in the second dataframe. For example, if Dice Rolling is in the first row of games in the mechanics_split column, the owned amount for that whole row is added to games_owned['owned'], and so on, for each value in the list in mechanics_split through the whole dataframe.

到目前为止,我已经尝试过:

So far, I've tried:

owned_dict = {}
def total_owned(x):
    for e in x:
        if e not in owned_dict:
            owned_dict[e] = 0
        if e in owned_dict:
            owned_dict[e] += games['owned'][x]
    return owned_dict

返回的内容:

KeyError: "None of [['Action Point Allowance System', 'Co-operative Play', 'Hand Management', 'Point to Point Movement', 'Set Collection', 'Trading', 'Variable Player Powers']] are in the [index]"

如果我在e之前添加另一个字母,将被告知有太多值需要解压.我还尝试跳过字典,仅使用otherdf['owned'][e] += games['owned'][x]无济于事.

If I add another letter before e, I'm told there are too many values to unpack. I also tried skipping the dictionary and just using otherdf['owned'][e] += games['owned'][x] to no avail.

从根本上我可能会误解有关索引如何在熊猫中工作以及如何将值索引到行的问题,所以如果我愿意,请告诉我.非常感谢您的帮助.

I may be fundamentally misunderstanding something about how indexes work in pandas and how to index a value to a row, so if I am, please let me know. Thanks very much for any help.

我已经通过将第二个数据框的索引更改为带有"otherdf.index = otherdf.types"的"types"列来解决了部分问题,但是我仍然面临转移拥有第一个数据框的值.

I've solved part of the problem by changing the index of the second dataframe to the 'types' column with `otherdf.index = otherdf.types', but I'm still left with the problem of transferring the owned values from the first dataframe.

推荐答案

我同意您的看法,即使用类型"列作为基于标签的索引将使事情变得简单.完成此操作后,您可以遍历第一个数据框的行,然后使用.

I agree with you that using the 'type' column as a label-based index will make things easier. With this done, you can iterate over the rows of the first dataframe, then add owned value to the appropriate row in the second dataframe using the .loc method.

for row_1 in df_1.itterrows():
  owned_value = row_1[1]['owned'] #iterrows() enumeration generator over rows      
  mechanics =  row_1[1]['mechanics_split']
  for type_string in mechanics:
    df_2.loc[type_string,('owned')] += owned_value

此外,我建议您阅读熊猫如何处理索引,以帮助您在继续使用Python时避免任何陷阱".

In addition, I suggest reading on how Pandas handles indexing to help avoid any 'gotchas' as you continue to work with Python.

这篇关于使用基于数据框中其他列的功能将值添加到pandas数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆