遍历Pandas DF中的多列并动态切片 [英] Iterating across multiple columns in Pandas DF and slicing dynamically

查看：251 发布时间：2020/5/4 10:12:30 python pandas machine-learning scikit-learn grid-search

本文介绍了遍历Pandas DF中的多列并动态切片的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

TLDR:如何遍历pandas数据框中多个列的所有选项，而无需显式指定列或列的值?

TLDR: How to iterate across all options of multiple columns in a pandas dataframe without specifying the columns or their values explicitly?

长版:我有一个熊猫数据框，看起来像这样，只是它具有比此处列出的功能或药物剂量组合更多的功能.除了3种类型的功能外，它可能还包含70 ...:

Long Version: I have a pandas dataframe that looks like this, only it has a lot more features or drug dose combinations than are listed here. Instead of just 3 types of features, it could have something like 70...:

> dosage_df

First Score Last Score  A_dose  B_dose  C_dose
22          28          1       40      130
55          11          2       40      130
15          72          3       40      130
42          67          1       90      130
90          74          2       90      130
87          89          3       90      130
14          43          1       40      700
12          61          2       40      700
41          5           3       40      700

除了数据框外，我还有一个python字典，其中包含每个功能的相关范围.键是要素名称，键可以采用的不同值是

Along with my data frame, I also have a python dictionary with the relevant ranges for each feature. The keys are the feature names, and the different values which it can take are the keys:

> dict_of_dose_ranges = {'A_dose': [1, 2, 3], 'B_dose': [40, 90], 'C_dose': [130,700]}

出于我的目的，我需要生成一个特定的组合(例如A_dose = 1，B_dose = 90和C_dose = 700)，然后根据这些设置从我的数据框中取出相关的切片，并从中进行相关的计算较小的子集，并将结果保存在某处.

For my purposes, I need to generate a particular combination (say A_dose = 1, B_dose = 90, and C_dose = 700), and based on those settings take the relevant slice out of my dataframe, and do relevant calculations from that smaller subset, and save the results somewhere.

我需要对我所有功能的所有可能组合进行此操作(远远超过此处的3个，并且将来会有所变化).

I need to do this for ALL possible combinations of ALL of my features (far more than the 3 which are here, and which will be variable in the future).

在这种情况下，我可以轻松地将其弹出到SkLearn的参数"网格中，生成选项:

In this case, I could easily pop this into SkLearn's Parameter grid, generate the options:

> from sklearn.grid_search import ParameterGrid
> all_options = list(ParameterGrid(dict_of_dose_ranges)) 
> all_options

并获得:

[{'A_dose': 1, 'B_dose': 40, 'C_dose': 130},
 {'A_dose': 1, 'B_dose': 40, 'C_dose': 700},
 {'A_dose': 1, 'B_dose': 90, 'C_dose': 130},
 {'A_dose': 1, 'B_dose': 90, 'C_dose': 700},
 {'A_dose': 2, 'B_dose': 40, 'C_dose': 130},
 {'A_dose': 2, 'B_dose': 40, 'C_dose': 700},
 {'A_dose': 2, 'B_dose': 90, 'C_dose': 130},
 {'A_dose': 2, 'B_dose': 90, 'C_dose': 700},
 {'A_dose': 3, 'B_dose': 40, 'C_dose': 130},
 {'A_dose': 3, 'B_dose': 40, 'C_dose': 700},
 {'A_dose': 3, 'B_dose': 90, 'C_dose': 130},
 {'A_dose': 3, 'B_dose': 90, 'C_dose': 700}]

这是我遇到的问题:

问题#1)我现在可以遍历all_options，但是我不确定现在如何从每个字典选项中从我的dosage_df中进行选择(即{'A_dose ':1，'B_dose':40，'C_dose':130})，而无需明确地执行此操作.

Problem #1) I can now iterate across all_options, but I'm not sure how to now SELECT out of my dosage_df from each of the dictionary options (i.e. {'A_dose': 1, 'B_dose': 40, 'C_dose': 130}) WITHOUT doing it explicitly.

过去，我可以做类似的事情:

In the past, I could do something like:

dosage_df[(dosage_df.A_dose == 1) & (dosage_df.B_dose == 40) & (dosage_df.C_dose == 130)]

First Score Last Score  A_dose  B_dose  C_dose
0           22          28      140     130

但是现在我不确定要放在括号内的内容以动态地对其进行切片...

But now I'm not sure what to put inside the brackets to slice it dynamically...

dosage_df[?????]

问题2)当我实际上输入具有各自范围的完整特征字典时，出现错误，因为它认为它具有太多选择...

Problem #2) When I actually enter in my full dictionary of features with their respective ranges, I get an error because it deems it as having too many options...

from sklearn.grid_search import ParameterGrid
all_options = list(ParameterGrid(dictionary_of_features_and_ranges)) 
all_options

---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-138-7b73d5e248f5> in <module>()
      1 from sklearn.grid_search import ParameterGrid
----> 2 all_options = list(ParameterGrid(dictionary_of_features_and_ranges))
      3 all_options

OverflowError: long int too large to convert to int

我尝试了多种替代方法，包括使用double while循环，此处的tree/递归方法，另一个此处的递归方法，但不是在一起....非常感谢您的帮助.

I tried a number of alternate approaches including using double while loops, a tree / recursion method from here, another recursion method from here, but it wasn't coming together.... Any help is much appreciated.

遍历Pandas DF中的多列并动态切片 [英] Iterating across multiple columns in Pandas DF and slicing dynamically

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

遍历Pandas DF中的多列并动态切片 [英] Iterating across multiple columns in Pandas DF and slicing dynamically

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭