在pandas.dataframe中搜索优化的选择 [英] search for an optimized selection in a pandas.dataframe

查看：187 发布时间：2020/5/24 4:15:37 python pandas

本文介绍了在pandas.dataframe中搜索优化的选择的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

根据此选择，在pandas.dataframe中选择包含N列(字符串，整数和浮点数)的某些行的最有效方法是:

What is the most efficient way for selecting some rows in a pandas.dataframe, containing N columns (strings, integers and floats), according to this selection:

遍历2个列(整数)的所有组合.
对于每种不同的组合，请仅保留一行(即所有列)，将最小值保留在第三列(浮点数)中

例如，对于(titi，tutu)与第三列为tete的组合:

for instance, for combinations of (titi,tutu) with the third column being tete:

  toto  tata  titi  tutu  tete
0    a    18   600   700   4.5
1    b    18   600   800  10.1
2    c    18   600   700  12.6
3    d     3   300   400   3.4
4    a    16   900  1000   6.0
5    a    18   600   800  10.1
6    c     3   300   400   3.0
7    a    16   900  1000   6.0

必须给:

    toto  tata  titi  tutu  tete
0    a    18   600   700   4.5
1    b    18   600   800  10.1
4    a    16   900  1000   6.0
6    c     3   300   400   3.0

此刻，我从以下代码开始:

For the moment, I began with the following code:

import pandas
indicesToKeep = []
indicesToRemove = []
reader = pandas.read_csv('/Users/steph/work/perso/sof/test.csv')
columns = reader.columns
for i in reader['titi'].unique():
    #temp = reader[[:]].query('titi == i')#does not work !
    temp = reader.loc[(reader.titi == i),columns]
    for j in temp['tutu'].unique():
        temp2 = temp.loc[(temp.tutu == j),columns]
        minimum = min(temp2.tete)
        indicesToKeep.append(min(
                temp2[temp2.tete==minimum].index.tolist()))
################
# compute the complement of indicesToKeep
#but I don't remember the pythonic syntax
for i in range(len(reader)):
    if i not in indicesToKeep:
        indicesToRemove.append(i)
############################
reader = reader.drop(indicesToRemove)

注意:

我确定这没有优化.
我使用旧的"loc"方法，因为我不知道如何使用"query"

推荐答案

IIUC sort_values + drop_duplicates，如果您起诉熊猫尝试不使用for循环，则大多数情况下它比矢量化方法慢

IIUC sort_values+drop_duplicates, if you are suing pandas try to not using for loop,most of time it is slow than the vectorized method

df.sort_values('tete').drop_duplicates(['titi','tutu']).sort_index()
Out[583]: 
  toto  tata  titi  tutu  tete
0    a    18   600   700   4.5
1    b    18   600   800  10.1
4    a    16   900  1000   6.0
6    c     3   300   400   3.0

这篇关于在pandas.dataframe中搜索优化的选择的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在pandas.dataframe中搜索优化的选择 [英] search for an optimized selection in a pandas.dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在pandas.dataframe中搜索优化的选择 [英] search for an optimized selection in a pandas.dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭