Pandas DataFrame基于列,索引值比较更改值 [英] Pandas DataFrame change a value based on column, index values comparison

查看:126
本文介绍了Pandas DataFrame基于列,索引值比较更改值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设您有一个熊猫DataFrame,它的主体中有某种数据,而columnindex名称中的数字.

Suppose that you have a pandas DataFrame which has some kind of data in the body and numbers in the column and index names.

>>> data=np.array([['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']])
>>> columns = [2, 4, 8]
>>> index = [10, 4, 2]
>>> df = pd.DataFrame(data, columns=columns, index=index)
>>> df
    2  4  8
10  a  b  c
4   d  e  f
2   g  h  i

现在假设我们想基于索引和列的比较以某种方式操纵数据帧.请考虑以下内容.

Now suppose we want to manipulate are data frame in some kind of way based on comparing the index and columns. Consider the following.

如果索引大于列,则用'k'替换字母:

Where index is greater than column replace letter with 'k':

    2  4  8
10  k  k  k
4   k  e  f
2   g  h  i

其中索引等于列的字母替换为'U':

Where index is equal to column replace letter with 'U':

    2  4  8
10  k  k  k
4   k  U  f
2   U  h  i

其中列大于索引的地方用'Y'替换字母:

Where column is greater than index replace letter with 'Y':

    2  4  8
10  k  k  k
4   k  U  Y
2   U  Y  Y

使问题对所有人有用:

  • 进行此替换的快速方法是什么?

  • What is a fast way to do this replacement?

进行此替换的最简单方法是什么?

What is the simplest way to do this replacement?

从最小示例加速结果

  • jezrael :556 µs ± 66.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

user3471881 :329 µs ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

雷木 :4.65 ms ± 252 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

这是重复项吗? 我在Google上搜索了pandas replace compare index column,结果是:

Is this a duplicate? I searched google for pandas replace compare index column and the top results are:

熊猫-比较两个数据框并替换匹配条件的值

Python熊猫:替换基于位置的值而不是索引值

Pandas DataFrame:替换其中的所有值一列,根据条件

但是,对于a)可行还是b)如何以这种方式进行比较,我没有任何感触

However, I don't feel any of these touch on whether this a) possible or b) how to compare in such a way

推荐答案

我认为您需要 numpy.select 进行广播:

I think you need numpy.select with broadcasting:

m1 = df.index.values[:, None] > df.columns.values
m2 = df.index.values[:, None] == df.columns.values


df = pd.DataFrame(np.select([m1, m2], ['k','U'], 'Y'), columns=df.columns, index=df.index)
print (df)
    2  4  8
10  k  k  k
4   k  U  Y
2   U  Y  Y

性能:

np.random.seed(1000)

N = 1000
a = np.random.randint(100, size=N)
b = np.random.randint(100, size=N)

df = pd.DataFrame(np.random.choice(list('abcdefgh'), size=(N, N)), columns=a, index=b)
#print (df)

def us(df):
    values = np.array(np.array([df.index]).transpose() - np.array([df.columns]), dtype='object')
    greater = values > 0
    less = values < 0
    same = values == 0

    values[greater] = 'k'
    values[less] = 'Y'
    values[same] = 'U'


    return pd.DataFrame(values, columns=df.columns, index=df.index)

def jez(df):

    m1 = df.index.values[:, None] > df.columns.values
    m2 = df.index.values[:, None] == df.columns.values
    return pd.DataFrame(np.select([m1, m2], ['k','U'], 'Y'), columns=df.columns, index=df.index)


In [236]: %timeit us(df)
107 ms ± 358 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [237]: %timeit jez(df)
64 ms ± 299 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

这篇关于Pandas DataFrame基于列,索引值比较更改值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆