如何在Pandas DataFrame中获取值的索引? [英] How to get indexes of values in a Pandas DataFrame?

查看:174
本文介绍了如何在Pandas DataFrame中获取值的索引?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我确定必须有一个非常简单的解决方案来解决这个问题,但是我找不到它(浏览先前提出的问题,我找不到我想要或不理解的答案). /p>

我有一个与此类似的数据框(大小更大,行和列更多):

      x   val1   val2   val3
0    0.0  10.0   NaN    NaN
1    0.5  10.5   NaN    NaN
2    1.0  11.0   NaN    NaN
3    1.5  11.5   NaN  11.60
4    2.0  12.0   NaN  12.08
5    2.5  12.5  12.2  12.56
6    3.0  13.0  19.8  13.04
7    3.5  13.5  13.3  13.52
8    4.0  14.0  19.8  14.00
9    4.5  14.5  14.4  14.48
10   5.0  15.0  19.8  14.96
11   5.5  15.5  15.5  15.44
12   6.0  16.0  19.8  15.92
13   6.5  16.5  16.6  16.40
14   7.0  17.0  19.8  18.00
15   7.5  17.5  17.7    NaN
16   8.0  18.0  19.8    NaN
17   8.5  18.5  18.8    NaN
18   9.0  19.0  19.8    NaN
19   9.5  19.5  19.9    NaN
20  10.0  20.0  19.8    NaN

在下一步中,我需要为每个值列计算导数dVal/dx(实际上我有3列以上,因此我需要在循环中有一个健壮的解决方案,我无法选择每次手动排成一行).但是由于某些列中的NaN值,我面临的问题是x和val的维数不同.我认为克服此问题的方法是仅选择那些val为notnull的x间隔.但我无法做到这一点.我可能犯了一些非常愚蠢的错误(我不是程序员,而且我没有很高的才能,所以请耐心等待我:)).

这是到目前为止的代码(现在,我想起来了,我可能只留下一些旧的代码就引入了一些错误,因为我已经把它弄了一段时间,尝试了不同的事情):

import pandas as pd
import numpy as np

df = pd.read_csv('H:/DocumentsRedir/pokus/dataframe.csv', delimiter=',')

vals = list(df.columns.values)[1:]

for i in vals:
    V = np.asarray(pd.notnull(df[i]))

    mask = pd.notnull(df[i])
    X = np.asarray(df.loc[mask]['x'])

    derivative=np.diff(V)/np.diff(X)

但是我收到此错误:

ValueError: operands could not be broadcast together with shapes (20,) (15,) 

因此,显然,它并没有仅选择notnull值...

我是否犯了一个明显的错误或应该采用的其他方法?谢谢!

(还有另一个不太重要的问题:np.diff是在此处使用的正确函数,还是我最好通过有限的差异手动计算它?我觉得numpy文档不是很有用.)

解决方案

要计算dVal/dX:

dVal = df.iloc[:, 1:].diff()  # `x` is in column 0.
dX = df['x'].diff()
>>> dVal.apply(lambda series: series / dX)

    val1  val2  val3
0    NaN   NaN   NaN
1      1   NaN   NaN
2      1   NaN   NaN
3      1   NaN   NaN
4      1   NaN  0.96
5      1   NaN  0.96
6      1  15.2  0.96
7      1 -13.0  0.96
8      1  13.0  0.96
9      1 -10.8  0.96
10     1  10.8  0.96
11     1  -8.6  0.96
12     1   8.6  0.96
13     1  -6.4  0.96
14     1   6.4  3.20
15     1  -4.2   NaN
16     1   4.2   NaN
17     1  -2.0   NaN
18     1   2.0   NaN
19     1   0.2   NaN
20     1  -0.2   NaN

我们对所有列(第一列除外)进行求差,然后对每个列应用一个lambda函数,将其除以X列中的差.

I am sure there must be a very simple solution to this problem, but I am failing to find it (and browsing through previously asked questions, I didn't find the answer I wanted or didn't understand it).

I have a dataframe similar to this (just much bigger, with many more rows and columns):

      x   val1   val2   val3
0    0.0  10.0   NaN    NaN
1    0.5  10.5   NaN    NaN
2    1.0  11.0   NaN    NaN
3    1.5  11.5   NaN  11.60
4    2.0  12.0   NaN  12.08
5    2.5  12.5  12.2  12.56
6    3.0  13.0  19.8  13.04
7    3.5  13.5  13.3  13.52
8    4.0  14.0  19.8  14.00
9    4.5  14.5  14.4  14.48
10   5.0  15.0  19.8  14.96
11   5.5  15.5  15.5  15.44
12   6.0  16.0  19.8  15.92
13   6.5  16.5  16.6  16.40
14   7.0  17.0  19.8  18.00
15   7.5  17.5  17.7    NaN
16   8.0  18.0  19.8    NaN
17   8.5  18.5  18.8    NaN
18   9.0  19.0  19.8    NaN
19   9.5  19.5  19.9    NaN
20  10.0  20.0  19.8    NaN

In the next step, I need to compute the derivative dVal/dx for each of the value columns (in reality I have more than 3 columns, so I need to have a robust solution in a loop, I can't select the rows manually each time). But because of the NaN values in some of the columns, I am facing the problem that x and val are not of the same dimension. I feel the way to overcome this would be to only select only those x intervals, for which the val is notnull. But I am not able to do that. I am probably making some very stupid mistakes (I am not a programmer and I am very untalented, so please be patient with me:) ).

Here is the code so far (now that I think of it, I may have introduced some mistakes just by leaving some old pieces of code because I've been messing with it for a while, trying different things):

import pandas as pd
import numpy as np

df = pd.read_csv('H:/DocumentsRedir/pokus/dataframe.csv', delimiter=',')

vals = list(df.columns.values)[1:]

for i in vals:
    V = np.asarray(pd.notnull(df[i]))

    mask = pd.notnull(df[i])
    X = np.asarray(df.loc[mask]['x'])

    derivative=np.diff(V)/np.diff(X)

But I am getting this error:

ValueError: operands could not be broadcast together with shapes (20,) (15,) 

So, apparently, it did not select only the notnull values...

Is there an obvious mistake that I am making or a different approach that I should adopt? Thanks!

(And another less important question: is np.diff the right function to use here or had I better calculated it manually by finite differences? I'm not finding numpy documentation very helpful.)

解决方案

To calculate dVal/dX:

dVal = df.iloc[:, 1:].diff()  # `x` is in column 0.
dX = df['x'].diff()
>>> dVal.apply(lambda series: series / dX)

    val1  val2  val3
0    NaN   NaN   NaN
1      1   NaN   NaN
2      1   NaN   NaN
3      1   NaN   NaN
4      1   NaN  0.96
5      1   NaN  0.96
6      1  15.2  0.96
7      1 -13.0  0.96
8      1  13.0  0.96
9      1 -10.8  0.96
10     1  10.8  0.96
11     1  -8.6  0.96
12     1   8.6  0.96
13     1  -6.4  0.96
14     1   6.4  3.20
15     1  -4.2   NaN
16     1   4.2   NaN
17     1  -2.0   NaN
18     1   2.0   NaN
19     1   0.2   NaN
20     1  -0.2   NaN

We difference all columns (except the first one), and then apply a lambda function to each column which divides it by the difference in column X.

这篇关于如何在Pandas DataFrame中获取值的索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆