访问存储在 pandas 数据框中的数组 [英] accessing arrays stored in pandas dataframe

查看:42
本文介绍了访问存储在 pandas 数据框中的数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个pandas数据框,其中一列包含一维numpy数组,而另一列包含标量数据,例如:

I have a pandas dataframe in which one column contains 1-D numpy arrays and another contains scalar data for instance:

df =
    A   B
0   x   [0, 1, 2]
1   y   [0, 1, 2]
2   z   [0, 1, 2]

我想为A=='x'所在的行获取B,所以我尝试了df[df.A == 'x'].B.values,这给了我输出:

I want to get B for the row where A=='x' So I tried df[df.A == 'x'].B.values which gives me the output:

array([array([0, 1, 2])], dtype=object)

输出周围有一个额外的array([]).我知道Pandas将其视为对象,而不仅仅是数据,而且我有一种方法可以使用df[df.A == 'x'].B.values[0]来访问数组.对于标量数据,我只能使用语法df[df.A == 'x'].B,它比我必须使用的df[df.A == 'x'].B.values[0]干净得多.

The output has an extra array([]) around it. I get that Pandas is treating it like an object and not just data, and I have a way to access the array by using df[df.A == 'x'].B.values[0] instead. In the case of scalar data I can just use the syntax df[df.A == 'x'].B which is a lot cleaner than the df[df.A == 'x'].B.values[0] which I have to use.

我的问题是:是否有更好/更干净/更短的方式来访问以我输入的格式存储的数据?还是这是我必须忍受的东西?

My question is: is there a better/cleaner/shorter way to access the data in the format I put it in? or is this just something I will have to live with?

推荐答案

区别不在于数组是一个对象,而是您指定的查询可以返回多个对象(因此,外部array() ).如果您确信查询将仅返回单个对象,则可以使用@Wen的解决方案来使用.item():

The difference isn't the fact that the array is an object, but that the query you specify could return more than one object (hence the outer array()). If you're confident that the query will return only a single object, then you can use @Wen 's solution to use .item():

In [1]: import pandas as pd

In [2]: df = pd.DataFrame([
   ...: dict(A='x', B=[0,1,2]),
   ...: dict(A='y', B=[0,1,2]),
   ...: dict(A='z', B=[0,1,2]),
   ...: ])

In [3]: df[df.A == 'x'].B.item()
Out[3]: [0, 1, 2]

但是根据查询的类型,您至少应该考虑检查结果以确保:

But based on the kind of query, you should at least consider checking the results to make sure:

In [4]: df = pd.DataFrame([
   ...: dict(A='x', B=[0,1,2]),
   ...: dict(A='y', B=[0,1,2]),
   ...: dict(A='z', B=[0,1,2]),
   ...: dict(A='x', B=[3,3,3]),
   ...: ])

In [5]: df[df.A == 'x'].B.item()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-e0ad528e719e> in <module>()
----> 1 df[df.A == 'x'].B.item()

   ...

ValueError: can only convert an array of size 1 to a Python scalar

In [6]: df[df.A == 'x'].B.values
Out[6]: array([[0, 1, 2], [3, 3, 3]], dtype=object)

这篇关于访问存储在 pandas 数据框中的数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆