Python:从 DataFrame 中的两列创建结构化的 numpy 结构化数组 [英] Python: Create structured numpy structured array from two columns in a DataFrame
问题描述
如何从 DataFrame 中的两列创建结构化数组?我试过这个:
How do you create a structured array from two columns in a DataFrame? I tried this:
df = pd.DataFrame(data=[[1,2],[10,20]], columns=['a','b'])
df
a b
0 1 2
1 10 20
x = np.array([([val for val in list(df['a'])],
[val for val in list(df['b'])])])
但这给了我这个:
array([[[ 1, 10],
[ 2, 20]]])
但我想要这个:
[(1,2),(10,20)]
谢谢!
推荐答案
有几种方法.与常规 NumPy 数组相比,您可能会遇到性能和功能方面的损失.
There are a couple of methods. You may experience a loss in performance and functionality relative to regular NumPy arrays.
您可以使用pd.DataFrame.to_records
和 index=False
.从技术上讲,这是一个记录数组,但对于许多目的这就足够了.
You can use pd.DataFrame.to_records
with index=False
. Technically, this is a record array, but for many purposes this will be sufficient.
res1 = df.to_records(index=False)
print(res1)
rec.array([(1, 2), (10, 20)],
dtype=[('a', '<i8'), ('b', '<i8')])
结构化数组
手动,您可以通过按行转换为 tuple
来构造结构化数组,然后为 dtype
参数指定一个元组列表.
structured array
Manually, you can construct a structured array via conversion to tuple
by row, then specifying a list of tuples for the dtype
parameter.
s = df.dtypes
res2 = np.array([tuple(x) for x in df.values], dtype=list(zip(s.index, s)))
print(res2)
array([(1, 2), (10, 20)],
dtype=[('a', '<i8'), ('b', '<i8')])
有什么区别?
很少.recarray
是常规 NumPy 数组类型 ndarray
的子类.另一方面,第二个例子中的结构化数组是 ndarray
类型.
Very little. recarray
is a subclass of ndarray
, the regular NumPy array type. On the other hand, the structured array in the second example is of type ndarray
.
type(res1) # numpy.recarray
isinstance(res1, np.ndarray) # True
type(res2) # numpy.ndarray
主要区别是记录数组便于属性查找,而结构化数组会产生AttributeError
:
The main difference is record arrays facilitate attribute lookup, while structured arrays will yield AttributeError
:
print(res1.a)
array([ 1, 10], dtype=int64)
print(res2.a)
AttributeError: 'numpy.ndarray' object has no attribute 'a'
这篇关于Python:从 DataFrame 中的两列创建结构化的 numpy 结构化数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!