将numpy数组转换为dask dataframe列? [英] Converting numpy array into dask dataframe column?

查看:159
本文介绍了将numpy数组转换为dask dataframe列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个要在现有dask数据框中添加为列的numpy数组。

I have a numpy array that i want to add as a column in a existing dask dataframe.

enc = LabelEncoder()
nparr = enc.fit_transform(X[['url']])

I具有类型为dask dataframe的ddf。

I have ddf of type dask dataframe.

ddf['nurl'] = nparr   ???

有什么优雅的方法可以实现以上目标?

Any elegant way to achieve above please?

Python PANDAS:从pandas / numpy转换为dask数据框/数组这不能解决我的问题,因为我希望将numpy数组转换为现有的dask数据框。

Python PANDAS: Converting from pandas/numpy to dask dataframe/array This does not solve my issue as i want numpy array into existing dask dataframe.


推荐答案

您可以将numpy数组转换为dask Series对象,然后将其合并到数据框。您将需要使用Series对象的 .to_frame()方法,因为它只支持将数据框与其他数据框合并。

You can convert the numpy array to a dask Series object, then merge it to the dataframe. You will need to use the .to_frame() method of the Series object since it dask only support merging dataframes with other dataframes.

import dask.dataframe as dd
import numpy as np
import pandas as pd

df = pd.DataFrame({'x': range(30), 'y': range(0,300, 10)})
arr = np.random.randint(0, 100, size=30)

# create dask frame and series
ddf = ddf = dd.from_pandas(df, npartitions=5)
darr = dd.from_array(arr)
# give it a name to use as a column head
darr.name = 'z'

ddf2 = ddf.merge(darr.to_frame())

ddf2
# returns:
Dask DataFrame Structure:
                   x      y      z
npartitions=5
0              int64  int64  int32
6                ...    ...    ...
...              ...    ...    ...
24               ...    ...    ...
29               ...    ...    ...
Dask Name: join-indexed, 33 tasks

这篇关于将numpy数组转换为dask dataframe列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆