如何通过索引列表从dask数据框中选择数据? [英] How can I select data from a dask dataframe by a list of indices?

查看:513
本文介绍了如何通过索引列表从dask数据框中选择数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有以下dask数据帧。

Let's say, I have the following dask dataframe.

dict_ = {'A':[1,2,3,4,5,6,7], 'B':[2,3,4,5,6,7,8], 'index':['x1', 'a2', 'x3', 'c4', 'x5', 'y6', 'x7']}
pdf = pd.DataFrame(dict_)
pdf = pdf.set_index('index')
ddf = dask.dataframe.from_pandas(pdf, npartitions = 2)

此外,我有一个我感兴趣的指数列表,例如

Furthermore, I have a list of indices, that I am interested in, e.g.

indices_i_want_to_select = ['x1','x3', 'y6']

如何生成一个新的dask数据帧,它只包含索引指定的行?有没有理由,为什么有些像ddf [ddf.A> = 4]是可能的,而ddf [indices_i_want_to_select中的ddf.index]或ddf.loc [indices_i_want_to_select]不是?

How can I generate a new dask dataframe, that contains only the rows specified by the indices? Is there a reason, why someting like ddf[ddf.A>=4] is possible, while ddf[ddf.index in indices_i_want_to_select] or ddf.loc[indices_i_want_to_select] is not?

推荐答案

以下似乎有效:

import pandas as pd
import dask.dataframe as dd

#generate example dataframe
pdf = pd.DataFrame(dict(A = [1,2,3,4,5], B = [6,7,8,9,0]), index=['i1', 'i2', 'i3', 4, 5])
ddf = dd.from_pandas(pdf, npartitions = 2)

#list of indices I want to select
l = ['i1', 4, 5]

#generate new dask dataframe containing only the specified indices
ddf_selected = ddf.map_partitions(lambda x: x[x.index.isin(l)], meta = ddf.dtypes)

编辑:如果结果的顺序不重要,这只适用。

edit: this only suitable, if the order of the result is not important.

这篇关于如何通过索引列表从dask数据框中选择数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆