从`dask.DataFrame`中切出几行 [英] Slicing out a few rows from a `dask.DataFrame`

查看:290
本文介绍了从`dask.DataFrame`中切出几行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通常,当使用大型 dask.DataFrame 时,仅抓取几行来测试所有后续操作将很有用。

Often, when working with a large dask.DataFrame, it would be useful to grab only a few rows on which to test all subsequent operations.

当前,根据切片Dask数据框,这不受支持。

Currently, according to Slicing a Dask Dataframe, this is unsupported.


  • 我希望再使用 head 达到相同的效果(因为支持该命令),但是返回一个常规的熊猫DataFrame。

  • 我还尝试了 df [:1000] ,它会执行,但生成的输出与您从Pandas期望的输出不同。

  • I was hoping to then use head to achieve the same (since that command is supported), but that returns a regular pandas DataFrame.
  • I also tried df[:1000], which executes, but generates an output different from that you'd expect from Pandas.

有什么办法可以抓住 dask.DataFrame 的前1000行?

Is there any way to grab the first 1000 rows from a dask.DataFrame?

推荐答案

如果您的数据框具有明智的分区索引,然后建议使用 .loc

If your dataframe has a sensibly partitioned index then I recommend using .loc

small = big.loc['2000':'2005']

如果要保持相同数量的分区,可以考虑使用样本

If you want to maintain the same number of partitions, you might consider sample

small = big.sample(frac=0.01)

如果只需要一个分区,则可以尝试 get_partition

If you just want a single partition, you might try get_partition

small = big.get_partition(0)

您可以此外,请始终使用 to_delayed from_delayed 来构建自己的自定义解决方案。 http://dask.pydata.org/en/latest/dataframe -create.html#dask-delayed

You can also, always use to_delayed and from_delayed to build your own custom solution. http://dask.pydata.org/en/latest/dataframe-create.html#dask-delayed

更一般地说,Dask.dataframe不会保留每个分区的行数,因此具体问题给出我1000行很难回答。回答给我一月份的所有数据或给我第一个分区之类的问题要容易得多

More generally, Dask.dataframe doesn't keep row-counts per partition, so the specific question of "give me 1000 rows" ends up being surprisingly hard to answer. It's a lot easier to answer questions like "give me all the data in January" or "give me the first partition"

这篇关于从`dask.DataFrame`中切出几行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆