如果懒惰求值,如何检查dask数据框是否为空? [英] How to check if dask dataframe is empty if lazily evaluated?

查看:116
本文介绍了如果懒惰求值,如何检查dask数据框是否为空?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道这个问题.但是请查看下面的代码(最小工作示例):

I am aware of this question. But check the code(minimal-working example) below:

import dask.dataframe as dd
import pandas as pd

# intialise data of lists.
data = {'Name': ['Tom', 'nick', 'krish', 'jack'], 'Age': [20, 21, 19, 18]}

# Create DataFrame
df = pd.DataFrame(data)
dask_df = dd.from_pandas(df, npartitions=1)

categoric_df = dask_df.select_dtypes(include="category")

当我尝试打印categoric_df时,出现以下错误:

When I try to print the categoric_df I get the following error:

ValueError: No objects to concatenate

当我从PyCharm调试器中检查categoric_df时:

And when I check the categoric_df from PyCharm debugger:

Unable to get repr for <class 'dask.dataframe.core.DataFrame'>

由于这些错误,我可以构建一个try/except块来检查数据帧是否为空.但是我不想使用这种方法,因为不能保证它一直都在工作,并且try/except会减慢代码的速度. 当我尝试打印计算出的categoric_df时,它看起来像这样:

With these errors, I can build a try/except block to check if the dataframe is empty or not. But I don't want to use this approach since it is not guaranteed to work all the time and try/except slows down the code. And when I try to print computed categoric_df it looks like this:

>>>print(categoric_df.compute())
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3]

总结:在这里,如果我选择不存在的类型并从中创建dask.DataFrame,则会得到一个dask.DataFrame,乍一看,如果使用len()函数,该值似乎不是空的.

In summary: Here if I select the non-existing dypes and create a dask.DataFrame from it, I get a dask.DataFrame which at first glance doesn't seem empty if I use len() function.

>>>print(len(categoric_df))
4
>>>print(len(categoric_df.compute())
4
>>>print(categoric_df.compute().empty)
True

是否有一种无需计算就可以检查categoric_df是否为空的方法? (我希望它保持懒惰的评估.)

Is there a way to check if the categoric_df is empty or not without computing it? (I want it to stay lazily evaluated.)

更新:print(len(categoric_df.columns))返回0.这可用于确定数据帧是否为空.但是 这样可行吗?我不确定.

UPDATE: print(len(categoric_df.columns)) is returning 0. This can be used for figuring out if the dataframe is empty or not. But is this viable? I am not sure.

推荐答案

您似乎遇到了一个错误,即数据框无法正确打印.如果您想在 https://github.com/dask/dask/issues上提出错误报告,/new ,这将是报告此问题的正确位置.

It looks like you're run into a bug where a dataframe isn't printing correctly. If you felt like raising a bug report at https://github.com/dask/dask/issues/new that would be the right place to report this.

这不会影响您要执行的检查.查看.columns以查看是否有任何列似乎是合理的.数据框仍然具有行这一事实仅意味着仍然存在索引.

This shouldn't affect the check that you want to do though. Looking at .columns to see if there are any columns seems reasonable. The fact that the dataframe still has rows just means that there is still an index.

这篇关于如果懒惰求值,如何检查dask数据框是否为空?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆