map_partitions的返回值是多少? [英] What is the return value of map_partitions?

查看：123 发布时间：2020/5/24 0:29:53 python pandas dask

本文介绍了map_partitions的返回值是多少?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

dask API表示，map_partition可用于在每个DataFrame分区上应用Python函数".根据此描述并根据"map"的通常行为，我希望map_partitions的返回值是(类似)一个列表，其长度等于分区数.列表中的每个元素应该是函数调用的返回值之一.

The dask API says, that map_partition can be used to "apply a Python function on each DataFrame partition." From this description and according to the usual behaviour of "map", I would expect the return value of map_partitions to be (something like) a list whose length equals the number of partitions. Each element of the list should be one of the return values of the function calls.

但是，对于以下代码，我不确定返回值取决于什么:

However, with respect to the following code, I am not sure, what the return value depends on:

#generate example dataframe
pdf = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
ddf = dd.from_pandas(pdf, npartitions=3)

#define helper function for map. VAL is the return value
VAL = pd.Series({'A': 1})
#VAL = pd.DataFrame({'A': [1]}) #other return values used in this example
#VAL = None
#VAL = 1
def helper(x):
    print('function called\n')
    return VAL

#check result
out = ddf.map_partitions(helper).compute()
print(len(out))

VAL = pd.Series({'A': 1})导致4个函数调用(可能是一个函数来推断dtype，而3个则是分区)，并输出len == 3且类型为pd.Series.
pd.DataFrame({'A': [1]})得出相同的数字，但是结果类型为pd.DataFrame.
VAL = None导致TypeError ...为什么?不能使用map_partitions 做某事，而不是返回某事?
VAL = 1仅导致2个函数调用. map_partitions的结果是整数1.

VAL = pd.Series({'A': 1}) causes 4 function calls (probably one to infer the dtype and 3 for the partitions) and an output with len == 3 and the type pd.Series.
pd.DataFrame({'A': [1]}) results in the same numbers, however the resulting type is pd.DataFrame.
VAL = None causes an TypeError ... why? Couldn't a possible use of map_partitions be to do something rather than to return something?
VAL = 1 results in only 2 function calls. The result of map_partitions is the integer 1.

因此，我想问一些问题:

Therefore, I want to ask some questions:

如何确定map_partitions的返回值?
除了分区数量之外，还有什么影响函数调用的数量/每个分区有一次要满足一次调用的条件是什么?
仅执行"某项操作(即过程)的函数的返回值应该是什么?
应如何设计返回任意对象的函数?

map_partitions的返回值是多少? [英] What is the return value of map_partitions?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

map_partitions的返回值是多少? [英] What is the return value of map_partitions?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭