Dask DataFrame的逐行处理 [英] Row by row processing of a Dask DataFrame

查看：433 发布时间：2020/10/15 18:36:47 python pandas dask

本文介绍了Dask DataFrame的逐行处理的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要处理一个大文件并更改一些值。

I need to process a large file and to change some values.

我想执行以下操作：

for index, row in dataFrame.iterrows():

        foo = doSomeStuffWith(row)
        lol = doOtherStuffWith(row)

        dataFrame['colx'][index] = foo
        dataFrame['coly'][index] = lol

对我来说不好，我不能做 dataFrame ['colx'] [index] = foo ！

Bad for me, I cannot do dataFrame['colx'][index] = foo!

我的行数很大，我需要处理大量的列。因此，如果我为每一列执行一个dataFrame.apply（...），恐怕dask可能会多次读取文件。

My number of row is quite large and I need to process a large number of column. So I'm afraid that dask may read the file several times if I do one dataFrame.apply(...) for each column.

其他解决方案是手动中断将我的数据分成大块，使用大熊猫或将任何东西扔到数据库中。但是，如果我可以继续使用.csv并让dask为我完成数据块处理，那就太好了！

Other solutions are to manually break my data into chunks and to use pandas or to just throw anything in a database. But it could be nice if I may keep using my .csv and let dask do the chunk processing for me!

感谢您的帮助。

Dask DataFrame的逐行处理 [英] Row by row processing of a Dask DataFrame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Dask DataFrame的逐行处理 [英] Row by row processing of a Dask DataFrame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭