如何将数据帧的每一行写入/写入流到不同的增量表中 [英] How to write / writeStream each row of a dataframe into a different delta table

查看:66
本文介绍了如何将数据帧的每一行写入/写入流到不同的增量表中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

数据框的每一行都有CSV内容.

Each row of my dataframe has a CSV content.

我正在努力将每一行保存在不同的特定表中.

I am strugling to save each row in a different and specific table.

我相信我需要使用foreach或UDF来完成此操作,但这根本行不通.

I believe I need to use a foreach or UDF in order to accomplish this, but this is simply not working.

我设法找到的所有内容就像使用.collect()(我真的不想使用)在foreach或代码中的简单打印一样.

All the content I managed to find was just like simple prints inside foreachs or codes using .collect() (which I really don't want to use).

我还找到了重新分配的方式,但这不允许我选择每行的位置.

I also found the repartition way, but that doesn't allow me to choose where each row will go.

rows = df.count()
df.repartition(rows).write.csv('save-dir')

您能给我一个简单而可行的例子吗?

Can you give me a simple and working example of it?

推荐答案

总之,一如既往,这很简单,但我看不出有任何问题.

Well, at the end of all, as always it is something very simple, but I dind't see this anywere.

基本上,当您执行foreach时,要保存的数据帧是在循环内部构建的.工作程序与驱动程序不同,不会在保存时自动设置"/dbfs/"路径,因此,如果您不手动添加"/dbfs/",它将在工作程序中本地保存数据.

Basically when you perform a foreach and the dataframe you want to save is built inside the loop. The worker unlike the driver, won't automatically setup the "/dbfs/" path on the saving, so if you don't manually add the "/dbfs/", it will save the data locally in the worker.

这就是为什么我的循环无法正常工作的原因.

That is why my loops weren't working.

这篇关于如何将数据帧的每一行写入/写入流到不同的增量表中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆