如何将数据帧的每一行写入/写入流到不同的增量表中 [英] How to write / writeStream each row of a dataframe into a different delta table
问题描述
数据框的每一行都有CSV内容.
Each row of my dataframe has a CSV content.
我正在努力将每一行保存在不同的特定表中.
I am strugling to save each row in a different and specific table.
我相信我需要使用foreach或UDF来完成此操作,但这根本行不通.
I believe I need to use a foreach or UDF in order to accomplish this, but this is simply not working.
我设法找到的所有内容就像使用.collect()(我真的不想使用)在foreach或代码中的简单打印一样.
All the content I managed to find was just like simple prints inside foreachs or codes using .collect() (which I really don't want to use).
我还找到了重新分配的方式,但这不允许我选择每行的位置.
I also found the repartition way, but that doesn't allow me to choose where each row will go.
rows = df.count()
df.repartition(rows).write.csv('save-dir')
您能给我一个简单而可行的例子吗?
Can you give me a simple and working example of it?
推荐答案
总之,一如既往,这很简单,但我看不出有任何问题.
Well, at the end of all, as always it is something very simple, but I dind't see this anywere.
基本上,当您执行foreach时,要保存的数据帧是在循环内部构建的.工作程序与驱动程序不同,不会在保存时自动设置"/dbfs/"路径,因此,如果您不手动添加"/dbfs/",它将在工作程序中本地保存数据.
Basically when you perform a foreach and the dataframe you want to save is built inside the loop. The worker unlike the driver, won't automatically setup the "/dbfs/" path on the saving, so if you don't manually add the "/dbfs/", it will save the data locally in the worker.
这就是为什么我的循环无法正常工作的原因.
That is why my loops weren't working.
这篇关于如何将数据帧的每一行写入/写入流到不同的增量表中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!