Azure Data Lake中的U-SQL输出 [英] U-SQL Output in Azure Data Lake

查看:103
本文介绍了Azure Data Lake中的U-SQL输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我不知道该表包含多少个不同的键值,是否可以根据列值将一个表自动分为几个文件? 可以将键值放入文件名吗?

Would it be possible to automatically split a table into several files based on column values if I don't know how many different key values the table contains? Is it possible to put the key value into the filename?

推荐答案

这是我们的也在stackoverflow上询问 :).我们目前正在研究它,并希望在夏天之前提供它.

This is our top ask (and has been previously asked on stackoverflow too :). We are currently working on it and hopefully have it available by summer.

在此之前,您必须编写脚本生成器.我倾向于使用U-SQL生成脚本,但是您可以使用Powershell或T4等来实现.

Until then you have to write a script generator. I tend to use U-SQL to generate the script but you could do it with Powershell or T4 etc.

这里是一个例子:

假设您要为下表/行集@x中的列name写入文件:

Let's assume you want to write files for the column name in the following table/rowset @x:

name | value1 | value2
-----+--------+-------
A    | 10     | 20
A    | 11     | 21
B    | 10     | 30
B    | 100    | 200

您将编写一个脚本来生成如下所示的脚本:

You would write a script to generate the script like the following:

@x = SELECT * FROM (VALUES( "A", 10, 20), ("A", 11, 21), ("B", 10, 30), ("B", 100, 200)) AS T(name, value1, value2);

// Generate the script to do partitioned output based on name column:

@stmts = 
  SELECT "OUTPUT (SELECT value1, value2 FROM @x WHERE name == \""+name+"\") TO \"/output/"+name+".csv\" USING Outputters.Csv();" AS output 
  FROM (SELECT DISTINCT name FROM @x) AS x;

OUTPUT @stmts TO "/output/genscript.usql" 
USING Outputters.Text(delimiter:' ', quoting:false);

然后使用genscript.usql,在@x之前进行计算并提交,以将数据划分为两个文件.

Then you take genscript.usql, prepend the calculation of @x and submit it to get the data partitioned into the two files.

这篇关于Azure Data Lake中的U-SQL输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆