ADLA作业未产生预期结果 [英] ADLA job is not producing expected results

查看:115
本文介绍了ADLA作业未产生预期结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用U-SQL处理数据,但没有得到预期的结果.这是我在做什么:

I am processing data in U-SQL but not getting expected results. Here is what I am doing:

1- Select data from ADL table partitions and assign it to @data1

2- Aggregate data using Group BY and assign it to @data2

3- Truncate partitions

4- Insert data(produced in step 2) into the same table

5- Use @data2 and generate a unique GUID for every record using user
defined function and assign it to @data2
        //UDF Code
        public static Guid GetNewGuid ()
        {
        return Guid.NewGuid ();
        }

6- Select few columns from @data2 and assign it to @data3

@ data2和@ data3中的GUID完全不同.

Strangely GUIDs in @data2 and @data3 are totally different.

如果我与其他数据集执行一些联接并在步骤5中更改架构,然后生成唯一的GUID,那么最后一步将获得相同的GUIDS.似乎在导致此问题的后端发生了一些脚本优化.

If I perform some joins with other datasets and change schema in Step 5 and then generate unique GUIDs then I get same GUIDS at last step. It looks like some script optimization is happening in the backend that is creating this problem.

能否让我知道上述工作流程中发生了什么问题?或者,如果后端进行了某种优化,那么如何学习脚本优化的工作原理.

Could you please let me know what is wrong happening in above workflow? Or if some sort of optimization is happening in the backend then how to learn how script optimization works.

更新: 在这个问题中,我的重点是要了解为什么在第一步中计算出的某些内容会在下一步中自动更改.

Update: In this question, my focus is to learn why something calculated on one step is automatically changed in next step.

推荐答案

回答更新的焦点-为什么在下一步中自动更改在一个步骤上计算的内容

wBob的摘录完全回答了IMO的问题,但也许更广泛的背景会有所帮助.

wBob's excerpt answers the question completely IMO, but maybe a broader context will help.

疯狂,答案很简单

  • 由于您违反了该语言的使用要求(确定性),因此导致的行为是未定义.

未定义意味着任何事情都会发生,因此您无法期望-拥有一致的价值或其他期望.关于所见到的实际行为的任何讨论(不同的指南)都是实现细节.

Undefined means anything can happen, so you cannot have expectations - of consistent values or otherwise. Any discussion of the actual behavior seen (different Guids) is implementation detail.

语义深潜
步骤是一种幻想.

Semantic deep dive
Steps are an illusion.

  • 命令式语言(如C#)中的语句是一系列精确的指令(操作方法).

  • Statements in an imperative language like C# are a sequence of exact instructions (how).

语句是需求列表,以输入(内容)来描述输出. U-Sql优化器具有完全的实现灵活性,可以满足这些要求.行集是组织用户需求的逻辑构造,它们不一定在实施时就存在.而与行集相对应的逻辑可以在实现中拆分,合并,跳过,重复等.

Statements in a functional language like U-Sql are a list of requirements, describing outputs in terms of inputs (what). The U-Sql optimizer has complete implementation flexibility in meeting those requirements. Rowsets are logical constructs to organize user requirements, they need not actually exist at implementation time; while the logic corresponding to a rowset may be split, merged, skipped, repeated, etc in implementation.

例如,在确定性要求下,第5、6步的完全合法的实现:

So, for example, a perfectly legal implementation of steps 5, 6 under the deterministic requirement:

@data3 = SELECT <FewCols>, GetNewGuid() AS NewGuid FROM @data2;
@data2 = SELECT *, GetNewGuid() AS NewGuid FROM @data2;

这篇关于ADLA作业未产生预期结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆