并发读/写ADLA [英] Concurrent read/write to ADLA

查看:97
本文介绍了并发读/写ADLA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问:1 我们正在考虑将读/写并行化到ADLA表,并且想知道这种设计的含义是什么. 我认为读取是可以的,但是对同一ADLA表进行并发写入的最佳实践应该是什么.

Q:2 假设我们有USQL脚本,该脚本在相同/不同的ADLA表中具有多个行集和多个输出/插入. USQL中的事务作用域故事是什么.如果输出/插入语句中的任何一条失败,那么它将导致所有先前的插入回滚或不回滚.如何处理交易范围

谢谢 阿米特(Amit)

解决方案

在回答之前,我先描述一下插入表时发生的情况(我想这就是写表而不是截断/插入的意思)

每个INSERT语句将为表创建一个新的扩展区文件.因此,如果您插入新行(建议一次插入很多行而不是仅插入一行),则在完成阶段将创建一个新文件,并且将更新元数据,因此,元数据服务会知道该文件属于到桌子上.

因此,您应该能够并行运行多个插入.

当前的交易范围如下(请注意,Azure Data Lake Analytics的平台是大数据处理而不是OLTP平台,因此没有提供可供选择的不同交易保证):

在ADLA中对U-SQL进行批处理的过程分为四个阶段:

  1. 准备工作包含编译,优化和代码生成
  2. 排队等待作业等待所有必需的资源
  3. 实际运行时执行阶段
  4. 完成文件和元数据的最终阶段.

在运行时阶段,如果发生运行时错误,则所有顶点都将成功或失败.因此,一切皆有或无.

一旦处理进入完成阶段,原子性就会降低到文件或表级别.您可能会生成3个文件,但由于某种原因而最终确定一个文件可能会失败.然后作业将失败,但是将创建成功的2个文件.

Q:1 We are thinking of parallelizing read/write to ADLA tables and was wondering what are implications of such design. I think reads are fine but what should be the best practice to have concurrent writes to same ADLA table.

Q:2 Suppose we have USQL scripts which has multiple rowsets and multiple output/insert in same/different ADLA tables. What is transaction scope story in USQL. If any of output/insert statement fails then will it cause all previous inserts to rollback or not. How to handle transaction scope

Thanks Amit

解决方案

Before I answer, let me describe what happens when you insert into a table (I assume that's what you mean with writes to a table and not truncate/insert).

Each INSERT statement will create a new extent file for the table. Thus if you insert new rows (recommendation is to insert many rows at a time and not just one row), a new file will gets created and the meta data will get updated during the finalization phase so the meta data service knows that the file belongs to the table.

So you should be able to run several inserts in parallel.

The transactional scope is currently as follows (note that Azure Data Lake Analytics' platform is a big data processing and not an OLTP platform and thus does not provide different transactional guarantees to choose from):

The batch processing of U-SQL in ADLA is done in 4 phases:

  1. Preparation contains the compilation, optimization and code generation
  2. Queuing where a job waits for all the needed resources
  3. Actual runtime execution phase
  4. Finalization phase where files and metadata gets persisted.

During the runtime phase, either all vertices succeed or fail if a runtime error occurs. So it is all or nothing.

Once the processing enters the finalization phase, the atomicity is reduced to the file or table level. You may generate 3 files but finalizing one file may fail for some reason. then the job fails but the 2 files that succeeded will be created.

这篇关于并发读/写ADLA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆