将大量数据从SQL Server传输和转换到Azure SQL Server的最佳方法.Azure数据工厂,HDInsight等 [英] Best method to transfer and transfrom large amount of data from a SQL Server to an Azure SQL Server. Azure Data Factory, HDInsight, etc

查看:66
本文介绍了将大量数据从SQL Server传输和转换到Azure SQL Server的最佳方法.Azure数据工厂,HDInsight等的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想找到将20 GB的SQL数据从安装在客户现场服务器Client上的SQL Server数据库传输到我们的Azure SQL Server Source(在S4上具有200 该表显示了基于内部测试在一次复制活动中针对给定的源对和宿对以MBps为单位的复制吞吐数量.

如果希望通过使用Azure数据工厂复制活动更快地传输数据,Azure提供了三种方法来实现更高的吞吐量:

  1. 数据集成单位.数据集成单元(DIU)(以前称为云数据移动单元或DMU)是一种度量,表示数据工厂中单个单元的功能(CPU,内存和网络资源分配的组合).您可以通过使用更多的数据集成单元(DIU)来实现更高的吞吐量.您需要根据复制操作的总时间进行收费.您需要为数据移动支付的总时长是各个DIU的时长之和.

  2. 并行复制.我们可以使用parallelCopies属性指示要使用复制活动的并行性.对于每次运行的复制活动,数据工厂都会确定用于将数据从源数据存储复制到目标数据存储的并行副本数./p>

  3. 分段的副本.将数据从源数据存储复制到接收器数据存储时,您可能会选择使用Blob存储作为临时登台存储.

您可以采用以下方法通过复制活动"来调整Data Factory服务的性能.

有关Azure数据工厂复制活动性能的详细信息,请参阅: https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-performance#data-integration-units

I like to find the best methods of transferring 20 GB of SQL data from a SQL Server database installed on a customer onsite server, Client, to our Azure SQL Server, Source, on an S4 with 200 DTUs performance for $320 a month. When doing an initial setup, we set up an Azure Data Factory that copies over the 20 GB via multiple table copies, e.g., Client Table A's content to Source Table A, Client Table B's content to Source Table B, etc. Then we run many Extractors store procedures that insert into Stage tables the data from the Source tables by joining these Source Table together, e.g., Source A join to Source B. After that is incremental copies, but the initial setup do take forever.

Currently the copying time on an S4 is around 12 hours with the extracting time to be 4 hours. Increasing the performance tier to an S9 of 1600 DTUs for $2400 a month will decrease time to 6 hours with the extracting time to be 2 hours, but that bring with it the higher cost.

I was wondering if there was other Azure methods. Does setting up an HDInsight cluster with Hadoop or Spark be more efficient for cost compare to scaling up the Azure SQL DB to an S9 and more? An S9 of $2400 a month of 31 days is $3.28 an hour. Azure HDInsight Clusters of Memorized Optimized Nodes of a D14 v2 instance is $1.496 per hour so it would be cheaper than an S9. However, how does it compare in terms of performance. Would the copying process be quicker or will the extraction process be quicker?

I am not used to Big Data methods yet. Thank you for all the help.

解决方案

Azure Data Factory Copy Activity delivers a first-class secure, reliable, and high-performance data loading solution. It enables you to copy tens of terabytes of data every day across a rich variety of cloud and on-premises data stores.Copy Activity offers a highly optimized data loading experience that is easy to configure and set up.

You can see the performance reference table about Copy Activity: The table shows the copy throughput number in MBps for the given source and sink pairs in a single copy activity run based on in-house testing.

If you want the data could be transfered quicker by using Azure Data Factory Copy Activity, Azure provides three ways to achieve higher throughput:

  1. Data integration units. A Data Integration Unit (DIU) (formerly known as Cloud Data Movement Unit or DMU) is a measure that represents the power (a combination of CPU, memory, and network resource allocation) of a single unit in Data Factory. You can achieve higher throughput by using more Data Integration Units (DIU).You are charged based on the total time of the copy operation. The total duration you are billed for data movement is the sum of duration across DIUs.

  2. Parallel Copy. We can use the parallelCopies property to indicate the parallelism that you want Copy Activity to use.For each Copy Activity run, Data Factory determines the number of parallel copies to use to copy data from the source data store and to the destination data store.

  3. Staged copy. When you copy data from a source data store to a sink data store, you might choose to use Blob storage as an interim staging store.

You can take these ways to tune the performance of your Data Factory service with Copy Activity.

For more details about Azure Data Factory Copy Activity performace, please see: https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-performance#data-integration-units

这篇关于将大量数据从SQL Server传输和转换到Azure SQL Server的最佳方法.Azure数据工厂,HDInsight等的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆