如何在c#中加快DataTable中50000行的验证? [英] How to speed up validation of 50000 row in DataTable in c#?

查看:57
本文介绍了如何在c#中加快DataTable中50000行的验证?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我必须向sql server数据库插入大量数据

我有DataTable中的数据,我需要验证大部分单元格值每行,然后保存到数据库。

完成它和UI挂起需要很长时间。甚至状态文本也没有在UI中更新。

所以我通过在不同的线程中执行也以不同的方式尝试它也结果是相同的。

最后我尝试了并行执行(通过将表的内容分成五个单独的表并且并行执行),如下所示,但没有发现任何改进。谁能建议我任何其他好的解决方案?通过使用并行或多线程或任何其他解决方案加快我对整个文档的验证过程??

  int  validatedRowCount =  0 ; 
int interval = 0 ;
interval = totalCount / 5 ;
ObservableCollection< Myobject> [] arrayObj = new ObservableCollection< Myobject> [ 5 ];
DataTable [] arrayOfDatatable = new DataTable [ 5 ];
arrayOfDatatable [ 0 ] = MyTable.AsEnumerable()。跳过( 0 )。间隔).CopyToDataTable();
arrayOfDatatable [ 1 ] = MyTable.AsEnumerable()。Skip(interval).Take(2 * interval).CopyToDataTable();
...
arrayOfDatatable [ 2 ] = MyTable.AsEnumerable()。Skip( 4 * interval).Take(totalRowCountOfMyTable).CopyToDataTable();

System.Threading.Tasks.Parallel.Invoke(

()= >
{ arrayObj [ 0 ] = blExeM.Validate(arrayOfDatatable [ 0 ],contries, ref result,selectedContries);
}
()= > {arrayObj [ 1 ] = blExeM.Validate(arrayOfDatatable [ 1 ],contries, ref result,selectedContries);
}
.........
()= > {arrayObj [ 4 ] = blExeM.Validate(arrayOfDatatable [ 4 ],contries, ref 结果,selectedContries);
}



contries:其列表< object>

selectedContries:字典

结果:字符串

验证其返回已验证的记录后一个集合..



这是一个好方法还是什么是处理这种情况的最佳解决方案?任何帮助非常感谢...如果可能请提供一段代码..

解决方案

在进入解决方案之前,Datatable或datarow不是线程安全的。如果您在使用并行任务进行验证时对数据行中的数据进行任何数据修改,则会遇到问题。



我建议使用Tasks而不是BackgroundWorker。首先让我们试着理解Parallel.Invoke是如何工作的。

来自MSDN for Parallel.Invoke:



Quote:

不保证操作执行的顺序或它们是否并行执行。无论是否由于正常终止或异常终止而发生完成,此方法都不会返回。





引用:

请注意,使用Invoke,您只需表达要同时运行的操作,并且运行时处理所有线程调度详细信息,包括自动缩放到数量主机上的内核。





因此,代码中发生的情况是,您将50000条记录的数据分成5个数据表,每条数据10000个。将有5个并行任务,每个任务有10000个记录要按顺序进行验证。这就是为什么你没有看到性能有任何改善的原因。



更好的方法是,处理等于主机内核数量的记录数。 Max可以是内核数量的2倍。



示例:如果您的机器有4个内核,您可以一次最多处理8条记录,以获得最佳性能。



下一个问题是,在调用Parallel.Invoke之前,你是否必须创建50000/8 = 6250数据表?是的,因为这必须在途中或其他方式完成而不是因为在调用Parallel.Invoke之前你不必这样做。



这里有一些示例代码展示如何做到,但不是唯一的方法。此代码未经过测试。

  long  recordsProcessed =  0 ; 
int interval = 8 ; // 假设有4个核心。您也可以使用4。
While(recordsProcessed < = totalCount)
{
arrayObj [ 0 ] = MyTable.AsEnumerable()。跳过( 0 )。取(间隔).CopyToDataTable();
recordsProcessed + = 8 ;
// 调用8个并行任务,每个任务都要处理数据流

}



执行此操作后,应更改Validate方法以使用DataRow而不是数据表。

 blExeM.Validate(dataRowToValidate,contries,ref result,selectedContries); 





您也可以使用Parallel.ForEach或PLINQ代替Parallel.Invoke。如果您想了解更多信息,请搜索这些术语。



最后,为了使UI响应,调用Validate方法的方法必须异步调用。


根据我的经验,我建议你使用BackgroundWorker做一个处理栏,它有点用户友好。

这里是一个参考。希望可以帮助解决您的问题。


Hi,
I have to insert bulk quantity of data to sql server database
I have the data in DataTable, I need to validate most of the cell values in each row, then save to database.
It takes long time to complete it and UI hangs. and even status text is not updating in UI.
So I tried it in different ways by executing in different threads also the result was same.
At last I tried parallel execution(By spliting the content of table into five seperate table and excecuted parallely) as like given below, but not found any improvement. can any one suggest me any other good solution?? to speed up my validation process of whole document by using parallel or multi thread or any other solution??

int validatedRowCount = 0;
int interval = 0;
interval = totalCount / 5;
ObservableCollection<Myobject>[] arrayObj= new ObservableCollection<Myobject>[5];
DataTable[] arrayOfDatatable = new DataTable[5];
arrayOfDatatable [0] = MyTable.AsEnumerable().Skip(0).Take(interval).CopyToDataTable();
arrayOfDatatable [1] = MyTable.AsEnumerable().Skip(interval).Take(2*interval).CopyToDataTable();
...
arrayOfDatatable [2] = MyTable.AsEnumerable().Skip(4 * interval).Take(totalRowCountOfMyTable).CopyToDataTable();

System.Threading.Tasks.Parallel.Invoke(

() =>
{ arrayObj[0] = blExeM.Validate(arrayOfDatatable[0], contries, ref result, selectedContries);
}
() =>{ arrayObj[1] = blExeM.Validate(arrayOfDatatable[1], contries, ref result, selectedContries);
}
.........
() =>{ arrayObj[4] = blExeM.Validate(arrayOfDatatable[4], contries, ref result, selectedContries);
}


contries : its list<object>
selectedContries: dictionary
result: string
after validating its returning the validated records in a collection..

Is it a good method or what is the best solution to handle this case?? Any help greatly appreciated...If possible please provide piece of code..

解决方案

Before getting into the solution, Datatable or datarow are not thread safe. If you are doing any data modification to data in a datarow in your validation using parallel tasks, you will face issues.

I would suggest to use Tasks instead of BackgroundWorker. First let us try to understand how Parallel.Invoke works.
From MSDN for Parallel.Invoke:

Quote:

No guarantees are made about the order in which the operations execute or whether they execute in parallel. This method does not return until each of the provided operations has completed, regardless of whether completion occurs due to normal or exceptional termination.



Quote:

Note that with Invoke, you simply express which actions you want to run concurrently, and the runtime handles all thread scheduling details, including scaling automatically to the number of cores on the host computer.



So, what happens in your code is, you are splitting the datatble of 50000 records into 5 datatables of 10000 each. There will be 5 parallel tasks, each having 10000 records to validate, sequentially. This is the reason why you did not see any improvement in performance.

The better way to do this will be, to process number of records equal to the number of cores in your host machine. Max can be 2 times number of cores.

Example: if your machine has 4 cores, you can process 8 records max at a time to get the best of the performance.

The next question will be, Do you have to create 50000/8 = 6250 datattables before calling Parallel.Invoke? Yes because this has to be done on way or the other and No because you do not have to do this before calling Parallel.Invoke.

Here is some sample code to show how it can be done, but not the only way. This code is not tested.

long recordsProcessed = 0;
int interval = 8; //assuming 4 cores. You can also use 4.
While(recordsProcessed <= totalCount)
{
  arrayObj[0] = MyTable.AsEnumerable().Skip(0).Take(interval).CopyToDataTable();
  recordsProcessed += 8;
  //Invoke with 8 parallel tasks each taking a datarow to process

}


Once you do this, your Validate method should be change to take DataRow instead of a datatable.

blExeM.Validate(dataRowToValidate, contries, ref result, selectedContries);



You can also use Parallel.ForEach or PLINQ instead of Parallel.Invoke. Search for these terms if you want to learn more.

Finally, to make the UI responsive, the method that calls the Validate method has to invoked asynchronously.


In my experience, I recommended you to use BackgroundWorker to do one processing bar, it's a bit user friendly.
here is one reference. Hope can help on your issue.


这篇关于如何在c#中加快DataTable中50000行的验证?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆