C#-使用StreamReader并行化While循环导致CPU过多 [英] C# - Parallelizing While Loop with StreamReader causing High CPU

查看:275
本文介绍了C#-使用StreamReader并行化While循环导致CPU过多的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

SemaphoreSlim sm = new SemaphoreSlim(10);

using (FileStream fileStream = File.OpenRead("..."))
using (StreamReader streamReader = new StreamReader(fileStream, Encoding.UTF8, true, 4096))
{
    String line;
    while ((line = streamReader.ReadLine()) != null)
    {
        sm.Wait();
        new Thread(() =>
        {
            doSomething(line);
            sm.Release();
        }).Start();
    }
}
MessageBox.Show("This should only show once doSomething() has done its LAST line.");

因此,我有一个非常大的文件,我想在每一行上执行代码.

我想并行执行,但一次最多只能执行10次.

为此,我的解决方案是使用SemaphoreSlim等待并在线程完成时释放. (由于该函数是同步的,因此.Release()的放置有效.)

问题是代码占用了大量CPU.内存正在按预期方式运行,而不是加载超过400mb的内存,而是每几秒钟上下几MB的内存.

但是CPU发疯了,它的大部分时间都锁定在100%的状态下持续了30秒钟,然后略微下降并返回.

由于我不想将每一行都加载到内存中,并且想一直运行代码,所以这里最好的解决方案是什么?

在9700行文件中输入500行.

在270万行文件中

600行.

编辑

按照注释中的说明,我从new Thread(()=>{}).Start();更改为Task.Factory.StartNew(()=>{});,看来线程创建和销毁正在导致性能下降.这似乎是正确的.当我移到Task.Factory.StartNew之后,它的运行速度与Semaphore提到的相同,并且它的CPU与我的Parallel.ForEach代码版本完全一样.

解决方案

您的代码创建了大量线程,效率很低. C#具有处理场景的简便方法.一种方法是:

File.ReadLines(path, Encoding.UTF8)
    .AsParallel().WithDegreeOfParallelism(10)
    .ForAll(doSomething);

SemaphoreSlim sm = new SemaphoreSlim(10);

using (FileStream fileStream = File.OpenRead("..."))
using (StreamReader streamReader = new StreamReader(fileStream, Encoding.UTF8, true, 4096))
{
    String line;
    while ((line = streamReader.ReadLine()) != null)
    {
        sm.Wait();
        new Thread(() =>
        {
            doSomething(line);
            sm.Release();
        }).Start();
    }
}
MessageBox.Show("This should only show once doSomething() has done its LAST line.");

So, I have an extremely large file that I want to execute code on every single line.

I want to do it in Parallel but at a maximum of 10 at a time.

My solution for that was to use SemaphoreSlim to wait and release when the thread is finished. (Since the function is synchronous the placement of .Release() works).

The issue is the code takes a LOT of CPU. Memory is going just as expected and instead of loading in over 400mb, it just goes up and down a few mbs every few seconds.

But CPU goes crazy, its most of the time locked at 100% for a good 30 seconds and dips down slightly and goes back.

Since I don't want to load every line into memory, and want to run code as it goes, whats the best solution here?

500 Lines In on a 9,700 line file.

600 Lines In on a 2.7 million line file.

EDIT

I changed from new Thread(()=>{}).Start(); to Task.Factory.StartNew(()=>{}); as per mentioned in comments, it seems that the Thread Creation and Destroying is causing the performance drop. And it seems to be right. After I moved to Task.Factory.StartNew it runs same speed as per mentioned by the Semaphore, and its CPU is exactly like my Parallel.ForEach code version.

解决方案

Your code creates a huge number of threads, which is inefficient. C# has easier ways of handling with your scenario. One approach is:

File.ReadLines(path, Encoding.UTF8)
    .AsParallel().WithDegreeOfParallelism(10)
    .ForAll(doSomething);

这篇关于C#-使用StreamReader并行化While循环导致CPU过多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆