使用多线程从网络服务器实现高效搜索 [英] Use multiple threads to achieve efficient search from a web server

查看:40
本文介绍了使用多线程从网络服务器实现高效搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 httpwebrequest 访问网络服务器并从给定范围的页面中获取数千条记录.对网页的每次点击都会获取 15 条记录,而网络服务器上几乎有 8 到 10000 个页面.这意味着服务器总共有 120000 次点击!如果使用单个流程轻松完成,则该任务可能非常耗时.因此,多线程是我想到的直接解决方案.

I want to access a web server using httpwebrequest and fetch thousands of records from a given range of pages. Each hit to a webpage fetches 15 records, and there are almost 8 to 10000 pages on the webserver. That means a total of 120000 hits to the server! If done trivially with a single process, the task can be very time consuming. Hence, multiple threading is the immediate solution that comes to mind.

目前,我创建了一个用于搜索的工作类,该工作类将产生 5 个子工作器(线程)以在给定范围内进行搜索.但是,由于我在线程方面的新手能力,我无法使其工作,因为我无法同步并使它们一起工作.我知道 .NET 中的委托、动作、事件,但是让它们与线程一起工作变得令人困惑..这是我正在使用的代码:

Currently, I have created a worker class for searching purpose, that worker class will spawn 5 subworkers (threads) to search in a given range. But, due to my novice abilities in threading, I am unable to make it work, as I am having trouble synchronizing and making them all work together. I know about delegates, actions, events in .NET but making them to work with threads is getting confusing..This is the code that I am using:

public void Start()
{
    this.totalRangePerThread = ((this.endRange - this.startRange) / this.subWorkerThreads.Length);
    for (int i = 0; i < this.subWorkerThreads.Length; ++i)
    {
        //theThreads[counter] = new Thread(new ThreadStart(MethodName));
        this.subWorkerThreads[i] = new Thread(() => searchItem(this.startRange, this.totalRangePerThread));
        //this.subWorkerThreads[i].Start();
        this.startRange = this.startRange + this.totalRangePerThread;
    }

    for (int threadIndex = 0; threadIndex < this.subWorkerThreads.Length; ++threadIndex)
        this.subWorkerThreads[threadIndex].Start();
}

searchItem 方法:

The searchItem method:

public void searchItem(int start, int pagesToSearchPerThread)
{
    for (int count = 0; count < pagesToSearchPerThread; ++count)
    {
     //searching routine here
    }
}

线程的共享变量之间存在问题,谁能指导我如何使其成为线程安全程序?

The problem exists between the shared variables of the threads, can anyone guide me how to make it a threadsafe procedure?

推荐答案

您面临的真正问题是 Thread 构造函数中的 labmda 表达式正在捕获外部变量 (startRange).修复它的一种方法是制作变量的副本,如下所示:

the real problem you're facing is that the labmda expression in the Thread constructor is capturing the outer variable (startRange). One way to fix it is to make a copy of the variable, like this:

for (int i = 0; i < this.subWorkerThreads.Length; ++i)
{
    var copy = startRange;
    this.subWorkerThreads[i] = new Thread(() => searchItem(copy, this.totalRangePerThread));
    this.startRange = this.startRange + this.totalRangePerThread;
}

有关创建和启动线程的更多信息,请参阅 Joe Albahari 的优秀电子书(有还有一个关于捕获变量的部分,再往下一点).如果您想了解闭包,请参阅这个问题.

for more information on creating and starting threads, see Joe Albahari's excellent ebook (there's also a section on captured variables a bit further down). If you want to learn about closures, see this question.

这篇关于使用多线程从网络服务器实现高效搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆