什么是最快的方式在特定的网站检查大代理列表? [英] What is the fastest way for checking a big proxy list on a specific web site?

查看:122
本文介绍了什么是最快的方式在特定的网站检查大代理列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有代理服务器的大名单(txt文件,格式= IP:每行端口),并写了下面的检查他们的code:

I have a big list of proxy servers (txt file , Format = ip:port in each line) and wrote the code below for checking them:

    public static void MyChecker()
    {
        string[] lines = File.ReadAllLines(txtProxyListPath.Text);
        List<string> list_lines = new List<string>(lines);
        List<string> list_lines_RemovedDup = new List<string>();
        HashSet<string> HS = new HashSet<string>();
        int Duplicate_Count = 0;
        int badProxy = 0;
        int CheckedCount = 0;

        foreach (string line in list_lines)
        {
            string[] line_char = line.Split(':');
            string ip = line_char[0];
            string port = line_char[1];
            if (CanPing(ip))
            {
                if (SoketConnect(ip, port))
                {
                    if (CheckProxy(ip, port))
                    {
                        string ipAndport = ip + ":" + port;
                        if (HS.Add(ipAndport))
                        {
                            list_lines_RemovedDup.Add(ipAndport);
                            CheckedCount++;
                        }
                        else
                        {
                            Duplicate_Count++;
                            CheckedCount++;
                        }
                    }
                    else
                    {
                        badProxy++;
                        CheckedCount++;
                    }
                }
                else
                {
                    badProxy++;
                    CheckedCount++;
                }
            }
            else
            {
                badProxy++;
                CheckedCount++;
            }
    }

    public static bool CanPing(string ip)
    {
        Ping ping = new Ping();

        try
        {
            PingReply reply = ping.Send(ip, 2000);
            if (reply == null)
                return false;

            return (reply.Status == IPStatus.Success);
        }
        catch (PingException Ex)
        {
            return false;
        }
    }

    public static bool SoketConnect(string ip, string port)
    {
        var is_success = false;
        try
        {
            var connsock = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
            connsock.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.SendTimeout, 200);
            System.Threading.Thread.Sleep(500);
            var hip = IPAddress.Parse(ip);
            var ipep = new IPEndPoint(hip, int.Parse(port));
            connsock.Connect(ipep);
            if (connsock.Connected)
            {
                is_success = true;
            }
            connsock.Close();
        }
        catch (Exception)
        {
            is_success = false;
        }
        return is_success;
    }

    public static bool CheckProxy(string ip, string port)
    {
        try
        {
            WebClient WC = new WebClient();
            WC.Proxy = new WebProxy(ip, int.Parse(port));
            WC.DownloadString("http://SpecificWebSite.com");
            return true;
        }
        catch (Exception)
        {
            return false;
        }
    }

不过,我想我应该重写这些codeS,因为他们是很慢的。照片 我有不好的延迟这些行:
WC.DownloadString(http://SpecificWebSite.com);

PingReply回复= ping.Send(IP,2000);
这是不好的大名单。
难道我写这些codeS在正确的方向或者我应该改变他们(这部分)?
我怎样才能优化改进呢?

But I think I should rewrite these codes because they are very slow.
I have bad delays in these lines :
WC.DownloadString("http://SpecificWebSite.com");
and
PingReply reply = ping.Send(ip, 2000);
and this is not good for a big list.
Did I write these codes in the right direction or should i change them(which parts)?
how can i optimze them?

在此先感谢

推荐答案

有相当多的东西可以改善。

There are quite a few things you can improve.

  • 请不要睡觉线程半秒。
  • 挂断了Ping检查(因为代理可能是在防火墙后面, 没有响应ping,但仍然工作)
  • 替换为一个HttpWebRequest的只得到了HEAD。
  • DownloadString
  • 设置你的HttpWebRequest超时的东西低于 默认情况下(不需要等那么久。如果代理不内响应 10-20secs那么你可能不希望使用它)。
  • 在分裂您的大名单成更小的,同时对其进行处理 时间。
  • Don't sleep the thread for half a second.
  • Drop the ping check (since the proxy might be behind a firewall and not responding to pings but still working)
  • Replace DownloadString with a HttpWebRequest getting the HEAD only.
  • Set the timeout of your HttpWebRequest to something lower than default (no need to wait that long. If a proxy doesn't respond within 10-20secs then you probably don't want to use it).
  • Split your big list into smaller ones and process them at the same time.

这些本身就应该由相当多的加快你的进程。

These alone should speed up your process by quite a bit.

按照要求,这里有一个如何使用HttpWebRequests

As requested, here's an example of how to use HttpWebRequests

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Proxy = null;   // set proxy here
request.Timeout = 10000; 
request.Method = "HEAD";

using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
    Console.WriteLine(response.StatusCode);
}

这篇关于什么是最快的方式在特定的网站检查大代理列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆