Java代理发现机器人 [英] Java Proxy Discovering Bot

查看:90
本文介绍了Java代理发现机器人的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一个ProxyFinder类,它连接到随机ip并首先对其进行ping操作,如果它们响应,则尝试通过公共代理端口创建http代理连接.

I have written a class, ProxyFinder which connects to random ips and first pings them, and if they respond, attempts to create a http proxy connection through common proxy ports.

当前,它被设置为仅连接到随机ip.这相对较快,每小时发现几个代理.但是,我想以某种方式检查我以前是否已经连接到IP.首先,我尝试将它们保存在列表中,但是使用了超过10GB的内存..我在下面的代码中尝试了一种方法,该方法使用RandomAccessFile将数据写入到缓存中,但是通过每个连接的整个文件都会变得越来越大.

Currently, it is set up just connecting to random ips. This is relatively fast, discovering a few proxys an hour. However, I would like to somehow check if I have already previously connected to an ip. First I tried keeping them in a list, but that was using over 10GB of ram.. I included a method that I tried in the code below which writes the data to a cache using a RandomAccessFile, but this is incredibly slow to search through the entire file for each connection as it gets larger.

我正在以尽可能小的格式存储数据,每个IP仅存储四个字节.即使这是4 * 256 * 256 * 256 * 256字节.. = 16gb的原始内存..或16gb文件,每次您要测试另一个ip时都进行搜索.

I am storing the data in as small of format as possible, simply four bytes for each ip. Even though, this is 4 * 256 * 256 *256 * 256 bytes.. = 16gb of raw ram.. or a 16gb file to search each time you want to test another ip.

我还尝试创建一个单独的线程来生成ip,然后根据文件检查它们,然后将它们添加到可以从中探测线程的队列中.它也跟不上探针线程.

I also tried creating a separate thread to generate ips, check them against the file, and then add them to a queue that the probe threads could pull from. It could not keep up with the probe threads either.

如何快速检查我是否已经连接到IP,而又不会令人难以置信地缓慢或使用可笑的内存量?

How can I quickly check if I have already connected to an IP or not, without being incredibly slow or using ridiculous amounts of memory?

package net;

import java.io.File;
import java.io.RandomAccessFile;
import java.net.HttpURLConnection;
import java.net.InetAddress;
import java.net.InetSocketAddress;
import java.net.Proxy;
import java.net.URL;
import java.util.Arrays;
import java.util.concurrent.atomic.AtomicInteger;

/**
 *
 * @author Colby
 */
public class ProxyFinder {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) throws Exception {

        int[] ports = {
            1080, 3128, 3128, 8080
        };

        System.out.println("Starting network probe");

        AtomicInteger counter = new AtomicInteger();
        for (int i = 0; i < 500; i++) {
            new Thread(() -> {

                do {
                    try {
                        byte[] addrBytes = randomAddress();//could be getNextAddress also
                        if (addrBytes == null) {
                            break;
                        }

                        InetAddress addr = InetAddress.getByAddress(addrBytes);
                        if (ping(addr)) {
                            float percent = (float) ((counter.get() / (256f * 256f * 256f * 256f)) * 100F);
                            if (counter.incrementAndGet() % 10000 == 0) {
                                System.out.println("Searching " + percent + "% network search");
                            }

                            for (int port : ports) {
                                try {
                                    Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(addr, port));

                                    HttpURLConnection con = (HttpURLConnection) new URL("http://google.com").openConnection(proxy);

                                    con.setConnectTimeout(1000);
                                    con.setReadTimeout(1000);
                                    con.setRequestMethod("GET");
                                    con.setRequestProperty("User-Agent", "Mozilla/5.0");

                                    con.getContent();
                                    con.disconnect();

                                    System.out.println("Proxy found!" + addr.getHostAddress() + ":" + port + "  Found at " + percent + "% network search");

                                } catch (Exception e) {
                                }
                            }

                            //
                            //System.out.println("Ping response: --" + addr.getHostAddress() + "-- Attempt: " + counter.get() + " Percent: " + percent + "%");
                        } else {
                            //System.out.println("Ping response failed: " + addr.getHostAddress() + " attempt " + counter.incrementAndGet());
                        }

                    } catch (Exception e) {
                        //e.printStackTrace();
                    }

                } while (true);

            }).start();
        }
    }

    private static RandomAccessFile cache;

    private static byte[] getNextAddress() throws Exception {
        if (cache == null) {
            cache = new RandomAccessFile(File.createTempFile("abc", ".tmp"), "rw");
        }

        byte[] check;
        checkFile:
        {
            byte[] addr = new byte[4];
            do {
                check = randomAddress();
                inner:
                {
                    cache.seek(0);
                    while (cache.length() - cache.getFilePointer() > 0) {
                        cache.readFully(addr);
                        if (Arrays.equals(check, addr)) {
                            break inner;
                        }
                    }
                    cache.write(check);
                    break checkFile;
                }

            } while (true);
        }
        return check;
    }

    private static byte[] randomAddress() {
        return new byte[]{(byte) (Math.random() * 256), (byte) (Math.random() * 256), (byte) (Math.random() * 256), (byte) (Math.random() * 256)};
    }

    private static boolean ping(InetAddress addr) throws Exception {
        return addr.isReachable(500);
    }
}

如果有人想知道的话,我已经运行了12个小时,它发现了大约50个代理,并且ping了大约IP范围的2.09664E-4%(约120万个ips).分配的带宽(0.5Mbps)还不错

Also in case anyone is wondering, I've had this running for 12 hours now and it's discovered about 50 proxys, and pinged about 2.09664E-4% of the ip range which is about 1.2 million ips. not bad for the bandwidth allocated (0.5Mbps)

我开始认为,存储和检查所有这些IP的开销可能比在搜索IP范围快结束时简单地连接到许多重复项的开销还要大.

I am starting to think that maybe the overhead of storing and checking all of these IPs would be even greater than simply connecting to many duplicates near the end of searching the ip range..

推荐答案

我从此处移植了另一种解决方案的代码来解决此问题: Java-将多维数组映射到单个

I have ported code from another solution here to fit this problem: Java- Mapping multi-dimensional arrays to single

以上问题的答案对以下代码的工作方式进行了深入的解释.如果其他人想在此主题上发布更深入的答案,我会给它答案.

The answer to the above question gives an in depth explanation of how the following code works. If anyone else would like to post a more in depth answer on this thread I will award it the answer.

static BitSet set;

static int pos(int i, int j, int k, int m) {
    return ((256*256*256) * i) + ((256*256) * j) + (256 * k) + m;
}

static boolean get(byte[] addr) {
    return set.get(pos(addr[0], addr[1], addr[2], addr[3]));
}

static void set(byte[] addr, boolean flag) {
    set.set(pos(addr[0], addr[1], addr[2], addr[3]), flag);
}

这篇关于Java代理发现机器人的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆