crawler4j 的实现 [英] Implementation of crawler4j

查看:16
本文介绍了crawler4j 的实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使 crawler4j 的基本形式如所见此处.我通过定义 rootFolder 和 numberOfCrawlers 修改了前几行,如下所示:

I am attempting to get the basic form of crawler4j running as seen here. I have modified the first few lines by defining the rootFolder and numberOfCrawlers as follows:

public class BasicCrawlController {

    public static void main(String[] args) throws Exception {
            if (args.length != 2) {
                    System.out.println("Needed parameters: ");
                    System.out.println("\t rootFolder (it will contain intermediate crawl data)");
                    System.out.println("\t numberOfCralwers (number of concurrent threads)");
                    return;
            }

            /*
             * crawlStorageFolder is a folder where intermediate crawl data is
             * stored.
             */
             String crawlStorageFolder = args[0];

              args[0] = "/data/crawl/root";

            /*
             * numberOfCrawlers shows the number of concurrent threads that should
             * be initiated for crawling.
             */
            int numberOfCrawlers = Integer.parseInt(args[1]);

            args[1] = "7";


            CrawlConfig config = new CrawlConfig();

            config.setCrawlStorageFolder(crawlStorageFolder);

无论我如何定义它,我仍然收到错误

No matter how I seem to define it I still am receiving the error

Needed parameters: 
 rootFolder (it will contain intermediate crawl data)
 numberOfCralwers (number of concurrent threads)

我认为我需要在运行配置中设置参数"窗口,但我不知道这意味着什么.如何正确配置此基本爬虫以使其启动并运行?

I think that I need to "set the paramaters in the Run Configurations" window but I do not know what that means. How can I properly configure this basic crawler to get it up and running?

推荐答案

在使用 javac 关键字编译程序后,您需要通过键入以下内容来运行它:

After you compile the program with the javac keyword you need to run it by typing the following:

java BasicCrawler 控制器arg1"arg2"

java BasicCrawler Controller "arg1" "arg2"

错误告诉您在运行程序时没有指定 arg[0] 或 arg[1].另外,这个"args[1] = 7";"是怎么回事在您已经收到了爬虫数量参数之后?

The error is telling you that you aren't specifying arg[0] or arg[1] when you run the program. Also, what is with this " args[1] = "7";" after you have already received the number of crawlers parameter?

看起来您正在尝试删除前 5 行,因为您无论如何都在尝试使用硬编码值.然后将 crawlForStorage String 设置为您的目录路径,将 numberOfCrawlers 设置为 7.这样您就不必指定命令行参数了.如果你想使用命令行参数,去掉上面的硬编码值并在 CL 中指定它们

For what it looks like you are trying to do remove the first 5 lines because you are attempting to use hard coded values anyway. Then set the crawlForStorage String to your directory path and the numberOfCrawlers to 7. Then you wouldn't have to specify command line parameters. If you want to use command line parameters get rid of your hard coded values above and specify them at the CL

这篇关于crawler4j 的实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆