NutchTutorial 中的 nutch 1.16 爬网示例在 org.apache.commons.cli.OptionBuilder (Windows 10) 上返回 NoSuchMethodError [英] nutch 1.16 crawl example from NutchTutorial returns NoSuchMethodError on org.apache.commons.cli.OptionBuilder (Windows 10)

查看:60
本文介绍了NutchTutorial 中的 nutch 1.16 爬网示例在 org.apache.commons.cli.OptionBuilder (Windows 10) 上返回 NoSuchMethodError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试使用来自 https 的代码示例和说明运行 Nutch 1.16 爬虫://cwiki.apache.org/confluence/display/NUTCH/NutchTutorial 但无论如何,我似乎在启动实际爬行时卡住了.我在 Windows 10 机器上通过 Cygwin64 运行它,使用二进制安装(尽管我尝试编译一个具有相同结果的).最初,Nutch 会抛出一个 UnsatisfiedLinkError (NativeIO$Windows.access0),我通过添加来自同一问题的其他几个答案的库来修复它.这样做后,我至少可以启动一个服务器,但是无论我做什么,尝试通过 nutch 爬行都会返回 NoSuchMethodError.nutch-site.xml 仅包含 http.agent.nameplugin.includes 选项,两者均取自同一个示例.

I have been trying to run a Nutch 1.16 crawler using code example and instructions from https://cwiki.apache.org/confluence/display/NUTCH/NutchTutorial but no matter what, I seem to get stuck when initiating the actual crawl. I'm running it through Cygwin64 on a Windows 10 machine, using a binary installation (though I have tried compiling one with the same results). Initially, Nutch would throw an UnsatisfiedLinkError (NativeIO$Windows.access0) which I fixed by adding libraries from several other answers for the same issue. Upon doing so, I could at least start a server, but trying to crawl through nutch itself would return NoSuchMethodError no matter what I did. nutch-site.xml only contains http.agent.name and plugin.includes options, both taken from the same example.

以下是错误信息(我也试过省略seed.txt):

The following is the error message (I also tried to omit seed.txt):

$ bin/nutch inject crawl/crawldb urls/seed.txt
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.commons.cli.OptionBuilder.withArgPattern(Ljava/lang/String;I)Lorg/apache/commons/cli/OptionBuilder;
        at org.apache.hadoop.util.GenericOptionsParser.buildGeneralOptions(GenericOptionsParser.java:207)
        at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:370)
        at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
        at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:138)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:59)
        at org.apache.nutch.crawl.Injector.main(Injector.java:534)

以下是 lib 目录中当前存在的库列表:

The following is the list of libraries currently present in the lib directory:

activation-1.1.jar
amqp-client-5.2.0.jar
animal-sniffer-annotations-1.14.jar
antlr-runtime-3.5.2.jar
antlr4-4.5.1.jar
aopalliance-1.0.jar
apache-nutch-1.16.jar
apacheds-i18n-2.0.0-M15.jar
apacheds-kerberos-codec-2.0.0-M15.jar
api-asn1-api-1.0.0-M20.jar
api-util-1.0.0-M20.jar
args4j-2.0.16.jar
ascii-utf-themes-0.0.1.jar
asciitable-0.3.2.jar
asm-3.3.1.jar
asm-7.1.jar
avro-1.7.7.jar
bootstrap-3.0.3.jar
cglib-2.2.1-v20090111.jar
cglib-2.2.2.jar
char-translation-0.0.2.jar
checker-compat-qual-2.0.0.jar
closure-compiler-v20130603.jar
commons-beanutils-1.7.0.jar
commons-beanutils-core-1.8.0.jar
commons-cli-1.2-sources.jar
commons-cli-1.2.jar
commons-codec-1.11.jar
commons-collections-3.2.2.jar
commons-collections4-4.2.jar
commons-compress-1.18.jar
commons-configuration-1.6.jar
commons-daemon-1.0.13.jar
commons-digester-1.8.jar
commons-el-1.0.jar
commons-httpclient-3.1.jar
commons-io-2.4.jar
commons-jexl-2.1.1.jar
commons-lang-2.6.jar
commons-lang3-3.8.1.jar
commons-logging-1.1.3.jar
commons-math3-3.1.1.jar
commons-net-3.1.jar
crawler-commons-1.0.jar
curator-client-2.7.1.jar
curator-framework-2.7.1.jar
curator-recipes-2.7.1.jar
cxf-core-3.3.3.jar
cxf-rt-bindings-soap-3.3.3.jar
cxf-rt-bindings-xml-3.3.3.jar
cxf-rt-databinding-jaxb-3.3.3.jar
cxf-rt-frontend-jaxrs-3.3.3.jar
cxf-rt-frontend-jaxws-3.3.3.jar
cxf-rt-frontend-simple-3.3.3.jar
cxf-rt-security-3.3.3.jar
cxf-rt-transports-http-3.3.3.jar
cxf-rt-transports-http-jetty-3.3.3.jar
cxf-rt-ws-addr-3.3.3.jar
cxf-rt-ws-policy-3.3.3.jar
cxf-rt-wsdl-3.3.3.jar
dom4j-1.6.1.jar
ehcache-3.3.1.jar
elasticsearch-0.90.1.jar
error_prone_annotations-2.1.3.jar
FastInfoset-1.2.16.jar
geronimo-jcache_1.0_spec-1.0-alpha-1.jar
gora-hbase-0.3.jar
gson-2.2.4.jar
guava-25.0-jre.jar
guice-3.0.jar
guice-servlet-3.0.jar
h2-1.4.197.jar
hadoop-0.20.0-ant.jar
hadoop-0.20.0-core.jar
hadoop-0.20.0-examples.jar
hadoop-0.20.0-test.jar
hadoop-0.20.0-tools.jar
hadoop-annotations-2.9.2.jar
hadoop-auth-2.9.2.jar
hadoop-common-2.9.2.jar
hadoop-core-1.2.1.jar
hadoop-core_0.20.0.xml
hadoop-core_0.21.0.xml
hadoop-core_0.22.0.xml
hadoop-hdfs-2.9.2.jar
hadoop-hdfs-client-2.9.2.jar
hadoop-mapreduce-client-common-2.2.0.jar
hadoop-mapreduce-client-common-2.9.2.jar
hadoop-mapreduce-client-core-2.2.0.jar
hadoop-mapreduce-client-core-2.9.2.jar
hadoop-mapreduce-client-jobclient-2.2.0.jar
hadoop-mapreduce-client-jobclient-2.9.2.jar
hadoop-mapreduce-client-shuffle-2.2.0.jar
hadoop-mapreduce-client-shuffle-2.9.2.jar
hadoop-yarn-api-2.9.2.jar
hadoop-yarn-client-2.9.2.jar
hadoop-yarn-common-2.9.2.jar
hadoop-yarn-registry-2.9.2.jar
hadoop-yarn-server-common-2.9.2.jar
hadoop-yarn-server-nodemanager-2.9.2.jar
hbase-0.90.0-tests.jar
hbase-0.90.0.jar
hbase-0.92.1.jar
hbase-client-0.98.0-hadoop2.jar
hbase-common-0.98.0-hadoop2.jar
hbase-protocol-0.98.0-hadoop2.jar
HikariCP-java7-2.4.12.jar
htmlparser-1.6.jar
htrace-core-2.04.jar
htrace-core4-4.1.0-incubating.jar
httpclient-4.5.6.jar
httpcore-4.4.9.jar
httpcore-nio-4.4.9.jar
icu4j-61.1.jar
istack-commons-runtime-3.0.8.jar
j2objc-annotations-1.1.jar
jackson-annotations-2.9.9.jar
jackson-core-2.9.9.jar
jackson-core-asl-1.9.13.jar
jackson-databind-2.9.9.jar
jackson-dataformat-cbor-2.9.9.jar
jackson-jaxrs-1.9.13.jar
jackson-jaxrs-base-2.9.9.jar
jackson-jaxrs-json-provider-2.9.9.jar
jackson-mapper-asl-1.9.13.jar
jackson-module-jaxb-annotations-2.9.9.jar
jackson-xc-1.9.13.jar
jakarta.activation-api-1.2.1.jar
jakarta.ws.rs-api-2.1.5.jar
jakarta.xml.bind-api-2.3.2.jar
jasper-compiler-5.5.12.jar
jasper-runtime-5.5.12.jar
java-xmlbuilder-0.4.jar
javassist-3.12.1.GA.jar
javax.annotation-api-1.3.2.jar
javax.inject-1.jar
javax.persistence-2.2.0.jar
javax.servlet-api-3.1.0.jar
jaxb-api-2.2.2.jar
jaxb-impl-2.2.3-1.jar
jaxb-runtime-2.3.2.jar
jcip-annotations-1.0-1.jar
jersey-client-1.19.4.jar
jersey-core-1.9.jar
jersey-guice-1.9.jar
jersey-json-1.9.jar
jersey-server-1.9.jar
jets3t-0.9.0.jar
jettison-1.1.jar
jetty-6.1.26.jar
jetty-client-6.1.22.jar
jetty-continuation-9.4.19.v20190610.jar
jetty-http-9.4.19.v20190610.jar
jetty-io-9.4.19.v20190610.jar
jetty-security-9.4.19.v20190610.jar
jetty-server-9.4.19.v20190610.jar
jetty-sslengine-6.1.26.jar
jetty-util-6.1.26.jar
jetty-util-9.4.19.v20190610.jar
joda-time-2.3.jar
jquery-2.0.3-1.jar
jquery-selectors-0.0.3.jar
jquery-ui-1.10.2-1.jar
jquerypp-1.0.1.jar
jsch-0.1.54.jar
json-smart-1.3.1.jar
jsp-2.1-6.1.14.jar
jsp-api-2.1-6.1.14.jar
jsp-api-2.1.jar
jsr305-3.0.0.jar
junit-3.8.1.jar
juniversalchardet-1.0.3.jar
leveldbjni-all-1.8.jar
log4j-1.2.17.jar
lucene-analyzers-common-4.3.0.jar
lucene-codecs-4.3.0.jar
lucene-core-4.3.0.jar
lucene-grouping-4.3.0.jar
lucene-highlighter-4.3.0.jar
lucene-join-4.3.0.jar
lucene-memory-4.3.0.jar
lucene-queries-4.3.0.jar
lucene-queryparser-4.3.0.jar
lucene-sandbox-4.3.0.jar
lucene-spatial-4.3.0.jar
lucene-suggest-4.3.0.jar
maven-parent-config-0.3.4.jar
metrics-core-3.0.1.jar
modernizr-2.6.2-1.jar
mssql-jdbc-6.2.1.jre7.jar
neethi-3.1.1.jar
netty-3.6.2.Final.jar
netty-all-4.0.23.Final.jar
nimbus-jose-jwt-4.41.1.jar
okhttp-2.7.5.jar
okio-1.6.0.jar
org.apache.commons.cli-1.2.0.jar
ormlite-core-5.1.jar
ormlite-jdbc-5.1.jar
oro-2.0.8.jar
paranamer-2.3.jar
protobuf-java-2.5.0.jar
reflections-0.9.8.jar
servlet-api-2.5-20081211.jar
servlet-api-2.5.jar
skb-interfaces-0.0.1.jar
slf4j-api-1.7.26.jar
slf4j-log4j12-1.7.25.jar
snappy-java-1.0.5.jar
spatial4j-0.3.jar
spring-aop-4.0.9.RELEASE.jar
spring-beans-4.0.9.RELEASE.jar
spring-context-4.0.9.RELEASE.jar
spring-core-4.0.9.RELEASE.jar
spring-expression-4.0.9.RELEASE.jar
spring-web-4.0.9.RELEASE.jar
ST4-4.0.8.jar
stax-api-1.0-2.jar
stax-ex-1.8.1.jar
stax2-api-3.1.4.jar
t-digest-3.2.jar
tika-core-1.22.jar
txw2-2.3.2.jar
typeaheadjs-0.9.3.jar
warc-hadoop-0.1.0.jar
webarchive-commons-1.1.5.jar
wicket-bootstrap-core-0.9.2.jar
wicket-bootstrap-extensions-0.9.2.jar
wicket-core-6.17.0.jar
wicket-extensions-6.13.0.jar
wicket-ioc-6.17.0.jar
wicket-request-6.17.0.jar
wicket-spring-6.17.0.jar
wicket-util-6.17.0.jar
wicket-webjars-0.4.0.jar
woodstox-core-5.0.3.jar
wsdl4j-1.6.3.jar
xercesImpl-2.12.0.jar
xml-apis-1.4.01.jar
xml-resolver-1.2.jar
xmlenc-0.52.jar
xmlParserAPIs-2.6.2.jar
xmlschema-core-2.2.4.jar
zookeeper-3.4.6.jar

这是我的java版本:

This is my java version:

java version "1.8.0_241"
Java(TM) SE Runtime Environment (build 1.8.0_241-b07)
Java HotSpot(TM) 64-Bit Server VM (build 25.241-b07, mixed mode)

我还想指出,尽管其他答案可能已经说了,但 nutch 1.4(或任何其他版本的 nutch)并没有解决这个问题,至少在 Windows 上是这样.

I'd also like to point out that, despite what another answer may have said, nutch 1.4 (or any other version of nutch for that matter) did NOT resolve the issue, at least on Windows.

推荐答案

以下答案对我有用,但我保留了原来的答案,因为它可能对与其他人一起工作的人仍然有用版本的 nutch.

The following answer worked for me, but I left the original one because it may still be useful to someone working with other versions of nutch.

再次感谢 Sebastian Nagel,为了解决 NoSuchMethodError,只需编辑 ivy\ivy.xml 以引用不同版本的 hadoop 库,在我的例子中我安装了 hadoop 3.1.3并且我还在引用的hadoop\bin目录下添加了对应的3.1.3版本的winutils.exehadoop.dllHADOOP_HOME.运行 bin/crawl,它似乎工作正常.

Again, thanks to Sebastian Nagel, in order to get around the NoSuchMethodError, just edit ivy\ivy.xml to reference a different version of hadoop libraries, in my case I installed hadoop 3.1.3 and I also added the corresponding 3.1.3 versions of winutils.exe and hadoop.dll to the hadoop\bin directory referenced by HADOOP_HOME. Running bin/crawl and it seems to be working correctly.

过时的答案:好的,在处理源代码本身之后(由 https://github.com/apache/commons-cli) 在 Sebastian Nagel 的建议下,我能够找到该方法的(非常简单的)实现(https://github.com/marcelmaatkamp/EntityExtractorUtils/blob/master/src/main/java/org/apache/commons/cli/OptionBuilder.java):

Outdated answer: Okay, after working on the source code itself (courtesy of https://github.com/apache/commons-cli) under the suggestion of Sebastian Nagel, I was able to find the (very simple) implementation for the method (https://github.com/marcelmaatkamp/EntityExtractorUtils/blob/master/src/main/java/org/apache/commons/cli/OptionBuilder.java):

    /**
     * The next Option created will have an argument patterns and
     * the number of pattern occurances
     *
     * @param argPattern string representing a pattern regex
     * @param limit the number of pattern occurance in the argument
     * return the OptionBuilder instance
     */
    public static OptionBuilder withArgPattern( String argPattern, 
                                                int limit )
    {
      OptionBuilder.argPattern = argPattern;
OptionBuilder.limit = limit;

使用 maven 然后我能够将代码编译成他们自己的 jar 文件,然后我将其添加到 apache nutch 的 lib 文件夹中.这仍然没有完全解决我的问题,因为整个 nutch 框架似乎都在使用已弃用的函数,这可能意味着在类似情况下需要更多的工作(例如,在使用新 jar 之后,我收到了一个org.apache.hadoop.mapreduce.Job.getInstance 上的 NoSuchMethodError).我将这个答案留在这里作为任何可能也陷入同一问题的人的临时解决方案,但我当然希望有一种更简单的方法可以在探索它们的整个文件结构之前找出哪些方法出现在哪个 jar 文件中,尽管它可能只是我忽略了它.

Using maven I was then able to compile the code into their own jar files, which I then added in the lib folder for apache nutch. This still did not completely resolve my problem, as there seem to be deprecated functions being used by the entire nutch framework, which will probably mean even more work under similar circumstances (for instance, right after using the new jar I've been returned a NoSuchMethodError over org.apache.hadoop.mapreduce.Job.getInstance). I leave this answer here as a temporary solution to anyone who may have also gotten stuck on the same issue, but I surely wish there was an easier way of finding out which methods appear in which jar file before exploring their entire file structure, although it may just be me ignoring it.

这篇关于NutchTutorial 中的 nutch 1.16 爬网示例在 org.apache.commons.cli.OptionBuilder (Windows 10) 上返回 NoSuchMethodError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆