将Hive 1.2 Metastore Service配置为S3存储而不是HDFS后无法启动 [英] Hive 1.2 Metastore Service doesn't start after configuring it to S3 storage instead HDFS

查看:128
本文介绍了将Hive 1.2 Metastore Service配置为S3存储而不是HDFS后无法启动的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在独立模式下有一个Apache Spark集群(2.2.0).直到现在使用HDFS来运行镶木地板文件.我正在使用Apache Hive 1.2的Hive Metastore Service通过Thriftserver通过JDBC来访问Spark.

I have an Apache Spark Cluster(2.2.0) in standalone mode. Till now was running using HDFS to store the parquet files. I'm using the Hive Metastore Service of Apache Hive 1.2 to access, using the Thriftserver, Spark over JDBC.

现在,我想使用S3对象存储而不是HDFS.我在hive-site.xml中添加了以下配置:

Now I want to use S3 Object Storage instead HDFS. I have added the following configuration to my hive-site.xml:

<property>
  <name>fs.s3a.access.key</name>
  <value>access_key</value>
  <description>Profitbricks Access Key</description>
</property>
<property>
  <name>fs.s3a.secret.key</name>
  <value>secret_key</value>
  <description>Profitbricks Secret Key</description>
</property>
<property>
  <name>fs.s3a.endpoint</name>
  <value>s3-de-central.profitbricks.com</value>
  <description>ProfitBricks S3 Object Storage Endpoint</description>
</property>
<property>
  <name>fs.s3a.endpoint.http.port</name>
  <value>80</value>
  <description>ProfitBricks S3 Object Storage Endpoint HTTP Port</description>
</property>
<property>
  <name>fs.s3a.endpoint.https.port</name>
  <value>443</value>
  <description>ProfitBricks S3 Object Storage Endpoint HTTPS Port</description>
</property>
<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>s3a://dev.spark.my_bucket/parquet/</value>
  <description>Profitbricks S3 Object Storage Hive Warehouse Location</description>
</property>

我在MySQL 5.7数据库中有配置单元metastore.我已将以下jar文件添加到Hive lib文件夹中:

I have the hive metastore in a MySQL 5.7 database. I have added to the Hive lib folder the following jar files:

  • aws-java-sdk-1.7.4.jar
  • hadoop-aws-2.7.3.jar

我已在MySQL上删除了旧的配置单元metastore模式,然后使用以下命令启动了metastore服务:hive --service metastore &并出现以下错误:

I have deleted the old hive metastore schema on MySQL and then I start the metastore service with the following command: hive --service metastore & and I get the following error:

java.lang.NoClassDefFoundError: com/fasterxml/jackson/databind/ObjectMapper
        at com.amazonaws.util.json.Jackson.<clinit>(Jackson.java:27)
        at com.amazonaws.internal.config.InternalConfig.loadfrom(InternalConfig.java:182)
        at com.amazonaws.internal.config.InternalConfig.load(InternalConfig.java:199)
        at com.amazonaws.internal.config.InternalConfig$Factory.<clinit>(InternalConfig.java:232)
        at com.amazonaws.ServiceNameFactory.getServiceName(ServiceNameFactory.java:34)
        at com.amazonaws.AmazonWebServiceClient.computeServiceName(AmazonWebServiceClient.java:703)
        at com.amazonaws.AmazonWebServiceClient.getServiceNameIntern(AmazonWebServiceClient.java:676)
        at com.amazonaws.AmazonWebServiceClient.computeSignerByURI(AmazonWebServiceClient.java:278)
        at com.amazonaws.AmazonWebServiceClient.setEndpoint(AmazonWebServiceClient.java:160)
        at com.amazonaws.services.s3.AmazonS3Client.setEndpoint(AmazonS3Client.java:475)
        at com.amazonaws.services.s3.AmazonS3Client.init(AmazonS3Client.java:447)
        at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:391)
        at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:371)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:235)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2811)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
        at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:104)
        at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:140)
        at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:146)
        at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:159)
        at org.apache.hadoop.hive.metastore.Warehouse.getDefaultDatabasePath(Warehouse.java:177)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:601)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5757)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:5990)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5915)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: java.lang.ClassNotFoundException: com.fasterxml.jackson.databind.ObjectMapper
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

缺少的类属于Jackson库,然后我复制了我的spark-2.2.0-bin-hadoop2.7/jars/文件夹中的Jackson-*.jar:

The missing class belongs to the Jackson library, then I have copied the Jackson-*.jar located on my spark-2.2.0-bin-hadoop2.7/jars/ folder which are:

  • jackson-annotations-2.6.5.jar
  • jackson-core-2.6.5.jar
  • jackson-core-asl-1.9.13.jar
  • jackson-databind-2.6.5.jar
  • jackson-jaxrs-1.9.13.jar
  • jackson-mapper-asl-1.9.13.jar
  • jackson-module-paranamer-2.6.5.jar
  • jackson-module-scala_2.11-2.6.5.jar
  • jackson-xc-1.9.13.jar

但是随后出现以下错误:

But then I got the following error:

2018-01-05 17:51:00,819 ERROR [main]: metastore.HiveMetaStore (HiveMetaStore.java:main(5920)) - Metastore Thrift Server threw an exception...
java.lang.NumberFormatException: For input string: "100M"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:589)
        at java.lang.Long.parseLong(Long.java:631)
        at org.apache.hadoop.conf.Configuration.getLong(Configuration.java:1319)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:248)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2811)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
        at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:104)
        at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:140)
        at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:146)
        at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:159)
        at org.apache.hadoop.hive.metastore.Warehouse.getDefaultDatabasePath(Warehouse.java:177)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:601)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5757)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:5990)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5915)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:148)

我认为这里的错误与jar版本的不兼容有关,但我找不到正确的版本.

I think the error here it have something to do with some jar version incompatibility but I'm not able to find the correct versions.

有人可以在这里帮助我吗?

Can someone help me here?

推荐答案

  1. 您绝对不能从所有期望中混合使用Hadoop常见版本,hadoop-aws,aws-s3-sdk和jackson版本,否则您将看到堆栈跟踪.
  2. 及其全部开放源代码,因此,如果您在本地D/L所有源代码JAR,您的IDE将帮助您找到导致堆栈跟踪的原因. 这就是我们所有人所做的.这不是魔术,现代的IDE(智能IDEA)甚至具有特殊的堆栈调试功能.
  1. You absolutely cannot mix versions of the Hadoop-common, hadoop-aws, aws-s3-sdk and jackson versions from what everything expects, or you will see stack traces.
  2. And its all open source, so if you D/L all the source JARs locally, your IDE will help you find what's causing the stack trace. This is what we all do. It's not magic, modern IDEs (intellij IDEA) even have special stack debugging.

之所以要输入此值,是因为hadoop-common的/core-default.xml资源中设置的fs.s3a.multipart.size的值为100M,它随

This one is coming in because the value of fs.s3a.multipart.size set in hadoop-common's /core-default.xml resource is 100M, which came in with HADOOP-13680 and the range parsing handling numbers like "100M" instead of 104857600 . This stack trace says "Hadoop 2.8+ configuration"

您可以尝试将配置中的属性设置为该数值,但是这是一个警告信号,表明JAR版本不同步,您可能只会再隔几行,然后其他内容才能中断.

You could try setting the property in your configs to that numeric value, but its a warning sign that versions of JARs are out of sync and you will probably only get a few lines further before something else breaks.

修复:确保hadoop-common.jarhadoop-aws.jar同步.看起来您已经排好了杰克逊和awson,尽管杰克逊足够复杂,您永远都不能认为这是理所当然的.

Fix: make sure that hadoop-common.jar and hadoop-aws.jar are in sync. It looks like you've got the jackson and aws ones lined up, though jackson is complex enough you can never take that for granted.

这篇关于将Hive 1.2 Metastore Service配置为S3存储而不是HDFS后无法启动的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆