Drop Hive表& msck修复失败,表存储在谷歌云存储桶中 [英] Drop Hive Table & msck repair fails with Table stored in google cloud bucket

查看:583
本文介绍了Drop Hive表& msck修复失败,表存储在谷歌云存储桶中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用以下SQL语句在Google Cloud Bucket中创建配置单元表.

I am creating hive table in Google Cloud Bucket using below SQL statement.

CREATE TABLE schema_name.table_name (column1 decimal(10,0), column2 int, column3 date) 
   PARTITIONED BY(column7 date) STORED AS ORC
   LOCATION 'gs://crazybucketstring/' 
   TBLPROPERTIES('ORC.COMPRESS'='SNAPPY');

然后我使用distcp命令将数据加载到该表中,现在当我尝试删除表时,它会失败,并显示以下错误消息,即使我尝试删除空表也会失败.

Then I loaded data into this table using distcp command, Now when I try to Drop table it fails with below error message, Even if I try to drop empty table it fails.

hive>>DROP TABLE schema_name.table_name; 

**Error:** Error while processing statement: 
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask.MetaException
(message:java.lang.IllegalArgumentException: `hadoopPath must not be null`)
(state=08S01,code=1)

我还使用gsutil rm -r gs://命令从Google Cloud Storage存储桶中删除了文件,但仍然无法删除表并给出相同的错误

I also removed files from Google Cloud Storage bucket using gsutil rm -r gs:// command but still not able to delete table and giving same error

在运行msck repair table时,也会出现以下错误.

Also on running msck repair table it is giving following error.

FAILED: 
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1) 

任何主意可能是什么问题?

Any Idea what could be wrong?

推荐答案

问题与存储区位置有关.我将尝试逐步解释如何重新创建它以及如何解决它.同样的问题也会导致无法运行msck repair命令.

The problem is related to bucket location. I will try to explain it in step by step manner on how to recreate it and how to solve it. this same issue also result in unable to run msck repair command.

如何重新创建它:

  1. 首先,我创建了一个表(T1),其位置指向此处给出的存储桶:
    LOCATION 'gs://crazybucketstring/'

  1. First I created a table (T1) with location pointing to the bucket given here:
    LOCATION 'gs://crazybucketstring/'

然后我在子文件夹中用给定的位置创建了另一个表(T2)内存储桶 下面
LOCATION gs://crazybucketstring/schemname/tableaname/

Then I created another table (T2) in-side bucket in subfolder with location as given below
LOCATION gs://crazybucketstring/schemname/tableaname/

现在,当我尝试删除第一个表(T1)时,它将引发整个错误 存储桶的行为与表相同,它不能删除存储桶,它只能 删除文件.

Now when I try to drop first table (T1) it throws error as entire bucket is behaving as table and it can't delete bucket, it can just delete files.

当我尝试删除表(T2)时,我可以删除它,并且存储桶子目录中的文件也被删除,因为它是托管表.表T1仍然令人头疼.

When I try to drop table (T2) I am able to drop it and also files inside bucket subdirectory is deleted as it is managed table. Table T1 is still a headache.

为了绝望地删除表T1,我使用gsutil rm -r命令清空存储桶,并尝试msck repair table tablename,奇怪的是msck repair命令失败并显示以下错误消息

In a desperate bid to delete Table T1, I emptied the bucket using gsutil rm -r command and tried msck repair table tablename and strangely msck repair command failed with below error message

>>  msck repair table tablename
Error: Error while processing statement: FAILED: 
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask (state=08S01,code=1)

像往常一样,DROP命令仍然无法正常工作.

As usual DROP command was still not working.

解决方案:

最终我得到了这个可行的想法.

Eventually I got this Idea which worked.

  1. 我更改了表T1并将其位置设置到存储桶(而不是裸存储桶)内的子目录中.
    ALTER TABLE TABLENAME SET LOCATION gs://crazybucketstring/schemname/tableaname/
  2. 现在我进行"msck修复",并且不会引发任何错误.
  3. 我发出了DROP Table命令,它起作用了.
  1. I Altered Table T1 and SET its location to subdirectory inside bucket instead of bare bucket.
    ALTER TABLE TABLENAME SET LOCATION gs://crazybucketstring/schemname/tableaname/
  2. Now I do 'msck repair' and it doesn't throw any error.
  3. I issued DROP Table command and it worked.

此问题与我们应该处理的表位置有关 在同一存储桶中创建多个表时,请谨慎操作.最好 实践是在存储桶中使用不同的子目录创建 不同的表,并避免仅使用存储桶路径作为表位置,特别是在必须在同一存储桶中创建多个表的情况下.谢谢,您可以随时与接触大数据问题.

This issue is related to Table Location which we should deal with carefully while creating more than 1 Table in same bucket. Best practice is to use different subdirectories inside bucket to create different tables and avoid using just bucket path as table location specially if you have to create multiple tables in same bucket. Thank you and feel free to reach out to Me for Big Data issues.

这篇关于Drop Hive表& msck修复失败,表存储在谷歌云存储桶中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆