如何添加;在每个“显示创建表"的末尾 [英] How to add ; at the end of each "show create table"

查看:91
本文介绍了如何添加;在每个“显示创建表"的末尾的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,我有一个txt文件,其中包含以下内容:

So, I have a txt file that has the below content in it:

CREATE EXTERNAL TABLE `table1`(
  `tab_id bigint COMMENT 'The unique identifier of thetable')
ROW FORMAT SERDE 
  *
STORED AS INPUTFORMAT 
* 
OUTPUTFORMAT 
  *
LOCATION
  *
TBLPROPERTIES (
  'transient_lastDdlTime'='1556u3ehw27')
CREATE TABLE `table2`(
  `count` bigint)
ROW FORMAT SERDE 
 * 
STORED AS INPUTFORMAT 
  *
OUTPUTFORMAT 
  *
LOCATION
  'hdfs://path/'
TBLPROPERTIES (
  'transient'='15407')

如您所见,在每个表的DDL之后,没有;在它的结尾.我正在尝试编写一个插入;在每个表DDL之后.因此输出应为:

CREATE EXTERNAL TABLE `table1`(
  `tab_id bigint COMMENT 'The unique identifier of thetable')
ROW FORMAT SERDE 
  *
STORED AS INPUTFORMAT 
* 
OUTPUTFORMAT 
  *
LOCATION
  *
TBLPROPERTIES (
  'transient_lastDdlTime'='1556u3ehw27');
CREATE TABLE `table2`(
      `count` bigint)
    ROW FORMAT SERDE 
     * 
    STORED AS INPUTFORMAT 
      *
    OUTPUTFORMAT 
      *
    LOCATION
      'hdfs://path/'
    TBLPROPERTIES (
      'transient'='15407');

我尝试了两种方法.(1)通过添加DDL创建脚本和python程序.

下面是我的DDL创建.sh脚本,该脚本在数据库的表中运行并为数据库中的所有表生成一个文件.我尝试使用最后一行中显示的cat函数(#cat ...)这样做,但不断收到错误.

Below is my DDL creation .sh script which runs through my database's tables and generates a single file for all the tables within a database.I tried to use the cat function shown below on the last line(# cat...) to do that but kept receiving errors.

 hiveDBName=my_db;

showcreate="show create table "

 showpartitions="show partitions "

 terminate=";"

 tables=`hive -e "use $hiveDBName;show tables;"`

 tab_list=`echo "${tables}"`

 rm -f ${hiveDBName}_all_table_partition_DDL.sql

 for list in $tab_list
 do
        echo "Generating table script for " ${hiveDBName}.${list}
        showcreatetable=${showcreatetable}${showcreate}${hiveDBName}.${list}${terminate}

 done

 echo " ====== Create Tables ======= : " $showcreatetable

##Remove the file
      rm -f ${hiveDBName}_extract_all_tables.txt

      hive -e "use $hiveDBName; ${showcreatetable}" > /home/path/filter_ddls/aa.sql
      grep -v "WARN" /home/path/filter_ddls/aa.sql >/home/path/hive_db_ddls/${hiveDBName}_extract_all_tables.sql
     # cat a1.sql + ";\n\n" >> ${hiveDBName}_extract_all_tables.sql

下面是我的Python程序,但是此方法的输出增加了;只有在跳过某些表的tblproperties之后.

Below is my Python program but the output of this method adds ; only after tblproperties which skips certain tables.

import re
f = open("/home/path/ddl.sql", 'rt', encoding='latin-1').read()

with open("/home/path/new_ddl.sql","w") as output:
    output.write(re.sub(r'(TBLPROPERTIES \(.*?\))', r'\1;', f, flags=re.DOTALL))

有什么想法或建议可以做到这一点?最好是第一个选项(.sh脚本).

Any ideas or suggestion to make this happen? Preferable the first option (the .sh script).

推荐答案

在聊天中进行讨论之后,讨论的两个解决方案如下:

Following our discussion in chat, the two solutions discussed were as follows:

如果您的格式是一致的,并且始终在末尾需要';'的行中出现 transient ,则使用简单的 sed 替换就足够了,例如

If your format is consistent and transient is always present in the line where a closing ';' is required at the end, then a simple sed substitution is all that is needed, e.g.

sed '/transient/s/$/;/' file

(添加 -i 选项以就地编辑文件,和/或添加 -i.bak 进行就地编辑,并保留原始文件不变 .bak 扩展名)

(add the -i option to edit the file in-place, and/or add -i.bak to edit in place preserving the original file unchanged with the .bak extension)

如果另一方面,内容可能更改并且 transient 可能存在或可能不存在,那么您可以关闭 TBLPROPERTIES 标记,然后在文件,以在 TBLPROPERTIES 之后找到第一个结束')',然后在其中添加结束';'.

If on the other hand, the content can change and transient may or may not be present, then you can key off the TBLPROPERTIES tag and then scan forward in the file to find the first closing ')' following TBLPROPERTIES and add the closing ';' there.

awk 提供了一个更强大的解决方案,因为没有保证 TBLPROPERTIES 和结束')'之间的可能行数.在 awk 下面与一个简单变量 look 一起用作服务器标记,指示您是否对结束')' /code>在 TBLPROPERTIES ( look = 1 )之后,还是不在( look = 0 )之后.

awk provides a more robust solution there as no guarantees have been given as to the possible number of lines between TBLPROPERTIES and the closing ')'. Below awk is used with a simple variable look to server as a flag indicating whether you are looking for the closing ')' after a TBLPROPERTIES (look=1), or not (look=0).

例如:

awk -v look=0 '
    /^TBLPROPERTIES/ { look=1 }
    look == 1 {
        if ( sub (/[)]$/,");") )
            look=0
    }1
' file

GNU awk 具有 gawk -i inplace 扩展名,以允许就地编辑文件,类似于 sed ,否则,您只需重定向输出到临时文件,然后复制或移动到原始文件名.

GNU awk has gawk -i inplace extension to allow editing the file in-place, similar to sed, otherwise you simply redirect the output to a temp file and then copy or move to the original filename.

无论使用上面的 sed 还是 awk ,输出都具有所需的终止';',例如

Whether using sed or awk above, the output has the desired terminating ';', e.g.

CREATE EXTERNAL TABLE `table1`(
  `tab_id bigint COMMENT 'The unique identifier of thetable')
ROW FORMAT SERDE
  *
STORED AS INPUTFORMAT
*
OUTPUTFORMAT
  *
LOCATION
  *
TBLPROPERTIES (
  'transient_lastDdlTime'='1556u3ehw27');
CREATE TABLE `table2`(
  `count` bigint)
ROW FORMAT SERDE
 *
STORED AS INPUTFORMAT
  *
OUTPUTFORMAT
  *
LOCATION
  'hdfs://path/'
TBLPROPERTIES (
  'transient'='15407');

如果您还有其他问题,请告诉我.

Let me know if you have further questions.

这篇关于如何添加;在每个“显示创建表"的末尾的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆