如何添加;在每个“显示创建表"的末尾 [英] How to add ; at the end of each "show create table"
问题描述
所以,我有一个txt文件,其中包含以下内容:
So, I have a txt file that has the below content in it:
CREATE EXTERNAL TABLE `table1`(
`tab_id bigint COMMENT 'The unique identifier of thetable')
ROW FORMAT SERDE
*
STORED AS INPUTFORMAT
*
OUTPUTFORMAT
*
LOCATION
*
TBLPROPERTIES (
'transient_lastDdlTime'='1556u3ehw27')
CREATE TABLE `table2`(
`count` bigint)
ROW FORMAT SERDE
*
STORED AS INPUTFORMAT
*
OUTPUTFORMAT
*
LOCATION
'hdfs://path/'
TBLPROPERTIES (
'transient'='15407')
如您所见,在每个表的DDL之后,没有;在它的结尾.我正在尝试编写一个插入;在每个表DDL之后.因此输出应为:
CREATE EXTERNAL TABLE `table1`(
`tab_id bigint COMMENT 'The unique identifier of thetable')
ROW FORMAT SERDE
*
STORED AS INPUTFORMAT
*
OUTPUTFORMAT
*
LOCATION
*
TBLPROPERTIES (
'transient_lastDdlTime'='1556u3ehw27');
CREATE TABLE `table2`(
`count` bigint)
ROW FORMAT SERDE
*
STORED AS INPUTFORMAT
*
OUTPUTFORMAT
*
LOCATION
'hdfs://path/'
TBLPROPERTIES (
'transient'='15407');
我尝试了两种方法.(1)通过添加DDL创建脚本和python程序.
下面是我的DDL创建.sh脚本,该脚本在数据库的表中运行并为数据库中的所有表生成一个文件.我尝试使用最后一行中显示的cat函数(#cat ...)这样做,但不断收到错误.
Below is my DDL creation .sh script which runs through my database's tables and generates a single file for all the tables within a database.I tried to use the cat function shown below on the last line(# cat...) to do that but kept receiving errors.
hiveDBName=my_db;
showcreate="show create table "
showpartitions="show partitions "
terminate=";"
tables=`hive -e "use $hiveDBName;show tables;"`
tab_list=`echo "${tables}"`
rm -f ${hiveDBName}_all_table_partition_DDL.sql
for list in $tab_list
do
echo "Generating table script for " ${hiveDBName}.${list}
showcreatetable=${showcreatetable}${showcreate}${hiveDBName}.${list}${terminate}
done
echo " ====== Create Tables ======= : " $showcreatetable
##Remove the file
rm -f ${hiveDBName}_extract_all_tables.txt
hive -e "use $hiveDBName; ${showcreatetable}" > /home/path/filter_ddls/aa.sql
grep -v "WARN" /home/path/filter_ddls/aa.sql >/home/path/hive_db_ddls/${hiveDBName}_extract_all_tables.sql
# cat a1.sql + ";\n\n" >> ${hiveDBName}_extract_all_tables.sql
下面是我的Python程序,但是此方法的输出增加了;只有在跳过某些表的tblproperties之后.
Below is my Python program but the output of this method adds ; only after tblproperties which skips certain tables.
import re
f = open("/home/path/ddl.sql", 'rt', encoding='latin-1').read()
with open("/home/path/new_ddl.sql","w") as output:
output.write(re.sub(r'(TBLPROPERTIES \(.*?\))', r'\1;', f, flags=re.DOTALL))
有什么想法或建议可以做到这一点?最好是第一个选项(.sh脚本).
Any ideas or suggestion to make this happen? Preferable the first option (the .sh script).
推荐答案
在聊天中进行讨论之后,讨论的两个解决方案如下:
Following our discussion in chat, the two solutions discussed were as follows:
如果您的格式是一致的,并且始终在末尾需要';'
的行中出现 transient
,则使用简单的 sed
替换就足够了,例如
If your format is consistent and transient
is always present in the line where a closing ';'
is required at the end, then a simple sed
substitution is all that is needed, e.g.
sed '/transient/s/$/;/' file
(添加 -i
选项以就地编辑文件,和/或添加 -i.bak
进行就地编辑,并保留原始文件不变 .bak
扩展名)
(add the -i
option to edit the file in-place, and/or add -i.bak
to edit in place preserving the original file unchanged with the .bak
extension)
如果另一方面,内容可能更改并且 transient
可能存在或可能不存在,那么您可以关闭 TBLPROPERTIES
标记,然后在文件,以在 TBLPROPERTIES
之后找到第一个结束')'
,然后在其中添加结束';'
.
If on the other hand, the content can change and transient
may or may not be present, then you can key off the TBLPROPERTIES
tag and then scan forward in the file to find the first closing ')'
following TBLPROPERTIES
and add the closing ';'
there.
awk
提供了一个更强大的解决方案,因为没有保证 TBLPROPERTIES
和结束')'之间的可能行数代码>.在
awk
下面与一个简单变量 look
一起用作服务器标记,指示您是否对结束')' /code>在
TBLPROPERTIES
( look = 1
)之后,还是不在( look = 0
)之后.
awk
provides a more robust solution there as no guarantees have been given as to the possible number of lines between TBLPROPERTIES
and the closing ')'
. Below awk
is used with a simple variable look
to server as a flag indicating whether you are looking for the closing ')'
after a TBLPROPERTIES
(look=1
), or not (look=0
).
例如:
awk -v look=0 '
/^TBLPROPERTIES/ { look=1 }
look == 1 {
if ( sub (/[)]$/,");") )
look=0
}1
' file
GNU awk
具有 gawk -i inplace
扩展名,以允许就地编辑文件,类似于 sed
,否则,您只需重定向输出到临时文件,然后复制或移动到原始文件名.
GNU awk
has gawk -i inplace
extension to allow editing the file in-place, similar to sed
, otherwise you simply redirect the output to a temp file and then copy or move to the original filename.
无论使用上面的 sed
还是 awk
,输出都具有所需的终止';'
,例如
Whether using sed
or awk
above, the output has the desired terminating ';'
, e.g.
CREATE EXTERNAL TABLE `table1`(
`tab_id bigint COMMENT 'The unique identifier of thetable')
ROW FORMAT SERDE
*
STORED AS INPUTFORMAT
*
OUTPUTFORMAT
*
LOCATION
*
TBLPROPERTIES (
'transient_lastDdlTime'='1556u3ehw27');
CREATE TABLE `table2`(
`count` bigint)
ROW FORMAT SERDE
*
STORED AS INPUTFORMAT
*
OUTPUTFORMAT
*
LOCATION
'hdfs://path/'
TBLPROPERTIES (
'transient'='15407');
如果您还有其他问题,请告诉我.
Let me know if you have further questions.
这篇关于如何添加;在每个“显示创建表"的末尾的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!