使用正则表达式提取配置单元计数字符串 [英] Extract hive count string using regex
问题描述
我正在尝试使用paramiko获取配置单元表中的记录总数.我知道我们可以使用Pyhive或pyhs2,但是它需要某些配置,并且需要很多时间才能从我的IT团队那里完成.
I am trying to get total number of records in a hive table using paramiko. I know we can use Pyhive or pyhs2 but it requires certain configuration and it will take alot of time to get that done from my IT team.
所以我正在使用paramiko执行以下命令并获取计数:
So I am using paramiko to execute the below command and get count:
beeline -u jdbc:hive2://localhost:10000 -n hive -e 'select count(*) from table_name'
我得到以下结果,
+----------+--+
| _c0 |
+----------+--+
| 1232322 |
+----------+--+
我需要从输出中提取此计数.
I need to extract this count from the output.
我尝试了以下代码和RE,但无法正常工作
I have tried the following code and RE but its not working,
pattern="""
+----------+--+
| _c0 |
+----------+--+
| [0-9]* |
+----------+--+
"""
import paramiko
si, so, se=ssh_con.exec_command("beeline -u jdbc:hive2://localhost:10000 -n hive -e 'select count(*) from table_name'")
print(so.read().decode())
print(re.match(pattern,so.read().decode()))
我能够检索计数并进行打印.只是寻找正则表达式以提取计数.
I am able to retrieve count and print it. Just looking for regular expression to extract count.
推荐答案
在Beeline中,结果可以以不同的格式显示.默认情况下,结果将打印在带有标题的表中.您可以删除标头和表,而无需使用regexp解析结果.添加以下选项:-showHeader = false
-outputformat = tsv2
In Beeline, the result can be displayed in different formats. By default the result is being printed in a table with header. You can remove header and table, no need in parsing result using regexp. Add these options: --showHeader=false
--outputformat=tsv2
beeline --showHeader=false --outputformat=tsv2 -u jdbc:hive2://localhost:10000 -n hive -e 'select count(*) from table_name'
了解有关
这篇关于使用正则表达式提取配置单元计数字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!