向数据库列添加注释并从AWS Glue检索 [英] Adding comments to database columns and retrieving from AWS Glue
问题描述
我正在尝试将AWS GLUE数据目录合并到我正在构建的Data Lake中.我正在使用一些不同的数据库,并想将COMMENTS添加到其中一些表的列中.这些数据库包括Redshift和MySql.通常,我会通过
I'm trying to incorporate a AWS GLUE Data Catalog to my Data Lake I'm building out. I'm using a few different databases and would like to add COMMENTS to columns in a few of these tables. These databases include Redshift and MySql. I usually add the comments to the column by doing something along the lines of
COMMENT ON COLUMN table.column_name IS 'This is the comment';
现在我知道Glue有一个在GUI中显示的注释字段.有没有一种方法可以将Glue中的注释字段与我添加到数据库列中的注释同步?
Now i know that Glue has a comment field that shows in the GUI. Is there a way to sync the comment field in Glue with the comments I add to the columns in a DB?
推荐答案
为了更新有关AWS Glue数据目录中定义的表的一些元信息,您需要使用 get_table()
和 update_table()
方法与例如, boto3
.
In order to update some meta information about a table that has been defined in AWS Glue Data Catalog, you would need to use a combination of get_table()
and update_table()
methods with boto3
for example .
这是最幼稚的方法:
import boto3
from pprint import pprint
glue_client = boto3.client('glue')
database_name = "__SOME_DATABASE__"
table_name = "__SOME_TABLE__"
response = glue_client.get_table(
DatabaseName=database_name,
Name=table_name
)
original_table = response['Table']
此处original_table
遵循get_table()
定义的响应语法.但是,我们需要从中删除一些字段,以便在使用update_table()
时它将通过验证.可以通过直接将original_table
传递给update_table()
而不会产生任何障碍
Here original_table
adheres response syntax defined by get_table()
. However, we need to remove some fields from it so it would pass validation when we use update_table()
. List of allowed keys could be obtained by passing original_table
directly to update_table()
without any chagnes
allowed_keys = [
"Name",
"Description",
"Owner",
"LastAccessTime",
"LastAnalyzedTime",
"Retention",
"StorageDescriptor",
"PartitionKeys",
"ViewOriginalText",
"ViewExpandedText",
"TableType",
"Parameters"
]
updated_table = dict()
for key in allowed_keys:
if key in original_table:
updated_table[key] = original_table[key]
为简单起见,我们将更改表中第一列的注释
For simplicity sake, we will change comment of the very first column from the table
new_comment = "Foo Bar"
updated_table['StorageDescriptor']['Columns'][0]['Comment'] = new_comment
response = glue_client.update_table(
DatabaseName=database_name,
TableInput=updated_table
)
pprint(response)
很显然,如果您想在特定列中添加评论,则需要将其扩展到
Obviously, if you want to add a comment to a specific column you would need to extend this to
new_comment = "Targeted Foo Bar"
target_column_name = "__SOME_COLUMN_NAME__"
for col in updated_table['StorageDescriptor']['Columns']:
if col['Name'] == target_column_name:
col['Comment'] = new_comment
response = glue_client.update_table(
DatabaseName=database_name,
TableInput=updated_table
)
pprint(response)
这篇关于向数据库列添加注释并从AWS Glue检索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!