nvalid语法:使用初始sortkeys自动创建表sortkey [英] nvalid syntax: Create table sortkey auto with initial sortkeys

查看:114
本文介绍了nvalid语法:使用初始sortkeys自动创建表sortkey的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 target-redshift 将数据推送到 aws-redshift

https://pypi.org/project/target-redshift/

我正在使用气流来监控etl状态

这是错误日志,我不知道这是什么意思.几乎没有关于目标红移的在线文档.有什么办法可以解决这个错误?

  [2021-03-07 15:04:11,841] {bash_operator.py:126}信息-错误写入记录的异常[2021-03-07 15:04:11,841] {bash_operator.py:126}信息-追溯(最近一次通话结束):[2021-03-07 15:04:11,841] {bash_operator.py:126}信息-文件"/usr/local/airflow/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py",第300行,位于write_batch中[2021-03-07 15:04:11,841] {bash_operator.py:126}信息-{'version':target_table_version})[2021-03-07 15:04:11,841] {bash_operator.py:126}信息-文件"/usr/local/airflow/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/sql_base.py",第840行,位于write_batch_helper中[2021-03-07 15:04:11,841] {bash_operator.py:126}信息-元数据)[2021-03-07 15:04:11,841] {bash_operator.py:126}信息-文件"/usr/local/airflow/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py",第588行,位于write_table_batch中[2021-03-07 15:04:11,841] {bash_operator.py:126}信息-table = sql.Identifier(remote_schema ['name'])[2021-03-07 15:04:11,841] {bash_operator.py:126}信息-文件"/usr/local/airflow/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/执行中的postgres.py",第65行[2021-03-07 15:04:11,841] {bash_operator.py:126}信息-返回super(_MillisLoggingCursor,self).execute(query,vars)[2021-03-07 15:04:11,841] {bash_operator.py:126}信息-文件"/usr/local/airflow/.virtualenvs/target-redshift/lib/python3.7/site-packages/psycopg2/第461行的"extras.py"[2021-03-07 15:04:11,841] {bash_operator.py:126}信息-返回super(LoggingCursor,self).execute(query,vars)[2021-03-07 15:04:11,842] {bash_operator.py:126}信息-psycopg2.errors.InternalError_:无效的语法:使用初始排序键自动创建表sortkey.[2021-03-07 15:04:11,842] {bash_operator.py:126}信息-[2021-03-07 15:04:11,842] {bash_operator.py:126}信息-严重(异常写入记录",InternalError_(无效语法:使用初始排序键自动创建表排序键.\ n')) 

解决方案

正在执行此形式的查询的任务下方.

 创建表schema.tablename_copy(如schema.tablename) 

这就是引发错误的原因无效的语法:自动创建带有初始sortkeys的表sortkey auto.

修复或删除SORTKEY&原始表的DISTSTYLE解决了该问题.例如.

  ALTER TABLE schema.tablename ALTER SORTKEY NONE;ALTER TABLE schema.tablename ALTER DISTSTYLE EVEN; 

根据表的实际需求(在排序和distkey方面),您可能会做其他事情.但是,很有可能这是一些加载表,并且没有SORTKEY等应该没什么大不了的.

其他方法如果上述方法失败,则可以使用多种方法来避免 Like .由于此 LIKE 可能是加载/向上插入序列中可能的一部分.例如

  • 制作目标表的临时副本(使用LIKE语句)
  • 将新数据加载到临时表中
  • 从目标表中删除ID在新数据中的行
  • 将新数据插入目标表
  • 清理临时表.

Airflow可能会在幕后为您执行此SQL查询序列.您可以将Airflow的步骤分解为单独的任务,并避免使用 Like .

  1. 删除原始表(根据使用情况可能是可能的)-我发现重新创建表时LIKE问题消失了.
  2. 使用普通的CREATE STATEMENT显式创建重复表(而不是依赖于LIKE)
  3. 考虑直接对表进行INSERT/APPEND/COPY并稍后处理重复项

其他信息:此新错误位于 Redshift端上.不能完全确定为什么,但是 LIKE 语句正在尝试传输SORTKEY,但无法这样做.

如果检查原始表 schema.tablename 的定义,您可能会发现Redshifts对它的理解有些奇怪.

亚马逊提供了此视图,使您可以获取完整的表DDL语句 SELECT * FROMWHERE schemaname ='schema'AND tablename ='tablename'

也许会显示一些有用的东西.在我的情况下,SORTKEY和DISTKEY都抛出错误:对我来说,这表明Redshift对键的内部理解存在缺陷.更重要的是-就我而言,我从未设置过这些键,因此我认为Redshift暗示了它们.

I'm trying to use target-redshift to push data to aws-redshift

https://pypi.org/project/target-redshift/

I am using airflow to monitor etl status

This is error log and i have no clue what it means. Online documentation hardly exists for target-redshift. Is there any way to go around this error ?

[2021-03-07 15:04:11,841] {bash_operator.py:126} INFO - ERROR Exception writing records
[2021-03-07 15:04:11,841] {bash_operator.py:126} INFO - Traceback (most recent call last):
[2021-03-07 15:04:11,841] {bash_operator.py:126} INFO -   File "/usr/local/airflow/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 300, in write_batch
[2021-03-07 15:04:11,841] {bash_operator.py:126} INFO -     {'version': target_table_version})
[2021-03-07 15:04:11,841] {bash_operator.py:126} INFO -   File "/usr/local/airflow/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/sql_base.py", line 840, in write_batch_helper
[2021-03-07 15:04:11,841] {bash_operator.py:126} INFO -     metadata)
[2021-03-07 15:04:11,841] {bash_operator.py:126} INFO -   File "/usr/local/airflow/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 588, in write_table_batch
[2021-03-07 15:04:11,841] {bash_operator.py:126} INFO -     table=sql.Identifier(remote_schema['name'])
[2021-03-07 15:04:11,841] {bash_operator.py:126} INFO -   File "/usr/local/airflow/.virtualenvs/target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 65, in execute
[2021-03-07 15:04:11,841] {bash_operator.py:126} INFO -     return super(_MillisLoggingCursor, self).execute(query, vars)
[2021-03-07 15:04:11,841] {bash_operator.py:126} INFO -   File "/usr/local/airflow/.virtualenvs/target-redshift/lib/python3.7/site-packages/psycopg2/extras.py", line 461, in execute
[2021-03-07 15:04:11,841] {bash_operator.py:126} INFO -     return super(LoggingCursor, self).execute(query, vars)
[2021-03-07 15:04:11,842] {bash_operator.py:126} INFO - psycopg2.errors.InternalError_: Invalid syntax: Create table sortkey auto with initial sortkeys.
[2021-03-07 15:04:11,842] {bash_operator.py:126} INFO - 
[2021-03-07 15:04:11,842] {bash_operator.py:126} INFO - CRITICAL ('Exception writing records', InternalError_('Invalid syntax: Create table sortkey auto with initial sortkeys.\n'))

解决方案

Underneath your task somewhere a query of this form is being run.

CREATE TABLE schema.tablename_copy 
(LIKE schema.tablename)

This is what is throwing the error Invalid syntax: Create table sortkey auto with initial sortkeys.

Fixing or removing the SORTKEY & DISTSTYLE of the origin table fixes the issue. eg.

ALTER TABLE schema.tablename ALTER SORTKEY NONE;
ALTER TABLE schema.tablename ALTER DISTSTYLE EVEN;

Depending on what your table actually needs (in terms of sort and distkeys) you might do something else. However more than likely this is some loading table and having no SORTKEY etc shouldn't be a big deal.

Other Approaches If the above fails any number of ways of avoiding the LIKE will also work. Since this LIKE is more than likely part of a load / upsert sequence. eg

  • Make a temp copy of destination table (using the LIKE statement)
  • Load new data into temp table
  • Remove from the destination table rows who's IDs are in the new data
  • Insert the new data into the destination table
  • Clean up temp table.

Airflow is probably doing this sequence of SQL queries behind the scenes for you. You could just dismantle the steps of the Airflow into separate tasks and avoid the use of LIKE.

  1. DROPPING the origin table (might be possible depending on use case) - I found on recreating the table the issue with LIKE went away.
  2. Explicitly create the duplicate table using a normal CREATE STATEMENT (instead of depending on a LIKE)
  3. Consider doing an INSERT / APPEND / COPY directly to the table and handling duplicates later

Extra info: This new error and it is on the Redshift side. Not entirely sure why but the LIKE statement is trying to transfer SORTKEYs and is unable to do so.

If you check the definition of the origin table schema.tablename you'll perhaps find something odd with Redshifts understanding of it.

The Amazon guys have provided this view to allow you to get full table DDL statements link

Running against the view:

SELECT * FROM 
WHERE schemaname = 'schema' AND tablename = 'tablename'

Will maybe show something useful. In my case the SORTKEY and DISTKEYs were both throwing errors: which to me indicated that Redshift had a flawed internal understanding of the keys. What is more - in my case I never set these keys, so I think they were implied by Redshift.

这篇关于nvalid语法:使用初始sortkeys自动创建表sortkey的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆