如何在不改变表格模式的情况下在当前表格上存储查询结果? [英] How to store the result of query on the current table without changing the table schema?

查看:136
本文介绍了如何在不改变表格模式的情况下在当前表格上存储查询结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个结构

  {
id:123,
scans:[{
scanid:123,
status:sleep
}]
},
{
id:123,
scans:[{
scanid:123,
status:sleep
}]
}



查询删除重复:

  SELECT * 
FROM(
SELECT
*,
ROW_NUMBER()
OVER(PARTITION BY ID)
row_number,
FROM table1

WHERE row_number = 1

我将目标表指定为table1。



在这里,我将扫描作为重复记录进行扫描,扫描为字符串形式和状态。但是,当我做一些查询(我正在做一个查询来删除重复)并覆盖现有的表时,表架构已更改。它变成 scans_scanid(string) scans_status(string)。扫描记录模式现在已更改。请提出我在哪里出错?

解决方案

据了解,NEST()与UnFlatten Results Output不兼容,用于子查询中的中间结果。



请尝试下面的解决方法

注意,我使用INTEGER作为id和scanid。如果他们应该是STRING,则需要

a。在输出模式部分中修改
以及

b。在 t = {scanid:parseInt(x [0]),status:x [1]}中移除parseInt()函数的使用}

  SELECT id,scans.scanid,scans.status 
FROM JS(
(//输入表
SELECT id,NEST( CONCAT(STRING(scanid),',',STRING(status)))AS扫描
FROM(
SELECT id,scans.scanid,scans.status
FROM(
SELECT id,scans.scanid,scans.status,
ROW_NUMBER()OVER(PARTITION BY id)AS dup
FROM table1
)WHERE dup = 1
)GROUP BY id $ b $'b',
id,scans,//输入列
[{'name':'id','type':'INTEGER'},//输出模式
{名称':'scans','type':'RECORD',
'mode':'REPEATED',
'fields':[
{'name':'scanid','键入':'INTEGER'},
{'name':'status','type':'STRING'}
]
}
],
本功能ion(row,emit){//函数
var c = [];
for(var i = 0; i< row.scans.length; i ++){
x = row.scans [i] .toString()。split(',');
t = {scanid:parseInt(x [0]),status:x [1]}
c.push(t);
};
emit({id:row.id,scans:c});
}

这里我使用 BigQuery用户定义函数。它们非常强大,但仍然有一些限制限制需要注意的。也请记住 - 他们是相当合格的候选人,因为它是合格的高计算查询
$ b


复杂查询可能消耗非常大的计算资源
相对于处理的字节数。通常情况下,这样的查询
包含大量的JOIN或CROSS JOIN子句或者复杂的
用户定义的函数。



I have a structure

  {
    id: "123",
    scans:[{
       "scanid":"123",
       "status":"sleep"
      }]
  },
  {
    id: "123",
    scans:[{
       "scanid":"123",
       "status":"sleep"
      }]
  }

Query to remove duplicate:

      SELECT *
    FROM (
      SELECT
          *,
          ROW_NUMBER()
              OVER (PARTITION BY id)
              row_number,
      FROM table1
    )
    WHERE row_number = 1

I specified destination table as table1.

Here I have made scans as repeated records, scanid as string and status as string. But when I do some query (I am making a query to remove duplicate) and overwrite the existing table, the table schema is changed. It becomes scans_scanid(string) and scans_status(string). Scans record schema is changed now. Please suggest where am I going wrong?

解决方案

It is known that NEST() is not compatible with UnFlatten Results Output and mostly is used for intermediate result in subquery.

Try below workaround
Note, I use INTEGER for id and scanid. If they should be STRING you need to
a. make change in output schema section
as well as
b. remove use of parseInt() function in t = {scanid:parseInt(x[0]), status:x[1]}

SELECT id, scans.scanid, scans.status 
FROM JS(
  (      // input table
    SELECT id, NEST(CONCAT(STRING(scanid), ',', STRING(status))) AS scans
    FROM (
      SELECT id, scans.scanid, scans.status 
      FROM (
        SELECT id, scans.scanid, scans.status, 
               ROW_NUMBER() OVER (PARTITION BY id) AS dup
        FROM table1
      ) WHERE dup = 1  
    ) GROUP BY id
  ),
  id, scans,     // input columns
  "[{'name': 'id', 'type': 'INTEGER'},    // output schema
    {'name': 'scans', 'type': 'RECORD',
     'mode': 'REPEATED',
     'fields': [
       {'name': 'scanid', 'type': 'INTEGER'},
       {'name': 'status', 'type': 'STRING'}
     ]    
    }
  ]",
  "function(row, emit){    // function 
    var c = [];
    for (var i = 0; i < row.scans.length; i++) {
      x = row.scans[i].toString().split(',');
      t = {scanid:parseInt(x[0]), status:x[1]}
      c.push(t);
    };
    emit({id: row.id, scans: c});  
  }"
)

Here I use BigQuery User-Defined Functions. They are extremely powerful yet still have some Limits and Limitations to be aware of. Also have in mind - they are quite a candidates for being qualified as expensive High-Compute queries

Complex queries can consume extraordinarily large computing resources relative to the number of bytes processed. Typically, such queries contain a very large number of JOIN or CROSS JOIN clauses or complex User-defined Functions.

这篇关于如何在不改变表格模式的情况下在当前表格上存储查询结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆