从BiqQuery中的单元格中的表中查找字符串 - >查询超出资源限制 [英] Find string from table in cell in BiqQuery --> Query exceeded resource limits

查看:125
本文介绍了从BiqQuery中的单元格中的表中查找字符串 - >查询超出资源限制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在BigQuery中有两个表格:




  • 城市列表:invertible-fin-XXX238.Reports.City

  • StationionNames:invertible-fin-XXX238.Reports.Station



大多数包含城市名称的StationNames。现在我想从车站表中提取城市。
这里有一些示例数据:


  • 城市:柏林

  • 站名:inStore_Berlin_Alexanderplatz

  • 车站名称:柏林舍内费尔德机场
  • 车站名称:柏林火车站特许经营



我尝试了INSTR函数,但没有成功(INSTR仅与Legacy SQL一起工作,并且我无法使用SUBSELECTS)。

  SELECT City,
INSTR((SELECT AdGroupName $ b $ FROM [invertible-fin-XXX238.Reports.City]),City)AS Match
FROM [invertible-fin- XXX238.Reports.Station]

因此,我在WHERE LIKE中尝试过。在SQL代码下面:

  SELECT a.City 
FROM [invertible-fin-XXX238.Reports.City] a
CROSS JOIN [invertible-fin-XXX238.Reports.Station] b
其中b。名称LIKE'%'+ a.City +'%'
GROUP BY a.City

但是现在查询的计算量太大了,我得到了错误代码查询超出第1层的资源限制,需要第18层或更高。返回。



请帮助我,写一个更资源友好的查询。



在此先感谢,
Philipp

解决方案

下面是BiigQuery标准SQL的许多可能版本中的几个: b


$ b

  #standardSQL 
SELECT city,station
FROM`invertible-fin-XXX238.Reports.Station` as s
JOIN`invertible-fin-XXX238。 Reports.City` AS c
ON REPLACE(LOWER(station),LOWER(city),'')< LOWER(station)



  #standardSQL 
SELECT city,station
FROM`invertible-fin-XXX238.Reports.Station` as s
JOIN`invertible -fin-XXX238.Reports.City` as c
ON LOWER(station)like CONCAT('%',LOWER(city),'%')

如果两个表中的City的名称拼写相同,则可以删除LOWER()函数

While以上版本看起来更直接 - 我更喜欢低于一个,因为它允许您从站点提取城市的控制方式 - r'([^ _] +)' - 您应该将所有字符你观察到在列站中是分隔符。因此,在这种情况下,只有在城市不是更长名称的一部分时才会提取城市

当然,您应该验证您是否需要担心这个



< pre class =lang-sql prettyprint-override> #standardSQL
WITH TOKENS AS(
SELECT token,station
FROM`invertible-fin-XXX238.Reports。 Station'AS s,
UNNEST(REGEXP_EXTRACT_ALL(LOWER(station),r'([^ _] +)'))令牌

SELECT city,station
FROM令牌AS s
JOIN`invertible-fin-XXX238.Reports.City` as c
ON LOWER(city)= token


I have two tables in BigQuery:

  • City List: Table: invertible-fin-XXX238.Reports.City
  • StationionNames: invertible-fin-XXX238.Reports.Station

Most of the StationNames containing City Names. Now I want to extract the city from the Station Table. Here some example data:

  • City: Berlin
  • Stationname: inStore_Berlin_Alexanderplatz
  • Stationname: Berlin Schönefeld Airport
  • Stationname: Train Station Franchise Berlin

I tried the INSTR Function, but had no success (the INSTR works only with Legacy SQL and there I couldn’t use SUBSELECTS).

SELECT City,
INSTR((SELECT AdGroupName 
FROM [invertible-fin-XXX238.Reports.City]),City) AS Match 
FROM [invertible-fin-XXX238.Reports.Station]

Therefore I tried it with WHERE LIKE. Below the SQL Code:

SELECT a.City
FROM [invertible-fin-XXX238.Reports.City] a
CROSS JOIN [invertible-fin-XXX238.Reports.Station] b
WHERE b. Name LIKE '%' + a.City + '%'
GROUP BY a.City

But now the Query is too computationally intensive and I got the Error Code "Query exceeded resource limits for tier 1. Tier 18 or higher required." back.

Could some please help me, writing a more resource friendly query.

Thanks in advance, Philipp

解决方案

Below are few of many possible versions for BiigQuery Standard SQL

#standardSQL
SELECT city, station
FROM `invertible-fin-XXX238.Reports.Station` AS s
JOIN `invertible-fin-XXX238.Reports.City` AS c
ON REPLACE(LOWER(station), LOWER(city), '') <> LOWER(station)

or

#standardSQL
SELECT city, station
FROM `invertible-fin-XXX238.Reports.Station` AS s
JOIN `invertible-fin-XXX238.Reports.City` AS c
ON LOWER(station) LIKE CONCAT('%',LOWER(city),'%')

You can remove LOWER() function if names of City are spelled in same case in both tables

While above versions look more straightforward - i would prefer below one as it allows control way you extract city from station -r'([^ _]+)' - you should all characters that you observe being delimiters in column station. So in this case you will extract only city when it is not part of longer name
Of course you should validate if you even need to worry of this

#standardSQL
WITH tokens AS (
  SELECT token, station
  FROM `invertible-fin-XXX238.Reports.Station` AS s,
    UNNEST(REGEXP_EXTRACT_ALL(LOWER(station), r'([^ _]+)')) token
) 
SELECT city, station
FROM tokens AS s
JOIN `invertible-fin-XXX238.Reports.City` AS c
ON LOWER(city) = token

这篇关于从BiqQuery中的单元格中的表中查找字符串 - &gt;查询超出资源限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆