从BiqQuery中的单元格中的表中查找字符串 - >查询超出资源限制 [英] Find string from table in cell in BiqQuery --> Query exceeded resource limits
问题描述
我在BigQuery中有两个表格:
- 城市列表:invertible-fin-XXX238.Reports.City
- StationionNames:invertible-fin-XXX238.Reports.Station
大多数包含城市名称的StationNames。现在我想从车站表中提取城市。
这里有一些示例数据:
我尝试了INSTR函数,但没有成功(INSTR仅与Legacy SQL一起工作,并且我无法使用SUBSELECTS)。
SELECT City,
INSTR((SELECT AdGroupName $ b $ FROM [invertible-fin-XXX238.Reports.City]),City)AS Match
FROM [invertible-fin- XXX238.Reports.Station]
因此,我在WHERE LIKE中尝试过。在SQL代码下面:
SELECT a.City
FROM [invertible-fin-XXX238.Reports.City] a
CROSS JOIN [invertible-fin-XXX238.Reports.Station] b
其中b。名称LIKE'%'+ a.City +'%'
GROUP BY a.City
但是现在查询的计算量太大了,我得到了错误代码查询超出第1层的资源限制,需要第18层或更高。返回。
请帮助我,写一个更资源友好的查询。
在此先感谢,
Philipp
下面是BiigQuery标准SQL的许多可能版本中的几个: b
$ b
#standardSQL
SELECT city,station
FROM`invertible-fin-XXX238.Reports.Station` as s
JOIN`invertible-fin-XXX238。 Reports.City` AS c
ON REPLACE(LOWER(station),LOWER(city),'')< LOWER(station)
或
#standardSQL
SELECT city,station
FROM`invertible-fin-XXX238.Reports.Station` as s
JOIN`invertible -fin-XXX238.Reports.City` as c
ON LOWER(station)like CONCAT('%',LOWER(city),'%')
如果两个表中的City的名称拼写相同,则可以删除LOWER()函数
While以上版本看起来更直接 - 我更喜欢低于一个,因为它允许您从站点提取城市的控制方式 - r'([^ _] +)'
- 您应该将所有字符你观察到在列站中是分隔符。因此,在这种情况下,只有在城市不是更长名称的一部分时才会提取城市
当然,您应该验证您是否需要担心这个
< pre class =lang-sql prettyprint-override>
#standardSQL
WITH TOKENS AS(
SELECT token,station
FROM`invertible-fin-XXX238.Reports。 Station'AS s,
UNNEST(REGEXP_EXTRACT_ALL(LOWER(station),r'([^ _] +)'))令牌
)
SELECT city,station
FROM令牌AS s
JOIN`invertible-fin-XXX238.Reports.City` as c
ON LOWER(city)= token
I have two tables in BigQuery:
- City List: Table: invertible-fin-XXX238.Reports.City
- StationionNames: invertible-fin-XXX238.Reports.Station
Most of the StationNames containing City Names. Now I want to extract the city from the Station Table. Here some example data:
- City: Berlin
- Stationname: inStore_Berlin_Alexanderplatz
- Stationname: Berlin Schönefeld Airport
- Stationname: Train Station Franchise Berlin
I tried the INSTR Function, but had no success (the INSTR works only with Legacy SQL and there I couldn’t use SUBSELECTS).
SELECT City,
INSTR((SELECT AdGroupName
FROM [invertible-fin-XXX238.Reports.City]),City) AS Match
FROM [invertible-fin-XXX238.Reports.Station]
Therefore I tried it with WHERE LIKE. Below the SQL Code:
SELECT a.City
FROM [invertible-fin-XXX238.Reports.City] a
CROSS JOIN [invertible-fin-XXX238.Reports.Station] b
WHERE b. Name LIKE '%' + a.City + '%'
GROUP BY a.City
But now the Query is too computationally intensive and I got the Error Code "Query exceeded resource limits for tier 1. Tier 18 or higher required." back.
Could some please help me, writing a more resource friendly query.
Thanks in advance, Philipp
Below are few of many possible versions for BiigQuery Standard SQL
#standardSQL
SELECT city, station
FROM `invertible-fin-XXX238.Reports.Station` AS s
JOIN `invertible-fin-XXX238.Reports.City` AS c
ON REPLACE(LOWER(station), LOWER(city), '') <> LOWER(station)
or
#standardSQL
SELECT city, station
FROM `invertible-fin-XXX238.Reports.Station` AS s
JOIN `invertible-fin-XXX238.Reports.City` AS c
ON LOWER(station) LIKE CONCAT('%',LOWER(city),'%')
You can remove LOWER() function if names of City are spelled in same case in both tables
While above versions look more straightforward - i would prefer below one as it allows control way you extract city from station -r'([^ _]+)'
- you should all characters that you observe being delimiters in column station. So in this case you will extract only city when it is not part of longer name
Of course you should validate if you even need to worry of this
#standardSQL
WITH tokens AS (
SELECT token, station
FROM `invertible-fin-XXX238.Reports.Station` AS s,
UNNEST(REGEXP_EXTRACT_ALL(LOWER(station), r'([^ _]+)')) token
)
SELECT city, station
FROM tokens AS s
JOIN `invertible-fin-XXX238.Reports.City` AS c
ON LOWER(city) = token
这篇关于从BiqQuery中的单元格中的表中查找字符串 - >查询超出资源限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!