在BQ公共数据集中获取顶级专利国家/地区的代码 [英] Get the top patent countries, codes in a BQ public dataset
问题描述
我正在尝试使用一种分析功能来获取专利申请量排名前2的国家,而在那些排名前2位的国家中,则申请专利的前2个国家.例如,答案将如下所示:
I am trying to use an analytic function to get the top 2 countries with patent applications, and within those top 2 countries, get the top 2 application kinds. For example, the answer will look something like this:
country - code
US P
US A
GB X
GB P
这是我正在使用的查询:
Here is the query I am using to get this:
SELECT
country_code,
MIN(count_country_code) count_country_code,
application_kind
FROM (
WITH
A AS (
SELECT
country_code,
COUNT(country_code) OVER (PARTITION BY country_code) AS count_country_code,
application_kind
FROM
`patents-public-data.patents.publications`),
B AS (
SELECT
country_code,
count_country_code,
DENSE_RANK() OVER(ORDER BY count_country_code DESC) AS country_code_num,
application_kind,
DENSE_RANK() OVER(PARTITION BY country_code ORDER BY count_country_code DESC) AS application_kind_num
FROM
A)
SELECT
country_code,
count_country_code,
application_kind
FROM
B
WHERE
country_code_num <= 2
AND application_kind_num <= 2) x
GROUP BY
country_code,
application_kind
ORDER BY
count_country_code DESC
但是,不幸的是,由于超量/订单/分区,我收到了内存超出"错误.这是消息:
However, unfortunately, I get a "memory exceeded" error due to the over/order/partition. Here is the message:
查询执行期间超出了资源:无法在分配的内存中执行查询.高峰使用:限制的112%.内存使用大户:用于分析OVER()子句的排序操作:98%其他/未分配:2%
Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 112% of limit. Top memory consumer(s): sort operations used for analytic OVER() clauses: 98% other/unattributed: 2%
如何进行上述查询(或其他类似的查询)而又不会遇到这些内存错误?可以在公共数据集
How would I go about doing the above query (or other similar queries) without running into these memory errors? This can be tested on the public dataset here.
一种简单的方法(仅当字段的基数为半低时才有效)是将其作为简单的聚合操作并将结果存储在数据库外部的内存中.例如:
One crude way to do it (which only works if the fields have a semi-low cardinality), would be to do it as a straightforward aggregation operation and sort the results in-memory outside the DB. For example:
推荐答案
以下是BigQuery标准SQL
Below is for BigQuery Standard SQL
#standardSQL
WITH A AS (
SELECT country_code
FROM `patents-public-data.patents.publications`
GROUP BY country_code
ORDER BY COUNT(1) DESC
LIMIT 2
), B AS (
SELECT
country_code,
application_kind,
COUNT(1) application_kind_count
FROM `patents-public-data.patents.publications`
WHERE country_code IN (SELECT country_code FROM A)
GROUP BY country_code, application_kind
), C AS (
SELECT
country_code,
application_kind,
application_kind_count,
DENSE_RANK() OVER(PARTITION BY country_code ORDER BY application_kind_count DESC) AS application_kind_rank
FROM B
)
SELECT
country_code,
application_kind,
application_kind_count
FROM C
WHERE application_kind_rank <= 2
有结果
这篇关于在BQ公共数据集中获取顶级专利国家/地区的代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!