BigQuery:将重音字符转换为其纯 ascii 等效项 [英] BigQuery: Convert accented characters to their plain ascii equivalents
本文介绍了BigQuery:将重音字符转换为其纯 ascii 等效项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下字符串:
巴西利亚
我需要转换为:
巴西利亚
没有´ 口音!
我可以在 BigQuery 上做什么?
谢谢!
解决方案
试试下面的快速简单选项:
#standardSQLWITH 查找 AS (选择'ç,æ,œ,á,é,í,ó,ú,à,è,ì,ò,ù,ä,ë,ï,ö,ü,ÿ,â,ê,î,ô,û,å,ø,Ø,Å,Á,À,Â,Ä,È,É,Ê,Ë,Í,Î,Ï,Ì,Ò,Ó,Ô,Ö,Ú,Ù,Û,Ü,Ÿ,Ç,Æ,Œ,ñ' AS 重音符号,'c,ae,oe,a,e,i,o,u,a,e,i,o,u,a,e,i,o,u,y,a,e,i,o,u,a,O,O,A,A,A,A,A,E,E,E,E,I,I,I,I,O,O,O,O,U,U,U,U,Y,C,AE,OE,n' 拉丁语),对 AS (SELECT 口音,拉丁文 FROM 查找,UNNEST(SPLIT(accents)) AS 口音 WITH OFFSET AS p1,UNNEST(SPLIT(latins)) AS latin WITH OFFSET AS p2其中 p1 = p2),yourTableWithWords AS (从 UNNEST 中选择单词(SPLIT('brasília,ångström,aperçu,barège, beau ideal, belle époque, béguin, bête noire, bêtise, Bichon Frisé, blasé, blessèd, bobèche, boîte,bombé, Bön, Boötes, boutonic-brintère碧昂丝,厄尔尼诺现象)) 一把剑)选择单词 AS word_with_accent,(SELECT STRING_AGG(IFNULL(latin, char), '')FROM UNNEST(SPLIT(word, '')) char左连接对ON char = 口音)AS word_without_accent从 yourTableWithWords
输出是
word_with_accent word_without_accent有福有福厄尔尼诺厄尔尼诺美好年代美好年代博伊特博伊特靴子靴子废话ångström 埃bobèche bobeche巴雷格巴雷格bric-à-brac bric-a-bracbete noire bete noireBichon Frisé Bichon Frize勃朗特碧昂丝 勃朗特碧昂丝贝蒂斯贝蒂斯理想之美 理想之美邦贝邦贝巴西利亚 巴西利亚胸花 胸花开胃酒开始邦邦
<块引用>
更新
下面是如何把这个逻辑打包成SQL UDF——这样就可以调用accent2latin(word)
来做一个魔术"
#standardSQLCREATE TEMP FUNCTION Accent2latin(word STRING) AS((WITH 查找 AS (选择'ç,æ,œ,á,é,í,ó,ú,à,è,ì,ò,ù,ä,ë,ï,ö,ü,ÿ,â,ê,î,ô,û,å,ø,Ø,Å,Á,À,Â,Ä,È,É,Ê,Ë,Í,Î,Ï,Ì,Ò,Ó,Ô,Ö,Ú,Ù,Û,Ü,Ÿ,Ç,Æ,Œ,ñ' AS 重音符号,'c,ae,oe,a,e,i,o,u,a,e,i,o,u,a,e,i,o,u,y,a,e,i,o,u,a,O,O,A,A,A,A,A,E,E,E,E,I,I,I,I,O,O,O,O,U,U,U,U,Y,C,AE,OE,n' 拉丁语),对 AS (SELECT 口音,拉丁文 FROM 查找,UNNEST(SPLIT(accents)) AS 口音 WITH OFFSET AS p1,UNNEST(SPLIT(latins)) AS latin WITH OFFSET AS p2其中 p1 = p2)SELECT STRING_AGG(IFNULL(latin, char), '')FROM UNNEST(SPLIT(word, '')) char左连接对ON 字符 = 口音));WITH yourTableWithWords AS (从 UNNEST 中选择单词(SPLIT('brasília,ångström,aperçu,barège, beau ideal, belle époque, béguin, bête noire, bêtise, Bichon Frisé, blasé, blessèd, bobèche, boîte,bombé, Bön, Boötes, boutonic-brintère碧昂丝,厄尔尼诺现象)) 一把剑)选择单词 AS word_with_accent,Accent2latin(word) AS word_without_accent从 yourTableWithWords
I have the following string:
brasília
And I need to convert to:
brasilia
Withou the ´ accent!
How can I do on BigQuery?
Thank you!
解决方案
Try below as quick and simple option for you:
#standardSQL
WITH lookups AS (
SELECT
'ç,æ,œ,á,é,í,ó,ú,à,è,ì,ò,ù,ä,ë,ï,ö,ü,ÿ,â,ê,î,ô,û,å,ø,Ø,Å,Á,À,Â,Ä,È,É,Ê,Ë,Í,Î,Ï,Ì,Ò,Ó,Ô,Ö,Ú,Ù,Û,Ü,Ÿ,Ç,Æ,Œ,ñ' AS accents,
'c,ae,oe,a,e,i,o,u,a,e,i,o,u,a,e,i,o,u,y,a,e,i,o,u,a,o,O,A,A,A,A,A,E,E,E,E,I,I,I,I,O,O,O,O,U,U,U,U,Y,C,AE,OE,n' AS latins
),
pairs AS (
SELECT accent, latin FROM lookups,
UNNEST(SPLIT(accents)) AS accent WITH OFFSET AS p1,
UNNEST(SPLIT(latins)) AS latin WITH OFFSET AS p2
WHERE p1 = p2
),
yourTableWithWords AS (
SELECT word FROM UNNEST(
SPLIT('brasília,ångström,aperçu,barège, beau idéal, belle époque, béguin, bête noire, bêtise, Bichon Frisé, blasé, blessèd, bobèche, boîte, bombé, Bön, Boötes, boutonnière, bric-à-brac, Brontë Beyoncé,El Niño')
) AS word
)
SELECT
word AS word_with_accent,
(SELECT STRING_AGG(IFNULL(latin, char), '')
FROM UNNEST(SPLIT(word, '')) char
LEFT JOIN pairs
ON char = accent) AS word_without_accent
FROM yourTableWithWords
Output is
word_with_accent word_without_accent
blessèd blessed
El Niño El Nino
belle époque belle epoque
boîte boite
Boötes Bootes
blasé blase
ångström angstrom
bobèche bobeche
barège barege
bric-à-brac bric-a-brac
bête noire bete noire
Bichon Frisé Bichon Frise
Brontë Beyoncé Bronte Beyonce
bêtise betise
beau idéal beau ideal
bombé bombe
brasília brasilia
boutonnière boutonniere
aperçu apercu
béguin beguin
Bön Bon
UPDATE
Below is how to pack this logic into SQL UDF - so accent2latin(word)
can be called to make a "magic"
#standardSQL
CREATE TEMP FUNCTION accent2latin(word STRING) AS
((
WITH lookups AS (
SELECT
'ç,æ,œ,á,é,í,ó,ú,à,è,ì,ò,ù,ä,ë,ï,ö,ü,ÿ,â,ê,î,ô,û,å,ø,Ø,Å,Á,À,Â,Ä,È,É,Ê,Ë,Í,Î,Ï,Ì,Ò,Ó,Ô,Ö,Ú,Ù,Û,Ü,Ÿ,Ç,Æ,Œ,ñ' AS accents,
'c,ae,oe,a,e,i,o,u,a,e,i,o,u,a,e,i,o,u,y,a,e,i,o,u,a,o,O,A,A,A,A,A,E,E,E,E,I,I,I,I,O,O,O,O,U,U,U,U,Y,C,AE,OE,n' AS latins
),
pairs AS (
SELECT accent, latin FROM lookups,
UNNEST(SPLIT(accents)) AS accent WITH OFFSET AS p1,
UNNEST(SPLIT(latins)) AS latin WITH OFFSET AS p2
WHERE p1 = p2
)
SELECT STRING_AGG(IFNULL(latin, char), '')
FROM UNNEST(SPLIT(word, '')) char
LEFT JOIN pairs
ON char = accent
));
WITH yourTableWithWords AS (
SELECT word FROM UNNEST(
SPLIT('brasília,ångström,aperçu,barège, beau idéal, belle époque, béguin, bête noire, bêtise, Bichon Frisé, blasé, blessèd, bobèche, boîte, bombé, Bön, Boötes, boutonnière, bric-à-brac, Brontë Beyoncé,El Niño')
) AS word
)
SELECT
word AS word_with_accent,
accent2latin(word) AS word_without_accent
FROM yourTableWithWords
这篇关于BigQuery:将重音字符转换为其纯 ascii 等效项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文