BigQuery:将重音字符转换为其纯 ascii 等效项 [英] BigQuery: Convert accented characters to their plain ascii equivalents

查看:17
本文介绍了BigQuery:将重音字符转换为其纯 ascii 等效项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下字符串:

巴西利亚

我需要转换为:

巴西利亚

没有´ 口音!

我可以在 BigQuery 上做什么?

谢谢!

解决方案

试试下面的快速简单选项:

#standardSQLWITH 查找 AS (选择'ç,æ,œ,á,é,í,ó,ú,à,è,ì,ò,ù,ä,ë,ï,ö,ü,ÿ,â,ê,î,ô,û,å,ø,Ø,Å,Á,À,Â,Ä,È,É,Ê,Ë,Í,Î,Ï,Ì,Ò,Ó,Ô,Ö,Ú,Ù,Û,Ü,Ÿ,Ç,Æ,Œ,ñ' AS 重音符号,'c,ae,oe,a,e,i,o,u,a,e,i,o,u,a,e,i,o,u,y,a,e,i,o,u,a,O,O,A,A,A,A,A,E,E,E,E,I,I,I,I,O,O,O,O,U,U,U,U,Y,C,AE,OE,n' 拉丁语),对 AS (SELECT 口音,拉丁文 FROM 查找,UNNEST(SPLIT(accents)) AS 口音 WITH OFFSET AS p1,UNNEST(SPLIT(latins)) AS latin WITH OFFSET AS p2其中 p1 = p2),yourTableWithWords AS (从 UNNEST 中选择单词(SPLIT('brasília,ångström,aperçu,barège, beau ideal, belle époque, béguin, bête noire, bêtise, Bichon Frisé, blasé, blessèd, bobèche, boîte,bombé, Bön, Boötes, boutonic-brintère碧昂丝,厄尔尼诺现象)) 一把剑)选择单词 AS word_with_accent,(SELECT STRING_AGG(IFNULL(latin, char), '')FROM UNNEST(SPLIT(word, '')) char左连接对ON char = 口音)AS word_without_accent从 yourTableWithWords

输出是

word_with_accent word_without_accent有福有福厄尔尼诺厄尔尼诺美好年代美好年代博伊特博伊特靴子靴子废话ångström 埃bobèche bobeche巴雷格巴雷格bric-à-brac bric-a-bracbete noire bete noireBichon Frisé Bichon Frize勃朗特碧昂丝 勃朗特碧昂丝贝蒂斯贝蒂斯理想之美 理想之美邦贝邦贝巴西利亚 巴西利亚胸花 胸花开胃酒开始邦邦

<块引用>

更新

下面是如何把这个逻辑打包成SQL UDF——这样就可以调用accent2latin(word)来做一个魔术"

#standardSQLCREATE TEMP FUNCTION Accent2latin(word STRING) AS((WITH 查找 AS (选择'ç,æ,œ,á,é,í,ó,ú,à,è,ì,ò,ù,ä,ë,ï,ö,ü,ÿ,â,ê,î,ô,û,å,ø,Ø,Å,Á,À,Â,Ä,È,É,Ê,Ë,Í,Î,Ï,Ì,Ò,Ó,Ô,Ö,Ú,Ù,Û,Ü,Ÿ,Ç,Æ,Œ,ñ' AS 重音符号,'c,ae,oe,a,e,i,o,u,a,e,i,o,u,a,e,i,o,u,y,a,e,i,o,u,a,O,O,A,A,A,A,A,E,E,E,E,I,I,I,I,O,O,O,O,U,U,U,U,Y,C,AE,OE,n' 拉丁语),对 AS (SELECT 口音,拉丁文 FROM 查找,UNNEST(SPLIT(accents)) AS 口音 WITH OFFSET AS p1,UNNEST(SPLIT(latins)) AS latin WITH OFFSET AS p2其中 p1 = p2)SELECT STRING_AGG(IFNULL(latin, char), '')FROM UNNEST(SPLIT(word, '')) char左连接对ON 字符 = 口音));WITH yourTableWithWords AS (从 UNNEST 中选择单词(SPLIT('brasília,ångström,aperçu,barège, beau ideal, belle époque, béguin, bête noire, bêtise, Bichon Frisé, blasé, blessèd, bobèche, boîte,bombé, Bön, Boötes, boutonic-brintère碧昂丝,厄尔尼诺现象)) 一把剑)选择单词 AS word_with_accent,Accent2latin(word) AS word_without_accent从 yourTableWithWords

I have the following string:

brasília

And I need to convert to:

brasilia

Withou the ´ accent!

How can I do on BigQuery?

Thank you!

解决方案

Try below as quick and simple option for you:

#standardSQL
WITH lookups AS (
  SELECT 
  'ç,æ,œ,á,é,í,ó,ú,à,è,ì,ò,ù,ä,ë,ï,ö,ü,ÿ,â,ê,î,ô,û,å,ø,Ø,Å,Á,À,Â,Ä,È,É,Ê,Ë,Í,Î,Ï,Ì,Ò,Ó,Ô,Ö,Ú,Ù,Û,Ü,Ÿ,Ç,Æ,Œ,ñ' AS accents,
  'c,ae,oe,a,e,i,o,u,a,e,i,o,u,a,e,i,o,u,y,a,e,i,o,u,a,o,O,A,A,A,A,A,E,E,E,E,I,I,I,I,O,O,O,O,U,U,U,U,Y,C,AE,OE,n' AS latins
),
pairs AS (
  SELECT accent, latin FROM lookups, 
    UNNEST(SPLIT(accents)) AS accent WITH OFFSET AS p1, 
    UNNEST(SPLIT(latins)) AS latin WITH OFFSET AS p2
  WHERE p1 = p2
),
yourTableWithWords AS (
  SELECT word FROM UNNEST(
        SPLIT('brasília,ångström,aperçu,barège, beau idéal, belle époque, béguin, bête noire, bêtise, Bichon Frisé, blasé, blessèd, bobèche, boîte, bombé, Bön, Boötes, boutonnière, bric-à-brac, Brontë Beyoncé,El Niño')
    ) AS word
)
SELECT 
  word AS word_with_accent, 
  (SELECT STRING_AGG(IFNULL(latin, char), '')
    FROM UNNEST(SPLIT(word, '')) char
    LEFT JOIN pairs
    ON char = accent) AS word_without_accent
FROM yourTableWithWords   

Output is

word_with_accent word_without_accent     
blessèd         blessed  
El Niño         El Nino  
belle époque    belle epoque     
boîte           boite    
Boötes          Bootes   
blasé           blase    
ångström        angstrom     
bobèche         bobeche  
barège          barege   
bric-à-brac     bric-a-brac  
bête noire      bete noire   
Bichon Frisé    Bichon Frise     
Brontë Beyoncé  Bronte Beyonce   
bêtise          betise   
beau idéal      beau ideal   
bombé           bombe    
brasília        brasilia     
boutonnière     boutonniere  
aperçu          apercu   
béguin          beguin   
Bön             Bon   

UPDATE

Below is how to pack this logic into SQL UDF - so accent2latin(word) can be called to make a "magic"

#standardSQL
CREATE TEMP FUNCTION accent2latin(word STRING) AS
((
  WITH lookups AS (
    SELECT 
    'ç,æ,œ,á,é,í,ó,ú,à,è,ì,ò,ù,ä,ë,ï,ö,ü,ÿ,â,ê,î,ô,û,å,ø,Ø,Å,Á,À,Â,Ä,È,É,Ê,Ë,Í,Î,Ï,Ì,Ò,Ó,Ô,Ö,Ú,Ù,Û,Ü,Ÿ,Ç,Æ,Œ,ñ' AS accents,
    'c,ae,oe,a,e,i,o,u,a,e,i,o,u,a,e,i,o,u,y,a,e,i,o,u,a,o,O,A,A,A,A,A,E,E,E,E,I,I,I,I,O,O,O,O,U,U,U,U,Y,C,AE,OE,n' AS latins
  ),
  pairs AS (
    SELECT accent, latin FROM lookups, 
      UNNEST(SPLIT(accents)) AS accent WITH OFFSET AS p1, 
      UNNEST(SPLIT(latins)) AS latin WITH OFFSET AS p2
    WHERE p1 = p2
  )
  SELECT STRING_AGG(IFNULL(latin, char), '')
  FROM UNNEST(SPLIT(word, '')) char
  LEFT JOIN pairs
  ON char = accent
));

WITH yourTableWithWords AS (
  SELECT word FROM UNNEST(
        SPLIT('brasília,ångström,aperçu,barège, beau idéal, belle époque, béguin, bête noire, bêtise, Bichon Frisé, blasé, blessèd, bobèche, boîte, bombé, Bön, Boötes, boutonnière, bric-à-brac, Brontë Beyoncé,El Niño')
    ) AS word
)
SELECT 
  word AS word_with_accent, 
  accent2latin(word) AS word_without_accent
FROM yourTableWithWords 

这篇关于BigQuery:将重音字符转换为其纯 ascii 等效项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆