如何在雪花中查找非 utf8 字符的行? [英] How to find rows with non utf8 characters in Snowflake?

查看:36
本文介绍了如何在雪花中查找非 utf8 字符的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的雪花数据库中,一个表有非 utf8 字符.如何在其上创建一个只有 utf8 字符的视图;通过排除具有非 utf8 字符的行或替换它们?谢谢

In my snowflake db, a table has non-utf8 characters. How can I create a view on top of it which will only have utf8 characters; either by excluding the rows with non-utf8 characters or by replacing them? Thanks

推荐答案

应该可以通过这样的测试来检查非 utf:

It should be possible to check for non-utf with a test like this:

MY_STRING IS NOT NULL AND TRY_HEX_DECODE_STRING(HEX_ENCODE(MY_STRING)) IS NULL

但是我没有数据可以测试.
要将字符串重新编码为 utf-8,您可以使用 JavaScript 函数:

But then I don't have data to test with.
To reencode the string to utf-8, you can use a JavaScript function:

CREATE OR REPLACE FUNCTION TO_UTF8(BINARY_TEXT BINARY)
RETURNS TEXT LANGUAGE JAVASCRIPT STRICT IMMUTABLE AS '
  var win1252 = [ /* C1 controls */
    8364,  129, 8218,  402, 8222, 8230, 8224, 8225,
     710, 8240,  352, 8249,  338,  141,  381,  143,
     144, 8216, 8217, 8220, 8221, 8226, 8211, 8212,
     732, 8482,  353, 8250,  339,  157,  382,  376
  ];
  return String.fromCharCode(
    ...Array.from(BINARY_TEXT).map(x => (x < 128 || x > 159) ? x : (win1252[x - 128]))
  ); /* .map(...) can be removed if no conversion from win1252 needed */
';

SELECT NVL(TRY_HEX_DECODE_STRING(HEX_ENCODE(MY_STRING)),
           TO_UTF8(HEX_ENCODE(MY_STRING)::BINARY));

这篇关于如何在雪花中查找非 utf8 字符的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆