BigQuery是否可以进行近似字符串匹配/模糊字符串搜索? [英] Is Approximate String Matching / Fuzzy String Searching possible with BigQuery?

查看:61
本文介绍了BigQuery是否可以进行近似字符串匹配/模糊字符串搜索?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

感谢Google提供BigQuery,这太好了!
BigQuery是否可以进行近似字符串匹配/模糊字符串搜索?
Google是否有计划将此功能添加到BigQuery?

Thanks to Google for delivering BigQuery, it's great!
Is Approximate String Matching / Fuzzy String Searching possible with BigQuery?
Does Google have plans to add this functionality to BigQuery?

可以肯定地使用Google专有的近似字符串匹配算法来向BigQuery提供此功能,同时仍保持Google知识产权.我们已经搜索了所有BigQuery文档和Stack Overflow问题.当然,有很多算法可以做到这一点,尽管该如何与BigQuery集成?

Surely the Google proprietary Approximate String Matching algorithm could be used to deliver this capability to BigQuery while still maintaining Google Intellectual Property. We've searched all the BigQuery documentation and Stack Overflow questions. Of course there are many algorithms to do this, though how to integrate with BigQuery?

我们的需求很简单,比较两个字符串,尽管它们可能略有不同,但它们大致相同.例如:

Our need is simple, to compare two strings which will be mostly the same though could be slightly different. For example:

"Rhodes USA" vs. "Rhodes USA, LLC", vs. "Rhodes USA LLC".  

在我们的BigQuery测试中,似乎两个字符串需要完全匹配才能让BigQuery加入它们,甚至减少到每个字符串中的尾随空格数.此功能或与BigQuery集成的指南的添加将不胜感激.这是对威斯康星州密尔沃基市的一家区域性,创新性的部分喷气式飞机所有权公司密尔沃基喷气机的支持.再次感谢Google提供BigQuery.

From our BigQuery tests it appears two strings need to match EXACTLY for BigQuery to JOIN them, even down to the number of trailing spaces in each string. The addition of this functionality or guidance for integration with BigQuery would be greatly appreciated. This is in support of Milwaukee Jets, a regional, innovative, fractional jet ownership company in Milwaukee, WI. Thanks again Google for delivering BigQuery.

非常感谢您,安德鲁·保林(414)212-5372

Thank you very much and best regards, Andrew Paullin (414) 212-5372

推荐答案

不幸的是,不支持近似字符串匹配.您可以获得的最接近的结果是使用正则表达式.最好的选择是将数据标准化后再使用BigQuery,即将"Rhodes USA"和"Rhodes,USA."转换为相同的字符串.但是,我将为此功能添加功能请求错误.

Unfortunately, approximate string matching is not supported. The closest you can get is by using regular expressions. Your best bet may be to normalize the data before it gets to BigQuery -- i.e transform "Rhodes USA" and "Rhodes, USA. " into the same string. I'll add a feature request bug for this support, however.

这篇关于BigQuery是否可以进行近似字符串匹配/模糊字符串搜索?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆