给定大量街道名称，测试文本是否包含该街道名称之一的最有效方法是什么? [英] Given a huge set of street names, what is the most efficient way to test whether a text contains one of the street names from the set?

查看：80 发布时间：2020/5/18 1:06:37 algorithm nlp

本文介绍了给定大量街道名称，测试文本是否包含该街道名称之一的最有效方法是什么?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个有趣的问题，需要帮助.我目前正在开发程序的功能，却偶然发现了这个问题

I have an interesting problem that I need help with. I am currently working on a feature of my program and stumbled into this issues

我在数据库中存储了印度尼西亚的大量街道名称列表(> 10万行)，每个街道名称都可以包含1个以上的单词.例如:"Sudirman"，"Gatot Subroto"或"Jalan Asia Afrika"都是合法的街道名称

I have a huge list of street names in Indonesia ( > 100k rows ) stored in database, Each street name may have more than 1 word. For example : "Sudirman", "Gatot Subroto", or "Jalan Asia Afrika" are all legit street names

在数据库中有一堆文本(> 1百万行)，我将其拆分为多个句子.现在，我需要做的功能(准确地说是功能)是测试句子中是否有街道名称，所以只对/错测试

have a bunch of texts ( > 1 Million rows ) in databases, that I split into sentences. Now, the features ( function to be exact ) that I need to do , is to test whether there are street names inside the sentences or no, so just a true / false test

我尝试通过执行以下步骤来解决它:

I have tried to solve it by doing these steps:

a.将街道名称放入键值散列"中

a. Putting the street names into a Key,Value Hash

b.将每个句子分成单词

b. Split each sentences into words

c.测试单词是否在哈希中

c. Test whether words are in the hash

这是快速的方法，但不能同时使用多个单词

This is fast, but will not work with multiple words

我想到的另一种替代方法是执行以下步骤:

Another alternatives that I thought of is to do these steps:

a.将每个句子拆分成单词

a. Split each sentences into words

b.用LIKE语句查询数据库(即SELECT #### FROM street_table WHERE名称，例如'％word％')

b. Query the database with LIKE statement ( i,e. SELECT #### FROM street_table WHERE name like '%word%' )

c.如果查询返回一行，则表示该句子包含街道名称

c. If query returned a row, it means that the sentence contains street names

现在，此解决方案将需要大量的IO.

Now, this solution is going to be a very IO intensive.

所以我的问题是进行此测试的最有效方法是什么?"?不管编程语言如何.我主要是在python中进行此操作，但是只要我能掌握这些概念，任何语言都可以做到

So my question is "What is the most efficient way to do this test" ? regardless of the programming language. I do this in python mainly, but any language will do as long as I can grasp the concepts

============编辑1 ================

============EDIT 1 =================

这将是期刊吗?

是的，我将以1分钟的间隔调用此功能.每次通话至少要获取100行文字，并根据街道名称数据库对其进行测试

Yes, I will call this feature / function with an interval of 1 minute. Each call will take 100 row of texts at least and test them against the street name database

给定大量街道名称，测试文本是否包含该街道名称之一的最有效方法是什么? [英] Given a huge set of street names, what is the most efficient way to test whether a text contains one of the street names from the set?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

给定大量街道名称，测试文本是否包含该街道名称之一的最有效方法是什么? [英] Given a huge set of street names, what is the most efficient way to test whether a text contains one of the street names from the set?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭