用于在包含不同数据类型的值范围的记录中使用多个输入变量进行搜索的算法 [英] Algorithm to be used for searching using multiple input variables in records containing ranges of values of different data types

查看:79
本文介绍了用于在包含不同数据类型的值范围的记录中使用多个输入变量进行搜索的算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多包含数千条记录的实体,其中有些像一些区域使用BETWEEN数字,一些区域有IN变量,其中有多个值,一些LIKE变量包含多个不同数据类型的值,如整数,字符串&小数。例如 - 一个典型的记录看起来像 -



A =新的|代码= IN 101,102,103 |值= LIKE A *,B *,C * |质量= BETWEEN 1,9 |

测试= IN A




如果我得到字段的输入 - A,代码,值和&每个上面的质量我需要返回所有匹配记录的相应测试字段。输入可以是以下:



A =新,代码= 102,值= Alpha,质量= 8.2



鉴于上述情况,我考虑使用模式匹配算法来识别匹配的记录。请根据提供的值设置建议哪种算法适合账单以正确识别给定记录。



我尝试过:



I have lot of entities which contain thousands of records, with diverse values like some have ranges using BETWEEN numbers,some with IN variable having multiple values in it, some with LIKE variable containing multiple values of different data types such as integer, string & decimal. For example - a typical record looks like -

A = New | Code = IN 101,102,103 | Values = LIKE A*,B*,C* | Quality = BETWEEN 1,9 |
Test = IN A


If i get an input for the fields - A,Code, Values & Quality per above I need to return the corresponding Test field for all the matching records. The input can be per below:

A = New , Code = 102, Values = Alpha, Quality = 8.2

Given the above I was thinking about using pattern matching algorithms to be used to identify the matching records. Please do suggest which algorithm will fit the bill to correctly identify a given record(s) based on the set of values provided.

What I have tried:

Given the above I was thinking about using pattern matching algorithms to be used to identify the matching records. Also to hash the inputs to be used as a lookup when the inputs are to be searched for.

推荐答案

引用:

请根据提供的值集建议哪种算法符合要求,以正确识别给定记录。

Please do suggest which algorithm will fit the bill to correctly identify a given record(s) based on the set of values provided.



你处于最糟糕的情况,平面文本数据意味着使用暴力。每个记录中的字段名称使CSV格式更糟糕。

您唯一可以预期的优化是首先检查最能过滤记录的字段。但它不会改变处理数据的时间。

所以:

1)读取所有行,没有优化

2)for每一行,用逗号分割,没有优化

3)每个字段,根据过滤器分成相等的,适度的优化。


You are in the worst possible situation, the flat text data imply the use of brut force. The name of fields in every records make it even worse the CSV format.
The only optimization you can expect is to first check the field that will best filter the records. But it will not change much the time to process the data.
So:
1) read all lines, no optimization
2) for each line, split on comma, no optimization
3) for each field, split on equal, modest optimization possible depending on filter.


这篇关于用于在包含不同数据类型的值范围的记录中使用多个输入变量进行搜索的算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆