使用 TRE 的模糊正则表达式匹配 [英] Fuzzy regex match using TRE
问题描述
我正在尝试在我的 C 程序中使用 TRE 库来执行模糊正则表达式搜索.我设法通过阅读文档拼凑了这段代码:
I'm trying to use the TRE library in my C program to perform a fuzzy regex search. I've managed to piece together this code from reading the docs:
regex_t rx;
regcomp(&rx, "(January|February)", REG_EXTENDED);
int result = regexec(&rx, "January", 0, 0, 0);
然而,这只会匹配一个精确的正则表达式(即不允许拼写错误).我没有看到任何允许在这些函数中设置模糊性的参数:
However, this will match only an exact regex (i.e. no spelling errors are allowed). I don't see any parameter which allows to set the fuzziness in those functions:
int regcomp(regex_t *preg, const char *regex, int cflags);
int regexec(const regex_t *preg, const char *string, size_t nmatch,
regmatch_t pmatch[], int eflags);
如何设置模糊程度(即最大 Levenshtein 距离),以及如何获得匹配的 Levenshtein 距离?
How can I set the level of fuzziness (i.e. maximum Levenshtein distance), and how do I get the Levenshtein distance of the match?
我忘了提及我使用的是 GnuWin32 的 Windows 二进制文件,它们仅适用于 0.7.5 版.0.8.0 的二进制文件仅适用于 Linux.
I forgot to mention I'm using the Windows binaries from GnuWin32, which are available only for version 0.7.5. Binaries for 0.8.0 are available only for Linux.
推荐答案
感谢@Wiktor Stribiżew,我找到了我需要使用的函数,并成功编译了一个工作示例:
Thanks to @Wiktor Stribiżew, I found out which function I need to use, and I've successfully compiled a working example:
#include <stdio.h>
#include "regex.h"
int main() {
regex_t rx;
regcomp(&rx, "(January|February)", REG_EXTENDED);
regaparams_t params = { 0 };
params.cost_ins = 1;
params.cost_del = 1;
params.cost_subst = 1;
params.max_cost = 2;
params.max_del = 2;
params.max_ins = 2;
params.max_subst = 2;
params.max_err = 2;
regamatch_t match;
match.nmatch = 0;
match.pmatch = 0;
if (!regaexec(&rx, "Janvary", &match, params, 0)) {
printf("Levenshtein distance: %d\n", match.cost);
} else {
printf("Failed to match\n");
}
return 0;
}
这篇关于使用 TRE 的模糊正则表达式匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!