Linux C LibPCRE 输出唯一结果 [英] Linux C LibPCRE output unique results

查看:35
本文介绍了Linux C LibPCRE 输出唯一结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码与包含多个重复项的字符串中的 REGEX 匹配,我想要做的是仅打印出唯一匹配项,我该怎么办?添加到数组中而不是使其唯一,然后才打印出结果?谢谢!

I have the following code that matches a REGEX in a string that contains multiple duplicates, what I want to do is to print out only unique matches, what can I do? Add to an array than make it unique and only then print out the results? Thanks!

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <pcre.h>

int main() {
  pcre *myregexp;
  const char *error;
  int erroroffset;
  int offsetcount;
  int offsets[(0+1)*3]; // (max_capturing_groups+1)*3
  const char *result;
  char *subject = "9,5,3,2,5,6,3,2,5,6,3,2,2,2,5,0,5,5,6,6,1,";
  myregexp = pcre_compile("\\d,", PCRE_MULTILINE|PCRE_DOTALL|PCRE_NEWLINE_ANYCRLF, &error, &erroroffset, NULL);

  if (myregexp != NULL) {
    offsetcount = pcre_exec(myregexp, NULL, subject, strlen(subject), 0, 0, offsets, (0+1)*3);

    while (offsetcount > 0) {

      if (pcre_get_substring(subject, offsets, offsetcount, 0, &result) >= 0) {
        printf("%s\n", result);
      }

      offsetcount = pcre_exec(myregexp, NULL, subject, strlen(subject), offsets[1], 0, offsets, (0+1)*3);
    }

  } else {
      printf("Syntax error in REGEX at erroroffset\n");
  }

}

输出:

bash$ ./regex
9,
5,
3,
2,
5,
6,
3,
2,
5,
6,
3,
2,
2,
2,
5,
0,
5,
5,
6,
6,
1,

我需要:

bash$ ./regex
0,
1,
2,
3,
5,
6,
9,

推荐答案

是的,添加到数组并从那里删除重复数据.

Yes, add to an array and deduplicate from there.

您不能使用正则表达式搜索唯一值.您可以使用正则表达式搜索替换并对某些内容进行重复数据删除,例如双新行、多个空格等,但是当需要使用随机搜索进行重复数据删除时,这不起作用.

You can not search unique values with regex. You can search replace with regex and deduplicate some things like double new lines, multiple spaces and so on, but this doesn't work when the deduplication needs to occur using random seeking.

这是一个如何去重的例子:a ->b

#include <stdio.h>
#include <string.h>    
main()
{
    char *a[5];
    int a_len = 5;

    a[0] = "a";
    a[1] = "b";
    a[2] = "b";
    a[3] = "a";
    a[4] = "c";

    char *b[a_len];
    int b_len = 0;

    int already_exists;
    int i, j;
    for (i = 0; i < a_len; i++) 
    {
        already_exists = 0;
        for ( j = 0; j < b_len; j++)
        {
            if (!strcmp(a[i], b[j]))
            {
                already_exists = 1;
                break;
            }
        }

        if (!already_exists)
        {
            b[b_len] = a[i];
            b_len++;
        }
    }

    for (i = 0; i < b_len; i++) 
    {
        printf("%s", b[i]);
    }
}

对于这些小数组,这可能是最快的算法.为了在更大的阵列上获得更好的性能,我建议对已排序的阵列进行重复数据删除.

For these small arrays this is probably the fastest algorithm. For better performance on bigger arrays I would suggest deduplication on a sorted array.

这篇关于Linux C LibPCRE 输出唯一结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆