如何将regexec与内存映射文件一起使用? [英] How to use regexec with memory mapped files?

查看:142
本文介绍了如何将regexec与内存映射文件一起使用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在大内存映射文件中查找正则表达式 通过使用 regexec()函数.我发现文件大小时程序崩溃 是页面大小的倍数.

I am trying to find a regular expression in a large memory mapped file by using regexec() function. I discovered that the program crashes when the file size is the multiple of the page size.

是否有一个具有字符串长度的 regexec()函数 作为额外的论点?

Is there a regexec() function that has the length of the string as additional argument?

或者:

如何在内存映射文件中找到正则表达式?

How to find a regex in a memory mapped file?

这是始终崩溃的最小示例 (如果我运行少于3个线程的程序不会崩溃):

Here is the minimal example that ALWAYS crashes (if I run less that 3 threads program doesn't crash):

ls -la ttt.txt 
-rwx------ 1 bob bob 409600 Jun 14 18:16 ttt.txt

gcc -Wall mal.c -o mal -lpthread -g && ./mal
[1]    11364 segmentation fault (core dumped)  ./mal

程序是:

#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>

#include <stdio.h>
#include <assert.h>
#include <pthread.h>
#include <regex.h>

void* f(void*arg) {
  int size = 409600;
  int fd = open("ttt.txt", O_RDONLY);
  char* text = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
  close(fd);

  fd = open("/dev/zero", O_RDONLY);
  char* end = mmap(text + size, 4096, PROT_READ, MAP_PRIVATE | MAP_FIXED, fd, 0);
  close(fd);

  assert(text+size == end);

  regex_t myre;
  regcomp(&myre, "XXXXX", REG_EXTENDED);
  regexec(&myre, text, 0, NULL, 0);
  regfree(&myre);
  return NULL;
}

int main(int argc, char* argv[]) {
  int n = 10;
  int i;
  pthread_t t[n];
  for (i = 0; i < n; ++i) {
    pthread_create(&t[n], NULL, f, NULL);
  }
  for (i = 0; i < n; ++i) {
    pthread_join(t[n], NULL);
  }
  return 0;
}

P.S. 这是gdb的输出:

P.S. This is the output from gdb:

gdb ./mal 
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/bob/prog/c/mal...done.
(gdb) r

Starting program: /home/srdjan/prog/c/mal 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff77ff700 (LWP 11817)]
[New Thread 0x7ffff6ffe700 (LWP 11818)]
[New Thread 0x7ffff6799700 (LWP 11819)]
[New Thread 0x7fffeffff700 (LWP 11820)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff6799700 (LWP 11819)]
__strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:72
72  ../sysdeps/x86_64/multiarch/../strlen.S: No such file or directory.
(gdb) bt
#0  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:72
#1  0x00007ffff78df254 in __regexec (preg=0x7ffff6798e80, string=0x7fffef79b000 'a' <repeats 200 times>..., nmatch=<optimized out>, 
pmatch=0x0, eflags=<optimized out>) at regexec.c:245
#2  0x00000000004008e6 in f (arg=0x0) at mal.c:24
#3  0x00007ffff7bc4e9a in start_thread (arg=0x7ffff6799700) at pthread_create.c:308
#4  0x00007ffff78f24bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#5  0x0000000000000000 in ?? ()
(gdb) 

推荐答案

Celada 可以正确识别问题-文件数据不一定包含空终止符.

Celada correctly identifies the problem - the file data does not necessarily include a null terminator.

您可以通过在文件后立即映射一个零页来解决该问题:

You could fix the problem by mapping a page of zeroes immediately after the file:

int fd;
char *text;

fd = open("ttt.txt", O_RDONLY);
text = mmap(NULL, 409600, PROT_READ, MAP_PRIVATE, fd, 0);
close(fd);

fd = open("/dev/zero", O_RDONLY);
mmap(text + 409600, 4096, PROT_READ, MAP_PRIVATE | MAP_FIXED, fd, 0);
close(fd);

(请注意,您可以在mmap()之后立即关闭fd,因为mmap()添加了对打开文件描述的引用).

(Note that you can close fd immediately after the mmap(), because mmap() adds a reference to the open file description).

您当然应该在上面添加错误检查.另外,许多UNIX系统支持MAP_ANONYMOUS标志,您可以使用它代替打开/dev/zero(但这在POSIX中不是).

You should of course add error-checking to the above. Also, many UNIX systems support a MAP_ANONYMOUS flag which you can use instead of opening /dev/zero (but this is not in POSIX).

这篇关于如何将regexec与内存映射文件一起使用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆