如何解析通过/proc文件传递到内核模块的大量数据? [英] How to parse large amount of data passed to kernel module through /proc file?

查看:50
本文介绍了如何解析通过/proc文件传递到内核模块的大量数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编辑:我发现seq_file可以简化从内核到用户空间写入大量数据的过程.我在寻找的是相反的东西.该API有助于从用户空间读取大量数据(一页以上).

I have found seq_file that eases writing a lot of data from kernel to user-space. What I am looking for is the opposite; an API that facilitates reading a lot of data (more than one page) from user-space.

编辑2 :我正在将端口<stdio.h>作为内核模块实现,该模块可以打开与FILE类似的/proc(以及以后的其他虚拟文件系统).并处理类似于<stdio.h>的输入和输出.您可以在此处找到项目.

Edit 2: I am implementing a port of <stdio.h> as a kernel module that would be able to open /proc (and later, other virtual file systems) similar to FILEs and handle input and output similar to <stdio.h>. You can find the project here.

我发现了很多关于内核如何将大量数据写入/proc(供用户空间程序使用)的问题,但反之则无所作为.让我详细说明:

I have found a LOT of questions on how the kernel can write large amounts of data to /proc (for user-space programs to take), but nothing for the other way around. Let me elaborate:

这个问题基本上是关于对输入进行标记的算法(例如,对int s或int和字符串的混合,等等),鉴于数据可能在多个缓冲区之间中断 strong>.

This question is basically about the algorithm by which the input is tokenized (for example to ints or a mixture of int and string etc), given that the data maybe broken between multiple buffers.

例如,假设以下数据正在发送到内核模块:

For example, imagine the following data is being sent to the kernel module:

12345678 81234567 78123456 67812345 5678 1234 45678123 3456 7812 23456781

并且为了这个示例,我们假设Linux馈送/proc处理程序的页面大小为20字节(与实际4KB相比).

and for the sake of this example, let's say the page size by which Linux feeds the /proc handler is 20 bytes (vs the real 4KB).

从/proc(在内核模块中)读取数据的函数随后会看到这样的数据:

The function that reads the data from /proc (in the kernel module) then sees the data as such:

call 1:
"12345678 81234567 78"
call 2:
"123456 67812345 5678"
call 3:
" 1234 45678123 3456 "
call 4:
"7812 23456781"

如您所见,在第一次调用中读取78时,直到下一个帧时才应对其进行处理,以决定78是整数还是两个帧之间的一小节.

As you can see, when 78 is read in the first call, it shouldn't be processed yet until the next frames for it to decide whether 78 was a whole number or one cut between frames.

现在,我发现 seq_file 显然仅用于以下情况:内核希望将数据写入给用户,而不是 read (或者可能是HOWTO编写得很糟糕).

Now I found seq_files that apparently are only for when the kernel wants to write data to user rather than read (or it could be that the HOWTO is horribly written).

到目前为止,我提供了以下解决方案(我是从内存写的,所以我可能会错过几次错误检查,但请耐心等待):

So far, I have come with the following solution (I am writing from memory, so I may miss a couple error checkings, but bear with me):

在初始化阶段(例如init_module):

In the initialization phase (say init_module):

initialize mutex1 to 1 and mutex2 to 0
create /proc entry
call data_processor

/proc阅读器:

1. down(mutex1)    /* down_interruptible of course, but let's not get into details */

2. copy_from_user to an internal buffer
   buffer_index = 0
   data_length = whatever the size is

3. strip spaces from end of buffer (except if all left from buffer is 1 space)
   if so, there_was_space_after = 1 else 0

4. up(mutex2)

我将解释为什么以后要删除空格

I will explain why I strip spaces later

get_int函数:

wait_for_next = 0
number_was_cut = 0
last_number = 0

do
{
    1. down(mutex2)

    2. if (number_was_cut && !isdigit(buffer[buffer_index]))
           break     /* turns out it wasn't really cut
                        as beginning of next buffer is ' ' */
       number_was_cut = 0
       wait_for_next = 0

    3. while (buffer_index < data_length && !isdigit(buffer_index[buffer_index]))
           ++buffer_index;    /* skip white space */

    4. while (buffer_index < data_length && isdigit(buffer[buffer_index]))
           last_number = last_number * 10 + buffer[buffer_index++] - '0';

    5. if (buffer_index >= data_length && !there_was_space_after)
           number_was_cut = 1
           wait_for_next = 1
           up(mutex1)         /* let more data come in */
       else
           up(mutex2)         /* let get_int continue */
           break
} while (wait_for_next)

return last_number

data_processor函数(例如):

int first_num = get_int()
int sencod_num = get_int()
for i = first_num to second_num
    do_whatever(get_int())

说明:首先,请参见data_processor.它没有涉及如何读取数据的复杂性,因此它仅获取整数并对它们执行任何操作.现在,让我们看一下/proc reader.它基本上等待data_processor调用get_int足够的时间以消耗所有当前数据(步骤1),然后将下一个缓冲区复制到内部存储器中,从而允许data_processor继续(步骤2).然后,它需要去除尾随空格,以便可以稍微简化get_int(第3步).最后,它向get_int发出信号,表明它可以开始读取数据(步骤4).

Explanation: First, see data_processor. It doesn't get involved in complications on how the data are read, so it just gets integers and does whatever it wants with them. Now let's see /proc reader. It basically waits for data_processor to call get_int enough times for all current data to be consumed (step 1) and then copies the next buffer into internal memory, allowing data_processor to continue (step 2). It then needs to strip trailing spaces so get_int could be simplified a bit (step 3). Finally, it signals get_int that it can start reading the data (step 4).

get_int函数首先等待数据到达(步骤1),(现在忽略步骤2),它跳过所有不需要的字符(步骤3),然后开始读取数字(步骤4).读取数字的末尾有两种可能性:到达缓冲区的末尾(在这种情况下,如果/proc读取器未剥离任何空格,则可以在帧之间剪切可以的数字)或空白.在前一种情况下,它需要向/proc读取器发送信号以读取更多数据,并等待另一个周期将其余的数字附加到当前的数据上,而在后一种情况下,它将返回该数字(步骤5).如果从上一帧继续,请检查新帧是否以数字开头.如果不是,则先前的数字实际上是整数,应该返回.否则,它需要继续在最后一个数字后附加数字(第2步).

The get_int function first waits for data to arrive (step 1), (ignore step 2 for now) it skips any unwanted characters (step 3) and then starts reading the number (step 4). The end of reading the number is by two possibilities; the end of buffer is reached (in which case, if /proc reader had not stripped any spaces, then the number could be cut between frames) or white space is met. In the former case, it needs to signal /proc reader to read in more data and wait for another cycle to append the rest of the number to the current one and in the later case, it returns the number (step 5). If continuing from last frame, check to see if new frame starts with a number or not. If not, then previous number was actually a whole number and should be returned. Otherwise, it needs to continue appending digits to last number (step 2).

此方法的主要问题在于它过于复杂.当添加get_string时,它变得更加复杂,或者读取的整数可能是十六进制等.基本上,您必须重新发明sscanf!请注意,在这个简单示例中,可以在get_int的第4步中使用sscanf而不是while循环(或者也可以使用get_string,但是当十六进制输入也是可能的时,这会变得更加棘手(想象一下十六进制)数字被削减在0到x0212ae4之间.即使如此,它仅替换了get_int的第4步,其余内容仍将保留.

The main problem with this method is that it is overly complicated. It gets much more complicated when get_string is added, or the read integer could be hex etc. Basically, you have to reinvent sscanf! Note that, sscanf could be used in this simple example at step 4 of get_int instead of the while loop (or also with get_string, but that gets more tricky when hex input is also possible (imagine the hex number being cut between 0 and x0212ae4). Even so, it just replaces step 4 of get_int and the rest of the stuff should still remain.

实际上,它使我完成了许多错误和大量测试,以完善所有特殊情况.这就是为什么它在我看来并不优雅的原因.

It actually got me many bugs and heavy testing to perfect all the special cases. That's another reason why it doesn't look elegant to me.

我想知道是否有更好的方法来处理此问题.我知道使用共享内存可能是一种选择,但是我正在寻找一种用于该任务的算法(出于好奇,因为我已经有了可行的解决方案,所以更多).更具体地说:

I would like to know if there is any better method to handle this. I am aware that using shared memory could be an option, but I'm looking for an algorithm for this task (more out of curiosity since I already have my working solution). More specifically:

  • Linux内核中是否存在已经实现的方法,可以将其视为普通的C FILE,您可以从中获取数据,并且可以处理将数据分解成页面本身的问题?
  • 如果没有,那么我是否使事情变得过于复杂,是否错过了一个显而易见的简单解决方案?
  • 我相信fscanf面临类似的问题.怎么处理的?
  • Is there an already implemented method in the Linux kernel that can be treated like a normal C FILE from which you can take data and it handles the breaking of data into pages itself?
  • If no, am I over-complicating things and am I missing an obvious simple solution?
  • I believe fscanf faces a similar problem. How is this handled by that?

侧面问题:我在互斥锁上阻止/proc读取器是否是一件可怕的事情?我的意思是,写数据可能会阻塞,但是我不确定这通常发生在用户空间还是内核空间.

Side question: Is it a terrible thing that I'm blocking the /proc reader on a mutex? I mean, writing data can be blocking, but I'm not sure if that normally happens in user-space or kernel-space.

推荐答案

我最终决定写一些适当的东西来解决这个问题.

I finally decided to write something proper to solve this problem.

kio将是C的内核模块标准stdio.h的端口.它可以在读写模式下支持/proc/sys/dev文件系统,无论是文本还是二进制. kio严格遵循该标准,但需要进行一些细微调整以确保内核空间的安全性.

kio in short, will be a port of C's standard stdio.h for kernel modules. It will support either of /proc, /sys and /dev file systems in both read and write modes, whether text or binary. kio follows the standard closely, but has its minor tweaks to ensure safety in kernel space.

当前状态:

    可以创建
  • /proc个文件
  • 已实现读取功能
  • 实现写入功能
  • 文件一次只能由用户打开一次
  • /proc files can be created
  • Read functions are implemented
  • Write functions are implemented
  • The files can be only opened by users once at a time

这篇关于如何解析通过/proc文件传递到内核模块的大量数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆