如何在读写字符设备时避免CPU使用率过高? [英] How to avoid high cpu usage while reading/writing character device?

查看:94
本文介绍了如何在读写字符设备时避免CPU使用率过高?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要为具有SRAM的PCIe设备编写一个Linux内核驱动程序。

I need to write a linux kernel driver for a PCIe device with SRAM.

第一次尝试,我编写了一个驱动程序,可以通过PCIe从PCIe访问

For the first attempt, I've written a driver to access SRAM from PCIe with a character device.

一切正常,但是有一个问题。 SRAM慢1MB读取/写入大约需要2秒钟,这是硬件限制。读取/写入时,CPU 100%繁忙。女巫是个问题。我不需要速度,读取/写入可能会很慢,但是为什么要占用这么多CPU?

Everything works as expected, but there is one problem. SRAM is slow 1MB takes about 2 secs to read/write, this is a hardware limitation. The CPU is 100% busy while reading/writing. Witch is a problem. I don't need speed, reading/writing can be slow, but why it takes so much CPU?

使用 pci_iomap

  g_mmio_buffer[0] = pci_iomap(pdev, SRAM_BAR_H, g_mmio_length);

读/写功能如下:

static ssize_t dev_read(struct file *fp, char *buf, size_t len, loff_t *off) {
  unsigned long rval;
  size_t copied;

  rval = copy_to_user(buf, g_mmio_buffer[SRAM_BAR] + *off, len);

  if (rval < 0) return -EFAULT;

  copied = len - rval;
  *off += copied;

  return copied;
}

static ssize_t dev_write(struct file *fp, const char *buf, size_t len, loff_t *off) {
  unsigned long rval;
  size_t copied;

  rval = copy_from_user(g_mmio_buffer[SRAM_BAR] + *off, buf, len);

  if (rval < 0) return -EFAULT;

  copied = len - rval;
  *off += copied;

  return copied;
}

问题是如何使用CPU资源过多?

The question is what can I do with high CPU usage?

我应该重写驱动程序以使用块设备而不使用字符吗?

Should I rewrite the driver to use a block device instead of a character?

允许CPU在读取时在另一个进程上工作/保存数据?

Allow CPU to work on another process while reading/saving data?

推荐答案

正如@ 0andriy指出的那样,您不应直接访问iomem。有诸如 memcpy_toio() memcpy_fromio()之类的函数可以在iomem和普通内存之间复制,但它们仅为了在内核虚拟地址上工作。

As pointed out by @0andriy, you are not supposed to access iomem directly. There are functions such as memcpy_toio() and memcpy_fromio() that can copy between iomem and normal memory, but they only work on kernel virtual addresses.

为了在不使用中间数据缓冲区的情况下从用户空间地址复制到iomem,需要对用户空间存储页进行固定操作。进入物理内存。可以使用 get_user_pages_fast()完成。但是,固定页面可能位于高内存存储区中。 (highmem)在内核中永久映射的内存之外。此类页面需要使用 kmap_atomic()在短时间内临时映射到内核虚拟地址空间。 (有一些规则控制 kmap_atomic()的使用,还有其他函数可以长期映射highmem。请检查 highmem 文档以获取详细信息。)

In order to copy from userspace addresses to iomem without using an intermediate data buffer, the userspace memory pages need to be "pinned" into physical memory. That can be done using get_user_pages_fast(). However, the pinned pages may be in "high memory" (highmem) which is outside the permanently mapped memory in the kernel. Such pages need to be temporarily mapped into kernel virtual address space for a short duration using kmap_atomic(). (There are rules governing the use of kmap_atomic(), and there are other functions for longer term mapping of highmem. Check the highmem documentation for details.)

一旦用户空间页面被选中,映射到内核虚拟地址空间, memcpy_toio() memcpy_fromio()可用于在该页面和iomem之间进行复制

Once a userspace page has beem mapped to kernel virtual address space, memcpy_toio() and memcpy_fromio() can be used to copy between that page and iomem.

kmap_atomic()临时映射的页面需要由 kunmap_atomic()

A page temporarily mapped by kmap_atomic() needs to be unmapped by kunmap_atomic().

get_user_pages_fast()固定的用户内存页面需要通过调用来单独取消固定put_page(),但是如果页面内存已被写入(例如,通过 memcpy_fromio()写入),则必须先将其标记为脏。通过 set_page_dirty_lock(),然后调用 put_page()

User memory pages pinned by get_user_pages_fast() need to be unpinned individually by calling put_page(), but if the page memory has been written to (e.g. by memcpy_fromio(), it must first be flagged as "dirty" by set_page_dirty_lock() before calling put_page().

将所有内容一起,以下功能可以用于在用户内存和iomem之间进行复制:

Putting all that together, the following functions may be used to copy between user memory and iomem:

#include <linux/kernel.h>
#include <linux/uaccess.h>
#include <linux/mm.h>
#include <linux/highmem.h>
#include <linux/io.h>

/**
 * my_copy_to_user_from_iomem - copy to user memory from MMIO
 * @to:     destination in user memory
 * @from:   source in remapped MMIO
 * @n:      number of bytes to copy
 * Context: process
 *
 * Returns number of uncopied bytes.
 */
long my_copy_to_user_from_iomem(void __user *to, const void __iomem *from,
                unsigned long n)
{
    might_fault();
    if (!access_ok(to, n))
        return n;
    while (n) {
        enum { PAGE_LIST_LEN = 32 };
        struct page *page_list[PAGE_LIST_LEN];
        unsigned long start;
        unsigned int p_off;
        unsigned int part_len;
        int nr_pages;
        int i;

        /* Determine pages to do this iteration. */
        p_off = offset_in_page(to);
        start = (unsigned long)to - p_off;
        nr_pages = min_t(int, PAGE_ALIGN(p_off + n) >> PAGE_SHIFT,
                 PAGE_LIST_LEN);
        /* Lock down (for write) user pages. */
        nr_pages = get_user_pages_fast(start, nr_pages, 1, page_list);
        if (nr_pages <= 0)
            break;

        /* Limit number of bytes to end of locked-down pages. */
        part_len =
            min(n, ((unsigned long)nr_pages << PAGE_SHIFT) - p_off);

        /* Copy from iomem to locked-down user memory pages. */
        for (i = 0; i < nr_pages; i++) {
            struct page *page = page_list[i];
            unsigned char *p_va;
            unsigned int plen;

            plen = min((unsigned int)PAGE_SIZE - p_off, part_len);
            p_va = kmap_atomic(page);
            memcpy_fromio(p_va + p_off, from, plen);
            kunmap_atomic(p_va);
            set_page_dirty_lock(page);
            put_page(page);
            to = (char __user *)to + plen;
            from = (const char __iomem *)from + plen;
            n -= plen;
            part_len -= plen;
            p_off = 0;
        }
    }
    return n;
}

/**
 * my_copy_from_user_to_iomem - copy from user memory to MMIO
 * @to:     destination in remapped MMIO
 * @from:   source in user memory
 * @n:      number of bytes to copy
 * Context: process
 *
 * Returns number of uncopied bytes.
 */
long my_copy_from_user_to_iomem(void __iomem *to, const void __user *from,
                unsigned long n)
{
    might_fault();
    if (!access_ok(from, n))
        return n;
    while (n) {
        enum { PAGE_LIST_LEN = 32 };
        struct page *page_list[PAGE_LIST_LEN];
        unsigned long start;
        unsigned int p_off;
        unsigned int part_len;
        int nr_pages;
        int i;

        /* Determine pages to do this iteration. */
        p_off = offset_in_page(from);
        start = (unsigned long)from - p_off;
        nr_pages = min_t(int, PAGE_ALIGN(p_off + n) >> PAGE_SHIFT,
                 PAGE_LIST_LEN);
        /* Lock down (for read) user pages. */
        nr_pages = get_user_pages_fast(start, nr_pages, 0, page_list);
        if (nr_pages <= 0)
            break;

        /* Limit number of bytes to end of locked-down pages. */
        part_len =
            min(n, ((unsigned long)nr_pages << PAGE_SHIFT) - p_off);

        /* Copy from locked-down user memory pages to iomem. */
        for (i = 0; i < nr_pages; i++) {
            struct page *page = page_list[i];
            unsigned char *p_va;
            unsigned int plen;

            plen = min((unsigned int)PAGE_SIZE - p_off, part_len);
            p_va = kmap_atomic(page);
            memcpy_toio(to, p_va + p_off, plen);
            kunmap_atomic(p_va);
            put_page(page);
            to = (char __iomem *)to + plen;
            from = (const char __user *)from + plen;
            n -= plen;
            part_len -= plen;
            p_off = 0;
        }
    }
    return n;
}

第二,您可能可以通过映射来加速内存访问表示为写合并;通过将 pci_iomap()替换为 pci_iomap_wc()

Secondly, you might be able to speed up memory access by mapping the iomem as "write combined" by replacing pci_iomap() with pci_iomap_wc().

第三,避免在访问慢速内存时等待等待状态的真正方法是不使用CPU,而是使用DMA传输。细节很大程度上取决于您的PCIe设备的总线主控DMA功能(如果有的话)。在DMA传输期间,用户内存页面仍需要固定(例如,通过 get_user_pages_fast()),但不需要通过 kmap_atomic( )

Thirdly, the only real way to avoid wait-stating the CPU when accessing slow memory is to not use the CPU and use DMA transfers instead. The details of that very much depend on your PCIe device's bus-mastering DMA capabilities (if it has any at all). User memory pages still need to be pinned (e.g. by get_user_pages_fast()) during the DMA transfer, but do not need to be temporarily mapped by kmap_atomic().

这篇关于如何在读写字符设备时避免CPU使用率过高?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆