在NUMA体系结构中按线程移动内存页面 [英] Move memory pages per-thread in NUMA architecture

查看:131
本文介绍了在NUMA体系结构中按线程移动内存页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一个回答两个问题:

(i)假设线程X在CPU Y上运行.是否可以使用syscalls migration_pages-甚至更好的move_pages(或其libnuma包装器)将与X关联的页面移动到连接Y的节点上?

(i) Suppose thread X is running at CPU Y. Is it possible to use the syscalls migrate_pages - or even better move_pages (or their libnuma wrapper) - to move the pages associated with X to the node in which Y is connected?

之所以提出这个问题,是因为两个系统调用的第一个参数都是PID(并且我需要使用逐线程方法进行研究)

This question arrises because first argument of both syscalls is PID (and i need a per-thread approach for some researching i'm doing)

(ii)对于(i)的肯定回答,我如何获得某个线程使用的所有页面?我的目标是移动包含数组M []的页面,例如...为了使用上面的syscall,如何将数据结构与其存储页面链接"?

(ii) in the case of positive answer for (i), how can i get all the pages used by some thread? My aim is, move the page(s) that contains array M[] for exemple...how to "link" data structures with their memory pages, for the sake of using the syscalls above?

一个额外的信息:我正在将C与pthreads一起使用.预先感谢!

An extra information: i'm using C with pthreads. Thanks in advance !

推荐答案

这是我用于将线程固定到单个CPU并将堆栈移动到相应NUMA节点的代码(略微适用于删除在其他地方定义的某些常量).请注意,我首先正常创建线程,然后从线程内部调用下面的SetAffinityAndRelocateStack().我认为这比尝试创建自己的堆栈要好得多,因为堆栈对到达底部的增长具有特殊的支持.

Here's the code I use for pinning a thread to a single CPU and moving the stack to the corresponding NUMA node (slightly adapted to remove some constants defined elsewhere). Note that I first create the thread normally, and then call the SetAffinityAndRelocateStack() below from within the thread. I think this is much better then trying to create your own stack, since stacks have special support for growing in case the bottom is reached.

该代码还可以适于从外部在新创建的线程上运行,但是这可能会引起竞争条件(例如,如果线程在其堆栈中执行I/O),因此我不建议这样做. /p>

The code can also be adapted to operate on the newly created thread from outside, but this could give rise to race conditions (e.g. if the thread performs I/O into its stack), so I wouldn't recommend it.

void* PreFaultStack()
{
    const size_t NUM_PAGES_TO_PRE_FAULT = 50;
    const size_t size = NUM_PAGES_TO_PRE_FAULT * numa_pagesize();
    void *allocaBase = alloca(size);
    memset(allocaBase, 0, size);
    return allocaBase;
}

void SetAffinityAndRelocateStack(int cpuNum)
{
    assert(-1 != cpuNum);
    cpu_set_t cpuset;
    CPU_ZERO(&cpuset);
    CPU_SET(cpuNum, &cpuset);
    const int rc = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpuset);
    assert(0 == rc);

    pthread_attr_t attr;
    void *stackAddr = nullptr;
    size_t stackSize = 0;
    if ((0 != pthread_getattr_np(pthread_self(), &attr)) || (0 != pthread_attr_getstack(&attr, &stackAddr, &stackSize))) {
        assert(false);
    }

    const unsigned long nodeMask = 1UL << numa_node_of_cpu(cpuNum);
    const auto bindRc = mbind(stackAddr, stackSize, MPOL_BIND, &nodeMask, sizeof(nodeMask), MPOL_MF_MOVE | MPOL_MF_STRICT);
    assert(0 == bindRc);

    PreFaultStack();
    // TODO: Also lock the stack with mlock() to guarantee it stays resident in RAM
    return;
}

这篇关于在NUMA体系结构中按线程移动内存页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆