在NUMA体系结构中按线程移动内存页面 [英] Move memory pages per-thread in NUMA architecture
问题描述
我一个回答两个问题:
(i)假设线程X在CPU Y上运行.是否可以使用syscalls migration_pages-甚至更好的move_pages(或其libnuma包装器)将与X关联的页面移动到连接Y的节点上?
(i) Suppose thread X is running at CPU Y. Is it possible to use the syscalls migrate_pages - or even better move_pages (or their libnuma wrapper) - to move the pages associated with X to the node in which Y is connected?
之所以提出这个问题,是因为两个系统调用的第一个参数都是PID(并且我需要使用逐线程方法进行研究)
This question arrises because first argument of both syscalls is PID (and i need a per-thread approach for some researching i'm doing)
(ii)对于(i)的肯定回答,我如何获得某个线程使用的所有页面?我的目标是移动包含数组M []的页面,例如...为了使用上面的syscall,如何将数据结构与其存储页面链接"?
(ii) in the case of positive answer for (i), how can i get all the pages used by some thread? My aim is, move the page(s) that contains array M[] for exemple...how to "link" data structures with their memory pages, for the sake of using the syscalls above?
一个额外的信息:我正在将C与pthreads一起使用.预先感谢!
An extra information: i'm using C with pthreads. Thanks in advance !
推荐答案
这是我用于将线程固定到单个CPU并将堆栈移动到相应NUMA节点的代码(略微适用于删除在其他地方定义的某些常量).请注意,我首先正常创建线程,然后从线程内部调用下面的SetAffinityAndRelocateStack()
.我认为这比尝试创建自己的堆栈要好得多,因为堆栈对到达底部的增长具有特殊的支持.
Here's the code I use for pinning a thread to a single CPU and moving the stack to the corresponding NUMA node (slightly adapted to remove some constants defined elsewhere). Note that I first create the thread normally, and then call the SetAffinityAndRelocateStack()
below from within the thread. I think this is much better then trying to create your own stack, since stacks have special support for growing in case the bottom is reached.
该代码还可以适于从外部在新创建的线程上运行,但是这可能会引起竞争条件(例如,如果线程在其堆栈中执行I/O),因此我不建议这样做. /p>
The code can also be adapted to operate on the newly created thread from outside, but this could give rise to race conditions (e.g. if the thread performs I/O into its stack), so I wouldn't recommend it.
void* PreFaultStack()
{
const size_t NUM_PAGES_TO_PRE_FAULT = 50;
const size_t size = NUM_PAGES_TO_PRE_FAULT * numa_pagesize();
void *allocaBase = alloca(size);
memset(allocaBase, 0, size);
return allocaBase;
}
void SetAffinityAndRelocateStack(int cpuNum)
{
assert(-1 != cpuNum);
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(cpuNum, &cpuset);
const int rc = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpuset);
assert(0 == rc);
pthread_attr_t attr;
void *stackAddr = nullptr;
size_t stackSize = 0;
if ((0 != pthread_getattr_np(pthread_self(), &attr)) || (0 != pthread_attr_getstack(&attr, &stackAddr, &stackSize))) {
assert(false);
}
const unsigned long nodeMask = 1UL << numa_node_of_cpu(cpuNum);
const auto bindRc = mbind(stackAddr, stackSize, MPOL_BIND, &nodeMask, sizeof(nodeMask), MPOL_MF_MOVE | MPOL_MF_STRICT);
assert(0 == bindRc);
PreFaultStack();
// TODO: Also lock the stack with mlock() to guarantee it stays resident in RAM
return;
}
这篇关于在NUMA体系结构中按线程移动内存页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!