memcpy的内部实现如何工作? [英] How does the internal implementation of memcpy work?

查看:131
本文介绍了memcpy的内部实现如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

标准C函数"memcpy"如何工作?它必须将(大)RAM块复制到RAM中的另一个区域.由于我知道您无法在汇编中从RAM到RAM的直接移动(使用mov指令),因此我猜测它在复制时会将CPU寄存器用作中间存储器?

How does the standard C function 'memcpy' work? It has to copy a (large) chunk of RAM to another area in the RAM. Since I know you cannot move straight from RAM to RAM in assembly (with the mov instruction) so I am guessing it uses a CPU register as the intermediate memory when copying?

但是它如何复制?按块(如何按块复制?),按单个字节(字符)或它们具有的最大数据类型(以long long double的形式复制-在我的系统上是12个字节).

But how does it copy? By blocks (how would it copy by blocks?), by individual bytes (char) or the largest data type they have (copy in long long double's - which is 12 bytes on my system).

显然,您可以直接将数据从RAM移到RAM ,我不是汇编专家,我从汇编中学到的所有信息都来自此文档(X86组装指南),该部分在mov指令部分中提到无法从RAM移到RAM.显然这是不正确的.

Ok apparently you can move data from RAM to RAM directly, I am not an assembly expert and all I have learnt about assembly is from this document (X86 assembly guide) which mentions in the section about the mov instruction that you cannot move from RAM to RAM. Apparently this isn't true.

推荐答案

取决于.通常,您无法在单个周期内物理上复制比最大可用寄存器大的文件,但这并不是当今机器真正的工作方式.实际上,您实际上并不在乎CPU在做什么,而在乎DRAM的特性.机器的内存层次结构将在以最快的方式执行此复制中扮演至关重要的决定性角色(例如,您是否要加载整个缓存行?相对于复制操作,DRAM行的大小是多少?).一个实现可能会选择使用某种矢量指令来实现memcpy.无需参考特定的实现,它实际上是一个带有一个位缓冲区的逐字节副本.

Depends. In general, you couldn't physically copy anything larger than the largest usable register in a single cycle, but that's not really how machines work these days. In practice, you really care less about what the CPU is doing and more about the characteristics of DRAM. The memory hierarchy of the machine is going to play a crucial determining role in performing this copy in the fastest possible manner (e.g., are you loading whole cache-lines? What's the size of a DRAM row with respect to the copy operation?). An implementation might instead choose to use some kind of vector instructions to implement memcpy. Without reference to a specific implementation, it's effectively a byte-for-byte copy with a one-place buffer.

这是一篇有趣的文章,描述了一个人优化.主要要点是,它总是根据您可以廉价执行的指令针对特定的体系结构和环境.

Here's a fun article that describes one person's adventure into optimizing memcpy. The main take-home point is that it is always going to be targeted to a specific architecture and environment based on the instructions you can execute inexpensively.

这篇关于memcpy的内部实现如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆