prefetch的英特尔Core 2 Duo [英] Prefetch for Intel Core 2 Duo

查看:222
本文介绍了prefetch的英特尔Core 2 Duo的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

具有使用了Core 2 Duo处理器prefetch说明任何人都有经验吗?

Has anyone had experience using prefetch instructions for the Core 2 Duo processor?

我一直在使用(标)prefetch集( prefetchnta prefetcht1 等),成功进行了一系列P4的机器,但是在酷睿2运行code时,它似乎 prefetcht(I)说明什么都不做,那 prefetchnta 指令是事倍功半。

I've been using the (standard?) prefetch set (prefetchnta, prefetcht1, etc) with success for a series of P4 machines, but when running the code on a Core 2 Duo it seems that the prefetcht(i) instructions do nothing, and that the prefetchnta instruction is less effective.

我的评价性能的标准要BLAS 1向量 - 向量(axpy)操作中,定时结果当矢量大小为外的高速缓存行为足够大。

My criteria for assessing performance is the timing results for a BLAS 1 vector-vector (axpy) operation, when the vector size is large enough for out-of-cache behaviour.

有英特尔推出新的prefetch说明?

Have Intel introduced new prefetch instructions?

推荐答案

在Intel 64和IA-32架构的英特尔参考文件,检查出163页和77:

From an Intel reference document on Intel 64 and IA-32 Architectures, check out page 163 and 77:

Pentium 4和Intel Xeon处理器
  基于英特尔的NetBurst
  微架构推出硬件
  除了软件prefetching
  prefetching。硬件prefetcher
  操作透明地获取数据
  和指令从内存流
  无需编程
  介入。随后
  微体系结构继续改善
  和添加功能的硬件
  prefetching机制。前
  硬件的实现
  prefetching机制,着眼于
  prefetching数据和指令从
  内存L2;更近
  实现提供额外的
  功能prefetch数据L2到
  L1。在英特尔的NetBurst
  微架构,硬件
  prefetcher可以跟踪8个独立
  流。

Pentium 4 and Intel Xeon processors based on Intel NetBurst microarchitecture introduced hardware prefetching in addition to software prefetching. The hardware prefetcher operates transparently to fetch data and instruction streams from memory without requiring programmer intervention. Subsequent microarchitectures continue to improve and add features to the hardware prefetching mechanisms. Earlier implementations of hardware prefetching mechanisms focus on prefetching data and instruction from memory to L2; more recent implementations provide additional features to prefetch data from L2 to L1. In Intel NetBurst microarchitecture, the hardware prefetcher can track 8 independent streams.

奔腾M处理器还提供
  硬件prefetcher数据。它可以
  跟踪在12个单独的流
  向前的方向,并在4个流
  向后方向。处理器的
  preFETCHNTA指令也取
  64字节到firstlevel数据
  缓存,而不污染
  二级缓存。

The Pentium M processor also provides a hardware prefetcher for data. It can track 12 separate streams in the forward direction and 4 streams in the backward direction. The processor’s PREFETCHNTA instruction also fetches 64-bytes into the firstlevel data cache without polluting the second-level cache.

英特尔Core Solo和英特尔酷睿双核
  处理器提供更先进
  硬件prefetchers比数据
  奔腾M处理器。主要差异
  总结在表2-10

Intel Core Solo and Intel Core Duo processors provide more advanced hardware prefetchers for data than Pentium M processors. Key differences are summarized in Table 2-10.

这篇关于prefetch的英特尔Core 2 Duo的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆