从Intel上的CPUID结果了解TLB [英] Understanding TLB from CPUID results on Intel

查看:266
本文介绍了从Intel上的CPUID结果了解TLB的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究cpuid指令的第0x02页,并提出了一些问题.文档中有一个表 ,该表描述了cpuid结果对TLB配置的含义.他们在这里:

I'm exploring leaf 0x02 of the cpuid instruction and came up with a few questions. There is a table in the documentation which describes what cpuid results mean for the TLB configuration. Here they are:

56H TLB Data TLB0: 4 MByte pages, 4-way set associative, 16 entries
[...]
B4H TLB Data TLB1: 4 KByte pages, 4-way associative, 256 entries

这是否意味着只有2个级别的TLB?如果某些x86供应商决定提供3个级别的TLB,如何查询TLB缓存的级别数?

Does it mean that there are only 2 levels of TLB? How to query the number of levels of TLB cache in case some x86 vendor decides to provide 3 levels of TLB?

57H TLB Data TLB0: 4 KByte pages, 4-way associative, 16 entries
[...] 
B4H TLB Data TLB1: 4 KByte pages, 4-way associative, 256 entries

这里的"4向关联"只是一个错字,意思是"4向设置关联"?

Is "4-way associative" here just a typo meaning that "4-way set associative"?

55H TLB Instruction TLB: 2-MByte or 4-MByte pages, fully associative, 7 entries
[...]
6AH Cache uTLB: 4 KByte pages, 8-way set associative, 64 entries
6BH Cache DTLB: 4 KByte pages, 8-way set associative, 256 entries

DTLB代表数据TLB吗? uTLB是什么意思? uops-TLB?这里考虑哪个TLB缓存级别?

Does DTLB stand for Data TLB? What does uTLB mean? uops-TLB? Which TLB cache level is considered here?

C1H STLB Shared 2nd-Level TLB: 4 KByte/2MByte pages, 8-way associative, 1024 entries

这是否意味着在这种情况下,所有内核之间都共享第二级TLB?因此,如果未明确指定,则TLB缓存核心是否为私有?

Does this mean that in that case the 2nd level TLB is shared among all cores? So when not specified explicitly is the TLB cache core private?

推荐答案

在某些x86供应商的情况下如何查询TLB缓存的级别数 决定提供3个级别的TLB?

How to query the number of levels of TLB cache in case some x86 vendor decides to provide 3 levels of TLB?

叶0x2可能仅在Intel处理器上返回TLB信息.它在所有当前的AMD处理器上保留.在当前所有的Intel处理器上,没有一个数字可以告诉您TLB级别的数量.确定级别数的唯一方法是枚举所有与TLB相关的cpuid叶子或子叶子.以下算法可在支持cpuid指令的所有当前Intel处理器上运行(直至并包括Ice Lake,Goldmont Plus和Knights Mill):

Leaf 0x2 may return TLB information only on Intel processors. It's reserved on all current AMD processors. On all current Intel processors, there is no single number that tells you the number of TLB levels. The only way to determine the number of levels is by enumerating all the TLB-related cpuid leafs or subleafs. The following algorithm works on all current Intel processors that support the cpuid instruction (up to and including Ice Lake, Goldmont Plus, and Knights Mill):

  1. 检查在将EAX设置为叶子0x2的情况下执行cpuid时返回的四个寄存器EAX,EBX,ECX和EDX中是否存在值0xFE.
  2. 如果不存在0xFE,请枚举四个寄存器中的所有字节.根据英特尔手册第2卷(编号325383-070US)的表3-12,将有一个或两个数据TLB描述符可以缓存4KB转换.英特尔手册为可缓存数据访问转换的TLB使用以下不同名称:数据TLB,数据TLB0,数据TLB1,DTLB,uTLB和共享的第二层TLB.如果有两个这样的描述符,则级别数为两个.具有更大数量的TLB编号的描述符是用于第二级TLB的描述符.如果只有一个这样的描述符,则级别数为一.
  3. 如果存在0xFE,则需要从cpuid叶0x18获得TLB信息.枚举所有有效子叶,直到最大有效子叶数.如果至少有一个子叶的EDX的至少两个有效位等于11,则TLB级别的数量为2.否则,TLB级别数为1.
  1. Check whether the value 0xFE exists in any of the four registers EAX, EBX, ECX, and EDX returned when cpuid is executed with EAX set to leaf 0x2.
  2. If 0xFE doesn't exist, enumerate all the bytes in the four registers. Based on Table 3-12 of the Intel manual Volume 2 (number 325383-070US), there will be either one or two descriptors of data TLBs that can cache 4KB translations. The Intel manual uses the following different names for TLBs that may cache data access translations: Data TLB, Data TLB0, Data TLB1, DTLB, uTLB, and Shared 2nd-Level TLB. If there are two such descriptors, then the number of levels is two. The descriptor with the larger number of TLB numbers is the one for the second-level TLB. If there is only one such descriptor, the number of levels is one.
  3. If 0xFE exists, the TLB information needs to be obtained from cpuid leaf 0x18. Enumerate all the valid subleafs up to the maximum valid subleaf number. If there is at least one subleaf with the least two significant bits of EDX equal to 11, then the number of TLB levels is two. Otherwise, the number of TLB levels is one.

Ice Lake和Goldmont Plus处理器的TLB信息显示在叶子0x18中.该叶子为编码TLB信息提供了更大的灵活性.叶子0x2中提供了所有其他当前Intel处理器的TLB信息.我不了解Knights Mill(如果有人可以访问Knights Mill,请考虑共享cpuid转储).

The TLB information for Ice Lake and Goldmont Plus processors is present in leaf 0x18. This leaf provides more flexibility in encoding TLB information. The TLB information for all other current Intel processors is present in leaf 0x2. I don't know about Knights Mill (if someone has access to a Knights Mill, please consider sharing the cpuid dump).

确定TLB级别的数量不足以完全描述级别之间的相互关系.当前的Intel处理器实现了两个不同的2级TLB层次结构:

Determining the number of TLB levels is not sufficient to fully describe how the levels are related to each other. Current Intel processors implement two different 2-level TLB hierarchies:

  • 第二级TLB可以缓存数据加载(包括预取),数据存储和指令取回的转换.在这种情况下,第二级TLB被称为共享的第二级TLB".
  • 第二级TLB可以缓存数据加载和存储的转换,但不能缓存指令提取.在这种情况下,以下任何一种都称为第二级TLB:数据TLB,数据TLB1或DTLB.

我将基于 InstLatx64 中的cpuid转储讨论几个示例.在启用了超线程的 Haswell 处理器之一上,叶0x2在四个寄存器中提供以下信息:

I'll discuss a couple of examples based on the cpuid dumps from InstLatx64. On one of the Haswell processors with hyperthreading enabled, leaf 0x2 provides the following information in the four registers:

76036301-00F0B5FF-00000000-00C10000

没有0xFE,因此该叶本身中包含TLB信息.根据表3-12:

There is no 0xFE, so the TLB information is present in this leaf itself. According to Table 3-12:

76: Instruction TLB: 2M/4M pages, fully associative, 8 entries
03: Data TLB: 4 KByte pages, 4-way set associative, 64 entries
63: Data TLB: 2 MByte or 4 MByte pages, 4-way set associative, 32 entries and a separate array with 1 GByte pages, 4-way set associative, 4 entries
B5: Instruction TLB: 4KByte pages, 8-way set associative, 64 entries
C1: Shared 2nd-Level TLB: 4 KByte/2MByte pages, 8-way associative, 1024 entries

其他字节与TLB不相关.

The other bytes are not relevant to TLBs.

与英特尔优化手册的表2-17(编号248966-042b)相比,存在一个差异.表2-17提到用于4KB条目的TLB指令具有128条条目(4路关联),并且在两个超线程之间动态分区.但是TLB转储说它是8路关联的,只有64个条目.对于具有128个条目的4路ITLB,实际上没有编码,因此我认为手册是错误的.无论如何,C1显示有两个TLB级别,第二个级别缓存数据和指令翻译.

There is one discrepancy compared to Table 2-17 of the Intel optimization manual (number 248966-042b). Table 2-17 mentions that the instruction TLB for 4KB entries has 128 entries, 4-way associative, and is dynamically partitioned between the two hyperthreads. But the TLB dump says that it's 8-way associative and there are only 64 entries. There is actually no encoding for a 4-way ITLB with 128-entries, so I think the manual is wrong. Anyway, C1 shows that there are two TLB levels and the second level caches data and instruction translations.

Goldmont 处理器之一上,叶0x2在四个寄存器中提供以下信息:

On one of the Goldmont processors, leaf 0x2 provides the following information in the four registers:

6164A001-0000FFC4-00000000-00000000

以下是与TLB相关的字节的解释:

Here is the interpretation of the TLB-relevant bytes:

61: Instruction TLB: 4 KByte pages, fully associative, 48 entries
64: Data TLB: 4 KByte pages, 4-way set associative, 512 entries
A0: DTLB: 4k pages, fully associative, 32 entries
C4: DTLB: 2M/4M Byte pages, 4-way associative, 32 entries

对于4KB页面有两个数据TLB,一个有512个条目,另一个有32个条目.这意味着处理器具有两个级别的TLB.第二级称为数据TLB",因此它只能缓存数据转换.

There are two data TLBs for 4KB pages, one has 512 entries and the other has 32 entries. This means that the processor has two levels of TLBs. The second level is called "Data TLB" and so it can only cache data translations.

优化手册的表19-4提到Goldmont中的ITLB支持大页面,但是TLB信息中没有此信息.数据TLB信息与手册的表19-7一致,只是在手册中将数据TLB"和"DTLB"分别称为"DTLB"和"uTLB".

Table 19-4 of the optimization manual mentions that the ITLB in Goldmont supports large pages, but this information is not present in the TLB information. The data TLB information is consistent with Table 19-7 of the manual, except that the "Data TLB" and "DTLB" are called "DTLB" and "uTLB", respectively, in the manual.

Knights Landing 处理器之一上,叶0x2在四个寄存器中提供以下信息:

On one of the Knights Landing processors, leaf 0x2 provides the following information in the four registers:

6C6B6A01-00FF616D-00000000-00000000
6C: DTLB: 2M/4M pages, 8-way set associative, 128 entries
6B: DTLB: 4 KByte pages, 8-way set associative, 256 entries
6A: uTLB: 4 KByte pages, 8-way set associative, 64 entries
61: Instruction TLB: 4 KByte pages, fully associative, 48 entries
6D: DTLB: 1 GByte pages, fully associative, 16 entries

因此有两个TLB级别.第一个由用于不同页面大小的多个结构组成. 4KB页面的TLB称为uTLB,其他页面大小的TLB称为DTLB.第二级TLB称为DTLB.这些数字和名称与手册中的表20-3一致.

So there are two TLB levels. The first one consists of multiple structures for different page sizes. The TLB for 4KB pages is called uTLB and the TLBs for the other pages sizes are called DTLBs. The second level TLB is called DTLB. These numbers and names are consistent with Table 20-3 from the manual.

Silvermont 处理器提供以下TLB信息:

Silvermont processors provide the following TLB information:

61B3A001-0000FFC2-00000000-00000000
61: Instruction TLB: 4 KByte pages, fully associative, 48 entries
B3: Data TLB: 4 KByte pages, 4-way set associative, 128 entries
A0: DTLB: 4k pages, fully associative, 32 entries
C2: DTLB: 4 KByte/2 MByte pages, 4-way associative, 16 entries

此信息与手册一致,但C2除外.我认为应该说"4 MByte/2 MByte",而不是"4 KByte/2 MByte".这可能是手册中的错字.

This information is consistent with the manual, except for C2. I think it should say "4 MByte/2 MByte" instead of "4 KByte/2 MByte." It's probably a typo in the manual.

英特尔 Penryn 微体系结构是一个示例,其中TLB信息使用名称TLB0和TLB1来引用第一级和第二级TLB:

The Intel Penryn microarchitecture is an example where the TLB information uses the names TLB0 and TLB1 to refer to the first and second level TLBs:

05: Data TLB1: 4 MByte pages, 4-way set associative, 32 entries
B0: Instruction TLB: 4 KByte pages, 4-way set associative, 128 entries
B1: Instruction TLB: 2M pages, 4-way, 8 entries or 4M pages, 4-way, 4 entries
56: Data TLB0: 4 MByte pages, 4-way set associative, 16 entries
57: Data TLB0: 4 KByte pages, 4-way associative, 16 entries
B4: Data TLB1: 4 KByte pages, 4-way associative, 256 entries

较旧的Intel处理器具有单级TLB层次结构.例如,这是 Prescott 的TLB信息:

Older Intel processors have single-level TLB hierarchies. For example, here is the TLB information for Prescott:

5B: Data TLB: 4 KByte and 4 MByte pages, 64 entries
50: Instruction TLB: 4 KByte and 2-MByte or 4-MByte pages, 64 entries

所有Intel 80386处理器和某些Intel 80486处理器都包含单级TLB层次结构,但不支持cpuid指令.在80386之前的处理器上,没有分页.如果您希望以上算法在所有Intel x86处理器上都可以使用,则还必须考虑这些情况.英特尔文档编号241618-025,标题为处理器标识和CPUID指令",可以找到

All Intel 80386 processors and some Intel 80486 processors include a single-level TLB hierarchy, but don't support the cpuid instruction. On processors earlier than 80386, there is no paging. If you want the algorithm above to work on all Intel x86 processors, you'll have to consider these cases as well. The Intel document number 241618-025 titled "Processor Identification and the CPUID Instruction," which can be found here, discusses how to handle these cases in Chapter 7.

我将讨论一个示例,其中TLB信息出现在叶子0x18中而不是叶子0x2中.就像我之前说过的那样,现有的唯一具有0x18 TLB信息的英特尔处理器是Ice Lake和Goldmont Plus处理器(也许还有Knights Mill). Ice Lake 处理器的叶子0x2转储为:

I'll discuss an example where the TLB information is present in leaf 0x18 rather than leaf 0x2. Like I said earlier, the only existing Intel processors that have the TLB information present in 0x18 are Ice Lake and Goldmont Plus processors (and maybe Knights Mill). The leaf 0x2 dump for an Ice Lake processor is:

00FEFF01-000000F0-00000000-00000000

有一个0xFE字节,因此TLB信息存在于功能更强大的叶子0x18中.叶0x18的子叶0x0指定最大有效子叶为0x7.这是子叶0x0到0x7的转储:

There is an 0xFE byte, so the TLB information is present in the more powerful leaf 0x18. Subleaf 0x0 of leaf 0x18 specifies that the maximum valid subleaf is 0x7. Here are the dumps for subleafs 0x0 to 0x7:

00000007-00000000-00000000-00000000 [SL 00]
00000000-00080007-00000001-00004122 [SL 01]
00000000-0010000F-00000001-00004125 [SL 02]
00000000-00040001-00000010-00004024 [SL 03]
00000000-00040006-00000008-00004024 [SL 04]
00000000-00080008-00000001-00004124 [SL 05]
00000000-00080007-00000080-00004043 [SL 06]
00000000-00080009-00000080-00004043 [SL 07]

英特尔手册介绍了如何解码这些位.每个有效的子叶描述一个单一的TLB结构.如果EDX的最低有效五位不全为零,则子叶有效(即描述TLB结构).因此,子叶0x0无效.接下来的七个子叶都是有效的,这意味着Ice Lake处理器中有7个TLB描述符. EDX的最低有效五位指定TLB的类型,接下来的三位指定TLB的级别.通过解码子叶位获得以下信息:

The Intel manual describes how to decode these bits. Each valid subleaf describes a single TLB structure. A subleaf is valid (i.e., describes a TLB structure) if the least significant five bits of EDX are not all zeros. Hence, subleaf 0x0 is invalid. The next seven subleafs are all valid, which means that there are 7 TLB descriptors in an Ice Lake processor. The least significant five bits of EDX specify the type of the TLB and the next three bits specify the level of the TLB. The following information is obtained by decoding the subleaf bits:

  • [SL 01] :描述了一级指令TLB,它是8路完全关联的高速缓存,能够缓存4KB,2MB和4MB页面的翻译.
  • [SL 02] :最低有效5位代表数字5,这是根据手册的最新版本(第2卷)保留的编码.其他位指定了16位完全关联的TLB,并且能够缓存所有页面大小的转换.英特尔已在优化手册的表2-5中提供了有关Ice Lake中TLB的信息.最接近的匹配表明,保留的编码5最有可能代表数据存储转换的第一级TLB.
  • [SL 03] :最低有效5位代表数字4,根据手册的最新版本,该数字也是保留的编码.与表2-5的最接近匹配表明,它代表可缓存4KB转换的数据加载的第一级TLB.方式和集合的数量与表2-5相匹配.
  • [SL 04] :类似于子叶0x3.与表2-5的最接近匹配表明,它代表可缓存2MB和4MB转换的数据加载的第一级TLB.方式和集合的数量与表2-5相匹配.
  • [SL 05] :类似于子叶0x3.与表2-5的最接近匹配表明,它代表可缓存1GB转换的数据加载的第一级TLB.方式和集合的数量与表2-5相匹配.
  • [SL 06] :描述由8种方式和128组组成的二级统一TLB,能够缓存4KB,2MB和4MB页面的翻译.
  • [SL 07] :描述由8种方式和128组组成的二级统一TLB,能够为4KB和1GB页面缓存翻译.
  • [SL 01]: Describes a first-level instruction TLB that is an 8-way fully associative cache capable of caching translations for 4KB, 2MB, and 4MB pages.
  • [SL 02]: The least significant five bits represent the number 5, which is a reserved encoding according to the most recent version of the manual (Volume 2). The other bits specify a TLB that is 16-way fully associative and capable of caching translations for all page sizes. Intel has provided information on the TLBs in Ice Lake in Table 2-5 of the optimization manual. The closest match shows that the reserved encoding 5 most likely represents a first-level TLB for data store translations.
  • [SL 03]: The least significant five bits represent the number 4, which is also a reserved encoding according to the most recent version of the manual. The closest match with Table 2-5 suggests that it represents a first-level TLB for data loads that can cache 4KB translations. The number of ways and sets matches Table 2-5.
  • [SL 04]: Similar to subleaf 0x3. The closest match with Table 2-5 suggests that it represents a first-level TLB for data loads that can cache 2MB and 4MB translations. The number of ways and sets matches Table 2-5.
  • [SL 05]: Similar to subleaf 0x3. The closest match with Table 2-5 suggests that it represents a first-level TLB for data loads that can cache 1GB translations. The number of ways and sets matches Table 2-5.
  • [SL 06]: Describes a second-level unified TLB consisting of 8 ways and 128 sets and capable of caching translations for 4KB, 2MB, and 4MB pages.
  • [SL 07]: Describes a second-level unified TLB consisting of 8 ways and 128 sets and capable of caching translations for 4KB and 1GB pages.

表2-5实际上提到只有一个统一的TLB结构,但是一半方法只能缓存4KB,2MB和4MB页面的翻译,另一半只能缓存4KB和1GB页面的翻译.因此,第二级TLB的TLB信息与手册一致.但是,指令TLB的TLB信息与表2-5不一致.该手册可能是正确的.在TLB信息转储中,用于4KB页面的ITLB似乎与用于2MB和4MB页面的ITLB混杂在一起.

Table 2-5 actually mentions that there is only one unified TLB structure, but half of the ways can only cache translations for 4KB, 2MB, and 4MB pages and the other half can only cache translations for 4KB and 1GB pages. So the TLB information for the second-level TLB is consistent with the manual. However, the TLB information for the instruction TLB is not consistent with Table 2-5. The manual is probably correct. The ITLB for 4KB pages seems to be mixed up with that for 2MB and 4MB pages in the TLB information dump.

在AMD处理器上,分别在叶子8000_0005和8000_0006中提供了第一级和第二级TLB的TLB信息.有关更多信息,请参见AMD手册第3卷.早于K5的AMD处理器不支持cpuid,其中一些处理器包含单级TLB.因此,如果您关心这些处理器,则需要一种替代机制来确定TLB是否存在. Zen 2在两个TLB级别上均增加了1GB支持.有关这些TLB的信息,请参见第8000_0019页.

On AMD processors, the TLB information for the first-level and second-level TLBs is provided in leafs 8000_0005 and 8000_0006, respectively. More information can be found in the AMD manual Volume 3. AMD processors earlier than the K5 don't support the cpuid and some of these processors include a single-level TLB. So if you care about these processors, you need an alternative mechanism to determine whether a TLB exists. Zen 2 adds 1GB support at both TLB levels. Information on these TLBs can be found in leaf 8000_0019.

AMD Zen具有三级指令TLB层次结构

AMD Zen has a three-level instruction TLB hierarchy according to AMD. This is the first core microarchitecture that I know of that uses a three-level TLB hierarchy. Most probably this is also the case on AMD Zen+ and AMD Zen 2 (but I couldn't find an AMD source that confirms this). There appears to be no documented cpuid information on the L0 ITLB. So you'll probably have to check whether the processor is AMD Zen or later and provide the L0 ITLB information (8 entries for all page sizes, probably fully associative) manually for these processors.

这里的"4向关联"只是一个错字,意思是"4向设置" 关联的?"

Is "4-way associative" here just a typo meaning that "4-way set associative"?

这不是错字.这些术语是同义词,并且都是常用术语.

It's not a typo. These terms are synonyms and both are commonly used.

DTLB代表数据TLB吗? uTLB是什么意思? uosp-TLB?哪个TLB 这里考虑缓存级别?

Does DTLB stand for Data TLB? What does uTLB mean? uosp-TLB? Which TLB cache level is considered here?

DTLB和uTLB都是数据TLB的名称. DTLB名称用于第一级和第二级TLB. uTLB名称仅用于第一级数据TLB,而它是micro-TLB的缩写.

DTLB and uTLB are both names for data TLBs. The DTLB name is used for both the first-level and second-level TLBs. The uTLB name is only used for the first-level data TLB and is short for micro-TLB.

这是否意味着在那种情况下第二级TLB在 所有核心?因此,如果未明确指定,则TLB缓存核心 私人的吗?

Does this mean that in that case the 2-nd level TLB is shared among all cores? So when not specified explicitly is the TLB cache core private?

此处的共享"一词是指统一的",因为在数据和指令翻译中都可以缓存.英特尔应该将其称为UTLB(大写U)或统一TLB,这是现代叶子0x18中使用的名称.

The term "shared" here means "unified" as in both data and instruction translations can be cached. Intel should have called it UTLB (capital U) or Unified TLB, which is the name used in the modern leaf 0x18.

这篇关于从Intel上的CPUID结果了解TLB的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆