C是否与C ++具有等效的std :: less? [英] Does C have an equivalent of std::less from C++?

查看:97
本文介绍了C是否与C ++具有等效的std :: less?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近回答了一个问题,当pq是指向不同对象/数组的指针时,在C中执行p < q的不确定行为.这让我想到:C ++在这种情况下具有与<相同(未定义)的行为,但是还提供了标准库模板std::less,该模板可以保证在可以比较指针时返回与<相同的东西,并在无法访问时返回一些一致的顺序.

I was recently answering a question on the undefined behaviour of doing p < q in C when p and q are pointers into different objects/arrays. That got me thinking: C++ has the same (undefined) behaviour of < in this case, but also offers the standard library template std::less which is guaranteed to return the same thing as < when the pointers can be compared, and return some consistent ordering when they cannot.

C是否提供具有类似功能的东西,从而可以安全地比较任意指针(相同类型)?我尝试浏览C11标准,但没有发现任何东西,但是我在C方面的经验比在C ++中小得多,因此我很容易错过一些东西.

Does C offer something with similar functionality which would allow safely comparing arbitrary pointers (to the same type)? I tried looking through the C11 standard and didn't find anything, but my experience in C is orders of magnitude smaller than in C++, so I could have easily missed something.

推荐答案

在具有平面内存模型(基本上是所有内容)的实现中,强制转换为uintptr_t即可.

On implementations with a flat memory model (basically everything), casting to uintptr_t will Just Work.

(但请参见指针比较是否应是在64位x86上签名还是未签名?,以讨论是否应该将指针视为已签名,包括在对象(在C中为UB)之外形成指针的问题.)

(But see Should pointer comparisons be signed or unsigned in 64-bit x86? for discussion of whether you should treat pointers as signed or not, including issues of forming pointers outside of objects which is UB in C.)

但是确实存在具有非平面内存模型的系统,对它们的思考可以帮助解释当前情况,例如C ++对<std::less具有不同的规范.

But systems with non-flat memory models do exist, and thinking about them can help explain the current situation, like C++ having different specs for < vs. std::less.

<指向单独对象(在C中为UB)(或至少在某些C ++版本中未指定)的指针的部分原因是允许使用怪异的机器,包括非平面内存模型.

Part of the point of < on pointers to separate objects being UB in C (or at least unspecified in some C++ revisions) is to allow for weird machines, including non-flat memory models.

一个著名的示例是x86-16实模式,其中指针为segment:offset,通过(segment << 4) + offset形成20位线性地址.相同的线性地址可以由多个不同的seg:off组合表示.

A well-known example is x86-16 real mode where pointers are segment:offset, forming a 20-bit linear address via (segment << 4) + offset. The same linear address can be represented by multiple different seg:off combinations.

C ++ std::less怪异ISA上的指针可能需要昂贵,例如在x86-16上标准化"一个segment:offset以使offset <==15.但是,没有 portable 的方法可以实现此目的. 归一化uintptr_t(或指针对象的对象表示)所需的操作是特定于实现的.

C++ std::less on pointers on weird ISAs might need to be expensive, e.g. "normalize" a segment:offset on x86-16 to have offset <= 15. However, there's no portable way to implement this. The manipulation required to normalize a uintptr_t (or the object-representation of a pointer object) is implementation-specific.

但是,即使在C ++ std::less必须昂贵的系统上,<也不必如此.例如,假设一个大"内存模型,其中一个对象适合一个段,则<可以只比较偏移量部分,甚至不打扰段部分. (同一对象内的指针将具有相同的段,否则它在C中的UB.C++ 17更改为仅未指定",这可能仍允许跳过规范化并仅比较偏移量.)这是假定所有指向任何部分的指针的对象始终使用相同的seg值,从不规范化.这就是您期望ABI需要大"内存模型而不是大"内存模型的原因. (请参见在评论中进行讨论)

But even on systems where C++ std::less has to be expensive, < doesn't have to be. For example, assuming a "large" memory model where an object fits within one segment, < can just compare the offset part and not even bother with the segment part. (Pointers inside the same object will have the same segment, and otherwise it's UB in C. C++17 changed to merely "unspecified", which might still allow skipping normalization and just comparing offsets.) This is assuming all pointers to any part of an object always use the same seg value, never normalizing. This is what you'd expect an ABI to require for a "large" as opposed to "huge" memory model. (See discussion in comments).

(例如,这种内存模型的最大对象大小可能为64kiB,但是最大总地址空间要大得多,可以容纳许多此类最大大小的对象.ISOC允许实现对对象大小的限制为小于size_t可以表示的最大值(无符号),例如,即使在平面内存模型系统上,GNU C也会将最大对象大小限制为PTRDIFF_MAX,因此大小计算可以忽略带符号的溢出.)请参见此答案和评论中的讨论.

(Such a memory model might have a max object size of 64kiB for example, but a much larger max total address space that has room for many such max-sized objects. ISO C allows implementations to have a limit on object size that's lower than the max value (unsigned) size_t can represent, SIZE_MAX. For example even on flat memory model systems, GNU C limits max object size to PTRDIFF_MAX so size calculation can ignore signed overflow.) See this answer and discussion in comments.

如果要允许对象大于段,则需要一个巨大"的内存模型,该模型必须担心在执行p++循环遍历数组或执行索引/时指针的偏移量部分溢出.指针算术.这会导致各处代码变慢,但可能意味着p < q恰好适用于指向不同对象的指针,因为针对巨大"内存模型的实现通常会选择始终将所有指针标准化.参见什么是近,远和巨大指针?-一些真实的C语言x86实模式的编译器确实可以选择针对巨大"模型进行编译,其中除非另外声明,否则所有指针均默认为巨大".

If you want to allow objects larger than a segment, you need a "huge" memory model that has to worry about overflowing the offset part of a pointer when doing p++ to loop through an array, or when doing indexing / pointer arithmetic. This leads to slower code everywhere, but would probably mean that p < q would happen to work for pointers to different objects, because an implementation targeting a "huge" memory model would normally choose to keep all pointers normalized all the time. See What are near, far and huge pointers? - some real C compilers for x86 real mode did have an option to compile for the "huge" model where all pointers defaulted to "huge" unless declared otherwise.

x86实模式分段不是唯一可能的非固定内存模型,它只是一个有用的具体示例,用于说明C/C ++实现如何对其进行处理.在现实生活中,实现通过far相对于near指针的概念扩展了ISO C,允许程序员选择何时可以相对于某些公共数据段仅存储/传递16位偏移量部分.

x86 real-mode segmentation isn't the only non-flat memory model possible, it's merely a useful concrete example to illustrate how it's been handled by C/C++ implementations. In real life, implementations extended ISO C with the concept of far vs. near pointers, allowing programmers to choose when they can get away with just storing / passing around the 16-bit offset part, relative to some common data segment.

但是,纯ISO C实现将不得不在小型内存模型(除具有16位指针的相同64kiB中的代码之外的所有东西)之间选择,或者在所有指针均为32位的情况下选择大内存或庞大内存.某些循环可以通过仅增加偏移量部分来进行优化,但是指针对象无法进行优化以使其更小.

But a pure ISO C implementation would have to choose between a small memory model (everything except code in the same 64kiB with 16-bit pointers) or large or huge with all pointers being 32-bit. Some loops could optimize by incrementing just the offset part, but pointer objects couldn't be optimized to be smaller.

如果您知道任何给定实现的魔术操作是什么,则可以使用纯C语言实现.问题在于,不同的系统使用不同的寻址方式,并且任何可移植的宏都无法对细节进行参数设置.

If you knew what the magic manipulation was for any given implementation, you could implement it in pure C. The problem is that different systems use different addressing and the details aren't parameterized by any portable macros.

也许不是:它可能涉及从特殊细分表中查找内容,例如例如x86保护模式,而不是实模式,在实模式下,地址的段部分是索引,而不是要左移的值.您可以在保护模式下设置部分重叠的段,并且地址的段选择器部分甚至不必与相应段基地址的顺序相同.如果GDT和/或LDT未映射到您的进程中的可读页中,则在x86保护模式下从seg:off指针获取线性地址可能涉及系统调用.

Or maybe not: it might involve looking something up from a special segment table or something, e.g. like x86 protected mode instead of real mode where the segment part of the address is an index, not a value to be left shifted. You could set up partially-overlapping segments in protected mode, and the segment selector parts of addresses wouldn't necessarily even be ordered in the same order as the corresponding segment base addresses. Getting a linear address from a seg:off pointer in x86 protected mode might involve a system call, if the GDT and/or LDT aren't mapped into readable pages in your process.

(当然,x86的主流操作系统使用平面内存模型,因此段基数始终为0(使用fsgs段的线程本地存储除外),并且仅32位或64位偏移"部分用作指针.)

(Of course mainstream OSes for x86 use a flat memory model so the segment base is always 0 (except for thread-local storage using fs or gs segments), and only the 32-bit or 64-bit "offset" part is used as a pointer.)

您可以手动添加各种特定平台的代码,例如默认情况下,假设采用平坦的或#ifdef的东西来检测x86实模式,然后将uintptr_t分为seg -= off>>4; off &= 0xf;的16位半部分,然后将这些部分组合回32位数字中.

You could manually add code for various specific platforms, e.g. by default assume flat, or #ifdef something to detect x86 real mode and split uintptr_t into 16-bit halves for seg -= off>>4; off &= 0xf; then combine those parts back into a 32-bit number.

这篇关于C是否与C ++具有等效的std :: less?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆