我什么时候应该只使用“int"?与更多特定于符号或特定于大小的类型? [英] When should I just use "int" versus more sign-specific or size-specific types?

查看:28
本文介绍了我什么时候应该只使用“int"?与更多特定于符号或特定于大小的类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个用于编程语言的小虚拟机用 C 实现.它支持在 32-bit 和 64 位架构以及 C 和 C++.

I have a little VM for a programming language implemented in C. It supports being compiled under both 32-bit and 64-bit architectures as well as both C and C++.

我正在尝试在启用尽可能多的警告的情况下使其编译干净.当我打开 CLANG_WARN_IMPLICIT_SIGN_CONVERSION 时,我收到一连串新警告.

I'm trying to make it compile cleanly with as many warnings enabled as possible. When I turn on CLANG_WARN_IMPLICIT_SIGN_CONVERSION, I get a cascade of new warnings.

对于何时使用 int 与显式无符号类型和/或显式大小的类型,我想有一个很好的策略.到目前为止,我无法决定该策略应该是什么.

I'd like to have a good strategy for when to use int versus either explicitly unsigned types, and/or explicitly sized ones. So far, I'm having trouble deciding what that strategy should be.

将它们混合使用——主要使用 int 表示局部变量和参数,使用更窄的类型来表示结构中的字段,这无疑会导致许多隐式转换问题.

It's certainly true that mixing them—using mostly int for things like local variables and parameters and using narrower types for fields in structs—causes lots of implicit conversion problems.

我确实喜欢为结构字段使用更具体大小的类型,因为我喜欢显式控制堆中对象的内存使用的想法.另外,对于散列表,我在散列时依赖无符号溢出,所以如果散列表的大小存储为 uint32_t,那就太好了.

I do like using more specifically sized types for struct fields because I like the idea of explicitly controlling memory usage for objects in the heap. Also, for hash tables, I rely on unsigned overflow when hashing, so it's nice if the hash table's size is stored as uint32_t.

但是,如果我尝试在任何地方使用更具体的类型,我会发现自己陷入了无处不在的扭曲转换的迷宫中.

But, if I try to use more specific types everywhere, I find myself in a maze of twisty casts everywhere.

其他 C 项目是做什么的?

What do other C projects do?

推荐答案

在任何地方都使用 int 似乎很诱人,因为它最大限度地减少了强制转换的需要,但是您应该注意几个潜在的陷阱的:

Just using int everywhere may seem tempting, since it minimizes the need for casting, but there are several potential pitfalls you should be aware of:

  • int 可能比您预期的要短.尽管在大多数桌面平台上,int 通常是 32 位, C 标准仅保证 16 位的最小长度.您的代码是否需要大于 216−1 = 32,767 的数字,即使是临时值?如果是这样,请不要使用 int.(您可能希望使用 long 代替;long 保证至少为 32 位.)

  • An int might be shorter than you expect. Even though, on most desktop platforms, an int is typically 32 bits, the C standard only guarantees a minimum length of 16 bits. Could your code ever need numbers larger than 216−1 = 32,767, even for temporary values? If so, don't use an int. (You may want to use a long instead; a long is guaranteed to be at least 32 bits.)

即使是 long 也不一定总是足够长. 特别是,不能保证数组(或字符串,是一个 char 数组)适合一个 long.使用 size_t(或 ptrdiff_t,如果您需要有符号差异).

Even a long might not always be long enough. In particular, there is no guarantee that the length of an array (or of a string, which is a char array) fits in a long. Use size_t (or ptrdiff_t, if you need a signed difference) for those.

特别是,a size_t 被定义为大足以保存任何有效的数组索引,而 int 甚至 long 可能不是.因此,例如,当迭代一个数组时,您的循环计数器(及其初始/最终值)通常应该是 size_t,至少除非您确定该数组足够短较小的类型来工作.(但在向后迭代时要小心:size_t 是无符号的,所以 for(size_t i = n-1; i >= 0; i--) 是一个无限循环! 使用 i != SIZE_MAXi != (size_t) -1 应该可以工作;或者使用 do/while 循环,但要注意 n == 0!)

In particular, a size_t is defined to be large enough to hold any valid array index, whereas an int or even a long might not be. Thus, for example, when iterating over an array, your loop counter (and its initial / final values) should generally be a size_t, at least unless you know for sure that the array is short enough for a smaller type to work. (But be careful when iterating backwards: size_t is unsigned, so for(size_t i = n-1; i >= 0; i--) is an infinite loop! Using i != SIZE_MAX or i != (size_t) -1 should work, though; or use a do/while loop, but beware of the case n == 0!)

int 已签名. 特别是,这意味着 int 溢出是未定义的行为. 如果您的值可能合法存在任何风险溢出,不要使用 int;改用unsigned int(或unsigned long,或uintNN_t).

An int is signed. In particular, this means that int overflow is undefined behavior. If there's ever any risk that your values might legitimately overflow, don't use an int; use an unsigned int (or an unsigned long, or uintNN_t) instead.

有时,您只需要一个固定的位长.如果您要与 ABI 接口或读取/写入需要特定长度整数的文件格式,那么就是您需要使用的长度.(当然,在这种情况下,您可能还需要担心字节顺序之类的事情,因此有时可能不得不求助于手动逐字节地打包数据.)

Sometimes, you just need a fixed bit length. If you're interfacing with an ABI, or reading / writing a file format, that requires integers of a specific length, then that's the length you need to use. (Of course, is such situations, you may also need to worry about things like endianness, and so may sometimes have to resort to manually packing data byte-by-byte anyway.)

综上所述,避免一直使用定长类型也是有原因的:不仅 int32_t 一直难以输入,而且迫使编译器始终使用 32-位整数并不总是最佳的,特别是在原生 int 大小可能是 64 位的平台上.你可以使用,比如说,C99 int_fast32_t,但这更难打字.

All that said, there are also reasons to avoid using the fixed-length types all the time: not only is int32_t awkward to type all the time, but forcing the compiler to always use 32-bit integers is not always optimal, particularly on platforms where the native int size might be, say, 64 bits. You could use, say, C99 int_fast32_t, but that's even more awkward to type.

因此,以下是我个人对最大安全性和便携性的建议:

Thus, here are my personal suggestions for maximum safety and portability:

  • 在通用头文件中定义您自己的整数类型以供随意使用,如下所示:

#include <limits.h>
typedef int i16;
typedef unsigned int u16;
#if UINT_MAX >= 4294967295U
  typedef int i32;
  typedef unsigned int u32;
#else
  typedef long i32;
  typedef unsigned long i32;
#endif

将这些类型用于类型的确切大小无关紧要的任何内容,只要它们足够大即可.我建议的类型名称既简短又具有自我说明性,因此它们应该易于在需要时用于强制转换,并最大限度地减少由于使用太窄的类型而导致的错误风险.

Use these types for anything where the exact size of the type doesn't matter, as long as they're big enough. The type names I've suggested are both short and self-documenting, so they should be easy to use in casts where needed, and minimize the risk of errors due to using a too-narrow type.

方便的是,上面定义的 u32u16 类型保证至少与 unsigned int 一样宽,因此可以安全使用而不必担心它们被提升为int 并导致未定义的溢出行为.

Conveniently, the u32 and u16 types defined as above are guaranteed to be at least as wide as unsigned int, and thus can be used safely without having to worry about them being promoted to int and causing undefined overflow behavior.

size_t 用于所有数组大小和索引,但在它与任何其他整数类型之间进行转换时要小心.或者,如果您不喜欢输入这么多下划线,typedef 也可以为它提供一个更方便的别名.

Use size_t for all array sizes and indexing, but be careful when casting between it and any other integer types. Optionally, if you don't like to type so many underscores, typedef a more convenient alias for it too.

对于假设在特定位数溢出的计算,使用 uintNN_t,或仅使用 u16/u32 如上所述,并使用 & 进行显式位掩码.如果您选择使用 uintNN_t,请确保防止意外升级到 int;一种方法是使用如下宏:

For calculations that assume overflow at a specific number of bits, either use uintNN_t, or just use u16 / u32 as defined above and explicit bitmasking with &. If you choose to use uintNN_t, make sure to protect yourself against unexpected promotion to int; one way to do that is with a macro like:

#define u(x) (0U + (x))

这应该让你安全地编写,例如:

which should let you safely write e.g.:

uint32_t a = foo(), b = bar();
uint32_t c = u(a) * u(b);  /* this is always unsigned multiply */

  • 对于需要特定整数长度的外部 ABI,再次定义特定类型,例如:

  • For external ABIs that require a specific integer length, again define a specific type, e.g.:

    typedef int32_t fooint32;  /* foo ABI needs 32-bit ints */
    

    同样,这个类型名称在大小和用途方面都是自记录的.

    Again, this type name is self-documenting, with regard to both its size and its purpose.

    如果 ABI 可能实际上需要,例如,16 位或 64 位整数,则取决于平台和/或编译时选项,您可以更改类型定义以匹配(并将类型重命名为 fooint) —但是,无论何时向该类型或从该类型强制转换任何内容时,您确实都需要小心,因为它可能会意外溢出.

    If the ABI might actually require, say, 16- or 64-bit ints instead, depending on the platform and/or compile-time options, you can change the type definition to match (and rename the type to just fooint) — but then you really do need to be careful whenever you cast anything to or from that type, because it might overflow unexpectedly.

    如果您的代码有自己的结构或需要特定位长的文件格式,请考虑为这些定义自定义类型,就像它是外部 ABI 一样.或者你可以只使用 uintNN_t 来代替,但这样你会失去一点自我文档.

    If your code has its own structures or file formats that require specific bitlengths, consider defining custom types for those too, exactly as if it was an external ABI. Or you could just use uintNN_t instead, but you'll lose a little bit of self-documentation that way.

    对于所有这些类型,不要忘记还定义相应的 _MIN_MAX 常量以便于边界检查.这听起来可能需要大量工作,但实际上只是单个头文件中的几行.

    For all these types, don't forget to also define the corresponding _MIN and _MAX constants for easy bounds checking. This might sound like a lot of work, but it's really just a couple of lines in a single header file.

    最后,记住要小心整数数学,尤其是溢出.例如,请记住两个 n 位有符号整数的差值可能不适合 n 位整数.(它将适合 nunsigned int,如果您知道它是非负的;但请记住,您需要将输入转换为无符号类型 之前 采取他们的差异以避免未定义的行为!)类似地,要找到两个整数的平均值(例如用于二分查找),不要使用 avg = (lo + hi)/2,而是使用例如avg = lo + (hi + 0U - lo)/2;如果总和溢出,前者将中断.

    Finally, remember to be careful with integer math, especially overflows. For example, keep in mind that the difference of two n-bit signed integers may not fit in an n-bit int. (It will fit into an n-bit unsigned int, if you know it's non-negative; but remember that you need to cast the inputs to an unsigned type before taking their difference to avoid undefined behavior!) Similarly, to find the average of two integers (e.g. for a binary search), don't use avg = (lo + hi) / 2, but rather e.g. avg = lo + (hi + 0U - lo) / 2; the former will break if the sum overflows.

    这篇关于我什么时候应该只使用“int"?与更多特定于符号或特定于大小的类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆