从字符数组中区分字符串字面量 [英] Differentiate String Literal from Char Array

查看:146
本文介绍了从字符数组中区分字符串字面量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想写一个函数,它接受一个字符串字面量 - 并且只有一个字符串文字:

I want to write some function that takes a string literal - and only a string literal:

template <size_t N>
void foo(const char (&str)[N]);


$ b - 无论它是一个真正的字符串字面量。虽然不可能在编译时间 - 不必诉诸于要求调用者包装文字/数组 - 在运行时,两者之间的区别 - 两个数组将在内存中完全不同的位置:

Unfortunately, that is too expansive and will match any array of char - whether or not it's a true string literal. While it's impossible to tell the difference between these at compile-time - without having to resort to requiring the caller to wrap the literal/array - at run-time, the two arrays will be in entirely different places in memory:

foo("Hello"); // at 0x400f81

const char msg[] = {'1', '2', '3'};
foo(msg); // at 0x7fff3552767f

有一种方法可以知道字符串数据在内存中的位置4.7.3),这样我可以至少断言函数只接受一个字符串字面量?

Is there a way to know where in memory the string data could live (using gcc 4.7.3) so that I could at least assert that the function takes a string literal only?

推荐答案

你似乎假设一个真正的字符串字面量
的必要特性是编译器烘焙它进入可执行文件的静态存储。

You seem to assume that a necessary trait of a "true string literal" is that the compiler bakes it into the static storage of the executable.

这不是真的。 C和C ++标准保证我们
a字符串字面值必须有静态存储持续时间,所以它必须存在
生命周期的程序,但如果编译器可以安排这个将
的文本放在静态存储中,这是免费的,有些编译器有时会

This is not actually true. The C and C++ standards guarantee us that a string literal shall have static storage duration, so it must exist for the life of the program, but if a compiler can arrange this without placing the literal in static storage, it is free to do so, and some compilers sometimes do.

但是,你想测试,对于给定的字符串
literal,是是否实际上是在静态存储。由于不需要在静态存储中使用
,就语言标准保证而言,
不能是您的问题的任何解决方案,仅仅基于便携式C / C ++ 。

However, it's clear that the property you want to test, for a given string literal, is whether it is in fact in static storage. And since it need not be in static storage, as far as the language standards guarantee, there can't be any solution of your problem founded solely on portable C/C++.

一个给定的字符串字面值实际上是否在静态存储中是
的问题,字符串字面量的地址是否在
在您的特定工具链的命名中,分配给符合
静态存储器的链接部分的地址范围,而
您的程序由该工具链构建。

Whether a given string literal is in fact in static storage is the question of whether the address of the string literal lies within one of the address ranges that get assigned to linkage sections that qualify as static storage, in the nomenclature of your particular toolchain, when your program is built by that toolchain.

因此,我建议的解决方案是,您使您的程序能够知道
地址范围的自己的链接部分的资格为
静态存储,然后可以通过明显的代码测试给定的字符串
是否在静态存储中。

So the solution I suggest is that you enable your program to know the address ranges of those of its own linkage sections that qualify as static storage, and then it can test whether a given string literal is in static storage by obvious code.

这里是一个玩具C + +项目的解决方案的例子, prog
用GNU / Linux x86_64工具链构建(C ++ 98或更好的做,
方法对C)来说只是稍微有点费力。在这个设置中,我们以ELF
格式链接,以及我们认为静态存储的连接部分 .bss (0-初始化静态数据), .rodata
(只读静态静态)和 .data 静态数据)。

Here is an illustration of this solution for a toy C++ project, prog built with the GNU/Linux x86_64 toolchain (C++98 or better will do, and the approach is only slightly more fiddly for C). In this setting, we link in ELF format, and the linkage sections we will deem static storage are .bss (0-initialized static data), .rodata (read-only static static) and .data (read/write static data).

以下是我们的源文件:

section_bounds.h

#ifndef SECTION_BOUNDS_H
#define SECTION_BOUNDS_H
// Export delimiting values for our `.bss`, `.rodata` and `.data` sections
extern unsigned long const section_bss_start;
extern unsigned long const section_bss_size;
extern unsigned long const section_bss_end;
extern unsigned long const section_rodata_start;
extern unsigned long const section_rodata_size;
extern unsigned long const section_rodata_end;
extern unsigned long const section_data_start;
extern unsigned long const section_data_size;
extern unsigned long const section_data_end;
#endif

section_bounds.cpp b
$ b

section_bounds.cpp

// Assign either placeholder or pre-defined values to 
// the section delimiting globals.
#ifndef BSS_START
#define BSS_START 0x0
#endif
#ifndef BSS_SIZE
#define BSS_SIZE 0xffff
#endif
#ifndef RODATA_START
#define RODATA_START 0x0
#endif
#ifndef RODATA_SIZE
#define RODATA_SIZE 0xffff
#endif
#ifndef DATA_START
#define DATA_START 0x0
#endif
#ifndef DATA_SIZE
#define DATA_SIZE 0xffff
#endif
extern unsigned long const 
    section_bss_start = BSS_START;
extern unsigned long const section_bss_size = BSS_SIZE;
extern unsigned long const 
    section_bss_end = section_bss_start + section_bss_size;
extern unsigned long const 
    section_rodata_start = RODATA_START;
extern unsigned long const 
    section_rodata_size = RODATA_SIZE;
extern unsigned long const 
    section_rodata_end = section_rodata_start + section_rodata_size;
extern unsigned long const 
    section_data_start = DATA_START;
extern unsigned long const 
    section_data_size = DATA_SIZE;
extern unsigned long const 
    section_data_end = section_data_start + section_data_size;

cstr_storage_triage.h

#ifndef CSTR_STORAGE_TRIAGE_H
#define CSTR_STORAGE_TRIAGE_H

// Classify the storage type addressed by `s` and print it on `cout`
extern void cstr_storage_triage(const char *s);

#endif

cstr_storage_triage.cpp / p>

cstr_storage_triage.cpp

#include "cstr_storage_triage.h"
#include "section_bounds.h"
#include <iostream>

using namespace std;

void cstr_storage_triage(const char *s)
{
    unsigned long addr = (unsigned long)s;
    cout << "When s = " << (void*)s << " -> \"" << s << '\"' << endl;
    if (addr >= section_bss_start && addr < section_bss_end) {
        cout << "then s is in static 0-initialized data\n";
    } else if (addr >= section_rodata_start && addr < section_rodata_end) {
        cout << "then s is in static read-only data\n";     
    } else if (addr >= section_data_start && addr < section_data_end){
        cout << "then s is in static read/write data\n";
    } else {
        cout << "then s is on the stack/heap\n";
    }       
}

main.cpp / p>

main.cpp

// Demonstrate storage classification of various arrays of char 

#include "cstr_storage_triage.h"

static char in_bss[1];
static char const * in_rodata = "In static read-only data";
static char in_rwdata[] = "In static read/write data";  

int main()
{
    char on_stack[] = "On stack";
    cstr_storage_triage(in_bss);
    cstr_storage_triage(in_rodata);
    cstr_storage_triage(in_rwdata);
    cstr_storage_triage(on_stack);
    cstr_storage_triage("Where am I?");
    return 0;
}

这是我们的makefile:

Here is our makefile:

.PHONY: all clean

SRCS = main.cpp cstr_storage_triage.cpp section_bounds.cpp 
OBJS = $(SRCS:.cpp=.o)
TARG = prog
MAP_FILE = $(TARG).map

ifdef AGAIN
BSS_BOUNDS := $(shell grep -m 1 '^\.bss ' $(MAP_FILE))
BSS_START := $(word 2,$(BSS_BOUNDS))
BSS_SIZE := $(word 3,$(BSS_BOUNDS))
RODATA_BOUNDS := $(shell grep -m 1 '^\.rodata ' $(MAP_FILE))
RODATA_START := $(word 2,$(RODATA_BOUNDS))
RODATA_SIZE := $(word 3,$(RODATA_BOUNDS))
DATA_BOUNDS := $(shell grep -m 1 '^\.data ' $(MAP_FILE))
DATA_START := $(word 2,$(DATA_BOUNDS))
DATA_SIZE := $(word 3,$(DATA_BOUNDS))
CPPFLAGS += \
    -DBSS_START=$(BSS_START) \
    -DBSS_SIZE=$(BSS_SIZE) \
    -DRODATA_START=$(RODATA_START) \
    -DRODATA_SIZE=$(RODATA_SIZE) \
    -DDATA_START=$(DATA_START) \
    -DDATA_SIZE=$(DATA_SIZE)
endif

all: $(TARG)

clean:
    rm -f $(OBJS) $(MAP_FILE) $(TARG)

ifndef AGAIN
$(MAP_FILE): $(OBJS)
    g++ -o $(TARG) $(CXXFLAGS) -Wl,-Map=$@ $(OBJS) $(LDLIBS)
    touch section_bounds.cpp

$(TARG): $(MAP_FILE)
    $(MAKE) AGAIN=1
else
$(TARG): $(OBJS)
    g++ -o $@ $(CXXFLAGS) $(OBJS) $(LDLIBS)
endif

$ b b

这里是 make 看起来像:

Here is what make looks like:

$ make
g++    -c -o main.o main.cpp
g++    -c -o cstr_storage_triage.o cstr_storage_triage.cpp
g++    -c -o section_bounds.o section_bounds.cpp
g++ -o prog  -Wl,-Map=prog.map main.o cstr_storage_triage.o section_bounds.o 
touch section_bounds.cpp
make AGAIN=1
make[1]: Entering directory `/home/imk/develop/SO/string_lit_only'
g++  -DBSS_START=0x00000000006020c0 -DBSS_SIZE=0x118 -DRODATA_START=0x0000000000400bf0
 -DRODATA_SIZE=0x120 -DDATA_START=0x0000000000602070 -DDATA_SIZE=0x3a
  -c -o section_bounds.o section_bounds.cpp
g++ -o prog  main.o cstr_storage_triage.o section_bounds.o

最后, prog 会:

$ ./prog
When s = 0x6021d1 -> ""
then s is in static 0-initialized data
When s = 0x400bf4 -> "In static read-only data"
then s is in static read-only data
When s = 0x602090 -> "In static read/write data"
then s is in static read/write data
When s = 0x7fffa1b053a0 -> "On stack"
then s is on the stack/heap
When s = 0x400c0d -> "Where am I?"
then s is in static read-only data

如果这是明显的工作原理,

If it's obvious how this works, you need read no further.

即使在我们知道其静态存储部分的地址和
大小之前,程序仍将编译和链接。它会需要太多,不会!在
的情况下,应该保存这些值的全局段_ * 变量
都使用占位符值构建。

The program will compile and link even before we know the addresses and sizes of its static storage sections. It would need too, wouldn't it!? In that case, the global section_* variables that ought to hold these values all get built with place-holder values.

当运行 make 时,配方:

$(TARG): $(MAP_FILE)
    $(MAKE) AGAIN=1


$ b b

and

$(MAP_FILE): $(OBJS)
    g++ -o $(TARG) $(CXXFLAGS) -Wl,-Map=$@ $(OBJS) $(LDLIBS)
    touch section_bounds.cpp


b $ b

有效,因为 AGAIN 未定义。他们告诉 make ,为了
构建 prog ,它必须首先构建链接器映射文件 prog ,根据
第二个配方,然后重新时间戳 section_bounds.cpp 。之后,
make 是再次调用自己, AGAIN defined = 1。

are operative, because AGAIN is undefined. They tell make that in order to build prog it must first build the linker map file of prog, as per the second recipe, and then re-timestamp section_bounds.cpp. After that, make is to call itself again, with AGAIN defined = 1.

再次使用 AGAIN 定义, make 再次使用makefile会发现it
必须计算所有变量:

Excecuting the makefile again, with AGAIN defined, make now finds that it must compute all the variables:

BSS_BOUNDS
BSS_START
BSS_SIZE
RODATA_BOUNDS
RODATA_START
RODATA_SIZE
DATA_BOUNDS
DATA_START
DATA_SIZE

对于每个静态存储段 S ,它通过grepping
计算 S_BOUNDS 链接器映射文件,用于报告 S 的地址和大小的行。
从该行,它将第二个字(=节地址)分配给 S_START
和第三个字(=节的大小)到 S_SIZE 。然后通过 -D 选项将所有
分隔值附加到 CPPFLAGS

For each static storage section S, it computes S_BOUNDS by grepping the linker map file for the line that reports the address and size of S. From that line, it assigns the 2nd word ( = the section address) to S_START, and the 3rd word ( = the size of the section) to S_SIZE. All the section delimiting values are then appended, via -D options to the CPPFLAGS that will automatically be passed to compilations.

由于 AGAIN 被定义, $(TARG)现在是惯例:

Because AGAIN is defined, the operative recipe for $(TARG) is now the customary:

$(TARG): $(OBJS)
    g++ -o $@ $(CXXFLAGS) $(OBJS) $(LDLIBS)



<但我们触摸了父 make 中的 section_bounds.cpp 所以它必须
重新编译,因此 prog 必须重新链接。这一次,当编译
section_bounds.cpp 时,所有区分界符宏:

But we touched section_bounds.cpp in the parent make; so it has to be recompiled, and therefore prog has to be relinked. This time, when section_bounds.cpp is compiled, all the section-delimiting macros:

BSS_START
BSS_SIZE
RODATA_START
RODATA_SIZE
DATA_START
DATA_SIZE

将具有预定义的值,不会假定其占位符值。

will have pre-defined values and will not assume their place-holder values.

这些预定义值将正确,因为第二个链接
不向链接添加符号,并且不删除任何符号,并且不会更改任何符号的
大小或存储类。它只是为第一个链接中存在的
符号分配不同的值。因此,
地址和静态存储段的大小将保持不变,现在已为您的程序所知。

And those predefined values will be correct because the second linkage adds no symbols to the linkage and removes none, and does not alter the size or storage class of any symbol. It just assigns different values to symbols that were present in the first linkage. Consequently, the addresses and sizes of the static storage sections will be unaltered and are now known to your program.

这篇关于从字符数组中区分字符串字面量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆