C ++字符串面试问题 [英] C++ String Interview Question
问题描述
我最近在一个C ++技术面试中,我被给了一些简单的字符串操作代码,这是为了接受一个字符串并返回一个由第一个和最后一个n字符组成的字符串,然后继续纠正任何错误,并使功能尽可能高效,我想出了下面的解决方案,但访问者声称有一个更快更好的方法:
原始代码:
std :: string first_last_n(int n,std :: string s)
{
std :: string first_n = s.substr(0,n);
std :: string last_n = s.substr(s.size() - n-1,n);
return first_n + last_n;
}
我的代码:
bool first_last_n(const std :: size_t& n,const std :: string& s,std :: string& r)
{
if size()< n)
return false;
r.reserve(2 * n);
r.resize(0);
r.append(s.data(),s.data()+ n);
r.append(s.data()+(s.size() - n),s.data()+ s.size());
return true;
}
我的更改摘要:
-
更改接口以接受返回字符串作为引用(假设RVO和rvalues尚未可用)
- p>
-
修复了last_n字符串中的off-by-1错误
-
将每个字符触及一次的次数减少了一次,或两次(在重叠情况下)
-
在字符串的大小小于n的情况下检查,返回false失败。
假设只允许使用本地C ++,还有其他方法可以更有效或更好地执行上述操作吗? p>
注意1:原始输入字符串实例不可修改。
b>注意2:所有解决方案必须通过以下测试用例,否则它们无效。
void test ()
{
{
std :: string s =0123456789;
std :: string r = first_last_n(10,s);
assert(r ==01234567890123456789);
}
{
std :: string s =0123456789ABC0123456789;
std :: string r = first_last_n(10,s);
assert(r ==01234567890123456789);
}
{
std :: string s =1234321;
std :: string r = first_last_n(5,s);
assert(r ==1234334321);
}
}
此实现应该是快速的:
inline std :: string first_last_n(std :: string :: size_type n,const std :: string& s)
{
n = std :: min(n,s.size());
std :: string ret;
ret.reserve(2 * n);
ret.append(s.begin(),s.begin()+ n);
ret.append(s.end() - n,s.end());
return ret;
}
当使用GNU libstdc ++时,声明& initializes ret
非常快,因为libstdc ++使用全局空字符串变量。因此,它只是一个指针副本。调用开始
和结束在
s
它们将解析为 begin
和 end
, begin()const
和
end()const
,因此 s
的内部表示不是leaked。使用libstdc ++, std :: string :: const_iterator
是 const char *
,它是一个指针类型和随机访问迭代器。因此,当 std :: string :: append< const char *>(const char *,const char *)
调用 std :: distance
以获取输入范围的长度,是指针差分运算。另外, std :: string :: append< const char *>(const char *,const char *)
会导致类似 memmove
。最后, reserve
操作确保有足够的内存可用于返回值。
EDIT: / strong>
对于好奇,这里是MinGW g ++ 4.5.0的汇编输出中 ret
的初始化:
movl $ __ ZNSs4_Rep20_S_empty_rep_storageE + 12,(%ebx)
$ b b
它只是将指针复制到全局的空表示。
EDIT2:
好吧。我现在测试了四个变体与g ++ 4.5.0和Visual C + + 16.00.30319.01:
变体1(c_str变体):
inline std :: string first_last_n(std :: string :: size_type n,const std :: string& s)
{
std :: string :: size_type s_size = s.size();
n = std :: min(n,s_size);
std :: string ret;
ret.reserve(2 * n);
const char * s_cStr = s.c_str(),* s_cStr_end = s_cStr + s_size;
ret.append(s_cStr,s_cStr + n);
ret.append(s_cStr_end - n,s_cStr_end);
return ret;
}
变体2(数据字符串变体):
inline std :: string first_last_n(std :: string :: size_type n,const std :: string& s)
{
std :: string :: size_type s_size = s.size();
n = std :: min(n,s_size);
std :: string ret;
ret.reserve(2 * n);
const char * s_data = s.data(),* s_data_end = s_data + s_size;
ret.append(s_data,s_data + n);
ret.append(s_data_end - n,s_data_end);
return ret;
}
变式3:
inline std :: string first_last_n(std :: string :: size_type n,const std :: string& s)
{
std :: string :: size_type s_size = s.size();
n = std :: min(n,s_size);
std :: string ret(s);
std :: string :: size_type d = s_size - n;
return ret.replace(n,d,s,d,n);
}
变体4(我的原始代码):
内联std :: string first_last_n(std :: string :: size_type n,const std :: string& s)
{
n = std :: min(n,s.size());
std :: string ret;
ret.reserve(2 * n);
ret.append(s.begin(),s.begin()+ n);
ret.append(s.end() - n,s.end());
return ret;
}
g ++ 4.5.0的结果是:
- 变体4是最快的
- 变体3是第二个(比变体4慢5%)
- 变体1是第三个(比变体3慢2%)
- 变体2是第四个(比变体1慢0.2%)
VC ++ 16.00.30319.01的结果是:
- 变体1是最快的
-
- 变体2是第二个变体(比变体1慢3%)
- 变体4是第三个$ b
- 变体3是第四种(比变体4慢17%)
不出意料,编译器。但是,不知道使用哪个编译器我认为我的变体是最好的,因为它是一个熟悉的风格的C ++,它是使用g ++的最快,并且它没有比使用VC ++变体1或2慢得多。 / p>
VC ++结果有趣的一点是,使用 c_str
而不是 data
更快。也许这就是为什么你的访问者说你的实现方式比实现更快。
EDIT3:
其实,我只是想到另一个变种:
变式5:
inline std :: string first_last_n(std :: string :: size_type n,const std :: string& s)
{
n = std :: min(n,尺寸());
std :: string ret;
ret.reserve(2 * n);
std :: string :: const_iterator s_begin = s.begin(),s_end = s.end();
ret.append(s_begin,s_begin + n)
ret.append(s_end - n,s_end);
return ret;
}
就像变体4,除了<$ c的开始和结束迭代器$ c> s 保存。
当测试变体5时,使用VC ++时,它实际上是变异2 :
- 变体1是最快的
- 变体5是第二个)
- 变体2是第三种(比变体5慢1.4%)
- 变体4是第三种>
- 变体3是第四种(比变体4慢17%)
I was recently in a C++ technical interview, where I was given a bit of simple string manipulation code, which is intended to take a string and return a string that is comprised of the first and last n-characters, and then proceed to correct any bugs and to also make the function as efficient as possible, I came up with the solution below, however the interviewer claimed there was an even faster more optimal way:
Original code:
std::string first_last_n(int n, std::string s)
{
std::string first_n = s.substr(0,n);
std::string last_n = s.substr(s.size()-n-1,n);
return first_n + last_n;
}
My code:
bool first_last_n(const std::size_t& n, const std::string& s, std::string& r)
{
if (s.size() < n)
return false;
r.reserve(2 * n);
r.resize(0);
r.append(s.data(),s.data() + n);
r.append(s.data() + (s.size() - n), s.data() + s.size());
return true;
}
Summary of my changes:
Changed the interface to take a return string as reference (assuming RVO and rvalues are not available yet)
Removed temporary strings being constructed via substr
Passed input string as a const reference inorder to get around temporary instantiation of input
Fixed off-by-1 error in last_n string
Reduced the number of times each character is touched down to once, or twice (in the event of an overlapping scenario)
Placed a check in the event the string s's size is less than n, returning false for failure.
Assuming only native C++ is allowed, is there some other way of doing the above more efficiently or optimally?
Note 1: The original input string instance is not to be modified.
Note 2: All solutions must pass the following test case, otherwise they are not valid.
void test()
{
{
std::string s = "0123456789";
std::string r = first_last_n(10,s);
assert(r == "01234567890123456789");
}
{
std::string s = "0123456789ABC0123456789";
std::string r = first_last_n(10,s);
assert(r == "01234567890123456789");
}
{
std::string s = "1234321";
std::string r = first_last_n(5,s);
assert(r == "1234334321");
}
}
This implementation should be fast:
inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
n = std::min(n, s.size());
std::string ret;
ret.reserve(2*n);
ret.append(s.begin(), s.begin() + n);
ret.append(s.end() - n, s.end());
return ret;
}
It passes all three unit tests.
When using GNU libstdc++, the line that declares & initializes ret
is extremely fast because libstdc++ uses a global "empty string" variable. Thus, it's simply a pointer copy. Calls to begin
and end
on s
are also fast because they will resolve to the const versions of begin
and end
, begin() const
and end() const
, so the internal representation of s
is not "leaked". With libstdc++, std::string::const_iterator
is const char*
, which is a pointer type and random access iterator. Thus, when std::string::append<const char*>(const char*, const char*)
calls std::distance
to obtain the length of the input range, it is a pointer difference operation. Also, std::string::append<const char*>(const char*, const char*)
results in something like a memmove
. Finally, the reserve
operation ensures that enough memory is available for the return value.
EDIT:
For the curious, here is the initialization of ret
in the assembly output of MinGW g++ 4.5.0:
movl $__ZNSs4_Rep20_S_empty_rep_storageE+12, (%ebx)
It's simply copying the pointer to the global "empty representation".
EDIT2: Okay. I have now tested four variants with g++ 4.5.0 and Visual C++ 16.00.30319.01:
Variant 1 (the "c_str variant"):
inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
std::string::size_type s_size = s.size();
n = std::min(n, s_size);
std::string ret;
ret.reserve(2*n);
const char *s_cStr = s.c_str(), *s_cStr_end = s_cStr + s_size;
ret.append(s_cStr, s_cStr + n);
ret.append(s_cStr_end - n, s_cStr_end);
return ret;
}
Variant 2 (the "data string" variant):
inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
std::string::size_type s_size = s.size();
n = std::min(n, s_size);
std::string ret;
ret.reserve(2*n);
const char *s_data = s.data(), *s_data_end = s_data + s_size;
ret.append(s_data, s_data + n);
ret.append(s_data_end - n, s_data_end);
return ret;
}
Variant 3:
inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
std::string::size_type s_size = s.size();
n = std::min(n, s_size);
std::string ret(s);
std::string::size_type d = s_size - n;
return ret.replace(n, d, s, d, n);
}
Variant 4 (my original code):
inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
n = std::min(n, s.size());
std::string ret;
ret.reserve(2*n);
ret.append(s.begin(), s.begin() + n);
ret.append(s.end() - n, s.end());
return ret;
}
The results for g++ 4.5.0 are:
- Variant 4 is the fastest
- Variant 3 is second (5% slower than variant 4)
- Variant 1 is third (2% slower than variant 3)
- Variant 2 is fourth (0.2% slower than variant 1)
The results for VC++ 16.00.30319.01 are:
- Variant 1 is the fastest
- Variant 2 is second (3% slower than variant 1)
- Variant 4 is third (4% slower than variant 2)
- Variant 3 is fourth (17% slower than variant 4)
Unsurprisingly, the variant that is fastest depends on the compiler. However, not knowing which compiler will be used I think that my variant is best because it is a familiar style of C++, it is the fastest when using g++, and it is not that much slower than variants 1 or 2 when using VC++.
One thing interesting from the VC++ results is that using c_str
rather than data
is faster. Perhaps that is why your interviewer said that there is a faster way than your implementation.
EDIT3:
Actually, I just thought about another variant:
Variant 5:
inline std::string first_last_n(std::string::size_type n, const std::string& s)
{
n = std::min(n, s.size());
std::string ret;
ret.reserve(2*n);
std::string::const_iterator s_begin = s.begin(), s_end = s.end();
ret.append(s_begin, s_begin + n);
ret.append(s_end - n, s_end);
return ret;
}
It's just like variant 4 except that the begin and end iterators for s
are saved.
When variant 5 is tested, it actually beats out variant 2 (the data string variant) when using VC++:
- Variant 1 is the fastest
- Variant 5 is second (1.6% slower than variant 1)
- Variant 2 is third (1.4% slower than variant 5)
- Variant 4 is third (4% slower than variant 2)
- Variant 3 is fourth (17% slower than variant 4)
这篇关于C ++字符串面试问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!