从解缠的符号中提取类 [英] Extracting class from demangled symbol
问题描述
我正在尝试使用boost::regex
从nm的已解散符号输出中提取(完整)类名.
这个示例程序
I'm trying to extract the (full) class names from demangled symbol output of nm using boost::regex
.
This sample program
#include <vector>
namespace Ns1
{
namespace Ns2
{
template<typename T, class Cont>
class A
{
public:
A() {}
~A() {}
void foo(const Cont& c) {}
void bar(const A<T,Cont>& x) {}
private:
Cont cont;
};
}
}
int main()
{
Ns1::Ns2::A<int,std::vector<int> > a;
Ns1::Ns2::A<int,std::vector<int> > b;
std::vector<int> v;
a.foo(v);
a.bar(b);
}
将为A类产生以下符号
Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >::A()
Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >::bar(Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > > const&)
Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >::foo(std::vector<int, std::allocator<int> > const&)
Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >::~A()
我想最好使用单个正则表达式模式提取类(实例)名称Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >
,但是我在解析<>
对中递归出现的类说明符时遇到问题.
I want to extract the class (instance) name Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >
preferably using a single regular expression pattern, but I have problems to parse the recursively occuring class specifiers within the <>
pairs.
有人知道如何使用正则表达式模式(boost::regex
支持)吗?
Does anyone know how to do this with a regular expression pattern (that's supported by boost::regex
)?
我的解决方案(基于 David Hammen 的回答,因此被接受):
My solution (based on David Hammen's answer, thus the accept):
我不使用(单个)正则表达式来提取类和名称空间符号.我创建了一个简单的函数,用于从符号字符串的尾部去除括号中的字符对(例如<>
或()
):
I don't use (single) regular expressions to extract class and namespace symbols. I have created a simple function that strips off bracketing character pairs (e.g. <>
or ()
) from the tail of symbol strings:
std::string stripBracketPair(char openingBracket,char closingBracket,const std::string& symbol, std::string& strippedPart)
{
std::string result = symbol;
if(!result.empty() &&
result[result.length() -1] == closingBracket)
{
size_t openPos = result.find_first_of(openingBracket);
if(openPos != std::string::npos)
{
strippedPart = result.substr(openPos);
result = result.substr(0,openPos);
}
}
return result;
}
这在其他两种从符号中提取名称空间/类的方法中使用:
This is used in two other methods that extract the namespace / class from the symbol:
std::string extractNamespace(const std::string& symbol)
{
std::string ns;
std::string strippedPart;
std::string cls = extractClass(symbol);
if(!cls.empty())
{
cls = stripBracketPair('<','>',cls,strippedPart);
std::vector<std::string> classPathParts;
boost::split(classPathParts,cls,boost::is_any_of("::"),boost::token_compress_on);
ns = buildNamespaceFromSymbolPath(classPathParts);
}
else
{
// Assume this symbol is a namespace global function/variable
std::string globalSymbolName = stripBracketPair('(',')',symbol,strippedPart);
globalSymbolName = stripBracketPair('<','>',globalSymbolName,strippedPart);
std::vector<std::string> symbolPathParts;
boost::split(symbolPathParts,globalSymbolName,boost::is_any_of("::"),boost::token_compress_on);
ns = buildNamespaceFromSymbolPath(symbolPathParts);
std::vector<std::string> wsSplitted;
boost::split(wsSplitted,ns,boost::is_any_of(" \t"),boost::token_compress_on);
if(wsSplitted.size() > 1)
{
ns = wsSplitted[wsSplitted.size() - 1];
}
}
if(isClass(ns))
{
ns = "";
}
return ns;
}
std::string extractClass(const std::string& symbol)
{
std::string cls;
std::string strippedPart;
std::string fullSymbol = symbol;
boost::trim(fullSymbol);
fullSymbol = stripBracketPair('(',')',symbol,strippedPart);
fullSymbol = stripBracketPair('<','>',fullSymbol,strippedPart);
size_t pos = fullSymbol.find_last_of(':');
if(pos != std::string::npos)
{
--pos;
cls = fullSymbol.substr(0,pos);
std::string untemplatedClassName = stripBracketPair('<','>',cls,strippedPart);
if(untemplatedClassName.find('<') == std::string::npos &&
untemplatedClassName.find(' ') != std::string::npos)
{
cls = "";
}
}
if(!cls.empty() && !isClass(cls))
{
cls = "";
}
return cls;
}
buildNamespaceFromSymbolPath()
方法仅连接有效的名称空间部分:
the buildNamespaceFromSymbolPath()
method simply concatenates valid namespace parts:
std::string buildNamespaceFromSymbolPath(const std::vector<std::string>& symbolPathParts)
{
if(symbolPathParts.size() >= 2)
{
std::ostringstream oss;
bool firstItem = true;
for(unsigned int i = 0;i < symbolPathParts.size() - 1;++i)
{
if((symbolPathParts[i].find('<') != std::string::npos) ||
(symbolPathParts[i].find('(') != std::string::npos))
{
break;
}
if(!firstItem)
{
oss << "::";
}
else
{
firstItem = false;
}
oss << symbolPathParts[i];
}
return oss.str();
}
return "";
}
至少isClass()
方法使用正则表达式扫描所有符号以查找构造函数方法(不幸的是,该方法似乎不适用于仅包含成员函数的类):
At least the isClass()
method uses a regular expression to scan all symbols for a constructor method (which unfortunately doesn't seem to work for classes only containing member functions):
std::set<std::string> allClasses;
bool isClass(const std::string& classSymbol)
{
std::set<std::string>::iterator foundClass = allClasses.find(classSymbol);
if(foundClass != allClasses.end())
{
return true;
}
std::string strippedPart;
std::string constructorName = stripBracketPair('<','>',classSymbol,strippedPart);
std::vector<std::string> constructorPathParts;
boost::split(constructorPathParts,constructorName,boost::is_any_of("::"),boost::token_compress_on);
if(constructorPathParts.size() > 1)
{
constructorName = constructorPathParts.back();
}
boost::replace_all(constructorName,"(","[\\(]");
boost::replace_all(constructorName,")","[\\)]");
boost::replace_all(constructorName,"*","[\\*]");
std::ostringstream constructorPattern;
std::string symbolPattern = classSymbol;
boost::replace_all(symbolPattern,"(","[\\(]");
boost::replace_all(symbolPattern,")","[\\)]");
boost::replace_all(symbolPattern,"*","[\\*]");
constructorPattern << "^" << symbolPattern << "::" << constructorName << "[\\(].+$";
boost::regex reConstructor(constructorPattern.str());
for(std::vector<NmRecord>::iterator it = allRecords.begin();
it != allRecords.end();
++it)
{
if(boost::regex_match(it->symbolName,reConstructor))
{
allClasses.insert(classSymbol);
return true;
}
}
return false;
}
如前所述,如果该类未提供任何构造函数,则最后一个方法将无法安全地找到该类的名称,并且在大型符号表上运行速度很慢.但这至少似乎涵盖了您可以从nm的符号信息中获得的信息.
As mentioned the last method doesn't safely find a class name if the class doesn't provide any constructor, and is quite slow on big symbol tables. But at least this seems to cover what you can get out of nm's symbol information.
我已将 regex 标记的问题留给了
I have left the regex tag for the question, that other users may find regex is not the right approach.
推荐答案
这很难用perl的扩展正则表达式来实现,后者比C ++中的任何功能都强大得多.我建议使用另一种方法:
This is hard to do with perl's extended regular expressions, which are considerably more powerful than anything in C++. I suggest a different tack:
首先摆脱看起来像数据之类的函数的东西(寻找D指示符).诸如virtual thunk to this
,virtual table for that
等之类的东西也将妨碍您使用;在进行主要解析之前,请先摆脱它们.正则表达式可以帮助您进行过滤.您应该剩下的就是功能.对于每个功能,
First get rid of the things that don't look like functions such as data (look for the D designator). Stuff like virtual thunk to this
, virtual table for that
, etc., will also get in your way; get rid of them before you do you the main parsing. This filtering is something where a regexp can help. What you should have left are functions. For each function,
-
在最后的右括号后删除内容.例如,
Foo::Bar(int,double) const
变为Foo::Bar(int,double)
.
对函数参数进行剥离.这里的问题是您可以在括号内包含括号,例如,将函数指针作为参数的函数,而函数指针又将参数作为参数.不要使用正则表达式.使用括号匹配的事实.此步骤之后,Foo::Bar(int,double)
变为Foo::Bar
,而a::b::Baz<lots<of<template>, stuff>>::Baz(int, void (*)(int, void (*)(int)))
变为a::b::Baz<lots<of<template>, stuff>>::Baz
.
Strip the function arguments. The problem here is that you can have parentheses inside the parentheses, e.g., functions that take function pointers as arguments, which might in turn take function pointers as arguments. Don't use a regexp. Use the fact that parentheses match. After this step, Foo::Bar(int,double)
becomes Foo::Bar
while a::b::Baz<lots<of<template>, stuff>>::Baz(int, void (*)(int, void (*)(int)))
becomes a::b::Baz<lots<of<template>, stuff>>::Baz
.
现在在前端工作.使用类似的方案来解析该模板内容.这样,混乱的a::b::Baz<lots<of<template>, stuff>>::Baz
就会变成a::b::Baz::Baz
.
Now work on the front end. Use a similar scheme to parse through that template stuff. With this, that messy a::b::Baz<lots<of<template>, stuff>>::Baz
becomes a::b::Baz::Baz
.
在此阶段,您的功能将类似于a::b:: ... ::ClassName::function_name
.在这里,某些命名空间中的自由函数存在一个小问题.毁灭者是一门致命的礼物.毫无疑问,如果函数名称以波浪号开头,则您将具有一个类名称.构造函数几乎是一种赠品,您手头就有一个类-只要您没有在其中定义了函数Foo
的名称空间Foo
.
At this stage, your functions will look like a::b:: ... ::ClassName::function_name
. There is a slight problem here with free functions in some namespace. Destructors are a dead giveaway of a class; there's no doubt that you have a class name if the function name starts with a tilde. Constructors are a near giveaway that you have a class at hand -- so long as you don't have a namespace Foo
in which you have defined a function Foo
.
最后,您可能需要重新插入剪切出的模板内容.
Finally, you may want to re-insert the template stuff you cut out.
这篇关于从解缠的符号中提取类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!