从解缠的符号中提取类 [英] Extracting class from demangled symbol

查看：57 发布时间：2020/9/22 6:45:28 c++ regex boost

本文介绍了从解缠的符号中提取类的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用boost::regex从nm的已解散符号输出中提取(完整)类名. 这个示例程序

I'm trying to extract the (full) class names from demangled symbol output of nm using boost::regex. This sample program

#include <vector>

namespace Ns1
{
namespace Ns2
{
    template<typename T, class Cont>
    class A
    {
    public:
        A() {}
        ~A() {}
        void foo(const Cont& c) {}
        void bar(const A<T,Cont>& x) {}

    private:
        Cont cont;
    };
}
}

int main()
{
    Ns1::Ns2::A<int,std::vector<int> > a;
    Ns1::Ns2::A<int,std::vector<int> > b;
    std::vector<int> v;

    a.foo(v);
    a.bar(b);
}

将为A类产生以下符号

Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >::A()
Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >::bar(Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > > const&)
Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >::foo(std::vector<int, std::allocator<int> > const&)
Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >::~A()

我想最好使用单个正则表达式模式提取类(实例)名称Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > >，但是我在解析<>对中递归出现的类说明符时遇到问题.

I want to extract the class (instance) name Ns1::Ns2::A<int, std::vector<int, std::allocator<int> > > preferably using a single regular expression pattern, but I have problems to parse the recursively occuring class specifiers within the <> pairs.

有人知道如何使用正则表达式模式(boost::regex支持)吗?

Does anyone know how to do this with a regular expression pattern (that's supported by boost::regex)?

我的解决方案(基于 David Hammen 的回答，因此被接受):

My solution (based on David Hammen's answer, thus the accept):

我不使用(单个)正则表达式来提取类和名称空间符号.我创建了一个简单的函数，用于从符号字符串的尾部去除括号中的字符对(例如<>或()):

I don't use (single) regular expressions to extract class and namespace symbols. I have created a simple function that strips off bracketing character pairs (e.g. <> or ()) from the tail of symbol strings:

std::string stripBracketPair(char openingBracket,char closingBracket,const std::string& symbol, std::string& strippedPart)
{
    std::string result = symbol;

    if(!result.empty() &&
       result[result.length() -1] == closingBracket)
    {
        size_t openPos = result.find_first_of(openingBracket);
        if(openPos != std::string::npos)
        {
            strippedPart = result.substr(openPos);
            result = result.substr(0,openPos);
        }
    }
    return result;
}

这在其他两种从符号中提取名称空间/类的方法中使用:

This is used in two other methods that extract the namespace / class from the symbol:

std::string extractNamespace(const std::string& symbol)
{
    std::string ns;
    std::string strippedPart;
    std::string cls = extractClass(symbol);
    if(!cls.empty())
    {
        cls = stripBracketPair('<','>',cls,strippedPart);
        std::vector<std::string> classPathParts;

        boost::split(classPathParts,cls,boost::is_any_of("::"),boost::token_compress_on);
        ns = buildNamespaceFromSymbolPath(classPathParts);
    }
    else
    {
        // Assume this symbol is a namespace global function/variable
        std::string globalSymbolName = stripBracketPair('(',')',symbol,strippedPart);
        globalSymbolName = stripBracketPair('<','>',globalSymbolName,strippedPart);
        std::vector<std::string> symbolPathParts;

        boost::split(symbolPathParts,globalSymbolName,boost::is_any_of("::"),boost::token_compress_on);
        ns = buildNamespaceFromSymbolPath(symbolPathParts);
        std::vector<std::string> wsSplitted;
        boost::split(wsSplitted,ns,boost::is_any_of(" \t"),boost::token_compress_on);
        if(wsSplitted.size() > 1)
        {
            ns = wsSplitted[wsSplitted.size() - 1];
        }
    }

    if(isClass(ns))
    {
        ns = "";
    }
    return ns;
}

std::string extractClass(const std::string& symbol)
{
    std::string cls;
    std::string strippedPart;
    std::string fullSymbol = symbol;
    boost::trim(fullSymbol);
    fullSymbol = stripBracketPair('(',')',symbol,strippedPart);
    fullSymbol = stripBracketPair('<','>',fullSymbol,strippedPart);

    size_t pos = fullSymbol.find_last_of(':');
    if(pos != std::string::npos)
    {
        --pos;
        cls = fullSymbol.substr(0,pos);
        std::string untemplatedClassName = stripBracketPair('<','>',cls,strippedPart);
        if(untemplatedClassName.find('<') == std::string::npos &&
        untemplatedClassName.find(' ') != std::string::npos)
        {
            cls = "";
        }
    }

    if(!cls.empty() && !isClass(cls))
    {
        cls = "";
    }
    return cls;
}

buildNamespaceFromSymbolPath()方法仅连接有效的名称空间部分:

the buildNamespaceFromSymbolPath() method simply concatenates valid namespace parts:

std::string buildNamespaceFromSymbolPath(const std::vector<std::string>& symbolPathParts)
{
    if(symbolPathParts.size() >= 2)
    {
        std::ostringstream oss;
        bool firstItem = true;
        for(unsigned int i = 0;i < symbolPathParts.size() - 1;++i)
        {
            if((symbolPathParts[i].find('<') != std::string::npos) ||
               (symbolPathParts[i].find('(') != std::string::npos))
            {
                break;
            }
            if(!firstItem)
            {
                oss << "::";
            }
            else
            {
                firstItem = false;
            }
            oss << symbolPathParts[i];
        }
        return oss.str();
    }
    return "";
}

至少isClass()方法使用正则表达式扫描所有符号以查找构造函数方法(不幸的是，该方法似乎不适用于仅包含成员函数的类):

At least the isClass() method uses a regular expression to scan all symbols for a constructor method (which unfortunately doesn't seem to work for classes only containing member functions):

std::set<std::string> allClasses;

bool isClass(const std::string& classSymbol)
{
    std::set<std::string>::iterator foundClass = allClasses.find(classSymbol);
    if(foundClass != allClasses.end())
    {
        return true;
    }

std::string strippedPart;
    std::string constructorName = stripBracketPair('<','>',classSymbol,strippedPart);
    std::vector<std::string> constructorPathParts;

    boost::split(constructorPathParts,constructorName,boost::is_any_of("::"),boost::token_compress_on);
    if(constructorPathParts.size() > 1)
    {
        constructorName = constructorPathParts.back();
    }
    boost::replace_all(constructorName,"(","[\\(]");
    boost::replace_all(constructorName,")","[\\)]");
    boost::replace_all(constructorName,"*","[\\*]");

    std::ostringstream constructorPattern;
    std::string symbolPattern = classSymbol;
    boost::replace_all(symbolPattern,"(","[\\(]");
    boost::replace_all(symbolPattern,")","[\\)]");
    boost::replace_all(symbolPattern,"*","[\\*]");
    constructorPattern << "^" << symbolPattern << "::" << constructorName << "[\\(].+$";
    boost::regex reConstructor(constructorPattern.str());

    for(std::vector<NmRecord>::iterator it = allRecords.begin();
        it != allRecords.end();
        ++it)
    {
        if(boost::regex_match(it->symbolName,reConstructor))
        {
            allClasses.insert(classSymbol);
            return true;
        }
    }
    return false;
}

如前所述，如果该类未提供任何构造函数，则最后一个方法将无法安全地找到该类的名称，并且在大型符号表上运行速度很慢.但这至少似乎涵盖了您可以从nm的符号信息中获得的信息.

As mentioned the last method doesn't safely find a class name if the class doesn't provide any constructor, and is quite slow on big symbol tables. But at least this seems to cover what you can get out of nm's symbol information.

我已将 regex 标记的问题留给了

I have left the regex tag for the question, that other users may find regex is not the right approach.

从解缠的符号中提取类 [英] Extracting class from demangled symbol

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

从解缠的符号中提取类 [英] Extracting class from demangled symbol

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭