Clang用于模糊解析C ++ [英] Clang for fuzzy parsing C++

查看:150
本文介绍了Clang用于模糊解析C ++的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以使用现有的libclang API解析带有不完整声明的C ++? IE.解析.cpp文件而不包含所有标头,从而动态推断出声明.因此,例如以下文字:

Is it at all possible to parse C++ with incomplete declarations with clang with its existing libclang API ? I.e. parse .cpp file without including all the headers, deducing declarations on the fly. so, e.g. The following text:

A B::Foo(){return stuff();}

将检测未知符号A,使用我的魔术启发式方法调用我的回调(扣除A的类),然后使用B和Foo及其他方法以相同的方式调用此回调.最后,我希望能够推断出我看到B类的成员Foo返回了A,而东西是一个函数. 上下文:我想看看是否可以在不非常快地解析所有标头的情况下进行明智的语法高亮显示和动态代码分析.

Will detect unknown symbol A, call my callback that deducts A is a class using my magic heuristic, then call this callback the same way with B and Foo and stuff. In the end I want to be able to infer that I saw a member Foo of class B returning A, and stuff is a function.. Or something to that effect. context: I wanna see if I can do sensible syntax highlighting and on the fly code analysis without parsing all the headers very quickly.

澄清一下,我正在寻找受严格限制的C ++解析,可能会采用启发式方法来解除某些限制.

To clarify, I'm looking for very heavily restricted C++ parsing, possibly with some heuristic to lift some of the restrictions.

C ++语法充满了上下文相关性. Foo()是函数调用还是Foo类的临时构造? Foo< Bar>东西;模板Foo< Bar>实例化和声明变量,或者对重载运算符<的调用看起来很怪异2和运算符>?只能在上下文中分辨,上下文通常来自解析标头.

C++ grammar is full of context dependencies. Is Foo() a function call or a construction of a temporary of class Foo? Is Foo<Bar> stuff; a template Foo<Bar> instantiation and declaration of variable stuff, or is it weird-looking 2 calls to overloaded operator < and operator > ? It's only possible to tell in context, and context often comes from parsing the headers.

我正在寻找的是一种插入自定义约定规则的方法.例如.我知道我不会重载Win32符号,因此我可以放心地假设CreateFile始终是一个函数,甚至我知道它的签名.我还知道,我所有的课程都以大写字母开头,并且是名词,函数通常是动词,因此我可以合理地猜测Foo和Bar是类名.在更复杂的情况下,我知道我不会编写< b> c;因此我可以假设a始终是模板实例化.依此类推.

What I'm looking for is a way to plug my custom convention rules. E.g. I know that I don't overload Win32 symbols, so I can safely assume that CreateFile is always a function, and I even know its signature. I also know that all my classes start with a capital letter and are nouns, and functions are usually verbs, so I can reasonably guess that Foo and Bar are class names. In a more complex scenario, I know I don't write side-effect-free expressions like a < b > c; so I can assume that a is always a template instantiation. And so on.

所以,问题是,是否有可能在每次遇到未知符号时使用Clang API进行回调,并使用我自己的非C ++启发式方法给出答案.如果我的试探法失败了,那么解析显然就失败了.我不是在谈论解析Boost库:)我是在谈论非常简单的C ++,可能没有模板,在这种情况下,clang只能处理一些最低要求.

So, the question is whether it's possible to use Clang API to call back every time it encounters an unknown symbol, and give it an answer using my own non-C++ heuristic. If my heuristic fails, then the parse fails, obviously. And I'm not talking about parsing Boost library :) I'm talking about very simple C++, probably without templates, restricted to some minimum that clang can handle in this case.

推荐答案

我认为另一种解决方案比模糊解析更适合OP.

Another solution which I think will suit more the OP than fuzzy parsing.

解析时,clang通过分析器的 Sema 部分维护语义信息.遇到未知符号时, Sema 将退回到

When parsing, clang maintains Semantic information through the Sema part of the analyzer. When encountering an unknown symbol, Sema will fallback to ExternalSemaSource to get some information about this symbol. Through this, you could implement what you want.

这里是一个简单的示例,说明如何进行设置.它并不完全功能正常(我没有在 LookupUnqualified 方法中做任何事情),您可能需要做进一步的研究,我认为这是一个好的开始.

Here is a quick example how to set up it. It is not entirely functional (I'm not doing anything in the LookupUnqualified method), you might need to do further investigations and I think it is a good start.

// Declares clang::SyntaxOnlyAction.
#include <clang/Frontend/FrontendActions.h>
#include <clang/Tooling/CommonOptionsParser.h>
#include <clang/Tooling/Tooling.h>
#include <llvm/Support/CommandLine.h>
#include <clang/AST/AST.h>
#include <clang/AST/ASTConsumer.h>
#include <clang/AST/RecursiveASTVisitor.h>
#include <clang/Frontend/ASTConsumers.h>
#include <clang/Frontend/FrontendActions.h>
#include <clang/Frontend/CompilerInstance.h>
#include <clang/Tooling/CommonOptionsParser.h>
#include <clang/Tooling/Tooling.h>
#include <clang/Rewrite/Core/Rewriter.h>
#include <llvm/Support/raw_ostream.h>
#include <clang/Sema/ExternalSemaSource.h>
#include <clang/Sema/Sema.h>
#include "clang/Basic/DiagnosticOptions.h"
#include "clang/Frontend/TextDiagnosticPrinter.h"
#include "clang/Frontend/CompilerInstance.h"
#include "clang/Basic/TargetOptions.h"
#include "clang/Basic/TargetInfo.h"
#include "clang/Basic/FileManager.h"
#include "clang/Basic/SourceManager.h"
#include "clang/Lex/Preprocessor.h"
#include "clang/Basic/Diagnostic.h"
#include "clang/AST/ASTContext.h"
#include "clang/AST/ASTConsumer.h"
#include "clang/Parse/Parser.h"
#include "clang/Parse/ParseAST.h"
#include <clang/Sema/Lookup.h>

#include <iostream>
using namespace clang;
using namespace clang::tooling;
using namespace llvm;

class ExampleVisitor : public RecursiveASTVisitor<ExampleVisitor> {
private:
  ASTContext *astContext;

public:
  explicit ExampleVisitor(CompilerInstance *CI, StringRef file)
      : astContext(&(CI->getASTContext())) {}

  virtual bool VisitVarDecl(VarDecl *d) {
    std::cout << d->getNameAsString() << "@\n";
    return true;
  }
};

class ExampleASTConsumer : public ASTConsumer {
private:
  ExampleVisitor visitor;

public:
  explicit ExampleASTConsumer(CompilerInstance *CI, StringRef file)
      : visitor(CI, file) {}
  virtual void HandleTranslationUnit(ASTContext &Context) {
    // de cette façon, on applique le visiteur sur l'ensemble de la translation
    // unit
    visitor.TraverseDecl(Context.getTranslationUnitDecl());
  }
};

class DynamicIDHandler : public clang::ExternalSemaSource {
public:
  DynamicIDHandler(clang::Sema *Sema)
      : m_Sema(Sema), m_Context(Sema->getASTContext()) {}
  ~DynamicIDHandler() = default;

  /// \brief Provides last resort lookup for failed unqualified lookups
  ///
  /// If there is failed lookup, tell sema to create an artificial declaration
  /// which is of dependent type. So the lookup result is marked as dependent
  /// and the diagnostics are suppressed. After that is's an interpreter's
  /// responsibility to fix all these fake declarations and lookups.
  /// It is done by the DynamicExprTransformer.
  ///
  /// @param[out] R The recovered symbol.
  /// @param[in] S The scope in which the lookup failed.
  virtual bool LookupUnqualified(clang::LookupResult &R, clang::Scope *S) {
     DeclarationName Name = R.getLookupName();
     std::cout << Name.getAsString() << "\n";
    // IdentifierInfo *II = Name.getAsIdentifierInfo();
    // SourceLocation Loc = R.getNameLoc();
    // VarDecl *Result =
    //     // VarDecl::Create(m_Context, R.getSema().getFunctionLevelDeclContext(),
    //     //                 Loc, Loc, II, m_Context.DependentTy,
    //     //                 /*TypeSourceInfo*/ 0, SC_None, SC_None);
    // if (Result) {
    //   R.addDecl(Result);
    //   // Say that we can handle the situation. Clang should try to recover
    //   return true;
    // } else{
    //   return false;
    // }
    return false;
  }

private:
  clang::Sema *m_Sema;
  clang::ASTContext &m_Context;
};

// *****************************************************************************/

LangOptions getFormattingLangOpts(bool Cpp03 = false) {
  LangOptions LangOpts;
  LangOpts.CPlusPlus = 1;
  LangOpts.CPlusPlus11 = Cpp03 ? 0 : 1;
  LangOpts.CPlusPlus14 = Cpp03 ? 0 : 1;
  LangOpts.LineComment = 1;
  LangOpts.Bool = 1;
  LangOpts.ObjC1 = 1;
  LangOpts.ObjC2 = 1;
  return LangOpts;
}

int main() {
  using clang::CompilerInstance;
  using clang::TargetOptions;
  using clang::TargetInfo;
  using clang::FileEntry;
  using clang::Token;
  using clang::ASTContext;
  using clang::ASTConsumer;
  using clang::Parser;
  using clang::DiagnosticOptions;
  using clang::TextDiagnosticPrinter;

  CompilerInstance ci;
  ci.getLangOpts() = getFormattingLangOpts(false);
  DiagnosticOptions diagnosticOptions;
  ci.createDiagnostics();

  std::shared_ptr<clang::TargetOptions> pto = std::make_shared<clang::TargetOptions>();
  pto->Triple = llvm::sys::getDefaultTargetTriple();

  TargetInfo *pti = TargetInfo::CreateTargetInfo(ci.getDiagnostics(), pto);

  ci.setTarget(pti);
  ci.createFileManager();
  ci.createSourceManager(ci.getFileManager());
  ci.createPreprocessor(clang::TU_Complete);
  ci.getPreprocessorOpts().UsePredefines = false;
  ci.createASTContext();

  ci.setASTConsumer(
      llvm::make_unique<ExampleASTConsumer>(&ci, "../src/test.cpp"));

  ci.createSema(TU_Complete, nullptr);
  auto &sema = ci.getSema();
  sema.Initialize();
  DynamicIDHandler handler(&sema);
  sema.addExternalSource(&handler);

  const FileEntry *pFile = ci.getFileManager().getFile("../src/test.cpp");
  ci.getSourceManager().setMainFileID(ci.getSourceManager().createFileID(
      pFile, clang::SourceLocation(), clang::SrcMgr::C_User));
  ci.getDiagnosticClient().BeginSourceFile(ci.getLangOpts(),
                                           &ci.getPreprocessor());
  clang::ParseAST(sema,true,false);
  ci.getDiagnosticClient().EndSourceFile();

  return 0;
}

这个想法和 DynamicIDHandler 类来自 cling 项目,其中未知符号是可变的(因此,注释和代码).

The idea and the DynamicIDHandler class are from cling project where unknown symbols are variable (hence the comments and the code).

这篇关于Clang用于模糊解析C ++的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆