如何将源代码嵌入到pdb中,并让调试器使用它? [英] How do I embed source into pdb, and have debugger(s) use it?

查看:104
本文介绍了如何将源代码嵌入到pdb中,并让调试器使用它?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

注意:我的目标关注点是C#使用常规MSIL定位CLR,以防万一,某些情况下可以正常工作,但在较一般的情况下无效。

NOTE: my target concern is C# targeting the CLR with regular MSIL in case there's something that works for that but not in the more general case(s).

最近发布了 Sourcepack项目,该项目允许用户重写pdb文件中的源路径以指向不同的位置。当您有程序集的源代码,但又不想将其放入与构建时完全相同的文件系统位置中时,这非常有用。

There was recently a release of the Sourcepack project which allows a user to rewrite the source paths in a pdb file to point at different locations. This is very useful when you have the source for the assembly, but don't want to try and get it into the exact same filesystem location(s) as when it was built.

http://lowleveldesign.wordpress.com/2011/08/26 / sourcepack-released /

对于开源项目,请使用 http://www.symbolsource.org/ 作为使您的项目用户轻松获取符号和来源的一种好方法。

For open-source projects, using http://www.symbolsource.org/ as a way of making it simple for users of your project to get symbols and source is an excellent idea.

但是,由于法律或便利性原因,经常有一些项目使用这种方法不太可行。另外,可能正在调试项目的人员可能相对较小或包含。

However, very often there are projects where either for legal or convenience reasons, using such an approach isn't very feasible. Also, the set of people that might be debugging the project may be relatively small or contained.

默认情况下,项目的pdb包含指向磁盘上文件的指针( IIRC),然后使用源索引可以添加将指针嵌入到源位置的功能(例如,在版本控制系统中),然后使用源服务器,然后使用指针实际获取源。

By default, the pdb's for a project include pointers to the files on disk (IIRC) and then source indexing can add the ability to embed pointers to the source locations (for instance, in a version control system), with a source server then using the pointers to actually fetch the source.

看起来事情可能更简单(对于某些构建,例如debug和/或仅供内部使用),只需放入实际源即可进入pdb(实际上只是取消引用当前写在PDB中的指针)。看来您可以跳过整个源服务器部分(至少在理论上是这样),并消除对调试时故事的一些依赖。是否以压缩方式存储源在很大程度上是正交的,但是为了使现有调试器的实现更简单,第一遍可能不会这样做。

It seems like things could be simpler (for certain builds, like debug and/or internal-only) to just put the actual source into the pdb (effectively just dereferencing the pointer currently written in the PDB). It seems like then you can skip the entire source server part (at least in theory) and eliminate a few dependencies on the debug-time story. Whether to store the source as compressed or not is largely orthogonal, but a first pass would probably not do so in an effort to make it simpler to implement for existing debuggers.

由于PDB匹配二进制的故事已经非常好,将源放入PDB甚至比源服务器指针还要好,因为指针可能会随着时间而中断(源控制系统移动或更改为其他系统,或者

Since the PDB-matching-binary story is already very good, putting the source into the PDB would be even better than a source server pointer, since the pointer can break over time (source control system moves, or changes to a different system, or whatever), but the actual source sitting in the PDB is good 'forever'.

(这是在Tigran发表评论后询问是否会带来好处的补充)

基准今天应该与之相对比的场景是使用正常源服务器实例的正常调试经验。在这种情况下,调试引擎(AFAIK)会从PDB获取一个指针(通过备用流),然后使用已注册的源服务器尝试通过该指针获取源。由于给定的程序集通常将包含多个源文件,因此要么有一个包含基本位置的指针,要么在PDB中(或其他东西)有多个指针,但这应与本讨论正交。

The 'baseline' scenario that this should be compared against is that of a 'normal' debugging experience using a 'normal' source server instance today. In that scenario, (AFAIK) the debugging engine gets a pointer from the PDB (via an alternate stream) then uses the registered source server(s) to attempt to get the source via that pointer. Since a given assembly is typically going to include multiple source files, there's either a single pointer that includes a base location or there are multiple pointers in the PDB (or something else), but that should be orthogonal to this discussion.

对于需要隐藏/不可访问源代码的项目(大多数Microsoft产品,例如Windows,Office,Visual Studio等),则使PDB包含指针是FAR优于包含实际来源(即使已加密)。如果没有必要的网络访问权限和权限,此类指针就毫无意义,因此,这种方法意味着您可以将PDB交付给地球上的任何人,而不必担心它们能够访问您的源(最坏的情况是,它们会瞥见您的源如何

For a project where keeping the source hidden/inaccessible is desirable (most Microsoft products, for instance, including Windows, Office, Visual Studio, etc.), then having the PDB contain pointers is FAR superior to including actual source (even if it were encrypted). Such pointers are meaningless without the necessary network access and permissions, so such an approach means you can ship the PDB to anyone on the planet without worrying about them being able to access your source (worst-case, they get a glimpse into how your source tree is arranged, I would think).

但是,有2个大型项目(特别是构建项目)没有隐藏源代码的好处存在。

However, there are 2 large sets of projects (and specifically, builds) where this 'hide the source' benefit doesn't exist.

第一个是仅由有权访问源代码的人使用的版本。在您自己的计算机上完成的构建永远不会离开该计算机是一个很好的例子,因为攻击者无论如何都需要从文件系统读取文件来获取源,因此从一个文件(.cs)到另一个文件(.cs)进行读取。 pdb)在攻击难度/向量方面的差异相对较小。同样,完成并推送到测试/分阶段环境的构建,在此环境中,在计算机上访问pdb的人员与可以正常访问源的人员相同或一部分。

The first are builds that are only used by people that have access to the source anyway. Builds done on your own machine that won't ever leave that machine are a great example, as an attacker would need to read files from your filesystem anyway to get the source, so reading from one file (.cs) vs. another (.pdb) is a relatively small difference in terms of attack difficulty/vector. Similarly, builds that are done and pushed to a test/staging environment where the people that access the pdb on machine are equal to or a subset of the people that can access the source 'normally'.

第二个(显然是)开源项目,该项目的源代码已经对所有人开放,因此对任何人隐藏源代码都没有好处。

The second are (somewhat obviously) open-source projects, where the source for the project is already open for everyone anyway, so there's no benefit to hiding the source from anyone.

请注意,可以相对容易地扩展它以包含加密形式的源(因为我们已经在谈论必须存储格式/编码数据),但是这样做会增加复杂性

Note that this could be relatively easily extended to include the source in an encrypted form instead (since we're already talking about having to store format/encoding data as well), but the added complexity of that would make such a scenario likely less useful than just using a 'normal' source server.

上面的描述不包括这些,可能允许这样做的潜在好处包括(但不限于:)这些当下突然出现的事情:

With the above descriptions out of the way, the list of potential benefits to allowing this include (but are not limited to :) these that pop into my head at the moment:


  • 无需处理设置源服务器支持。 It Just Works(IJW),至少在/如果调试器知道要查看pdb的情况下才有效。


    • 同时,您仍然可以做一个固定源服务器,它只是一个提取源并将其反馈给调用者的虚拟对象。这样的配置对每个人来说都是相同的(例如,使用localhost),仍然消除了当前实际配置源服务器的需求


    • 由于构建仍会读取源文件并写入pdb文件,因此我们只是在修改用pdb编写,不会因进行网络调用或读取内存中尚不存在的数据而造成任何构建时性能下降。

    • 直到本地构建支持将源放置在其中,这可能是一个简单的构建后步骤,可能首先通过Sourcepack项目的一个小分支来实现,因为它
      已经完成了读取/修改PDB文件的工作:)


    • 的特定源代码管理系统例如,在DVCS情况下,PDB指针可能指向git或mercurial之类的随机实例,而不一定是您有权访问的

    • 源服务器工具该版本返回您确实有权访问的源控制服务器实例(如果该版本甚至存在)还不存在AFAIK)


    • 没问题,例如,如果项目从以下一项移动到另一项:自托管,sourceforge,github,bitbucket,codeplex,code.google.com等。


    • 例如,如果您正在将网络KVM放入用于调试问题的框中,但是没有网络,或者只能与光盘通话的连接,这样它就无法访问您的源控制服务器。)

    注意:另一种方法是将源包含在实际程序集中(例如,作为资源),但是pdb是一个更好的选择(如果没有pdb的话,很容易交付构建,如果源在pdb中,则不会产生正常的运行时性能影响,因为程序集的代码和大小相同,等等)

    NOTE: another approach would be including the source in the actual assembly (for instance, as a resource), but the pdb is a better choice (easy to ship a build without pdb's, no normal runtime perf hit if the source is in the pdb since the assembly is the same code and same size, etc)

    从表面上看,这种支持似乎不太难添加,但我有种感觉这是因为我对所涉及的机制并不十分了解,而不是实际上很容易实现。 :)

    On the surface of it, this kind of support doesn't seem like it would be too difficult to add, but I get the feeling this is because I don't really know enough about the mechanics involved instead of it actually being a simple thing to implement. :)

    我的猜测是:


    1. 添加一个构建后的步骤,其功能类似于Sourcepack,但不更改指针,而是将其替换为实际的源代码。


      • 根据源服务器需要执行的操作,可能需要加上前缀,否则实际源将位于其他备用数据流中,并且指针更新为 source-in-pdb:ads-foo.cs之类的内容。前缀或指针可以包括源文件的存储方式(未压缩,gzip,bzip2等,以及文件编码)


    • 不知道源服务器 API是否具有足够的信息来获取PDB的位置,更不用说它是否有权读取内容了。



    健全性检查?



    随着上面的胡言乱语,问题确实出现了:

    Sanity check?

    With the babble above out of the way, the questions are really:


    • 这种事情是否已经存在? (如果可以,请提供指针!)

    • 假设尚不存在,上述作为首过实施是否有意义?

    • 假设上述内容是否存在陷阱或复杂性?

    • 假设上述内容为否和是,是否有一个现有的项目在进行这项工作时有意义(这很接近)或在其现有范围内?)

    • Does this kind of thing already exist? (and if so, please provide pointers!)
    • Assuming it doesn't exist yet, does the above make sense as a first-pass implementation? Are there pitfalls or complexities the above skips over?
    • Assuming "no" and "yes" for the above, is there an existing project that makes sense in terms of taking this on (it's close or in their existing scope)?

    推荐答案

    我已阅读并希望总结一下我为清楚起见

    I've read over this and wanted to summarize my understanding for clarity


    今天,调试器使用PDB来获取文件和校验和的磁盘路径,这些文件和校验和经过编译以创建给定的部分可执行文件。然后,调试器将尝试使用本地磁盘和可用的符号服务器加载文件。根据该建议,我们只将文件本身嵌入到PDB中就可以跳过中间人。尤里卡,不再寻找源!

    Today the debugger uses the PDB to gain the disk path to a file and checksum which was compiled to create a given section of an executable. The debugger then attempts to load the file using both the local disk and available symbol server. Under this proposal we would skip the middle man by just embedding the file itself into the PDB. Eureka, no more searching for source!

    作为以这种方式完成了合理的源代码挖掘工作的人,我喜欢为所有调试准备一个软件包的想法需要。但是,关于此提议有两个方面需要考虑。

    As someone who's done their fair share of digging for source code in this manner I like the idea of having one package for all your debugging needs. There are a couple of facets to consider about this proposal though.

    第一个是将源代码实际嵌入到PDB中。这是非常可行的。 PDB本质上是一个轻量级的文件数据库。它的编码具有结构性,但AFAIK可以将所需的内容放入某些插槽(例如,局部变量值/类型)。某些插槽可能有大小限制,但是我敢肯定,您可以发明一种编码方案,将大文件分解为多个块。

    The first is the actual embedding of the source code into the PDB. This is very doable. The PDB is essentially a light weight file database. There is structure to what it encodes but AFAIK you can put whatever you want into certain slots (local variable values / types for example). There may be size limitations for certain slots but I'm sure you could invent an encoding scheme to break large files up into chunks.

    第二个方面是让调试器实际从PDB加载文件,而不是在磁盘上搜索文件。我对调试器的那部分并不熟悉,但据我了解,它仅使用2条信息来定位文件

    The second facet is having the debugger actually load the file from the PDB vs. searching for it on disk. I'm not as familiar with that part of the debugger but from what I understand it only uses 2 pieces of information to locate the file


    1. 磁盘上文件的路径

    2. 该文件的校验和(用于消除同名文件的歧义)

    我相当确定这是它传递给符号服务器的唯一信息。这使得实现符号服务器变得不可行,因为它无法访问PDB(假设我当然是对的)。

    I'm fairly certain this is the only information it passes onto a symbol server. This makes it unfeasible to implement a symbol server because it won't have access to the PDB (assuming of course I'm right).

    我挖来的是希望有一个可以覆盖的VS COM组件,它可以让您拦截给定路径的文件加载,但我找不到一个。

    I dug around hoping there was a VS COM component you could override which would allow you to intercept the loading of the file for a given path but I couldn't find one.

    我认为可行的一种方法是

    One approach I think would be feasible though would be


    1. 嵌入源代码在PDB中

    2. 拥有一个工具,它既可以将源提取到已知位置,也可以重写PDB以指向该位置。

    虽然这不是您想要的。

    This wouldn't be quite what you want though.

    这篇关于如何将源代码嵌入到pdb中,并让调试器使用它?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆