将 Subversion 差异转换为 JSON 的最佳方法是什么? [英] What's the best way to turn a Subversion diff into JSON?

查看:53
本文介绍了将 Subversion 差异转换为 JSON 的最佳方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一堆 Sed/unix fu,我开始怀疑不会是完成任务的最佳方式,因为 'svn diff' ...

I have a bunch of Sed/unix fu, that I'm begining to suspect isn't going to be the best way to complete the task, given the variance of lines coming out of 'svn diff' ...

svn diff -r 1:9 | 
expand | 
sed -e 's/^Index: \(.*\)/]}, { "index":"\1", /g' | 
sed -e 's/^--- \(.*\)/"from":"\1", /g' | 
sed -e 's/^+++ \(.*\)/"to":"\1", "chunks":[/g' | 
sed -e 's/^@@ \(.*\) @@/]},{"locn":"\1", "lines": [/g' | 
sed -e 's/^-\(.*\)/"-\1",/g' | 
sed -e 's/^+\(.*\)/"+\1",/g' | 
sed -e 's/^ \(.*\)/" \1",/g' | 
sed -e 's/^==============.*//g' | 
tr -d '\n' | 
sed -e 's/"chunks":\[\]},{/"chunks":\[{/g' | 
sed -e 's/^]}, \(.*\)/{"changes":[ \1]}]}]}/g' | 
sed -e 's/,\]}/]}/g' |
jshon

它可靠地转动......

It reliably turns ...

Index: file1.txt
===================================================================
--- file1.txt   (revision 8)
+++ file1.txt   (revision 9)
@@ -1,3 +1,5 @@
+zzz
+
 aaa

 Efficiently Blah blah
@@ -7,3 +9,5 @@
 functional solutions.

 bbb
+
+www   

进入...

{
 "changes": [
  {
   "index": "file1.txt",
   "to": "file1.txt   (revision 9)",
   "from": "file1.txt   (revision 8)",
   "chunks": [
    {
     "locn": "-1,3 +1,5",
     "lines": [
      "+zzz",
      "+",
      " aaa",
      " ",
      " Efficiently blah blah"
     ]
    },
    {
     "locn": "-7,3 +9,5",
     "lines": [
      " functional solutions.",
      " ",
      " bbb",
      "+",
      "+www"
     ]
    }
   ]
  }
 ]
}

但是 'svn diff' 可能会产生比我正在处理的更多的东西,我想知道在这个方向上继续前进是否愚蠢.

But there's way more that could come out of 'svn diff' than I'm handling, and I wonder if it's foolish to carry on in this direction.

推荐答案

我可能会使用 libsvn_diff 中的差异解析器.我不确定它是否被绑定包裹,但它很可能从 Python 绑定中工作.

I'd probably use the diff parser in libsvn_diff. I'm not sure if it's been wrapped by the bindings but it's likely that it works from the Python bindings.

从 svn_diff_open_patch_file() 开始,然后通过调用 svn_diff_parse_next_patch() 迭代文件中的补丁,直到它为 svn_patch_t 提供 NULL.

Start with svn_diff_open_patch_file() and then iterate over the patches in the file by calling svn_diff_parse_next_patch() until it gives you NULL for the svn_patch_t.

一旦你有了每个文件的结构,生成你的 JSON 就很简单了.

Once you have the struct for each file it should be trivial to generate your JSON.

公平警告,该差异解析器中可能存在错误.它是为 svn 补丁编写的,我发现它有问题(尽管我认为大多数错误都在补丁应用程序中而不是解析中).另一方面,这样做意味着即使我们调整补丁格式输出,您也应该始终拥有一个好的解析器.当然,您的错误报告(如果您最终有任何报告)可以改进我们的解析器.

Fair warning, there may be bugs in that diff parser. It's was written for svn patch, which I find buggy (though I think most of the bugs are in the patch application not the parsing). On the other hand doing it this way should mean even if we adjust patch format output you should always have a good parser. And of course your bug reports (if you end up having any) could improve our parser.

我唯一想到的其他事情是 API 不是流式的(它适用于文件),这可能不是您想要的.此外,如果你真的想深入研究,你可以直接驱动 WC/RA 层作为编辑器驱动器的接收器,它生成你的 json 输出而不是统一的差异.但这可能比您想要的要多得多,因为有大量代码只是为了处理 diff 目标类型的所有不同变体(本地到本地、repo 到 repo、本地到 repo、repo 到本地).

Only other things that occur to me is that the API is not streamy (it works on files) which may not be what you want. Also if you really want to go down the rabbit hole you could just drive the WC/RA layer directly an act as a receiver of the editor drive that generates your json output instead of a unified diff. But that's probably way more than what you want because there's a ton of code just to handle all the different variations of diff target types (local to local, repo to repo, local to repo, repo to local).

示例

所以我决定使用 diff 解析器.我最终编写了以下 python 脚本来使用它并生成与您的示例几乎相同的 JSON 输出.请注意,解析器会丢弃 Index 行,因此我的输出中没有该行.

So I decided to play with the diff parser. I ended up writing the following python script to use it and produce almost the same JSON output as your example. Note that the parser throws away the Index line so I don't have that in my output.

我遇到了一个必须对 Python SWIG 绑定进行的小改动才能使这项工作(svn_patch_t 的 hunks 字段没有正确转换为 python 列表),我在 r1548379 在 Subversion 主干上(我怀疑该补丁将完全适用于 1.8).

I ran into one small change I had to make to the Python SWIG bindings making this work (the hunks field of svn_patch_t wasn't properly being converted to a python list), which I fixed in r1548379 on Subversion trunk (I suspect that patch will apply cleanly to 1.8).

注意 svn_diff_hunk_readline_diff_text() 的文档说第一行将是大块头,但它似乎不是真的.虽然您可以使用 svn_diff_hunk_get_{original,modified}_{start,length} 函数重建您想要的大块头数据.

Note that svn_diff_hunk_readline_diff_text()'s documentation says the first line will be the hunk header, but it doesn't seem to be true. Though you can reconstruct the hunk header data you wanted with the svn_diff_hunk_get_{original,modified}_{start,length} functions.

我没有费心去处理属性更改解析或操作解析(我认为对此的支持并不完整,但如果您需要它,我会将其作为练习留给您).

I didn't bother to mess with the property change parsing or the operation parsing (I don't think the support for this is really complete but if you want it I leave it as an excercise to you).

如果这不是最 Pythonic 的代码,我很抱歉.部分原因是被包装的 C API 不利于这一点,部分原因是我对 Python 不太熟悉.我是用 Python 完成的,因为这些绑定在这方面更接近完整.

My appologies if this isn't the most Pythonic code. Part of that is driven by the fact that the C APIs that are wrapped aren't conducive to that and part is that I'm simply not a super comfortable with Python. I did it in Python since those bindings are closer to being complete in this respect.

您只需使用以下脚本即可运行以下脚本:python scriptname.py patchfile

You can run the following script with just: python scriptname.py patchfile

import sys
from svn import diff, core
import json

class UDiff:
  def convert_svn_patch_t(self, patch, pool):
    data = {}
    data['from'] = patch.old_filename
    data['to'] = patch.new_filename
    iter_pool = core.Pool(pool);
    chunks = []
    for hunk in patch.hunks:
      iter_pool.clear()
      chunk = {}
      orig_start = diff.svn_diff_hunk_get_original_start(hunk)
      orig_len = diff.svn_diff_hunk_get_original_length(hunk)
      mod_start = diff.svn_diff_hunk_get_modified_start(hunk)
      mod_len = diff.svn_diff_hunk_get_modified_length(hunk)
      chunk['locn'] = "-%d,%d +%d,%d" % \
                      (orig_start, orig_len, mod_start, mod_len)
      lines = []
      while True:
        text, eol, eof = diff.svn_diff_hunk_readline_diff_text(hunk,
                                                               iter_pool,
                                                               iter_pool)
        if eof:
          break;
        lines.append("%s%s" % (text, eol))
      chunk['lines'] = lines
      chunks.append(chunk)
    data['chunks'] = chunks
    self.data = data

  def as_dict(self):
    return self.data

  def __init__(self, patch, pool):
    self.convert_svn_patch_t(patch, pool)

class UDiffAsJson:
  def __init__(self):
    self.pool = core.Pool()

  def convert(self, fname):
    patch_file = diff.svn_diff_open_patch_file(fname, self.pool)
    iter_pool = core.Pool(self.pool)
    changes = []
    while True:
      iter_pool.clear()
      patch = diff.svn_diff_parse_next_patch(patch_file,
                                             False, # reverse
                                             False, # ignore_whitespace
                                             iter_pool, iter_pool)
      if not patch:
        break
      udiff = UDiff(patch, iter_pool)
      changes.append(udiff.as_dict())
    data = {}
    data['changes'] = changes
    diff.svn_diff_close_patch_file(patch_file, iter_pool)
    return json.dumps(data, indent=True)

if __name__ == "__main__":
  udiffasjson = UDiffAsJson()
  sys.stdout.write(udiffasjson.convert(sys.argv[1]))

这篇关于将 Subversion 差异转换为 JSON 的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆