搜索和处理git对象 [英] Searching and Handling git objects

查看:72
本文介绍了搜索和处理git对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试过滤git存储库中文件的历史内容.在某些文件中有一行包含字符串'BEAM:A_BOOK',在第7个逗号分隔的行中,该值是我要检索以进行进一步处理的值.我认为,理想情况下,我最终会得到像字典这样的东西,其中包含提交的SHA-1哈希值,以及该文件过去版本的A_BOOK值.

I'm trying to filter through the historical content of a file in my git repository. There is a line in some of the files that contains the string 'BEAM:A_BOOK', and in the 7th comma separated value of this line is a value I want to retrieve for further processing. I think, ideally, I'd end up with something like a dictionary with the SHA-1 hash of the commit, and this A_BOOK value for the past versions of this file.

文件前几行的示例.请注意,我希望从此版本的文件中检索的值为"56.0":

Example of first few lines of a File. Note the value I'd hope to retrieve from this version of the file would be '56.0':

# Date: 2018-12-21 01:49:16.888 PV,SELECTED,TIMESTAMP,STATUS,SEVERITY,VALUE_TYPE,VALUE,READBACK,READBACK_VALUE,DELTA,READ_ONLY

REA_EXP:LINE,0,1544047322.881066957,NO_ALARM,NONE,enum,"JENSA~[UDF;AT-TPC;GPL;JENSA]",,"---",,true

REA_BTS19:BEAM:OPTICSFILE,0,1541798820.065952460,NO_ALARM,NONE,string,"BTS19_test3.data",,"---",,true

REA_BTS19:BEAM:A_BOOK,0,1545322510.562031883,NO_ALARM,NONE,double,"56.0",,"---",,true

最终,我将对此进行扩展以检索几个值并进行一些数学运算以执行更复杂的过滤.更多背景信息:我们在版本控制下,将为核物理实验提供的离子束的原子质量和电荷值存储在文本文件中.这些文本文件充当我们的保存集",并充满了除此质量和电荷信息以外的内容,因为它们还包含如果我们想再次运行该射线束将恢复的机器值.我的目标是通过与它们一起运行的光束的Charge:Mass比率来过滤这些文件.

Ultimately, I'll extend this to retrieve a couple values and do some math to perform more complicated filtering. More background: we store the Atomic Mass and Charge values for ion beams we deliver for nuclear physics experiments in text files under version control. These text files act as our 'save sets', and are filled with more than this mass and charge info, as they also include machine values we would restore if we wanted to run that beam again. My goal is to filter these files by the Charge:Mass ratio of the beams we ran with them.

到目前为止,这似乎可以为我提供大部分信息:

So far, this seems to get me most of my information:

git grep 'BTS19:BEAM:A_BOOK' $(git rev-list --all) | grep RFQ-JENSA_Setpoint.snp

哪个吐出这样的东西:

16eca44985214b790eb6ca8241ad86728b4fd3ae:RFQ-JENSA_Setpoints.snp:REA_BTS19:BEAM:A_BOOK,0,1531323944.085330133,NO_ALARM,NONE,double,"2.0",,"---",,true

6e585c905444f25e18edfe1eeb32ced2de72ed7c:RFQ-JENSA_Setpoints.snp:REA_BTS19:BEAM:A_BOOK,0,1531323944.085330133,NO_ALARM,NONE,double,"2.0",,"---",,true

bc202d5f21f9829fa3701ca636657ee1b0a73e25:RFQ-JENSA_Setpoints.snp:REA_BTS19:BEAM:A_BOOK,0,1531323944.085330133,NO_ALARM,NONE,double,"2.0",,"---",,true

etc...

但是,我想看到类似的东西:

However, I'd like to see something like:

<hash>:<Retrieved A_BOOK Value>

或者,根据我刚刚显示的输出,我希望看到这样的内容:

Or, based on the output I just showed, I'd hope to see something like this:

16eca44985214b790eb6ca8241ad86728b4fd3ae:2.0

6e585c905444f25e18edfe1eeb32ced2de72ed7c:2.0

bc202d5f21f9829fa3701ca636657ee1b0a73e25:2.0

etc...

并最终包括一些数学运算以显示更有意义的东西:

And eventually include some math to show something more meaningful:

<hash>:<Retrieved Q_BOOK Value>/<Retrieved A_BOOK Value>

是否有更好的方法来解决此问题? 检索此信息的好方法是什么?

Is there a better way to go about this? What's a good way to retrieve this information?

谢谢!

推荐答案

鉴于您对每个修订版中的特定文件都感兴趣,请考虑将-- <pathspec>添加到git grep调用中.也就是说,代替:

Given that you're interested in a particular file within each revision, consider adding -- <pathspec> to the git grep invocation. That is, instead of:

git grep 'BTS19:BEAM:A_BOOK' $(git rev-list --all) | grep RFQ-JENSA_Setpoint.snp

您可以从以下内容开始:

you could start with:

git grep 'BTS19:BEAM:A_BOOK' $(git rev-list --all) -- RFQ-JENSA_Setpoint.snp

您仍然会得到这些行,但是速度更快,因为git grep可以跳过名称中没有RFQ-JENSA_Setpoint.snp的所有文件. (请注意,<pathspec>与正则表达式不同:如果您确实希望允许任何字符(例如RFQ-JENSA_SetpointXsnpRFQ-JENSA_SetpointYsnp)作为文件名,则必须在此处使用-- 'RFQ-JENSA_Setpoint?snp'.我我猜您的第二个grep过于宽容.RE总体上比path glob具有更高的表达力,但是对于这种特殊情况,即使您确实表示任何字符",glob也会允许?.

You will still get the lines, but faster, since git grep can skip all the files that don't have RFQ-JENSA_Setpoint.snp in their names. (Note that a <pathspec> is not the same as a regular expression: if you really wanted to allow any character, e.g., RFQ-JENSA_SetpointXsnp and RFQ-JENSA_SetpointYsnp as file names, you'd have to use -- 'RFQ-JENSA_Setpoint?snp' here. I'm guessing your second grep was overly permissive. REs are more expressive in general than path globs, but for this particular case, even if you really did mean "any character", glob has ? to allow that.)

使事情变得复杂的是,您可能会发现在大型存储库中,$(git rev-list --all)会产生足够的字符串以溢出argv限制. (我猜不到系统上的argv限制是什么.)在这种情况下,您可能需要通过xargsgit rev-list --all输送管道:

Complicating matters, you may find that in a large repository, $(git rev-list --all) produces enough strings to overflow argv limits. (What the argv limits are on your system is not something I can guess.) In that case, you may need to pipe git rev-list --all through xargs:

git rev-list --all | xargs -I % git grep 'BTS19:BEAM:A_BOOK' % -- RFQ-JENSA_Setpoint.snp

令人讨厌的是,这会为每个修订生成一个单独的git grep,这会使您立即放慢速度. (如果您使用的是BSD风格的xargs,则可以使用-J代替-I;或考虑使用GNU

Annoyingly, this spawns one separate git grep for each revision, which will slow you right back down. (If you have a BSD-style xargs you can use -J instead of -I; or consider the GNU parallel command.)

要分解这些内容并提取第7个逗号分隔的值,请考虑将:替换为,并使用awk:

To break these up and extract the 7th comma-separated value, consider replacing the : with , and using awk:

... | sed 's/:/,/' | awk -F, '{print $1 ":" $8}'

尽管如果您需要适当的CSV报价处理,则可能更适合使用单独的工具. (以您的示例为例,它也将用引号打印<hash>:"2.0".)

although if you need proper CSV quote handling, a separate tool is probably more appropriate. (Given your example this would print <hash>:"2.0", too, with the quotes.)

这篇关于搜索和处理git对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆