意外提交的敏感信息 - GitLab [英] Accidentally committed sensitive information - GitLab

查看:18
本文介绍了意外提交的敏感信息 - GitLab的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不小心提交了包含敏感数据的文件.我需要通过删除敏感数据来更新该文件,并确保旧版本不会出现在历史记录中.

I accidentally committed a file with sensitive data. I need to update that file by removing the sensitive data and ensure the older version doesn't show up in the history.

我知道那些在本地克隆了 repo 的人仍然可以访问它.但是,一旦他们提取了最新的,是否可以将其设置为他们不会看到向前移动的敏感数据或无法在日志中看到它?

I understand that those who have the repo cloned locally will still have access to it. But once they pull the latest, can it be setup in a way that they will not see the sensitive data moving forward or will not be able to see it in the logs?

推荐答案

虽然 GitLab 通常不像 GitHub 那样公开,但关于数据的一般规则在这里适用:如果您将敏感/秘密数据提供给无法信任的人,你的秘密已经泄露,你应该停止依赖它.

While GitLab is not generally as public as GitHub, the general rules about data apply here: if you've given sensitive / secret data to someone who cannot be trusted, your secret is already out and you should stop depending on it.

这意味着关键问题不是——或者至少现在还不是——我如何说服 GitLab 忘记我的秘密",而是我是否完全、完全信任 GitLab 服务器和其他拥有一直都可以访问这些服务器吗?"如果答案是不",那么无论如何你都必须停止依赖这个秘密.

That means the key question is not—or at least, not yet—"how do I convince GitLab to forget my secrets" but rather "do I completely, totally trust both the GitLab server(s) and everyone else that has had access to those server(s) all this time?" If the answer is "no" you must stop depending on this secret anyway.

也就是说,这里是关于 Git 本身 如何存储数据的规则.假设您的 GitLab 服务器是/正在使用 Git(而不是在它们之上构建的一些其他东西可能会添加更多方式来访问数据,从而为您的敏感/秘密数据提供更多方式泄漏),您所要做的就是说服 GitLab 服务器做与您在自己的 Git 中所做的相同的事情.

That said, here are rules about how Git itself stores the data. Assuming your GitLab server(s) is/are using only Git (and not some additional things built atop them that may add yet more ways to access the data that provide even more ways for your sensitive / secret data to leak), all you have to do is convince the GitLab servers to do the same thing you would do in your own Git.

Git 的底层存储模型是存储库是 Git 所称的对象的集合.每个对象都有一个唯一的哈希 ID,并且是以下四种类型之一:blobtreecommitannotated tag.blob 大致是文件数据.如果敏感/机密数据在文件中,它们实际上是在 blob 对象中.tree 配对——嗯,比 pair 更多,但我们现在使用它1——每个文件的 name使用它的 blob 哈希 ID,因此如果文件的 name 是敏感/秘密数据,那么您的秘密实际上是在树对象中.commit 对象包含您的姓名、电子邮件地址、时间戳、日志消息和一些先前或 提交的哈希 ID,以及保存构成快照的文件的树该提交.带注释的标记 对象与提交非常相似,只是它通常具有提交的哈希 ID,而不是树对象.这是人们通常存储 PGP 签名的地方,该签名将某些特定提交标记为祝福",例如,称为 2.3.4 版或其他版本.

Git's underlying storage model is that a repository is a collection of what Git calls objects. Each object has a unique hash ID, and is one of four types: blob, tree, commit and annotated tag. A blob is, roughly, file data. If the sensitive / secret data are inside a file, they are actually inside a blob object. A tree pairs up—well, more than pair, but let's use that for now1—each file's name with its blob hash ID, so if the file's name is the sensitive / secret data, your secret is actually inside a tree object. A commit object contains your name, email address, time stamp, log message, and the hash ID of some previous or parent commit(s), along with the hash ID of the tree that holds the files that make up the snapshot that is that commit. An annotated tag object holds much the same as a commit except that instead of a tree object, it usually has the hash ID of a commit; this is where one usually stores a PGP signature marking some particular commit as "blessed" and, say, called version 2.3.4 or whatever.

假设您的机密文件位于一个名称本身不是机密文件的特定文件中,此时您的目标是让 Git 停止使用保存该特定文件数据的 blob.为此,您必须使对象本身变为 unreferenced,然后使用 git gc 使 Git 物理删除未引用的对象.在这一点上,reachability 的长篇大论通常是有用的,但我会将其外包给 像 (a) Git 一样思考.让我们在这里说一下,一般来说,在你不小心提交了一些秘密文件之后,Git 找到 commit 对象的方式是使用分支名称:

Assuming your secrets are in one particular file, whose name itself is not secret, your goal at this point is to cause your Git to stop using the blob that holds that particular file's data. To do so, you must cause the object itself to become unreferenced, and then use git gc to make Git physically remove the unreferenced object. At this point, a long aside into reachability in general is useful, but I'll outsource it to Think Like (a) Git. Let's just say here that in general, right after you've accidentally committed some secret file, the way that Git finds the commit object is using a branch name:

... <-F <-G <-H   <--master

name master 包含提交Hhash ID.commit H 包含其父commit 的hash ID,commit G,所以Git 要找到commit G,它首先读取名称master(产生哈希 ID H)然后从数据库中读取提交对象(产生一个 tree 对象和一个 parent 提交哈希,G,连同日志消息和您的姓名和电子邮件地址等),抛出除了 G 的哈希之外的所有内容,然后读取来自数据库的实际提交对象 G.如果您要求 Git 获取某个特定文件——或者更准确地说,该文件的内容——from 提交 G,然后它会使用 G 的树来查找包含该文件的 blob 的哈希 ID,然后从数据库中获取 blob 对象,现在 Git 有了内容.

The name master contains the hash ID of commit H. Commit H contains the hash ID of its parent commit, commit G, so for Git to find commit G, it starts by reading the name master (which produces hash ID H) and then reading the commit object from the database (which produces one tree object and one parent commit hash, G, along with the log message and your name and email address and so on), throws out all but the hash of G, and then reads the actual commit object G from the database. If you have asked Git to get some particular file—or more precisely, that file's content—from commit G, it then uses G's tree to find the hash ID of the blob containing that file, then gets the blob object from the database, and now Git has the content.

因此,假设您的秘密数据位于附加到提交 H 的树上的 blob 中,而这些相同的数据不在 任何 其他文件中——所以没有附加到任何 other 提交的树将具有该 blob 的哈希 ID.然后,要使 H 本身不被引用,只需将名称 master 指向 G 而不是 H:

So, suppose your secret data are in a blob attached to a tree attached to commit H, and those same data are not in any other file—so that no tree attached to any other commit will have the hash ID of that blob. Then, to make H itself unreferenced, just make the name master point to G instead of H:

git checkout master
git reset --hard HEAD~1

现在你有:

...--E--F--G   <-- master
            
             H   [abandoned]

但是,虽然 H 没有 明显 名称保存其哈希 ID,但我们还没有完成:git gc 不会——在至少还没有——删除H,事情开始变得复杂了.

But while H has no obvious name holding its hash ID, we're not yet done: git gc won't—at least not yet—remove H, and here's where things start to get complicated.

如果 H 中有有价值的文件,我们可以将 H 推到一边,使用 git commit --amend 进行新的提交I 的父级是 G 而不是 H,并且 master 指向 I:

If there are valuable files in H, we can push H aside, using git commit --amend, to make a new commit I whose parent is G instead of H, and have master point to I:

... edit files, git add, git commit --amend ...

给予:

             H   [abandoned]
            /
...--E--F--G--I   <-- master

<小时>

1从技术上讲,每个树条目都有:


1Technically, each tree entry has:

  • 条目的 mode,一个文本字符串,如 100755100644.如果条目用于子树,则字符串为 40000.
  • 保存文件名的字节串,通常采用 UTF-8 编码
  • 条目附带的哈希 ID
  • the entry's mode, a text string like 100755 or 100644. The string is 40000 if the entry is for a sub-tree.
  • a string of bytes holding the file's name, generally in UTF-8 encoding
  • the hash ID that goes with the entry

(模式和名称以空格分隔,名称以 ASCII NUL 结尾,而哈希 ID 以 20 个二进制字节编码.当 Git 切换到 SHA-256 时,这将不得不改变.我认为新格式尚未确定,但它可能很简单,例如,使用 0n 的模式,其中 n 是版本号,由于模式是八进制的,前导零被抑制,所以没有现有的树将 01 作为模式.或者,它可能是一个 NUL 字节后跟一个版本号,因为它目前也是一个无效的树条目.)因此,对于子目录,树只列出子树,而对于常规文件,有两个值加上一个哈希.对于符号链接,哈希 ID 仍然是 blob 的 ID,但 blob 的 content 是符号链接的 target;对于子模块的 gitlinks,哈希 ID 是 commit Git 应该在子模块中 git checkout 的哈希 ID.

(The mode and name are separated by a space, and the name is terminated by an ASCII NUL, while the hash ID is encoded in 20 binary bytes. This is going to have to change when Git switches to SHA-256. I don't think the new format is as-yet decided, but it could be as simple as, say, using a mode of 0n where n is a version number, as the mode is in octal with leading zeros suppressed, so no existing tree will have 01 as a mode. Or, perhaps it might be a NUL byte followed by a version number, since that too is currently an invalid tree entry.) Hence for sub-directories, the tree just lists sub-trees, and for regular files there are two values plus a hash. For symlinks, the hash ID is still that of a blob, but the blob's content is the target of the symbolic link; and for gitlinks for submodules, the hash ID is that of the commit Git should git checkout in the submodule.

Git 中 确实 为你记住 H 的部分,即使在你 git reset 之后,它也被 Git 称为 引用日志.reflog 会记住引用的 previous 值.也就是说,在我们 git reset 之前,分支名称 master 可能指向 H right now.然后它指向 GI 现在,在我们使用 git reset --hard 之后git commit --amend 放弃提交 H.但它过去指向H,所以H的hash ID在reflog中为master这个名字.

The part of Git that does remember H for you, even after you git reset it away, is what Git calls reflogs. A reflog remembers the previous values of a reference. That is, the branch name master might point to H right now, before we git reset it. Then it points to G or I right now, after we use git reset --hard or git commit --amend to discard commit H. But it used to point to H, so H's hash ID is in the reflog for the name master.

@{1}@{yesterday} 语法是告诉 Git 查找这些 reflog 值的方式.编写 master@{1} 告诉你的 Git:查看我的 master reflog,并获取 master 的上一个值. 这个条目存在的事实将使你的 Git 保留提交 H 这将使你的 Git 保留包含秘密的 blob.

The @{1} or @{yesterday} syntax is how you tell Git to look up these reflog values. Writing master@{1} tells your Git: look in my master reflog, and get me the immediately-previous value of master. The fact that this entry exists will make your Git retain commit H which will make your Git retain the blob containing the secret.

实际上至少有 两个 reflogs 包含提交 H 的哈希 ID:一个用于 master,在 master@{1},一个用于 HEAD 本身.因此,如果您要说服您的 Git 真正丢弃提交 H,从而丢弃 H 的树,从而丢弃 H 的树所特有的所有 blob,你必须让这些 reflog 条目消失.

There are in fact at least two reflogs containing the hash ID of commit H: one for master, in master@{1}, and one for HEAD itself. So if you are to convince your Git to really discard commit H, and hence discard the tree for H, and hence discard any blobs unique to the tree for H, you must make these reflog entries go away.

通常,它们会自行消失,通常在大约 30 天后.发生这种情况是因为每个 reflog 条目也有一个时间戳,并且 git reflog expire 将根据这个时间戳过期并删除旧的 reflog 条目,而不是您计算机上的当前时间.主 git gc 命令为您运行 git reflog expire,并将其设置为默认在 30 天内过期无法访问的提交2.(可达到的提交默认为 90 天.)所以在 你自己的 Git 上,你需要运行:

Normally, they go away on their own, generally after about 30 days. This happens because each reflog entry has a time-stamp as well, and git reflog expire will expire—and remove—old reflog entries based on this time-stamp, vs the current time on your computer. The master git gc command runs git reflog expire for you, and sets it up to expire unreachable commits2 in 30 days by default. (Reachable commits get 90 days by default.) So on your own Git, you would need to run:

git reflog expire --expire-unreachable=now --all

告诉你的 Git:找到所有无法访问的提交,如 H 并立即使它们的 reflog 条目过期.

to tell your Git: Find all unreachable commits like H and expire their reflog entries now.

2从技术上讲,它从引用的当前值无法访问.也就是说,Git 不会在这里测试全局可达性,而是做一个更简单的测试:这个 reflog 入口是否指向一个提交,该提交是引用本身现在指向的提交的祖先?

2Technically, it's unreachable from the current value of the reference. That is, Git is not going to test a global reachability here, but rather do a somewhat simpler test: does this reflog entry point to a commit that is an ancestor of the commit to which the reference itself points right now?

即使在 HEAD 和分支名称中的 reflog 条目都过期后,您会发现自己的 git gc 不会立即丢弃 blob 对象.原因是 所有 Git 对象都有一个宽限期,在此期间 git gc 不会将它们修剪掉.默认宽限期为 14 天.这为 所有 Git 命令提供了一些时间,在此期间它们可以创建对象而不用担心它们,只要它们在 14 天内通过链接所有这些对象完成所有工作进入提交或标记对象或其他任何内容,并制作适当的引用名称(例如分支或标记名称)记录该对象的哈希 ID.

Even after expiring the reflog entries from both HEAD and the branch name, you'll find that your own git gc does not immediately discard the blob object. The reason is that all Git objects have a grace period during which git gc won't prune them away. The default grace period is 14 days. This gives all Git commands some time during which they can create objects without worrying about them, as long as they finish all their work within that 14 day period by linking all those objects up into a commit or tag object or whatever, and making an appropriate reference name (such as a branch or tag name) record the hash ID of that object.

要让你不小心用 H 提交的 blob 消失,你不仅需要使无法访问的 reflog 条目过期,而且还要告诉 Git 修剪对象,即使它们是 天:

To make the blob you accidentally committed with H go away, then, you not only need to expire the unreachable reflog entries, but also tell Git to prune objects even if they're zero days old:

git prune --expire=now

这个修剪步骤是 git gc 中实际删除对象的部分,因此通过运行 git prune,您无需运行 git gc.(git gc 也运行 reflog expire 等,但协调一切以确保 Git 具有这些宽限期.由于我们绕过所有宽限期,我们只是绕过 git gc 也是如此.)

This prune step is the part of git gc that actually removes the object, so by running git prune, you remove the need to run git gc. (git gc also runs the reflog expire and so on, but coordinates everything to make sure Git has these grace periods. Since we're bypassing all the grace periods, we just bypass git gc as well.)

确保执行此操作时没有其他 Git 命令正在运行,因为它们可能正在创建他们希望在完成工作时持续 14 天的对象.

Make sure no other Git commands are running when you do this, since they may be creating objects that they expect to persist for 14 days while they get their work done.

如果您的秘密存储在 Git 所谓的 loose 对象中,上述步骤就足够了:该对象将完全消失,并且:

If your secret is stored in what Git calls a loose object, the above steps suffice: the object will be completely gone, and:

git rev-parse <hash-ID>

将不再找到该对象.此 Git 存储库中的任何地方都不再提供它.

will no longer find the object at all. It's no longer available anywhere in this Git repository.

但并非所有物体都是松散的.最后,为了节省空间,Git 将这些松散的对象打包打包文件中.存储在包文件中的对象会针对同一包文件中的其他对象进行压缩.3在这种情况下,如果您的秘密数据已被打包,则可以从包文件中检索它们.

But not all objects are loose. Eventually, to save space, Git packs these loose objects into pack files. Objects stored inside pack files get compressed against other objects in the same pack file.3 In this case, if your secret data have become packed, it's possible to retrieve them from the pack file.

这通常不会很快发生,因此很少有刚刚提交的秘密最终出现在包文件中.但如果它已经发生了,清理它的唯一方法是让 Git 重新打包所有现有的包文件.也就是说,你会让 Git 将包分解成它们组成的松散对象,然后扔掉不需要的对象,然后构建一个新的(通常是单个的)包文件——或者至少使用具有这种效果的过程.重建包的 Git 命令是 git repack,它有很多选项.时间不多了,这里不再赘述.

This usually doesn't happen quickly so it's rare to have a just-committed secret wind up in a pack file. But if it has happened, the only way to clean it up is to make Git re-pack all the existing pack files. That is, you would have Git explode the packs into their constituent loose objects, then toss the unwanted object, then build a new (usually single) pack file—or use a process that has that effect, at least. The Git command to rebuild the packs is git repack and it has a lot of options. I'm not going to go into any more detail here as I'm out of time.

3thin packs 中,对象可以针对存储库中的其他对象进行压缩仅用于 fetch 和 push 操作,之后通过添加缺失的碱基来增肥".

3In thin packs, objects may be compressed against other objects in the repository that are not in the pack file, but thin packs are used only for fetch and push operations, after which they're "fattened up" by adding the missing bases back.

要处理所有这些,您需要能够登录到 GitLab 服务器,因为这些维护 Git 命令(也不是 BFG,见下文)都不能通过 fetch 或 push 调用.特别是,虽然您可以从客户端使用 git push -f 使服务器上的名称 master 不再指向提交 H,你不能调用 git prune 使松散的对象消失.

To deal with all of this, you need to be able to log into your GitLab server(s), as none of these maintenance Git commands (nor the BFG, see below) can be invoked via fetch or push. In particular, while you can use git push -f from your client to make the name master on the server no longer point to commit H, you cannot invoke git prune to make a loose object go away.

如果您 登录到服务器,您可以检查那里是否为您的存储库启用了引用日志.如果没有,则无需进行任何 reflog 到期.您还可以通过查看 .git/objects 目录来查看您的对象是松散的还是打包的.如果您的 blob 哈希 ID 是,例如,0123456789...,它将存在于一个名为 .git/objects/01/23456789... 的文件中.一旦它被取消引用和修剪,文件就会消失,你就完成了.

If and when you do log into the server, you can check whether reflogs are enabled for your repository there. If not, there's no need to do any reflog expiry. You can also see whether your object is loose or packed by looking into the .git/objects directory. If your blob hash ID is, say, 0123456789... it will live in a file named .git/objects/01/23456789.... Once it's unreferenced and pruned, the file will be gone and you will be done.

您可以使用BFG repo 清理器来避免很多麻烦.BFG 无论如何都不尊重任何宽限期,因为它有不同的目的.这也可以解决任何包文件问题.与其他方法一样,这必须在服务器上运行,并且它有自己的怪癖(请参阅链接的问题和答案).

You can avoid a lot of complications by using the BFG repo cleaner. BFG does not respect any of the grace periods anyway, since it has a different purpose. That also takes care of any pack file issues. Like the other method, this must be run on the server, and it has its own quirks (see the linked question and answers).

这篇关于意外提交的敏感信息 - GitLab的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆