是否可以仅使用 rsync 和 POSIX shell 脚本通过中央服务器同步多个客户端? [英] Is it possible to sync multiple clients over a central server using just rsync and POSIX shell scripting?

查看:38
本文介绍了是否可以仅使用 rsync 和 POSIX shell 脚本通过中央服务器同步多个客户端?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

场景

我有一个文件服务器作为要同步的文件的主存储,我有几个客户端具有主存储的本地副本.每个客户端都可以更改主存储中的文件、添加新文件或删除现有文件.我希望他们通过定期执行同步操作来尽可能保持同步,但我在任何地方都可以使用的唯一工具是 rsync 并且我只能在客户端上运行脚本代码,不在服务器上.

问题

rsync 不执行双向同步,所以我必须从服务器同步到客户端以及从客户端到服务器.这对于刚刚通过运行两个 rsync 操作更改的文件来说没问题,但在添加或删除文件时失败.如果我不使用带有删除选项的 rsync,客户端将永远无法删除文件,因为从服务器到客户端的同步会恢复它们.如果我使用删除选项,那么首先运行从服务器到客户端的同步并删除客户端添加的所有新文件,或者首先运行从客户端到服务器的同步并删除其他客户端添加到服务器的所有新文件.

问题

显然 rsync 不能单独处理这种情况,因为它只能使一个位置与另一个位置同步.我当然需要编写一些代码,但我只能依靠 POSIX shell 脚本,这似乎使我无法实现目标.那么它甚至可以用 rsync 来完成吗?

解决方案

此方案需要三个同步操作,并了解自上次同步以来本地客户端添加/删除了哪些文件.这种意识是必不可少的,它会建立一个状态,而 rsync 没有,因为 rsync 是无状态的;当它运行时,它对以前或将来的操作一无所知.是的,它可以通过一些简单的 POSIX 脚本来完成.

我们假设设置了三个变量:

  1. metaDir 是客户端可以持久存储与同步操作相关的文件的目录;内容本身未同步.

  2. localDir 是要同步的文件的本地副本.

  3. remoteStorage 是任何有效的 rsync 源/目标(可以是挂载的目录或 rsync 协议端点,带或不带 SSH 隧道).

每次成功同步后,我们在元目录中创建一个文件,列出本地目录中的所有文件,我们需要它来跟踪在两次同步之间添加或删除的文件.如果不存在这样的文件,我们从未运行过成功的同步.在这种情况下,我们只需同步远程存储中的所有文件,构建这样一个文件,我们就完成了:

filesAfterLastSync=$metaDir/files_after_last_sync.txt"如果 [ !-f "$metaDir/files_after_last_sync.txt";];然后rsync -a "$remoteStorage/";$localDir"( cd "$localDir" && find .) |sed "s/^\.//";|排序 >$filesAfterLastSync"退出 0菲

为什么 ( cd "$localDir" && find .) |sed "s/^\.//"?文件需要植根于 $localDir 以便稍后 rsync.如果文件 $localDir/test.txt 存在,则生成的输出文件行必须是 /test.txt 而不是别的.如果没有 cdfind 命令的绝对路径,它将包含 /..abspath../test.txt 而没有 sed 它将包含 ./test.txt.为什么显式的 sort 调用?往下看.

如果这不是我们的初始同步,我们应该创建一个临时目录,在脚本终止时自动删除自己,无论哪种方式:

tmpDir=$( mktemp -d )陷阱 'rm -rf "$tmpDir"'出口

然后我们创建当前在本地目录中的所有文件的文件列表:

filesForThisSync=$tmpDir/files_for_this_sync.txt"( cd "$localDir" && find .) |sed "s/^\.//";|排序 >$filesForThisSync"

现在为什么会出现 sort 调用?原因是我需要将文件列表排序在下面.好的,那么为什么不告诉 find 对列表进行排序呢?那是因为 find 不保证排序与 sort 所做的相同(在手册页中明确记录),我需要完全按照 sort 产生.

现在我们需要创建两个特殊文件列表,一个包含自上次同步以来添加的所有文件,另一个包含自上次同步以来删除的所有文件.仅使用 POSIX 这样做有点棘手,但存在各种可能性.这是其中之一:

newFiles=$tmpDir/files_ added_since_last_sync.txt"加入 -t "";-v 2 "$filesAfterLastSync";$filesForThisSync">$newFiles"DeletedFiles="$tmpDir/files_removed_since_last_sync.txt";加入 -t "";-v 1 "$filesAfterLastSync";$filesForThisSync">$deletedFiles"

通过将分隔符设置为空字符串,join 比较整行.通常输出将包含两个文件中存在的所有行,但我们指示 join 仅输出一个文件的行,这些行无法与另一个文件的行匹配.仅存在于第二个文件中的行必须来自已添加的文件,仅存在于第一个文件中的行必须来自已删除的文件.这就是为什么我在上面使用 sort 的原因,因为 join 只有在按 sort 对行进行排序时才能正常工作.

最后我们执行三个同步操作.首先,我们将所有新文件同步到远程存储,以确保在开始执行删除操作时这些文件不会丢失:

rsync -aum --files-from=$newFiles"$localDir/"$remoteStorage"

什么是-aum?-a 表示存档,这意味着同步递归,保留符号链接,保留文件权限,保留所有时间戳,尝试保留所有权和组以及其他一些(它是 -rlptgoD 的快捷方式).-u 表示更新,这意味着如果目标文件已存在,则仅在源文件具有较新的最后修改日期时才同步.-m 表示修剪空目录(如果不需要,您可以将其省略).

接下来,我们从远程存储同步到本地并删除,以获取其他客户端执行的所有更改和文件删除,但我们排除已在本地删除的文件,否则这些文件会恢复我们不想要的内容:

rsync -aum --delete --exclude-from="$deletedFiles";$remoteStorage/"$localDir"

最后,我们通过删除同步从本地存储到远程存储,以更新本地更改的文件并删除本地删除的文件.

rsync -aum --delete "$localDir/";$remoteStorage"

有些人可能认为这太复杂了,只需两个同步就可以完成.首先将远程同步到本地并删除并排除所有在本地添加或删除的文件(这样我们也只需要生成一个特殊文件,这更容易生成).然后通过删除将本地同步到远程并排除任何内容.然而这种方法是错误的.它需要第三次同步才能正确.

考虑这种情况:客户端 A 创建了 FileX 但尚未同步.客户端 B 稍后也会创建 FileX 并立即同步.现在,当客户端 A 执行上述两个同步时,远程存储上的 FileX 较新,应该替换客户端 A 上的 FileX,但这不会发生.第一次同步明确排除 FileX;它被添加到客户端 A,因此必须排除在第一次同步时不会被删除(客户端 A 不知道 FileX 也被客户端 B 添加并上传到远程).而第二个只会上传到远程并排除 FileX,因为远程一个更新.同步后,客户端 A 有一个过时的 FileX,尽管远程存在更新的 FileX.

为了解决这个问题,需要从远程到本地的第三次同步,没有任何排除.所以你最终也会得到三个同步操作,与我上面介绍的三个同步操作相比,我认为上面的那些总是同样快,有时甚至更快,所以我更喜欢上面的那些,但是,选择是你的.此外,如果您不需要支持该边缘情况,则可以跳过最后一次同步操作.问题将在下次同步时自动解决.

在脚本退出之前,不要忘记为下一次同步更新我们的文件列表:

 ( cd "$localDir" && find . ) |sed "s/^\.//";|排序 >$filesAfterLastSync"

最后,--delete 意味着 --delete-before--delete-during,具体取决于您的 版本rsync.您可能更喜欢其他或明确指定的删除操作.

The scenario

I have a file server that acts as a master storage for the files to sync and I have several clients that have a local copy of the master storage. Each client may alter files from the master storage, add new ones or delete existing ones. I would like all of them to stay in sync as good as possible by regularly performing a sync operation, yet the only tool I have available everywhere for that is rsync and I can only run script code on the clients, not on the server.

The problem

rsync doesn't perform a bi-directional sync, so I have to sync from server to client as well as from client to server. This works okay for files that just changed by running two rsync operations but it fails when files have been added or deleted. If I don't use rsync with a delete option, clients cannot ever delete files as the sync from the server to the client restores them. If I use a delete option, then either the sync from server to client runs first and deletes all new files the client has added or the sync from client to server runs first and deletes all new files other clients have added to the server.

The question

Apparently rsync alone cannot handle that situation, since it is only supposted to bring one location in sync with another location. I surely neeed to write some code but I can only rely on POSIX shell scripting, which seems to make achieving my goals impossible. So can it even be done with rsync?

解决方案

What is required for this scenario are three sync operations and awareness of which files the local client has added/deleted since the last sync. This awareness is essential and establishes a state, which rsync doesn't have, as rsync is stateless; when it runs it knows nothing about previous or future operations. And yes, it can be done with some simple POSIX scripting.

We will assume three variables are set:

  1. metaDir is a directory where the client can persistently store files related to the sync operations; the content itself is not synced.

  2. localDir is the local copy of the files to be synced.

  3. remoteStorage is any valid rsync source/target (can be a mounted directory or an rsync protocol endpoint, with or w/o SSH tunneling).

After every successful sync, we create a file in the meta dir that lists all files in local dir, we need this to track files getting added or deleted in between two syncs. In case no such file exists, we have never ran a successful sync. In that case we just sync all files from remote storage, build such a file, and we are done:

filesAfterLastSync="$metaDir/files_after_last_sync.txt"

if [ ! -f "$metaDir/files_after_last_sync.txt" ]; then
    rsync -a "$remoteStorage/" "$localDir"
    ( cd "$localDir" && find . ) | sed "s/^\.//" | sort > "$filesAfterLastSync"
    exit 0
fi

Why ( cd "$localDir" && find . ) | sed "s/^\.//"? Files need to be rooted at $localDir for rsync later on. If a file $localDir/test.txt exists, the generated output file line must be /test.txt and nothing else. Without the cd and an absolute path for the find command, it would contain /..abspath../test.txt and without the sed it would contain ./test.txt. Why the explicit sort call? See further downwards.

If that isn't our initial sync, we should create a temporary directory that auto-deletes itself when the script terminates, no matter which way:

tmpDir=$( mktemp -d )
trap 'rm -rf "$tmpDir"' EXIT

Then we create a file list of all files currently in local dir:

filesForThisSync="$tmpDir/files_for_this_sync.txt"
( cd "$localDir" && find . ) | sed "s/^\.//" | sort  > "$filesForThisSync"

Now why is there that sort call? The reason is that I need the file list to be sorted below. Okay, but then why not telling find to sort the list? That's because find does not guarantee to sort the same was as sort does (that is explicitly documented on the man page) and I need exactly the order that sort produces.

Now we need to create two special file lists, one containing all files that were added since last sync and one that contains all files that were deleted since last sync. Doing so is a bit tricky with just POSIX but various possibility exists. Here's one of them:

newFiles="$tmpDir/files_added_since_last_sync.txt"
join -t "" -v 2 "$filesAfterLastSync" "$filesForThisSync" > "$newFiles"

deletedFiles="$tmpDir/files_removed_since_last_sync.txt"
join -t "" -v 1 "$filesAfterLastSync" "$filesForThisSync" > "$deletedFiles"

By setting the delimiter to an empty string, join compares whole lines. Usually the output would contain all lines that exists in both files but we instruct join to only output lines of one of the files that cannot be matched with the lines of the other file. Lines that only exist in the second file must be from files have been added and lines that only exist in the first file file must be from files that have been deleted. And that's why I use sort above as join can only work correctly if the lines were sorted by sort.

Finally we perform three sync operations. First we sync all new files to the remote storage to ensure these are not getting lost when we start working with delete operations:

rsync -aum --files-from="$newFiles" "$localDir/" "$remoteStorage"

What is -aum? -a means archive, which means sync recursive, keep symbolic links, keep file permissions, keep all timestamps, try to keep ownership and group and some other (it's a shortcut for -rlptgoD). -u means update, which means if a file already exists at the destination, only sync if the source file has a newer last modification date. -m means prune empty directories (you can leave it out, if that isn't desired).

Next we sync from remote storage to local with deletion, to get all changes and file deletions performed by other clients, yet we exclude the files that have been deleted locally, as otherwise those would get restored what we don't want:

rsync -aum --delete --exclude-from="$deletedFiles" "$remoteStorage/" "$localDir"

And finally we sync from local to remote storage with deletion, to update files that were changed locally and delete files that were deleted locally.

rsync -aum --delete "$localDir/" "$remoteStorage" 

Some people might think that this is too complicated and it can be done with just two syncs. First sync remote to local with deletion and exclude all files that were either added or deleted locally (that way we also only need to produce a single special file, which is even easier to produce). Then sync local to remote with deletion and exclude nothing. Yet this approach is faulty. It requires a third sync to be correct.

Consider this case: Client A created FileX but hasn't synced yet. Client B also creates FileX a bit later and syncs at once. When now client A performs the two syncs above, FileX on remote storage is newer and should replace FileX on client A but that won't happen. The first sync explicitly excludes FileX; it was added to client A and thus must be excluded to not be deleted by the first sync (client A cannot know that FileX was also added and uploaded to remote by client B). And the second one would only upload to remote and exclude FileX as the remote one is newer. After the sync, client A has an outdated FileX, despite the fact, that an updated one existed on remote.

To fix that, a third sync from remote to local without any exclusion is required. So you would also end up with a three sync operations and compared to the three ones I presented above, I think the ones above are always equally fast and sometimes even faster, so I would prefer the ones above, however, the choice is yours. Also if you don't need to support that edge case, you can skip the last sync operation. The problem will then resolve automatically on next sync.

Before the script quits, don't forget to update our file list for the next sync:

 ( cd "$localDir" && find . ) | sed "s/^\.//" | sort > "$filesAfterLastSync"

Finally, --delete implies --delete-before or --delete-during, depending on your version of rsync. You may prefer another or explicit specified delete operation.

这篇关于是否可以仅使用 rsync 和 POSIX shell 脚本通过中央服务器同步多个客户端?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆