在 torrent kademlia 路由表上实现查找节点 [英] Implementing find node on torrent kademlia routing table

查看:22
本文介绍了在 torrent kademlia 路由表上实现查找节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经查看了许多关于此主题的文件,但有一些不太清楚.例如比特流文件 (http://www.bittorrent.org/beps/bep_0005.html) 状态

I, already, reviewed a number of documents on this topic but there is something not exactly clear. For example bit torrent document (http://www.bittorrent.org/beps/bep_0005.html) states

路由表被细分为桶",每个桶覆盖一个空间的一部分.一个空表有一个带 ID 空间的桶最小值=0,最大值=2^160 的范围.当一个 ID 为N"的节点插入到表,它被放置在具有 min <= N <的桶中.最大限度.一个空表只有一个桶,因此任何节点都必须适合其中.每个在变为满"之前,bucket 只能容纳 K 个节点,目前为 8 个.当一个bucket充满已知好的节点时,就不能再添加节点了除非我们自己的节点ID在bucket的范围内.在那里面在这种情况下,桶被两个新桶取代,每个桶都有一半旧桶的范围和旧桶的节点是分布在两个新的之间.对于只有一个的新表桶,满桶总是分成两个新的桶覆盖范围 0..2^159 和 2^159..2^160.

The routing table is subdivided into "buckets" that each cover a portion of the space. An empty table has one bucket with an ID space range of min=0, max=2^160. When a node with ID "N" is inserted into the table, it is placed within the bucket that has min <= N < max. An empty table has only one bucket so any node must fit within it. Each bucket can only hold K nodes, currently eight, before becoming "full." When a bucket is full of known good nodes, no more nodes may be added unless our own node ID falls within the range of the bucket. In that case, the bucket is replaced by two new buckets each with half the range of the old bucket and the nodes from the old bucket are distributed among the two new ones. For a new table with only one bucket, the full bucket is always split into two new buckets covering the ranges 0..2^159 and 2^159..2^160.

关于kademlia路由表的其他文档有些不同,其中bucket是根据节点id的位前缀排列的,但还有一个令人困惑的地方.当我们回复查找节点"请求时,我们必须使用 XOR 操作找到与所请求节点最近的 8 个节点.我看到一些实现只是遍历路由表中的每个项目执行 XOR 操作,从而找到 8 个最接近的项目.在我看来太浪费 CPU 了.

It is somewhat different from other documents regarding kademlia routing table where buckets are arranged in accordance to the bit prefix of the node id but there is another confusing thing. When we reply to the "find node" request we have to find 8 closest nodes to the requested one using XOR operation. I saw some implementations just going through each item in the routing table executing XOR operation and thus finding 8 closest items. It seems to me too CPU wasting.

一切都已经在桶里了.即使我们使用 bit Torrent 文档系统建议的方法,我们也可以更快地找到可能包含请求的节点 ID 的存储桶,只需简单地枚举存储桶并检查其上的最小和最大数字即可.然后可能该桶应该包含关闭节点,但它们是值最接近的节点,而不是 XOR 最接近的节点(据我所知),这有点不同但有点相似.

Everything is already in buckets. Even if we use the suggested by the bit torrent document system we can quicker find the bucket which could potentially contain the requested node id simply enumerating buckets and checking the minimum and maximum numbers on it. Then potentially that bucket should contain the closes nodes but they are value closest nodes not the XOR closest nodes (as I understand) which is somewhat different but somewhat similar.

我使用从 0 到 99 的数字进行了一个简单的测试,我想在其中找到 8 个最接近的 XOR 数字,并且它们位于所寻找数字的附近但不靠近它.现在,考虑我们的存储桶,我猜有可能存储桶中的所有节点 ID 都是最接近小异常的.因此,例如,如果我们拿这个桶,从左边拿一个,从右边拿一个,然后搜索 XOR 最近的节点 ID,我们会找到我们要找的东西,并且没有必要遍历路由中的所有节点表.

I ran a simple test using numbers from 0 to 99 where I wanted to find 8 XOR closest numbers and they where in the proximity of the sought number but not right near it. Now, thinking about our buckets I guess it possible that all node ids in the bucket are the closest for a minor exception. So, for example if we take this bucket, take one from the left and one from the right and search for the XOR closest node ids we'll find what we are looking for and there is no point to go through ALL nodes in the routing table.

我是对的还是我遗漏了什么?

Am I right or am I missing something?

推荐答案

关于kademlia路由表的其他文档有些不同,其中bucket是根据节点id的位前缀排列的,但还有一个令人困惑的地方.

It is somewhat different from other documents regarding kademlia routing table where buckets are arranged in accordance to the bit prefix of the node id but there is another confusing thing.

bittorrent 规范描述了一个路由表实现,它仅近似于 kademlia 论文.它更容易实现,但也有一些缺点.

The bittorrent specification describes a routing table implementation that only approximates the one described in the kademlia paper. It is easier to implement but has some shortcomings.

因此,例如,如果我们拿这个桶,从左边拿一个,从右边拿一个,然后搜索 XOR 最近的节点 ID,我们会找到我们要找的东西,没有必要遍历所有节点在路由表中.

So, for example if we take this bucket, take one from the left and one from the right and search for the XOR closest node ids we'll find what we are looking for and there is no point to go through ALL nodes in the routing table.

在完整的树状路由表实现和简化的 BEP5 变体中,每个存储桶都可以被认为具有一个 类似 CIDR 的前缀(参见这个 SO 答案)由前缀位组成桶盖和掩码位计数.

In both - the full, tree-like routing table implementation and the simplified BEP5-variant - each bucket can be thought of as having a CIDR-like prefix (see this SO answer) consisting of prefix bits that the bucket covers and a mask bit count.

在 BEP5-variant 中,每个桶的前缀只是从数组索引和您的节点自己的 ID 派生而来.在树状表中,由于桶的拆分/合并操作,桶必须跟踪它们的前缀.

In the BEP5-variant each bucket's prefix is simply derived from the array index and your node's own ID. In the tree-like table buckets have to track their prefix due to bucket split/merge operations.

使用这些前缀,您可以找到覆盖目标键的存储桶.

Using those prefixes you can find the bucket that covers the target key.

问题是桶不一定是满的,如果你想发送,假设一个响应中有 20 个节点,一个桶是不够的.

The thing is that buckets don't have to be full, and if you want to send, let's say 20 nodes in a response a single bucket will not suffice.

所以你必须相对于目标键以升序距离(XOR)顺序遍历路由表(要么根据你自己的节点ID排序,要么按自然距离排序)来访问多个桶.

So you have to traverse the routing table (either sorted based on your own node ID or by the natural distance) in ascending distance (XOR) order relative to the target key to visit multiple buckets.

由于 XOR 距离度量在每个位进位处折叠(XOR == 无进位加法),因此它不能很好地映射到任何路由表布局.换句话说,访问最近的桶是不行的.

Since the XOR distance metric folds at each bit-carry (XOR == carry-less addition) it does not map nicely to any routing table layout. In other words, visiting the nearest buckets won't do.

用于从树状路由表中找到距离特定目标键最近的 N 个节点.

Here's my implementation for finding the N closest nodes to a specific target key from a tree-like routing table.

我认为很多人只是简单地遍历整个路由表,因为对于常规节点,它最多只会包含几十个桶,而 DHT 节点不会看到太多流量,因此它只需执行几次此操作每秒,如果你在一个密集的、缓存友好的数据结构中实现它,那么最大的份额实际上可能是内存流量,而不是 CPU 指令进行一些 XOR 和比较.

I figure that many people simply iterate over the whole routing table because for regular nodes it will only contain a few dozen buckets at most and a DHT node does not see much traffic, so it only has to execute this operation a few times per second and if you implement this in a dense, cache-friendly datastructure then the lion's share might actually be the memory traffic and not the CPU instructions doing a few XORs and comparisons.

即全表扫描很容易实现.

I.e. a full-table-scan is just easy to implement.

假设我们有一个路由表,每个存储桶都有以下位前缀.这些字母用作方便的名称).

Let's assume we have a routing table with the following bit prefixes for each bucket. The letters serve as convenient names).

A 000... 
B 00100... 
C 0010100... 
D 0010101000... 
E 0010101001...
F 001010101... 
G 00101011... 
H 001011... 
I 0011... 
J 01... 
K 1... 

现在假设我们正在寻找这个目标键:

now let's say we're looking for this target key:

此外,存储桶未完全装满,或者我们需要比单个存储桶容纳的条目更多的条目,因此我们必须访问多个存储桶才能获得所需的数量.

Also, the buckets aren't entirely full or we need more entries than whatever fits into a single bucket, so we'll have to visit more than one bucket to obtain the desired amount.

现在,第一个访问的bucket很明显,它是B,因为它的前缀覆盖了目标键.

Now, the first bucket to visit is fairly obvious, it's B since its prefix covers the target key.

由于 B 的前缀是 5 位长,所以该桶中的任何条目与 T 的 XOR 距离为 00000???????....共享 5 个前缀位.

Since B's prefix is 5 bits long any entry in that bucket will have a XOR distance to T of 00000???????.... 5 prefix bits shared.

B 是最接近 T 的bucket,这意味着不能有任何路由表条目比相对距离 00000... 更近>.相反,这意味着 B 之外的任何条目可以具有的最小距离是 00001....这意味着下一个最近的存储桶必须覆盖 T xor 00001... ->00101110101111110[...].

B is the closest bucket to T, which means that there cannot be any routing table entries closer than the relative distance 00000.... Conversely that means that the minimal distance that any entry outside of B can have is 00001.... Which means the next-closest bucket must cover T xor 00001... -> 00101110101111110[...].

覆盖这个的bucket是H.

The bucket that covers this is H.

HB

最终 - 假设我们必须访问 6 个存储桶 - 顺序将如下所示:

Ultimately - assuming we'll have to visit 6 buckets - the order will look like this:

00100...      -> B
001011...     -> H
001010101...  -> F
0010101000... -> D
0010101001... -> E
00101011...   -> G

这看起来相当混乱.但是如果我们为每个访问过的桶绘制前缀到目标键的距离,它会变得更加明显:

This seems fairly chaotic. But if we plot the prefix-to-target-key distances for each bucket visited it becomes a little more obvious:

00000...
000010...
000011000...
0000110010...
0000110011...
00001101...

所以算法如下:

  1. 找到覆盖目标键的初始桶
  2. 将存储桶的前缀与目标键(零掩码尾随位)进行异或
  3. 通过该前缀的最低有效位增加距离
  4. 与目标键的异或增量距离
  5. 找到覆盖异或键的下一个桶
  6. 转到 2

TL;DR:只看左一桶,右一桶"是不够的.正确的算法相当复杂,对整个表进行线性扫描更容易实现.

TL;DR: "just look one bucket left, one bucket right" is not sufficient. The correct algorithm is fairly involved, a linear scan over the whole table is easier to implement.

这篇关于在 torrent kademlia 路由表上实现查找节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆