在Torrent Kademlia路由表上实现查找节点 [英] Implementing find node on torrent kademlia routing table

查看:82
本文介绍了在Torrent Kademlia路由表上实现查找节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经审查了有关此主题的许多文档,但是有些内容并不清楚。例如,比特种子文件( http://www.bittorrent.org/beps/bep_0005.html


路由表被细分为存储桶,每个存储桶覆盖空间的
部分。一个空表有一个存储桶,其ID空间
的范围为min = 0,max = 2 ^ 160。当将ID为 N的节点插入到
表中时,该节点将放置在min< = N<最高
空表只有一个存储桶,因此任何节点都必须位于其中。每个
桶只能容纳K个节点(目前为8个),然后再变满。
当存储桶中装有已知良好的节点时,除非我们自己的节点ID落入存储桶的范围内,否则不能再添加
个节点。在那种
的情况下,该存储桶由两个新存储桶替换,每个新存储桶具有旧存储桶
范围的一半,并且旧存储桶中的节点分布在两个新存储桶中。对于只有一个
存储桶的新表,整个存储桶总是分成两个新存储桶,覆盖
,范围为0..2 ^ 159和2 ^ 159..2 ^ 160。


关于kademlia路由表的其他文档有些不同,在kademlia路由表中,根据节点ID的位前缀来排列存储桶,但还有另一件令人困惑的事情。当我们回复查找节点请求时,我们必须使用XOR操作找到8个最接近所请求节点的节点。我看到一些实现只是通过路由表中的每个项目执行XOR操作,从而找到8个最接近的项目。在我看来,CPU也在浪费。



一切都已经准备就绪。即使我们使用bit torrent文档系统建议的内容,我们也可以更快地找到可能包含所请求节点ID的存储桶,只需枚举存储桶并检查其上的最小和最大数目即可。然后可能该存储桶应包含关闭节点,但它们是值最接近的节点,而不是异或最接近的XOR节点(据我了解)。



我跑了一个简单的测试,使用0到99之间的数字,我想找到8个XOR最接近的数字,它们在所寻找数字的附近,但并不在其附近。现在,考虑一下我们的存储桶,我猜可能存储桶中的所有节点ID都是最接近的,仅是次要异常。因此,例如,如果我们使用此存储桶,则从左侧获取一个,从右侧获取一个,并搜索XOR最近的节点ID,我们将找到所需的内容,并且没有必要遍历路由中的所有节点表。



我是对还是我缺少东西?

解决方案


关于kademlia路由表的其他文档有些不同,在kademlia路由表中,根据节点ID的位前缀来排列存储桶,但还有另一件令人困惑的事情。


bittorrent规范描述了一种路由表实现,该实现仅近似于 kademlia纸


因此,例如,如果我们拿这个水桶,从左边拿一个,从一个拿一个在右侧并搜索XOR最近的节点ID,我们将找到所需的内容,没有必要遍历路由表中的所有节点。


在完整的,类似树的路由表实现和简化的BEP5-variant中,每个存储桶都可以被视为具有类似CIDR的前缀(请参阅此SO答案),由



在BEP5变量中,每个存储桶的前缀都简单地从数组索引和您节点的ID派生而来。在树状表中,由于进行存储桶拆分/合并操作,存储桶必须跟踪其前缀。



使用这些前缀,您可以找到覆盖目标键的存储桶。 / p>

问题是存储桶不一定要装满,如果要发送,则假设一个响应中的20个节点不足以满足要求。

p>

因此,您必须以相对于目标的上升距离(XOR)顺序遍历路由表(根据您自己的节点ID或按自然距离排序) key 来访问多个存储桶。



由于XOR距离度量在每个进位位(XOR ==进位加法)处都会折叠,因此映射效果不佳到任何路由表布局。换句话说,访问最近的存储桶是不会的。



这是我的实现,用于从树状路由表中找到与特定目标关键字最接近的N个节点。



我认为很多人只是简单地遍历整个路由表,因为对于常规节点而言,它最多仅包含几十个存储桶,而DHT节点不会看到太多流量,因此它仅需每秒执行几次该操作,并且如果您在密集的,易于缓存的数据结构中实现此操作,则最大的份额实际上可能是内存流量,而不是CPU指令执行一些XOR和比较。 / p>

即全表扫描很容易实现。






假设我们有一个带有以下位前缀的路由表每个桶。字母用作方便的名称。)

  A 000 ... 
B 00100 ...
C 0010100 ...
D 0010101000 ...
E 0010101001 ...
F 001010101 ...
G 00101011 ...
H 001011 ...
I 0011 ...
J 01 ...
K 1 ...

现在让我们说我们正在寻找此目标密钥:

  T = 001001101011111100110011011011011010110110110110110111011011011001101111001001001001000100000001001 $ b $ / code> 

此外,存储桶还没有完全装满,或者我们需要的条目数超出了单个存储桶的容纳量,因此我们将必须访问多个存储桶才能获得所需的金额。



现在,第一个访问存储桶很明显,它是 B ,因为其前缀覆盖了目标密钥。



由于 B 的前缀长度为5位该存储桶中的条目将具有与 T 00000中的$ c>?共享5个前缀位。



B 是最接近 T ,这意味着没有任何路由表条目比相对距离 00000 ... 更近。相反,这意味着 B 之外的任何条目可以具有的最小距离为 00001 ... 。这意味着最近的存储桶必须覆盖 T xor 00001 ...-> 00101110101111110 [...]



覆盖此操作的存储区为 H



H B

最终-假设我们必须访问6个存储桶-订单将如下所示:

  00100 ...-> B 
001011 ...-> H
001010101 ...-> F
0010101000 ...-> D
0010101001 ...-> E
00101011 ...-> G

这似乎很混乱。但是,如果我们绘制访问的每个存储桶的前缀到目标密钥的距离,就会变得更加明显:

  00000。 .. 
000010 ...
000011000 ...
0000110010 ...
0000110011 ...
00001101 ...

因此算法如下:


  1. 找到覆盖目标密钥的初始存储桶

  2. 将存储桶的前缀与目标密钥(零掩码尾随位)进行异或

  3. 将距离增加前缀的最低有效位

  4. 与目标键的异或距离增加

  5. 找到下一个包含异或键的存储桶
  6. goto 2

TL; DR:只向左看一桶,向右看一桶是还不够正确地使用了正确的算法,对整个表进行线性扫描更容易实现。


I, already, reviewed a number of documents on this topic but there is something not exactly clear. For example bit torrent document (http://www.bittorrent.org/beps/bep_0005.html) states

The routing table is subdivided into "buckets" that each cover a portion of the space. An empty table has one bucket with an ID space range of min=0, max=2^160. When a node with ID "N" is inserted into the table, it is placed within the bucket that has min <= N < max. An empty table has only one bucket so any node must fit within it. Each bucket can only hold K nodes, currently eight, before becoming "full." When a bucket is full of known good nodes, no more nodes may be added unless our own node ID falls within the range of the bucket. In that case, the bucket is replaced by two new buckets each with half the range of the old bucket and the nodes from the old bucket are distributed among the two new ones. For a new table with only one bucket, the full bucket is always split into two new buckets covering the ranges 0..2^159 and 2^159..2^160.

It is somewhat different from other documents regarding kademlia routing table where buckets are arranged in accordance to the bit prefix of the node id but there is another confusing thing. When we reply to the "find node" request we have to find 8 closest nodes to the requested one using XOR operation. I saw some implementations just going through each item in the routing table executing XOR operation and thus finding 8 closest items. It seems to me too CPU wasting.

Everything is already in buckets. Even if we use the suggested by the bit torrent document system we can quicker find the bucket which could potentially contain the requested node id simply enumerating buckets and checking the minimum and maximum numbers on it. Then potentially that bucket should contain the closes nodes but they are value closest nodes not the XOR closest nodes (as I understand) which is somewhat different but somewhat similar.

I ran a simple test using numbers from 0 to 99 where I wanted to find 8 XOR closest numbers and they where in the proximity of the sought number but not right near it. Now, thinking about our buckets I guess it possible that all node ids in the bucket are the closest for a minor exception. So, for example if we take this bucket, take one from the left and one from the right and search for the XOR closest node ids we'll find what we are looking for and there is no point to go through ALL nodes in the routing table.

Am I right or am I missing something?

解决方案

It is somewhat different from other documents regarding kademlia routing table where buckets are arranged in accordance to the bit prefix of the node id but there is another confusing thing.

The bittorrent specification describes a routing table implementation that only approximates the one described in the kademlia paper. It is easier to implement but has some shortcomings.

So, for example if we take this bucket, take one from the left and one from the right and search for the XOR closest node ids we'll find what we are looking for and there is no point to go through ALL nodes in the routing table.

In both - the full, tree-like routing table implementation and the simplified BEP5-variant - each bucket can be thought of as having a CIDR-like prefix (see this SO answer) consisting of prefix bits that the bucket covers and a mask bit count.

In the BEP5-variant each bucket's prefix is simply derived from the array index and your node's own ID. In the tree-like table buckets have to track their prefix due to bucket split/merge operations.

Using those prefixes you can find the bucket that covers the target key.

The thing is that buckets don't have to be full, and if you want to send, let's say 20 nodes in a response a single bucket will not suffice.

So you have to traverse the routing table (either sorted based on your own node ID or by the natural distance) in ascending distance (XOR) order relative to the target key to visit multiple buckets.

Since the XOR distance metric folds at each bit-carry (XOR == carry-less addition) it does not map nicely to any routing table layout. In other words, visiting the nearest buckets won't do.

Here's my implementation for finding the N closest nodes to a specific target key from a tree-like routing table.

I figure that many people simply iterate over the whole routing table because for regular nodes it will only contain a few dozen buckets at most and a DHT node does not see much traffic, so it only has to execute this operation a few times per second and if you implement this in a dense, cache-friendly datastructure then the lion's share might actually be the memory traffic and not the CPU instructions doing a few XORs and comparisons.

I.e. a full-table-scan is just easy to implement.


Let's assume we have a routing table with the following bit prefixes for each bucket. The letters serve as convenient names).

A 000... 
B 00100... 
C 0010100... 
D 0010101000... 
E 0010101001...
F 001010101... 
G 00101011... 
H 001011... 
I 0011... 
J 01... 
K 1... 

now let's say we're looking for this target key:

T = 0010011010111111001111100101011000001010010100001100100010011100000000100100000111001101100110110110101100010100111010111001101111110101001001001000100000001001

Also, the buckets aren't entirely full or we need more entries than whatever fits into a single bucket, so we'll have to visit more than one bucket to obtain the desired amount.

Now, the first bucket to visit is fairly obvious, it's B since its prefix covers the target key.

Since B's prefix is 5 bits long any entry in that bucket will have a XOR distance to T of 00000???????.... 5 prefix bits shared.

B is the closest bucket to T, which means that there cannot be any routing table entries closer than the relative distance 00000.... Conversely that means that the minimal distance that any entry outside of B can have is 00001.... Which means the next-closest bucket must cover T xor 00001... -> 00101110101111110[...].

The bucket that covers this is H.

H is not adjacent to B

Ultimately - assuming we'll have to visit 6 buckets - the order will look like this:

00100...      -> B
001011...     -> H
001010101...  -> F
0010101000... -> D
0010101001... -> E
00101011...   -> G

This seems fairly chaotic. But if we plot the prefix-to-target-key distances for each bucket visited it becomes a little more obvious:

00000...
000010...
000011000...
0000110010...
0000110011...
00001101...

So the algorithm is as follows:

  1. find the initial bucket covering the target key
  2. XOR the bucket's prefix with the target key (zero-mask trailing bits)
  3. increment the distance by the least significant bit of that prefix
  4. XOR incremented distance with target key
  5. find next bucket covering the XORed key
  6. goto 2

TL;DR: "just look one bucket left, one bucket right" is not sufficient. The correct algorithm is fairly involved, a linear scan over the whole table is easier to implement.

这篇关于在Torrent Kademlia路由表上实现查找节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆