Aerospike设计|申请流程内部|资源资源 [英] Aerospike Design | Request Flow Internals | Resources

查看:129
本文介绍了Aerospike设计|申请流程内部|资源资源的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在哪里可以找到有关从客户端API触发集群时读/写请求的流程信息?



在Aerospike配置文档中( http://www.aerospike.com/docs/reference/configuration ),其中提到了事务队列,服务线程,事务线程等,但是体系结构文档中未讨论它们。我想了解它的工作原理,以便可以对其进行相应的配置。

解决方案

从客户端到群集节点



在您的应用程序中,记录的是3-元组(命名空间集合标识符)。对于所有键值方法(例如,,键将被传递到客户端 a href = http://www.aerospike.com/apidocs/python/client.html#aerospike.Client.get rel = noreferrer>获取和输入)。



然后,客户端通过set , identifier )部分进行哈希处理> RIPEMD-160 ,生成20B摘要。该摘要是指定记录中的实际唯一标识符。您的Aerospike集群的命名空间。每个命名空间都有4096个分区



客户端使用摘要的12位来确定此特定密钥的分区ID。客户端使用分区图查找拥有与分区ID对应的主分区的节点。随着群集的增长,找到正确节点的成本保持不变(O(1)),因为它不依赖于记录数或节点数。



客户端将操作及其数据转换为Aerospike 有线协议消息,然后使用其池中的现有TCP连接(或创建一个新的TCP连接)将消息发送到正确的节点(持有该分区ID的主副本的节点)。



服务线程和事务队列



当操作消息作为 NIC发送/接收队列中断,
a 服务线程从NIC接收消息。接下来发生什么取决于该操作应针对的名称空间。如果它是内存中的名称空间,则服务线程将执行以下所有步骤。如果它是一个数据存储在SSD上的命名空间,则服务线程会将操作放在交易队列。队列的交易线程之一将执行以下步骤。



主索引查找



每个记录在内存主索引。主索引表示为每个分区的小树枝,并将每个小枝都实现为红黑树



线程(如上所述,事务线程或服务线程)从记录的摘要中找到分区ID,并跳至该分区的正确分支。 / p>

存在,读取,更新,替换



如果操作是存在读取更新替换,线程获取记录锁定,在此期间其他操作将等待访问特定的小树枝。这是一个短暂的锁。线程在红黑树间穿行,以查找带有该摘要的条目。如果操作是存在,并且元数据条目确实存在,线程将打包适当的消息并作出响应。对于读取,线程将使用指针元数据从命名空间存储中读取记录



更新需要如上所述读取记录,然后合并到bin数据中。替换与更新类似,但是替换会跳过首先读取当前记录的过程。如果名称空间在内存中,则服务线程会将修改后的记录写入内存。如果名称空间存储在SSD上,则合并的记录将放置在流写入缓冲区,等待刷新到存储设备。调整主索引中的元数据条目,将其指针更新为记录的新位置。 Aerospike执行创建/更新/替换的写时复制。



如果复制因子大于1。在记录锁定过程之后,该操作将副本写入完成时,也将其停放在RW哈希(序列化器)中。这是同一记录上的其他事务将排队的地方,直到它们达到事务待处理限制(又称为热键 )。副本写入由不同的线程处理( rw-接收),释放事务或服务线程以继续进行下一个操作。复制副本写入完成后,将释放RW哈希锁,并且rw-receive线程将打包回复消息并将其发送回客户端。



创建和删除



如果操作是正在写入的新记录或正在删除的记录,则需要修改分区小枝。



像更新/替换一样,这些操作将获得记录级锁定,并将经历RW哈希。因为他们从代表小树枝的红黑树中添加或删除了元数据条目,所以他们还必须获得索引树归约锁定。当名称空间主管线程找到过期记录,并将其从主索引中删除。



如果命名空间存储在SSD上,则创建操作会将记录加载到流写入缓冲区中,等待刷新到SSD,并在副本之前写入。它将更新主索引中的元数据条目,并调整其指向新块的指针。



A delete将从主索引的分区小枝中删除元数据条目。 / p>

摘要




  • 存在/读取记录级锁定,并保持最短的时间。当复制因子为1时,更新/替换也是如此。

  • 当复制因子大于1时,更新/替换也将获取RW哈希锁。

  • 创建/删除还可以获取索引树减少锁。

  • 对于内存中的名称空间,服务线程会进行所有工作,甚至可能复制副本。
  • >
  • 对于SSD命名空间上的数据,服务线程将操作扔到事务队列中,此后,它的一个事务线程处理诸如将记录加载到流式写缓冲区以进行写操作之类的事情,直到可能发生的情况为止。复制副本。

  • rw-receive线程处理复制副本并在更新/替换/创建/删除写入操作后返回消息。



进一步阅读




Where can I find information about the how flow of the read/write request in the cluster when fired from the client API?

In Aerospike configuration doc ( http://www.aerospike.com/docs/reference/configuration ), it's mentioned about transaction queues, service threads, transaction threads etc but they are not discussed in the architecture document. I want to understand how it works so that I can configure it accordingly.

解决方案

From client to cluster node

In your application, a record's key is the 3-tuple (namespace, set, identifier). The key is passed to the client for all key-value methods (such as get and put).

The client then hashes the (set, identifier) portion of the key through RIPEMD-160, resulting in a 20B digest. This digest is the actual unique identifier of the record within the specified namespace of your Aerospike cluster. Each namespace has 4096 partitions, which are distributed across the nodes of the cluster.

The client uses 12 bits of the digest to determine the partition ID of this specific key. Using the partition map, the client looks up the node that owns the master partition corresponding to the partition ID. As the cluster grows, the cost of finding the correct node stays constant (O(1)) as it does not depended on the number of records or the number of nodes.

The client converts the operation and its data into an Aerospike wire protocol message, then uses an existing TCP connection from its pool (or creates a new one) to send the message to the correct node (the one holding this partition ID's master replica).

Service threads and transaction queues

When an operation message comes in as a NIC transmit/receive queue interrupt, a service thread picks up the message from the NIC. What happens next depends on the namespace this operation is supposed to execute against. If it is an in-memory namespace, the service thread will perform all of the following steps. If it's a namespace whose data is stored on SSD, the service thread will place the operation on a transaction queue. One of the queue's transaction threads will perform the following steps.

Primary index lookup

Every record has a 64B metadata entry in the in-memory primary index. The primary-index is expressed as a collection of sprigs per-partition, with each sprig being implemented as a red-black tree.

The thread (either a transaction thread or the service thread, as mentioned above) finds the partition ID from the record's digest, and skips to the correct sprig of the partition.

Exist, Read, Update, Replace

If the operation is an exists, a read, an update or a replace, the thread acquires a record lock, during which other operations wait to access the specific sprig. This is a very short lived lock. The thread walks the red-black tree to find the entry with this digest. If the operation is an exists, and the metadata entry does exist, the thread will package the appropriate message and respond. For a read, the thread will use the pointer metadata to read the record from the namespace storage.

An update needs to read the record as described above, and then merge in the bin data. A replace is similar to an update, but it skips first reading the current record. If the namespace is in-memory the service thread will write the modified record to memory. If the namespace stores on SSD the merged record is placed in a streaming write buffer, pending a flush to the storage device. The metadata entry in the primary index is adjusted, updating its pointer to the new location of the record. Aerospike performs a copy-on-write for create/update/replace.

Updates and replaces also needs to be communicated to the replica(s) if the replication factor of the namespace is greater than 1. After the record locking process, the operation will also be parked in the RW Hash (Serializer), while the replica write completes. This is where other transactions on the same record will queue up until they hit the transaction pending limit (AKA a hot key). The replica write(s) is handled by a different thread (rw-receive), releasing the transaction or service thread to move on to the next operation. When the replica writes complete the RW Hash lock is released, and the rw-receive thread will package the reply message and send it back to the client.

Create and Delete

If the operation is a new record being written, or a record being deleted, the partition sprig needs to be modified.

Like update/replace, these operations acquire the record-level lock and will go through the RW hash. Because they add or remove a metadata entry from the red-black tree representing the sprig, they must also acquire the index tree reduction lock. This process also happens when the namespace supervisor thread finds expired records and remove them from the primary index. A create operation will add an element to the partition sprig.

If the namespace stores on SSD, the create will load the record into a streaming write buffer, pending a flush to SSD, and ahead of the replica write. It will update the metadata entry in the primary index, adjusting its pointer to the new block.

A delete removes the metadata entry from the partition sprig of the primary index.

Summary

  • exists/read grab the record-level lock, and hold it for the shortest amount of time. That's also the case for update/replace when replication factor is 1.
  • update/replace also grab the RW hash lock, when replication factor is higher than 1.
  • create/delete also grab the index tree reduction lock.
  • For in-memory namespaces the service thread does all the work up to potentially the point of replica writes.
  • For data on SSD namespaces the service thread throws the operation onto a transaction queue, after which one of its transaction threads handles things such as loading the record into a streaming write buffer for writes, up until the potential replica write.
  • The rw-receive thread deals with replica writes and returning the message after the update/replace/create/delete write operation.

Further reading

这篇关于Aerospike设计|申请流程内部|资源资源的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆