DynamoDB get_item以毫秒为单位读取400kb数据 [英] DynamoDB get_item to read 400kb data in milliseconds

查看：58 发布时间：2021/4/3 19:54:47 amazon-web-services amazon-dynamodb

本文介绍了DynamoDB get_item以毫秒为单位读取400kb数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个称为 events 的动态表，其中存储了所有 user事件详细信息，例如 product_view ， add_to_cart 和 product_purchase

I have a dynamodb table called events in which i stored all user event details like product_view ,add_to_cart and product_purchase

在此 events 表中，我有一些 items ，其存储容量达到了 400kb

In this events table, I have some items whose storage capacity reached 400kb

问题:

        response = self._table.get_item(
            Key={
                PARTITION_KEY: <pk>,
                SORT_KEY: <sk>,
            },
            ConsistentRead=False,
        )

当我想使用 dynamodb get_item 方法访问 item(400kb)时，需要大约 5秒返回结果

when I want to use dynamodb get_item method to access the item(400kb), it is taking around 5 seconds to return the result.

我已经使用了DAX

目标

我想在不到1秒的时间内读取 400kb 项.

I want to read 400kb item in less than a 1 second.

重要信息:

dynamodb 中的数据将以这种格式存储

The data in the dynamodb will be stored in this format

{
 "partition_key": "user_id1111",
 "sort_key": "version_1",
 "attributes": {
  "events": [
   {
    "t": "1614712316",  
    "a": "product_view",   
    "i": "1275"
   },
   {
    "t": "1614712316",  
    "a": "product_add",   
    "i": "1275"
   },
   {
    "t": "1614712316",  
    "a": "product_purchase",   
    "i": "1275"
   },
    ...

  ]
 }
}

t 是一个时间戳记
a 可能是 product_view ， product_add ， product_purchase
i 是product_id

t is a timestamp
a may be product_view,product_add,product_purchase
i is the product_id

如果您看到上面的项目，则 events 是一个列表，它将被新事件附加.

If you see above item events is a list and it will be appended by new events.

我有一个 400kb 项，其中 events 列表中的事件数

I have an item which is 400kb with number of events in the events list

我写了一些脚本来测量时间，结果在下面给出

I wrote some script to measure the time and the results are given below

import boto3
import datetime

dynamodb = boto3.resource('dynamodb')

table = dynamodb.Table('events')

pk = f"user_id1111"
sk = f"version_1"


t_load_start = datetime.datetime.now()


response = table.get_item(
    Key={
        "partition_key": pk,
        "sort_key": sk,
    },
    ReturnConsumedCapacity="TOTAL"
)
capacity_units = response["ConsumedCapacity"]["CapacityUnits"]

t_load_end = datetime.datetime.now()
seconds = (t_load_end - t_load_start).total_seconds()

print(f"Elapsed time is::{seconds}sec and {capacity_units} capacity units")

这是我得到的输出.

Elapsed time is::5.676799sec and 50.0 capacity units

有人可以为此提出建议吗?

Can anyone suggest a solution for this?

tl; dr:将函数的内存增加到至少1024MB，请参阅更新2

我很好奇，所以我做了一些测量.我创建了一个脚本，可以在一个新表中创建一个大小约为400KB的大型Boi项目.

tl;dr: Increase your functions memory to at least 1024MB, see update 2

I was curious, so I did some measurements. I created a script that creates a big boi item with pretty much exactly 400KB in size in a fresh table.

然后，我测试从Python读取的两次读取-一个使用资源API，另一个使用较低级别的客户端-最终在这两种情况下都保持一致的读取.

Then I test two reads from Python - one with the resource API and the other with the lower level client - eventually consistent reads in both cases.

这是我测量的:

Reading Big Boi from a Table Resource took 0.366508s and consumed 50.0 RCUs
Reading Big Boi from a Client took 0.301585s and consumed 50.0 RCUs

如果从RCU推断，则读取的项目大小约为 50 * 2 * 4KB = 400 KB (最终一致的读取将消耗0.5个RCU).

If we extrapolate from the RCUs, the item it read was about 50 * 2 * 4KB = 400 KB in size (eventually consistent reads consume 0.5 RCUs).

我在德国本地针对 eu-central-1 (德国法兰克福)运行了几次，我看到的最大延迟时间约为900毫秒.(这没有DAX.)

I ran it a few times locally from Germany against eu-central-1 (Frankfurt, Germany) and the highest latency I saw was about 900ms. (This is without DAX.)

因此，我认为您应该向我们展示如何进行测量.

import uuid
from datetime import datetime, timedelta

import boto3
import boto3.dynamodb.conditions as conditions

TABLE_NAME = "big-boi-test"
BIG_BOI_PK = "f0ba8d6c"

TABLE_RESOURCE = boto3.resource("dynamodb").Table(TABLE_NAME)
DDB_CLIENT = boto3.client("dynamodb")

def create_table():
    DDB_CLIENT.create_table(
        AttributeDefinitions=[{"AttributeName": "PK", "AttributeType": "S"}],
        TableName=TABLE_NAME,
        KeySchema=[{"AttributeName": "PK", "KeyType": "HASH"}],
        BillingMode="PAY_PER_REQUEST"
    )

def create_big_boi_item() -> str:
    # based on calculations here: https://zaccharles.github.io/dynamodb-calculator/
    template = {
        "PK": {
            "S": BIG_BOI_PK
        },
        "bigBoi": {
            "S": ""
        }
    } # This is 16 bytes

    big_boi = "X" * (1024 * 400 - 16)
    template["bigBoi"]["S"] = big_boi
    return template

def store_big_boi():
    big_bio = create_big_boi_item()

    DDB_CLIENT.put_item(
        Item=big_bio,
        TableName=TABLE_NAME
    )

def get_big_boi_with_table_resource():

    start = datetime.now()
    response = TABLE_RESOURCE.get_item(
        Key={"PK": BIG_BOI_PK},
        ReturnConsumedCapacity="TOTAL"
    )
    end = datetime.now()
    seconds = (end - start).total_seconds()
    capacity_units = response["ConsumedCapacity"]["CapacityUnits"]

    print(f"Reading Big Boi from a Table Resource took {seconds}s and consumed {capacity_units} RCUs")

def get_big_boi_with_client():

    start = datetime.now()
    response = DDB_CLIENT.get_item(
        Key={"PK": {"S": BIG_BOI_PK}},
        ReturnConsumedCapacity="TOTAL",
        TableName=TABLE_NAME
    )
    end = datetime.now()
    seconds = (end - start).total_seconds()
    capacity_units = response["ConsumedCapacity"]["CapacityUnits"]

    print(f"Reading Big Boi from a Client took {seconds}s and consumed {capacity_units} RCUs")

if __name__ == "__main__":
    # create_table()
    # store_big_boi()
    get_big_boi_with_table_resource()
    get_big_boi_with_client()

更新

我对一件看起来更像您正在使用的物品再次进行了相同的测量，无论我以哪种方式要求它们，我的平均水平仍低于1000ms:

Update

I did the same measurements again with an item that looks more like the one you're using, I'm still below 1000ms on average no matter which way I request them:

Reading Big Boi from a Table Resource took 1.492829s and consumed 50.0 RCUs
Reading Big Boi from a Table Resource took 0.871583s and consumed 50.0 RCUs
Reading Big Boi from a Table Resource took 0.857513s and consumed 50.0 RCUs
Reading Big Boi from a Table Resource took 0.769432s and consumed 50.0 RCUs
Reading Big Boi from a Table Resource took 0.690172s and consumed 50.0 RCUs
Reading Big Boi from a Table Resource took 0.670099s and consumed 50.0 RCUs
Reading Big Boi from a Table Resource took 0.633489s and consumed 50.0 RCUs
Reading Big Boi from a Table Resource took 0.605999s and consumed 50.0 RCUs
Reading Big Boi from a Table Resource took 0.598635s and consumed 50.0 RCUs
Reading Big Boi from a Table Resource took 0.606553s and consumed 50.0 RCUs
Reading Big Boi from a Client took 1.66636s and consumed 50.0 RCUs
Reading Big Boi from a Client took 0.921605s and consumed 50.0 RCUs
Reading Big Boi from a Client took 0.831735s and consumed 50.0 RCUs
Reading Big Boi from a Client took 0.707082s and consumed 50.0 RCUs
Reading Big Boi from a Client took 0.668602s and consumed 50.0 RCUs
Reading Big Boi from a Client took 0.648401s and consumed 50.0 RCUs
Reading Big Boi from a Client took 0.5695s and consumed 50.0 RCUs
Reading Big Boi from a Client took 0.592073s and consumed 50.0 RCUs
Reading Big Boi from a Client took 0.611436s and consumed 50.0 RCUs
Reading Big Boi from a Client took 0.553827s and consumed 50.0 RCUs
Average latency over 10 requests with the table resource: 0.7796304s
Average latency over 10 requests with the client: 0.7770621s

这是物品的样子:

以下是完整的测试脚本供您验证:

Here is the full test-script for you to verify:

import statistics
import uuid
from datetime import datetime, timedelta

import boto3
import boto3.dynamodb.conditions as conditions

TABLE_NAME = "big-boi-test"
BIG_BOI_PK = "NestedBoi"

TABLE_RESOURCE = boto3.resource("dynamodb").Table(TABLE_NAME)
DDB_CLIENT = boto3.client("dynamodb")

def create_table():
    DDB_CLIENT.create_table(
        AttributeDefinitions=[{"AttributeName": "PK", "AttributeType": "S"}],
        TableName=TABLE_NAME,
        KeySchema=[{"AttributeName": "PK", "KeyType": "HASH"}],
        BillingMode="PAY_PER_REQUEST"
    )

def create_big_boi_item() -> str:
    # based on calculations here: https://zaccharles.github.io/dynamodb-calculator/
    template = {
        "PK": {
            "S": "NestedBoi"
        },
        "bigBoiContainer": {
            "M": {
            "bigBoiList": {
                "L": [
                
                ]
            }
            }
        }
    } # 43 bytes

    item = {
        "M": {
        "t": {
            "S": "1614712316"
        },
        "a": {
            "S": "product_view"
        },
        "i": {
            "S": "1275"
        }
        }
    }  # 36 bytes

    number_of_items = int((1024 * 400 - 43) / 36)

    for _ in range(number_of_items):
        template["bigBoiContainer"]["M"]["bigBoiList"]["L"].append(item)

    return template

def store_big_boi():
    big_bio = create_big_boi_item()

    DDB_CLIENT.put_item(
        Item=big_bio,
        TableName=TABLE_NAME
    )

def get_big_boi_with_table_resource():

    start = datetime.now()
    response = TABLE_RESOURCE.get_item(
        Key={"PK": BIG_BOI_PK},
        ReturnConsumedCapacity="TOTAL"
    )
    end = datetime.now()
    seconds = (end - start).total_seconds()
    capacity_units = response["ConsumedCapacity"]["CapacityUnits"]

    print(f"Reading Big Boi from a Table Resource took {seconds}s and consumed {capacity_units} RCUs")

    return seconds

def get_big_boi_with_client():

    start = datetime.now()
    response = DDB_CLIENT.get_item(
        Key={"PK": {"S": BIG_BOI_PK}},
        ReturnConsumedCapacity="TOTAL",
        TableName=TABLE_NAME
    )
    end = datetime.now()
    seconds = (end - start).total_seconds()
    capacity_units = response["ConsumedCapacity"]["CapacityUnits"]

    print(f"Reading Big Boi from a Client took {seconds}s and consumed {capacity_units} RCUs")

    return seconds

if __name__ == "__main__":
    # create_table()
    # store_big_boi()

    n_experiments = 10
    experiments_with_table_resource = [get_big_boi_with_table_resource() for i in range(n_experiments)]
    experiments_with_client = [get_big_boi_with_client() for i in range(n_experiments)]
    print(f"Average latency over {n_experiments} requests with the table resource: {statistics.mean(experiments_with_table_resource)}s")
    print(f"Average latency over {n_experiments} requests with the client: {statistics.mean(experiments_with_client)}s")

如果我增加n_experiments，它可能会变得更快，这可能是因为DDB在内部缓存了.

If I increase n_experiments, it tends to get even faster, probably because DDB caches internally.

仍然:无法复制.

了解到您正在运行Lambda函数之后，我再次使用不同的内存配置在Lambda内部运行了测试.

After learning you're running Lambda functions, I ran the tests again inside of Lambda with different memory configurations.

<身体>

记忆	n_experiments	使用资源的平均时间	与客户的平均时间
128MB	10	6.28s	5.06s
256MB	10	3.26s	2.61s
512MB	10	1.62s	1.33s
1024MB	10	0.84s	0.68s
2048MB	10	0.52s	0.43s
4096MB	10	0.51s	0.41s

如注释中所述，CPU和网络性能与分配给功能的内存量成正比.您可以通过扔钱解决问题:-)

As mentioned in the comments, CPU and Network performance scale with the amount of Memory you assign to a function. You can solve your problem by throwing money at it :-)

这篇关于DynamoDB get_item以毫秒为单位读取400kb数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

DynamoDB get_item以毫秒为单位读取400kb数据 [英] DynamoDB get_item to read 400kb data in milliseconds

问题描述

推荐答案

tl; dr:将函数的内存增加到至少1024MB，请参阅更新2

tl;dr: Increase your functions memory to at least 1024MB, see update 2

更新

Update

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

DynamoDB get_item以毫秒为单位读取400kb数据 [英] DynamoDB get_item to read 400kb data in milliseconds

问题描述

推荐答案

tl; dr:将函数的内存增加到至少1024MB，请参阅更新2

tl;dr: Increase your functions memory to at least 1024MB, see update 2

更新

Update

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭