仅当我收到推理请求时,有没有办法打开SageMaker模型端点 [英] Is there a way to turn on SageMaker model endpoints only when I am receiving inference requests

查看:87
本文介绍了仅当我收到推理请求时,有没有办法打开SageMaker模型端点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个模型端点,该端点是InService并部署在ml.m4.xlarge实例上.我也在使用API​​ Gateway创建RESTful API.

I have created a model endpoint which is InService and deployed on an ml.m4.xlarge instance. I am also using API Gateway to create a RESTful API.

问题:

  1. 当我收到推理请求时,是否可以将模型端点仅在服务中(或处于待机状态)?也许通过编写lambda函数或关闭端点的东西(这样它就不会继续累积每小时收费)

  1. Is it possible to have my model endpoint only Inservice (or on standby) when I receive inference requests? Maybe by writing a lambda function or something that turns off the endpoint (so that it does not keep accumulating the per hour charges)

如果q1可行,那么最终用户是否会有一些奇怪的延迟问题?因为首次配置模型端点通常需要花费几分钟的时间.

If q1 is possible, would this have some weird latency issues on the end users? Because it usually takes a couple of minutes for model endpoints to be created when I configure them for the first time.

如果无法使用q1,那么选择便宜的实例类型将如何影响执行推理所需的时间(假设我仅将端点用于用户数量较少的应用程序). >

If q1 is not possible, how would choosing a cheaper instance type affect the time it takes to perform inference (Say I'm only using the endpoints for an application that has a low number of users).

我知道这个网站会比较不同的实例类型( https://aws.amazon.com/sagemaker/pricing/instance-types/)

I am aware of this site that compares different instance types (https://aws.amazon.com/sagemaker/pricing/instance-types/)

但是,具有适度的网络性能是否意味着执行实时推理的时间可能会更长?

But, does having a moderate network performance mean that the time to perform realtime inference may be longer?

任何建议都将不胜感激.目标是在用户不要求预测时就浪费金钱.

Any recommendations are much appreciated. The goal is not to burn money when users are not requesting for predictions.

推荐答案

您的模型有多大?如果小于 AWS Lambda 所要求的50 MB大小限制,依赖关系足够小,可能有一种方法可以直接依赖Lambda作为执行引擎.

How large is your model? If it is under the 50 MB size limit required by AWS Lambda and the dependencies are small enough, there could be a way to rely directly on Lambda as an execution engine.

如果您的模型大于50 MB,则可以通过将其存储在EFS上来 still 来运行它.请参阅 EFS Lambda .

If your model is larger than 50 MB, there might still be a way to run it by storing it on EFS. See EFS for Lambda.

这篇关于仅当我收到推理请求时,有没有办法打开SageMaker模型端点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆