与直接连接到db和retreive数据相比,搜寻器有何优势? [英] How does crawler much better than direct connecting to db and retreive data?

查看:83
本文介绍了与直接连接到db和retreive数据相比,搜寻器有何优势?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在AWS Glue作业中,为了从DB或S3检索数据,我们可以使用2种方法. 1)使用抓取工具2)使用直接连接到DB或S3.

In AWS Glue jobs, in order to retrieve data from DB or S3, we can get using 2 approaches. 1) Using Crawler 2) Using direct connection to DB or S3.

所以我的问题是:爬网程序比直接连接到数据库并检索数据要好得多吗?

So my question is: How does crawler much better than direct connecting to a database and retrieve data?

推荐答案

AWS Glue Crawlers不会检索实际数据.抓取工具访问您的数据存储,并按优先级排序分类列表进行操作,以提取数据的架构和其他统计信息,然后使用此元数据填充 Glue数据目录.可以将爬网程序安排为定期运行,以检测新数据的可用性以及对现有数据的更改,包括数据爬网程序对表定义所做的更改.抓取工具会自动向现有表中添加新表,新分区以及表定义的新版本.

AWS Glue Crawlers will not retrieve the actual data. Crawlers accesses your data stores and progresses through a prioritized list of classifiers to extract the schema of your data and other statistics, and then populates the Glue Data Catalog with this metadata. Crawlers can be scheduled to run periodically that will detect the availability of the new data along with the change to the existing data, including the table definition changes made by the data crawler. Crawlers automatically adds new table, new partitions to the existing table and the new versions of table definitions.

AWS Glue数据目录成为之间的通用元数据存储库 Amazon Athena,Amazon Redshift Spectrum,Amazon S3. AWS胶水爬行器 帮助构建此元数据存储库.

AWS Glue Data Catalog becomes a common metadata repository between Amazon Athena, Amazon Redshift Spectrum, Amazon S3. AWS Glue Crawlers helps in building this metadata repository.

这篇关于与直接连接到db和retreive数据相比,搜寻器有何优势?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆