使用Spark结构化流进行实时数据标准化/标准化 [英] Real-time data standardization / normalization with Spark structured streaming

查看：172 发布时间：2020/9/4 8:22:31 apache-spark machine-learning spark-streaming normalization spark-structured-streaming

本文介绍了使用Spark结构化流进行实时数据标准化/标准化的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在实现机器学习算法时，标准化/标准化数据是必不可少的(即使不是至关重要的)要点.在过去的两周里，我一直试图解决使用Spark结构化流进行实时处理的问题.

Standardizing / normalizing data is an essential, if not a crucial, point when it comes to implementing machine learning algorithms. Doing so on a real time manner using Spark structured streaming has been a problem I've been trying to tackle for the past couple of weeks.

在历史数据上使用StandardScaler估计器((value(i)-mean) /standard deviation)证明是很好的，并且在我的用例中，这是获得合理聚类结果的最佳方法，但是我不确定如何将StandardScaler模型与实时数据.结构化流式传输不允许这样做.任何建议将不胜感激！

Using the StandardScaler estimator ((value(i)-mean) /standard deviation) on historical data proved to be great, and in my use case it is the best, to get reasonable clustering results, but I'm not sure how to fit StandardScaler model with real-time data. Structured streaming does not allow it. Any advice would be highly appreciated!

换句话说，如何在Spark结构化流中适应模型?

In other words, how to fit models in Spark structured streaming?

使用Spark结构化流进行实时数据标准化/标准化 [英] Real-time data standardization / normalization with Spark structured streaming

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

使用Spark结构化流进行实时数据标准化/标准化 [英] Real-time data standardization / normalization with Spark structured streaming

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭