Pytrends趋势结果与手动下载的数据不同 [英] Pytrends trend results not similar with manually downloaded data

查看:200
本文介绍了Pytrends趋势结果与手动下载的数据不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用pytrends从Google趋势中自动下载csv中的数据.我使用的代码如下.在这种情况下,我将从2008年至今每月下载一次Google趋势数据.

I use pytrends to automatically download data in csv from google trend. The code i used is below. In this case, i am downloading a monthly google trend data from 2008 to present.

from pytrends.request import TrendReq
from urllib.parse import unquote
from dateutil.relativedelta import relativedelta
import datetime
import pytrends

google_username = "xxxxx@gmail.com"
google_password = "xxxxx"

search_term = unquote('%2Fm%2F07gyp7')
google_trend = TrendReq(google_username, google_password, custom_useragent='Pytrends'  )
google_trend_payload = {'gprop' : 'news' , 'q': search_term}
trendresult = TrendReq.trend(google_trend_payload, return_type = 'dataframe')
print(trendresult)

前5个月Google网站的结果与pytrends的结果相比:

The result from google website for the first 5 months compared with the result from pytrends:

Date          Pytrends data          Manual csv data
2008-01       21.0                   28.0
2008-02       16.0                   19.0
2008-03       16.0                   21.0
2008-04       15.0                   18.0
2008-05       22.0                   31.0

有人知道原因吗?谢谢.

Anyone know the reason? Thank you.

推荐答案

我遇到了同样的问题,因此在项目期间必须手动下载.现在,我已经知道了原因.它是Google的抽样方法. Google每天都会返回不同的趋势系列.想象一下,谷歌每天有1000万台服务器,对于每个查询,它可能仅采样其10k的服务器.因此,为了获得一致的序列,您可以花费30(甚至50)次并取平均值.对于数值不太小的序列(最小可能超过30),标准偏差约为5%(可接受).

I had the same issue so I had to download manually during my project. Now, I have been aware of the reason. It is the sampling methods by google. Each day Google returns a different trend series. Imagine google has 10 millions servers, each day, for each query, it only samples maybe 10 k of its servers. So, in order to get consistent series, you can take 30 (or even 50) times and take the average. For series with values not quite small (maybe over 30 as minimum), the standard deviation is around 5% (acceptable).

手动下载和gtrend下载之间的差异可能与以下事实有关:它们提取数据的方法不同. gtrend下载类型为 https://www.google.com/trends/fetchContent 的网址....现在,我知道手动下载的处理方式,但是我知道还有另一种提取数据的方法,例如

The difference between manual and gtrend download may be related to the fact that they are not the same extracting data methods. The gtrend downloads the url of type https://www.google.com/trends/fetchContent.... And I do now know how the manual download is processed but I do know there are another way to extract data, like https://www.google.com/trends/trendsReport.. . The latter returns weekly series for everything (pretty rich).

目前,似乎有配额限制问题.

At the moment, there seems to have quota limit problem.

这篇关于Pytrends趋势结果与手动下载的数据不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆