在XML Feed中搜索关键字 [英] Searching XML Feeds for Keywords

查看:75
本文介绍了在XML Feed中搜索关键字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

全部

我正在建立一个网站,该网站将收集来自大约35种不同RSS提要的新闻报道,并存储在数组中.我正在使用一个foreach()循环来搜索标题和描述,以查看它是否包含大约40个关键字之一,每篇文章都使用substr().如果搜索成功,则该文章将存储在数据库中,并最终出现在网站上.

I'm building a site which will gather news stories from about 35 different RSS feeds, storing in an array. I'm using a foreach() loop to search the title and description to see if it contains one of about 40 keywords, using substr() for each article. If the search is successful, that article is stored in a DB, and ultimately will appear on the site.

该脚本每30分钟运行一次.麻烦的是,这需要1-3分钟,具体取决于返回的故事数量.并非糟糕",但在分片托管环境中,我可以看到这引起了很多问题,尤其是随着网站的发展和添加了更多供稿/关键字.

The script runs every 30 mins. Trouble is, it takes 1-3 mins depending on how many stories are returned. Not 'terrible' but on a shard hosting env, I can see this causing plenty of issues, especially as the site grows and more feeds/keywords are added.

有什么方法可以优化关键字的搜索",从而加快索引"的速度?

Are there any ways that I can optimize the 'searching' of keywords, so that I can speed up the 'indexing'?

谢谢!

推荐答案

35-40 RSS提要是对一个脚本同时处理和解析的大量请求.您的瓶颈很可能是请求,而不是解析.您应该分开关注点.有一个脚本,每分钟左右一次一次请求一个RSS feed,并将结果存储在本地.然后,另一个脚本应每15-30分钟解析并保存/删除一次临时结果.

35-40 RSS feeds are a lot of requests for one script to handle and parse all at once. Your bottleneck is most likely the requests, not the parsing. You should separate the concerns. Have one script that requests an RSS feed one at a time every minute or so, and store the results locally. Then another script should parse and save/remove the temporary results every 15-30 minutes.

这篇关于在XML Feed中搜索关键字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆