智能地抓取第一段/开始的文字 [英] Intelligently grab first paragraph/starting text

查看:76
本文介绍了智能地抓取第一段/开始的文字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想有一个脚本,我可以在其中输入URL,它将智能地抓住文章的第一段...除了从<$ c $内提取文本外,我不确定从哪里开始。 c>< p> 标签。您是否知道有关如何执行此类操作的任何提示/教程?

I'd like to have a script where I can input a URL and it will intelligently grab the first paragraph of the article... I'm not sure where to begin other than just pulling text from within <p> tags. Do you know of any tips/tutorials on how to do this kind of thing?

更新

为进一步说明,我正在网站的一部分中,用户可以在Facebook上提交链接,该链接将从网站上获取图像以及文字。我正在使用PHP并试图确定执行此操作的最佳方法。

For further clarification, I'm building a section of my site where users can submit links like on Facebook, it'll grab an image from their site as well as text to go with the link. I'm using PHP and trying to determine the best method of doing this.

我说智能是因为我想尝试在该页面上获取内容重要,不仅是第一段,而且是最重要内容的第一段。

I say "intelligently" because I'd like to try to get content on that page that's important, not just the first paragraph, but the first paragraph of the most important content.

推荐答案

如果要抓取的页面是外国的,甚至是本地的,但您事先都不知道其结构,我想说最好的方法是使用php DOM函数

If the page you want to grab is foreign or even if it is local but that you don't know its structure in advance, I'd say the best to achieve this would be by using the php DOM functions.

function get_first_paragraph($url)
{
  $page = file_get_contents($url);
  $doc = new DOMDocument();
  $doc->loadHTML($page);
  /* Gets all the paragraphs */
  $p = $doc->getElementsByTagName('p');
  /* extracts the first one */
  $p = $p->items(0);
  /* returns the paragraph's content */
  return $p->textContent;
}

这篇关于智能地抓取第一段/开始的文字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆