从网站内容PHP过滤JavaScript [英] Filtering javascript from site content PHP

查看：156 发布时间：2018/6/21 13:08:57 javascript php jquery html keyword

本文介绍了从网站内容PHP过滤JavaScript的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

因此，我正在制作一个脚本，根据用户提交的URL检查页面的关键字密度，并且我一直使用strip_tags，但它似乎并未完全过滤实际单词中的javascript和其他代码网站上的内容。是否有更好的方法来筛选页面上的代码内容和实际的单词内容？

  if（isset（$ _ POST ['url']））{
 $ url = $ _POST ['url']; 
 $ str = strip_tags（file_get_contents（$ url））; 
 $ words = str_word_count（strtolower（$ str），1）; 
 $ word_count = array_count_values（$ words）; 
 
 foreach（$ word_count as $ key => $ val）{
 $ density =（$ val / count（$ words））* 100; 
 echo$ key  -  COUNT：$ val，DENSITY：.number_format（$ density，2）。％
> \\\
; 
} 
}

解决方案

I为此写了2个函数：

  / ** 
 *删除所有由Html字符串提供的标签
 * 
 * @param string $ str Html字符串
 * @param字符串[] $ tagArr一个包含所有标记名的数组
 * 
 * @return string没有标签的Html字符串
 * / 
函数removeTags（$ str，$ tagArr）
 {
 foreach（$ tagArr as $ tag）{
 $ str = preg_replace （'＃<'。$ tag。'（。*？）>（。*？）< /'。$ tag。'> #is'，''，$ str）; 
} 
返回$ str; 
 
 $ b $ ** 
清除一些html字符串
 * 
 * @param string $ str一些html字符串
 * 
 * @return string清理后的字符串
 * / 
函数filterHtml（$ str）
 {
 //移除标签
 $ str = removeTags（$ str， ['script'，'style']）; 
 
 //删除所有标签，但不包含内容
 $ str = preg_replace（'/< [>] *> /'，''，$ str）; 
 
 //删除换行符和制表符
 str = str_replace（[\\\
，\t，\r]，''，$ str）; 
 
 //删除Double Whitespace 
 while（strpos（$ str，''）！== false）{
 $ str = str_replace（''，''，$ str ）; 
} 
 
 //返回修剪
返回修剪（$ str）; 
 
 $ / code>

工作示例

  $ fileContent = file_get_contents（'http://stackoverflow.com/questions/25537377/filtering-html-from-site-content-php'）; 
 $ filteredContent = filterHtml（$ fileContent）; 
 var_dump（$ filteredContent）;

So I'm making a script to check the keyword density of a page based off the URL the user submits and I have been using strip_tags but it doesn't seem to be completely filtering the javascript and other code from the actual word content on the site. Is there a better way to filter between the code content on a page and the actual word content?

if(isset($_POST['url'])){
$url = $_POST['url'];
$str = strip_tags(file_get_contents($url));
$words      = str_word_count(strtolower($str),1);
$word_count = array_count_values($words);

foreach ($word_count as $key=>$val) {
    $density = ($val/count($words))*100;
        echo "$key - COUNT: $val, DENSITY: ".number_format($density,2)."%<br/>\n";
}
}

解决方案

I have written 2 functions for this:

/**
 * Removes all Tags provided from an Html string
 *
 * @param string   $str    The Html String
 * @param string[] $tagArr An Array with all Tag Names to be removed
 *
 * @return string The Html String without the tags
 */
function removeTags($str, $tagArr)
{
    foreach ($tagArr as $tag) {
        $str = preg_replace('#<' . $tag . '(.*?)>(.*?)</' . $tag . '>#is', '', $str);
    }
    return $str;
}

/**
 * cleans some html string
 *
 * @param string $str some html string
 *
 * @return string the cleaned string
 */
function filterHtml($str)
{
    //Remove Tags
    $str = removeTags($str, ['script', 'style']);

    //Remove all Tags, but not the Content
    $str = preg_replace('/<[^>]*>/', ' ', $str);

    //Remove Linebreaks and Tabs
    $str = str_replace(["\n", "\t", "\r"], ' ', $str);

    //Remove Double Whitespace
    while (strpos($str, '  ') !== false) {
        $str = str_replace('  ', ' ', $str);
    }

    //Return trimmed
    return trim($str);
}

Working Example

$fileContent     = file_get_contents('http://stackoverflow.com/questions/25537377/filtering-html-from-site-content-php');
$filteredContent = filterHtml($fileContent);
var_dump($filteredContent);

这篇关于从网站内容PHP过滤JavaScript的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从网站内容PHP过滤JavaScript [英] Filtering javascript from site content PHP

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

从网站内容PHP过滤JavaScript [英] Filtering javascript from site content PHP

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭