如何防止抓取工具依赖于XPath获取页面内容 [英] How to prevent crawlers depending on XPath from getting pages contents

查看:200
本文介绍了如何防止抓取工具依赖于XPath获取页面内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是一个php的库,使evrey人能够攻击我(类似cURL的东西)。然后我有一个想法,以防止它,我想为我的元素使用动态类名。请查看:

There is a library of php that makes evreybody able to attacks me (something like cURL). Then i have a idea to prevent it, I want to use dynamic class name for my elements. look at this:

<div class="<?php $ClassName ?>">anything</div> // $className is taken from the database




$ ClassName 会改变evry时间。

Note: $ClassName will vary evry time.

,任何人不知道是什么是我的类名称来选择我的元素,然后复制我的数据。现在我有两个问题:

In this case, anyone don't know what is my class name to select my element and then copy my data. Now i have two problem:


  1. 如何在 $ ClassName code>。$ ClassName (在css文件中)?换句话说,我如何使用php变量css类名? (动态css类)

  2. 它是否已经过优化以从数据库获取所有类名?

  1. How can I communicate between $ClassName and .$ClassName (in css file)? in other words, how can i use php variable for css class names ? (dynamic css classes)
  2. Is it optimized to take all class names from database ?!


推荐答案

使用数据库获取类名不是最佳的, 。你应该定义一个所有类名的数组,然后通过 array_rand 选择一个类,这样的东西:

Using the database to get the class name is not optimal until it can be done locally. You should define a array of all class names, and then pick one up them by array_rand, some thing like this:

// php code
   <?php
     $classes = array('class1','class2','class3','class4'); 
     $class_name = $classes[array_rand($classes)];
   ?>


// html code
     <div class="<? php echo $class_name; ?>">anything</div>


// css code
   <style>
     .<? php echo $class_name; ?> {
      // your css codes
     }
   </style>

注意:您必须知道您不能使用PHP代码 .css 文件,那么您应该在 .php 文件中写入您想要动态的所有css代码,并使用< style>

Note: you must know that you can't use php codes at .css file, then you should write all css codes that you want to be dynamic in your .php file and use <style> stuff </style>.


正如@sємsєм说的,你可以创建动态html标签。

Meanwhilem, as @sємsєм said, you can creat dynamic html tags.

这样的东西:(全代码)

// php code
   <?php
     // dynamic class
     $classes = array('class1','class2','class3','class4'); 
     $class_name = $classes[array_rand($classes)];

     // dynamic tags
     $tags_statr = array('','<div>','<div><div>','<div><p>','<span><div>');
     $tags_end = array('','</div>','</div></div>','</div></p>','</span></div>');
     $numb = array_rand($tags_statr);
   ?>


// html code
     <?php echo $tags_statr[$numb]; ?>
     <div class="<? php echo $class_name; ?>">anything</div>
     <?php echo $tags_end[$numb]; ?>


// css code
   <style>
     .<? php echo $class_name; ?> {
      // your css codes
     }
   </style>

为了更高的安全性,你可以把你的内容 em> (除了外部动态代码)。例如:

And for higher security, You can put your content (Here 'anything') (in addition to the external dynamic tags). for example:

<span1>anything</span1> // <span1> changed to <span2,3,4....>

在这种情况下,具有数据的相邻标记也是动态的,这使得搜索器更难。

In this case, the adjacent tag with data is also dynamic, And this makes it harder for crawlers.

最后,我必须说,你不能完全阻止爬虫,你只是让它很难。如果您真的想保护自己的数据,可以执行类似下面的操作:

Finally, I must say that you can't prevent crawlers utterly, you just make it difficult. If you really want to protect your data, you can do things like them:


  • 增加了对用户的限制。 (例如只有注册用户可以看到重要信息)

  • 监控您的网站使用的IP地址(如果可疑,

  • 使用相关软件。 (例如要限制每天搜索IP)

  • Increased restrictions for users. (e.g Only registered users can see important information)
  • Monitor IP that uses of your website (and if suspicious, block it)
  • Use relevant software. (e.g To limit the search for an IP on a daily basis)

这篇关于如何防止抓取工具依赖于XPath获取页面内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆