.htaccess mod-rewrite regex apache 混淆导致每天 10k 404 [英] .htaccess mod-rewrite regex apache confusion results in 10k 404's per day

查看:16
本文介绍了.htaccess mod-rewrite regex apache 混淆导致每天 10k 404的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经查看了这里发布的与 .htaccessapachemod-rewriteregex 相关的许多问题>,但我就是不明白.我尝试了一些不同的事情,但要么我把事情复杂化了,要么犯了初学者的错误.无论如何,我已经研究了几天,并且在每天显示 10000 个 404 的情况下完全把事情搞砸了.

I have reviewed the many questions posted here related to .htaccess, apache, mod-rewrite and regex, but I'm just not getting it. I tried a few different things but either I am over complicating things or making beginner mistakes. Regardless, I've been at it a few days now and have completely scrambled things somewhere as the 10000 404's per day are showing.

我的网站

我有一个 WordPress 网站,其中包含 23,000 多个帖子,分为 1200 多个类别.该网站以流媒体视频文件、行业新闻、节目评论、电影、phpbb 论坛等为特色,其结构如下:

I have a WordPress site which contains over 23,000 posts broken down into just over 1200 categories. The site features streaming video files, industry news, show reviews, movies, phpbb forums, etc. and is structured like this:

  • 站点/基本类别(0和a-z)/子类别(系列名称)/所有流媒体剧集的帖子(剧集名称 .html)
  • 所有流媒体电影的站点/电影/帖子 title.html
  • 网站/新闻/posttitle.html
  • 网站/评论/posttitle.html
  • site/page.html 用于分类页面
  • 网站/论坛

永久链接结构为 /%category%/%postname%.html

我使用的是 Yoast Wordpress SEO 插件,并且可以选择为目录和类别添加尾随斜杠.

I have am using the Yoast Wordpress SEO plugin and have the option to append a trailing slash enabled for directories and categories.

这里是当前的.htaccess

    # BEGIN WordPress
    <IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /
    RewriteRule ^index\.php$ - [L]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /index.php [L]
    </IfModule>

    # END WordPress

我的例子

从我们的旧网站结构中,我们有许多使用/episode title/"的入站链接.这是错误的.我们需要这些传入链接重定向到/watch-anime/letter, number or symbol only 1 character long/series title/episode title.html

From our old site structure we have many inbound links using "/episode title/". This is wrong. We need these incoming links to redirect to /watch-anime/letter, number or symbol only 1 character long/series title/episode title.html

/one-piece-episode-528​/

应该

/watch-anime/o/one-piece/​one-piece-episode-528​.html

我犯的一个错误导致了这个问题....../watch-anime/letter/series title/episode title/"到/watch-anime/letter/series title/剧集标题.html".因此,我们需要从单个帖子中删除尾部斜杠并添加 .html

A mistake I made caused this problem... "/watch-anime/letter/series title/episode title/" to "/watch-anime/letter/series title/episode title.html". So, we need to remove trailing slash from single posts and add .html

/watch-anime​/w​/welcome-to-the-nhk​/welcome-to-the-nhk-episode-14​/

应该

/watch-anime​/w​/welcome-to-the-nhk​/welcome-to-the-nhk-episode-14​.html

同样的错误,结合旧网站结构问题导致这个问题.../episode title.html"需要是/watch-anime/letter/series title/episode title.html"

The same mistake caused this problem when combined with the old site structure issue... "/episode title.html" needs to be "/watch-anime/letter/series title/episode title.html"

/one-piece-episode-528​.html

必须

/watch-anime/o/one-piece/​one-piece-episode-528​.html

如您所见,我在迁移网站帖子结构和尝试修复它之间做了很多事情.我现在正在寻求您可以提供的任何帮助,以获取将处理这些 301 重定向的正确 .htaccess 文件.

As you can see, I've made a mess of things between migrating the sites post structure and my attempts to fix it. I am now asking for any help you can provide in getting a proper .htaccess file that will take care of these 301 redirects.

感谢您提供的任何帮助!

Thanks for any assistance you can provide!

推荐答案

我不知道 RewriteMap 是否适用于 .htaccess 文件,但无论如何这是我的虚拟主机解决方案,它应该可以完美运行.

I don't know if RewriteMap work with .htaccess files, but anyway here's my solution for virtual host, which should work flawlessly.

创建一个 RewriteMap 文件.请参阅此处了解更多信息.这是一个非常简单的文本文件:首先,错误的 URL 没有/",然后是一个空格(至少),然后是正确的 URL,如下所示:

Create a RewriteMap file. See here for more information. This is a very simple text file with: first, the wrong URL without the '/', then one space (at least) and then the right url, like this:

one-piece-episode-528​ /watch-anime/o/one-piece/​one-piece-episode-528​.html
dexter-season-6-episode-1 /watch-interesting-stuff/d/dexter/dexter-season-6-episode-1.html
breaking-bad-full-season-3 /watch-interesting-stuff/b/breaking-bad/​breaking-bad-full-season-3.html

等等.

将这个简单的文本文件转换成哈希映射.例如:

convert this simple text file into hash map. For example:

httxt2dbm -i mapanime.txt -o mapanime.map

现在在你的虚拟主机中声明它:

Now declare it in your vhost:

RewriteMap mapanime \
    dbm:/pathtofile/mapanime.map

总的来说,你的虚拟主机应该是这样的:

So all in all your vhost should look like:

<VirtualHost *>
    RewriteEngine On
    RewriteMap mapanime \
        dbm:/pathtofile/mapanime.map
    # don't touch the URL, but try to search if it exists in mapanime
    RewriteRule /([^/]*)/$ - [QSA,NC,E=VARANIME:${mapanime:$1|notfound}]
    # if VARANIME not empty *and*
    #   VARANIME different from "notfound":
    RewriteCond %{ENV:VARANIME} ^(notfound|)$
    # then redirect it to the right URL:
    # QSA = query string append
    # R = redirect, 301 = definitive redirect
    # L = last = don't go further
    RewriteRule . %{ENV:VARANIME} [QSA,R=301,L]
</VirtualHost>

希望这会有所帮助.

我没有看到更简单的解决方案,但我很确定这个方法会奏效.

I don't see a simpler solution, but I'm pretty sure this one will work.

如果它不起作用:阅读我通常的两个提示",并在您的问题中添加重写日志.

If it doesn't work: read my usual "two hints", and add the rewrite log in your question.

请尝试使用 RewriteLog 指令:它可以帮助您追踪此类问题:

Please try to use the RewriteLog directive: it helps you to track down such problems:

# Trace:
# (!) file gets big quickly, remove in prod environments:
RewriteLog "/web/logs/mywebsite.rewrite.log"
RewriteLogLevel 9
RewriteEngine On

<小时>

我最喜欢的检查正则表达式的工具:


My favorite tool to check for regexp:

http://www.quanetic.com/Regex(不要忘记选择 ereg(POSIX) 而不是 preg(PCRE)!)

http://www.quanetic.com/Regex (don't forget to choose ereg(POSIX) instead of preg(PCRE)!)

这篇关于.htaccess mod-rewrite regex apache 混淆导致每天 10k 404的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆