清理字符串以使它们的URL和文件名安全吗? [英] Sanitizing strings to make them URL and filename safe?

查看:78
本文介绍了清理字符串以使它们的URL和文件名安全吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试提供一个功能,该功能可以很好地清理某些字符串,以使它们可以安全地在URL中使用(如post slug),也可以安全地用作文件名.例如,当有人上传文件时,我要确保我删除名称中的所有危险字符.

I am trying to come up with a function that does a good job of sanitizing certain strings so that they are safe to use in the URL (like a post slug) and also safe to use as file names. For example, when someone uploads a file I want to make sure that I remove all dangerous characters from the name.

到目前为止,我已经提出了以下功能,希望可以解决此问题并允许外来UTF-8数据.

So far I have come up with the following function which I hope solves this problem and allows foreign UTF-8 data also.

/**
 * Convert a string to the file/URL safe "slug" form
 *
 * @param string $string the string to clean
 * @param bool $is_filename TRUE will allow additional filename characters
 * @return string
 */
function sanitize($string = '', $is_filename = FALSE)
{
 // Replace all weird characters with dashes
 $string = preg_replace('/[^\w\-'. ($is_filename ? '~_\.' : ''). ']+/u', '-', $string);

 // Only allow one dash separator at a time (and make string lowercase)
 return mb_strtolower(preg_replace('/--+/u', '-', $string), 'UTF-8');
}

是否有人可以针对此收集任何棘手的示例数据-或知道一种更好的方法来保护我们的应用程序不受不良影响?

$ is-filename允许一些其他字符,例如temp vim文件

更新:删除了星形字符,因为我无法想到有效的用法

推荐答案

您的解决方案中的一些观察结果:

Some observations on your solution:

    模式结尾处的
  1. 'u'表示模式,而不是其匹配的文本将被解释为UTF-8(我想您假设是后者?).
  2. \ w与下划线字符匹配.您特别将其包含在文件中,从而导致假设您不希望它们出现在URL中,但是在您的代码中,具有URL的将被允许包含下划线.
  3. 是否包含外国UTF-8"取决于语言环境.目前尚不清楚这是服务器还是客户端的语言环境.来自PHP文档:
  1. 'u' at the end of your pattern means that the pattern, and not the text it's matching will be interpreted as UTF-8 (I presume you assumed the latter?).
  2. \w matches the underscore character. You specifically include it for files which leads to the assumption that you don't want them in URLs, but in the code you have URLs will be permitted to include an underscore.
  3. The inclusion of "foreign UTF-8" seems to be locale-dependent. It's not clear whether this is the locale of the server or client. From the PHP docs:

单词"字符是任何字母或数字或下划线字符,即可以成为Perl单词"的一部分的任何字符.字母和数字的定义由PCRE的字符表控制,如果进行特定于语言环境的匹配,则可能会有所不同.例如,在"fr"(法语)语言环境中,某些大于128的字符代码用于带重音的字母,并且这些字符由\ w匹配.

A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w.

创建弹头

您可能不应该在帖子中添加带重音符号等字符,因为从技术上讲,应该对它们进行百分比编码(按照URL编码规则),这样您的URL看起来就会很丑.

Creating the slug

You probably shouldn't include accented etc. characters in your post slug since, technically, they should be percent encoded (per URL encoding rules) so you'll have ugly looking URLs.

因此,如果我是您,请在小写字母之后将所有特殊"字符转换为它们的等效字符(例如é-> e),并用-"替换非[az]字符,仅限一次运行'-',就像您所做的一样.这里有一个转换特殊字符的实现: https://web. archive.org/web/20130208144021/http://neo22s.com/slug

So, if I were you, after lowercasing, I'd convert any 'special' characters to their equivalent (e.g. é -> e) and replace non [a-z] characters with '-', limiting to runs of a single '-' as you've done. There's an implementation of converting special characters here: https://web.archive.org/web/20130208144021/http://neo22s.com/slug

OWASP具有其Enterprise Security API的PHP实现,其中包括安全编码和解码应用程序中输入和输出的方法.

OWASP have a PHP implementation of their Enterprise Security API which among other things includes methods for safe encoding and decoding input and output in your application.

编码器接口提供:

canonicalize (string $input, [bool $strict = true])
decodeFromBase64 (string $input)
decodeFromURL (string $input)
encodeForBase64 (string $input, [bool $wrap = false])
encodeForCSS (string $input)
encodeForHTML (string $input)
encodeForHTMLAttribute (string $input)
encodeForJavaScript (string $input)
encodeForOS (Codec $codec, string $input)
encodeForSQL (Codec $codec, string $input)
encodeForURL (string $input)
encodeForVBScript (string $input)
encodeForXML (string $input)
encodeForXMLAttribute (string $input)
encodeForXPath (string $input)

https://github.com/OWASP/PHP-ESAPI https://www.owasp.org/index.php/Category:OWASP_Enterprise_Security_API

这篇关于清理字符串以使它们的URL和文件名安全吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆