使用正则表达式验证URL是否安全? [英] Is it safe to validate a URL with a regexp?

查看:164
本文介绍了使用正则表达式验证URL是否安全?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的网络应用中,我有一个表单字段,用户可以在其中输入网址。我已经在做一些初步的客户端验证,我想知道是否可以使用正则表达式来验证输入的字符串是否是有效的URL。所以,有两个问题:

In my web app I've got a form field where the user can enter an URL. I'm already doing some preliminary client-side validation and I was wondering if I could use a regexp to validate if the entered string is a valid URL. So, two questions:


  1. 使用regexp执行此操作是否安全? URL是一个复杂的野兽,就像你不应该使用正则表达式来解析HTML一样,我担心它也可能不适用于URL。

  2. 如果它可以完成后,这项任务的正常表现是什么? (我知道Google出现了无数的正则表达式,但我担心它们的质量)。

我的目标是防止出现这种情况URL出现在网页中并且浏览器无法使用。

My goal is to prevent a situation where the URL appears in the web page and is unusable by the browser.

推荐答案

嗯......也许。人们经常会问一个关于电子邮件地址的类似问题,而那些你需要一个非常复杂的正则表达式(即至少几页长)才能正确验证它们。我认为网址并不复杂(W3C有文件描述他们的格式)但是,你提出的任何合理的短正则表达式都可能会阻止一些有效的URL。

Well... maybe. People often ask a similar question about email addresses, and with those you would need a horrendously complicated regular expression (i.e. a couple pages long, at least) to correctly validate them. I don't think URLs are quite as complicated (the W3C has a document describing their format) but still, any reasonably short regexp you come up with will probably block some valid URLs.

我建议考虑你需要什么类型的URL接受。也许为了你的目的,阻止偶尔有效但奇怪的提交是好的,在这种情况下你可以使用匹配大多数URL的简单正则表达式,如Dobiatowski的答案中的那个。或者你可以使用一个接受所有有效URL和一些无效URL的正则表达式,如果它适合你。但是我要小心翼翼地试图找到一个正则表达式,它接受所有有效的URL并且没有无效的URL。如果你想以这种方式进行100%万无一失的验证,我建议使用我提到的第二种类型的客户端验证(接受一些无效的URL)并在服务器端进行更全面的检查,使用一些用于处理表单数据的任何语言的库。

I would suggest thinking about what kinds of URLs you need to be accepting. Maybe for your purposes, blocking the occasional valid-but-weird submission is fine, and in that case you can use a simple regex that matches most URLs, like the one in Dobiatowski's answer. Or you could use a regex that accepts all valid URLs and a few invalid ones, if that works for you. But I'd be wary of trying to find a regular expression that accepts exactly all valid URLs and no invalid ones. If you want to have 100% foolproof verification in that way, I'd suggest using a client-side validation of the second type I mentioned (that accepts a few invalid URLs) and doing a more comprehensive check on the server side, using some library in whatever language you are using to process the form data.

这篇关于使用正则表达式验证URL是否安全?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆