从相对路径解析绝对路径 [英] Resolving absolute path from relative path

查看：192 发布时间：2018/5/2 18:45:50 php ruby-on-rails go web-crawler relative-path

本文介绍了从相对路径解析绝对路径的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在制作一个网络爬虫，我试图找出找出相对路径的绝对路径的方法。
我参加了2个考点。一个在ROR中，另一个在Pyro CMS中制作。

在后者中，我找到了链接为index.php的href标签。所以，如果我目前在 http://example.com/xyz 上爬行，那么我的抓取工具将追加并使其成为 http：// example.com/xyz/index.php 。但问题是，我应该追加到根目录，即它应该是 http://example.com/index.php 。因此，如果我抓取 http://example.com/xyz/index.php ，我会找到另一个index.php，它会被再次追加。

在ROR中，如果相对路径以'/'开头，我可以很容易地知道它是一个根站点。

我可以处理index.php的情况，但是如果我手动开始执行，可能会有很多规则需要处理。我相信有一个更简单的方法来完成这件事。

解决方案

在Go中，package path 是您的朋友。

您可以从路径中获取目录或文件夹使用 path.Dir（） ，例如

  p：=/xyz/index.php
 dir：= path.Dir（p）
 fmt.Println（dir：，dir）//输出：/ xyz

如果你发现一个带有根路径的链接（以斜线开头），你可以直接使用它。

如果它是相对的，你可以将它加入 dir 使用 路径。加入（） 。 Join（）也会清理网址：

  p2： = path.Join（dir，index.php）
 fmt.Println（p2：，p2）
 p3：= path.Join（dir，./index.php）
 fmt.Println（p3：，p3）
 p4：= path.Join（dir，../index.php）
 fmt.Println（p4：，p4 ）

输出：

  p2：/xyz/index.php 
 p3：/xyz/index.php 
 p4：/index.php

由 path.Join（）执行的清理任务由 path.Clean（） ，您可以手动调用任何路径课程。它们是：
$ b

用单斜杠替换多个斜线。

消除每个。路径名称元素（当前目录）。

消除每个内部 ..
code>路径名元素（父目录）以及它之前的非 .. 元素。消除 .. 开始根路径的元素：也就是用/ .. >/。

如果您有完整网址（包括架构，主机等），则可以使用 url.Parse（） 函数获取 url.URL 从原始url字符串中为你标记url，所以你可以得到这样的路径：

  uraw：=http://example.com/xyz/index.php 
u，err：= url.Parse（uraw）
 if err！= nil {
 fmt.Println（Invalid url：，err）
} 
 fmt .Println（Path：，u.Path）

输出：

 路径：/xyz/index.php

尝试去游乐场的所有例子。

I'm making a web-crawler and I'm trying to figure out a way to find out absolute path from relative path. I took 2 test sites. One in ROR and 1 made using Pyro CMS.

In the latter one, I found href tags with link "index.php". So, If I'm currently crawling at http://example.com/xyz, then my crawler will append and make it http://example.com/xyz/index.php. But the problem is that, I should be appending to root instead i.e. it should have been http://example.com/index.php. So if I crawl http://example.com/xyz/index.php, I'll find another "index.php" which gets appended again.

While in ROR, if the relative path starts with '/', I could've easily known that it is a root site.

I can handle the case of index.php, but there might be so many rules that I need to take care of if I start doing it manually. I'm sure there's an easier way to get this done.
解决方案
In Go, package path is your friend.

You can get the directory or folder from a path with path.Dir(), e.g.
p := "/xyz/index.php" dir := path.Dir(p) fmt.Println("dir:", dir) // Output: "/xyz"
If you find a link with root path (starts with a slash), you can use that as-is.

If it is relative, you can join it with the dir above using path.Join(). Join() will also "clean" the url:
p2 := path.Join(dir, "index.php") fmt.Println("p2:", p2) p3 := path.Join(dir, "./index.php") fmt.Println("p3:", p3) p4 := path.Join(dir, "../index.php") fmt.Println("p4:", p4)
Output:
p2: /xyz/index.php p3: /xyz/index.php p4: /index.php
The "cleaning" tasks performed by path.Join() are done by path.Clean() which you can manually call on any path of course. They are:

Replace multiple slashes with a single slash.

Eliminate each . path name element (the current directory).

Eliminate each inner .. path name element (the parent directory) along with the non-.. element that precedes it.

Eliminate .. elements that begin a rooted path: that is, replace "/.." by "/" at the beginning of a path.

And if you have a "full" url (with schema, host, etc.), you can use the url.Parse() function to obtain a url.URL value from the raw url string which tokenizes the url for you, so you can get the path like this:
uraw := "http://example.com/xyz/index.php" u, err := url.Parse(uraw) if err != nil { fmt.Println("Invalid url:", err) } fmt.Println("Path:", u.Path)
Output:
Path: /xyz/index.php
Try all the examples on the Go Playground.

这篇关于从相对路径解析绝对路径的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从相对路径解析绝对路径 [英] Resolving absolute path from relative path

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

从相对路径解析绝​​对路径 [英] Resolving absolute path from relative path

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

从相对路径解析绝对路径 [英] Resolving absolute path from relative path

登录关闭