如何将目录路径转换为唯一的数字标识符(Linux / C ++)? [英] How do I convert a directory path to a unique numerical identifier (Linux/C++)?

查看:178
本文介绍了如何将目录路径转换为唯一的数字标识符(Linux / C ++)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究如何获取目录(文件夹)并派生某种形式的唯一数字标识符。我调查了字符串哈希方法,但是, Pigeon Hole Principle 意味着永远不可能为每一个单独的字符串派生一个真正唯一的数字。



字符串到唯一哈希是不好的。



我最近一直在研究其他方法来实现我的目标,因此有以下问题要问:

目录时间邮票 - 它们有多'独特'?
'stat'报告的时间戳记是如何描述的这里(第二篇文章)?如果分辨率足够小,是否有可能多个文件夹在Linux系统上共享完全相同的时间戳?



如果有人有其他方法/技术,我想很高兴听到:)



编辑1 澄清我的用例以回应发布的答案到目前为止:我在Android平台上工作,所以文件系统没有与任何其他链接(当然除了Micro SD卡等可移动媒体除外)。



我是将每个路径插入到数据库中,但在查询表格时试图避免字符串比较。在这里,地图/ hashmaps的使用不是一种选择。是的,路径本身是唯一的,但理想情况下,我需要一个数字标识符,可用于查询表格而不是路径本身。每个路径的标识符也必须是唯一的。我用std :: collat​​e进行了实验,但发现哈希中存在许多碰撞(一个包含20,000条路径的数据集,约有100次碰撞)。更令人惊讶的是,每次运行我的应用程序时,哈希似乎都大不相同。我怀疑它是否以某种方式播种?

非常感谢,
P

解决方案在任何基于UNIX的系统上,都可以使用inode编号作为该文件系统中的唯一标识符。将它与设备号码结合起来可以使其在机器内独一无二。如果你希望它是全球唯一的,你可以引入系统的主要MAC地址。



请记住:


  1. 如果目录被移动或重命名,inode号码将跟随目录。如果目录被删除并被替换,它将会改变。

  2. inode号码在系统中不稳定,超出一两个真正特殊的目录。 (例如, / 通常是inode 2。)



I am investigating ways to take a directory (folder) and derive some form of unique numerical identifier. I have investigated "string to hash" methods, however, the Pigeon Hole Principle means that one can never derive a truely unique number for every single string.

String to unique hash is no good.

I have recently been investigating other means of achieving my goal and thus have the following question to ask:

Directory time stamps - how 'unique' are they? To what resolution are the time stamps reported by 'stat' as described here (second post)? if the resolution is small enough, is it possible for more than one folder to share the exact same time stamp on a Linux system?

If anyone has other methods/techniques they'd like to share, I'd be happy to listen :)

Edit 1 To clarify my use case in response to the answers posted so far: I am working on Android platforms, so the filesystem is not linked to any other (except of course for removeable media such as Micro SD cards).

I am inserting each path into a database but trying to avoid string comparisons when querying the table. The use of maps/hashmaps is not an option here. Yes, the path itself is unique, but ideally I need a numerical identifier that can be used to query the table as opposed to the path itself. The identifier must also be unique per path. I have experimented with std::collate but found there were many collides in the hashes (a dataset of 20, 000 paths yeilds approximatley 100 collides). What was even more surprising is that the hashes appeared to be largely different each time my application is run. I wonder if it's seeded somehow?

Many thanks, P

解决方案

On any UNIX-based system, you can use the inode number as a unique identifier within that file system. Combining it with the device number will make it unique within the machine. If you wanted it to be globally unique, you could throw in the system's primary MAC address.

Keep in mind, however, that:

  1. The inode number will "follow" the directory if it is moved or renamed. It will change if the directory is deleted and replaced.

  2. The inode number will not be stable across systems, beyond one or two really special directories. (For instance, / is usually inode 2.)

这篇关于如何将目录路径转换为唯一的数字标识符(Linux / C ++)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆