文件内的任何非二进制文件中提取的字符串不论其位置的 [英] Extracting string from any non-binary file irrespective of its location within file

查看:99
本文介绍了文件内的任何非二进制文件中提取的字符串不论其位置的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

OK,这里有一个问题我一直不成功设法应付,写一个批处理脚本。假设我有包含,比如说一个文件,一些YouTube的地址(例如链接到YouTube的网页的HTML文件)。

OK, here is a problem I have been unsuccessfully trying to cope with, writing a batch script. Suppose I have a file containing, say, some youtube addresses (for example a html file with links to youtube pages).

该文件的内容可能是这样的:

The content of the file may look like this:

Blaaaa blaa
blaa blaa blaa <a href=https://www.youtube.com/watch?v=9bZkp7q19f0>Gangnam1</a> blaaa blaa
<a href=https://www.youtube.com/watch?v=kYtGl1dX5qI&list=RD9bZkp7q19f0>Scream and shout</a> blaa blaa
blaaaaa <a href=https://www.youtube.com/watch?v=lWA2pjMjpBs&list=RD9bZkp7q19f0>Diamonds</a> blaa
blaa bla bla

的字符串会使用通配符掩码可以发现,像这样的:

The strings will be found using wildcard character mask, like this:

https://www.youtube.com/watch\?v=*> 

(或这种东西)

和保存在另一个文件应该作为输出如下:

And the output saved in another file should look as follows:

https://www.youtube.com/watch?v=9bZkp7q19f0>
https://www.youtube.com/watch?v=kYtGl1dX5qI&list=RD9bZkp7q19f0>
https://www.youtube.com/watch?v=lWA2pjMjpBs&list=RD9bZkp7q19f0>

当然可以搜索也把其他字符串,不仅相关的YouTube

The search may of course regards also other strings, not only youtube related.

像找到或FINDSTR简单的命令不能使用,因为它们返回包含字符串整行。同样,对于与令牌和定界符似乎很少使用这里,作为要找到的字符串遍布文件被不规则分散,有时在同一行几

Simple commands like FIND or FINDSTR cannot be used, as they return the whole line containing the string. Similarly, FOR with tokens and delimiters seems to be of little use here, as the strings to be found are scattered irregularly all over the file, sometimes a few in the same line.

我真的不知道该如何解决这个问题。它看似简单,直到现在我还没有找到一个脚本或程序,将给予这样的输出。或许有,甚至存在准备好了,编译的程序去做。我欠了很多的任何帮助。

I really do not know how to solve this problem. It may seem simple, still I have never found a script or program that would give an output like that. Perhaps there even exists a ready, compiled program to do it. I will owe a lot for any help.

推荐答案

我会用另一种脚本语言如蝙蝠做到这一点。
在这里,我的AutoIt了一个小为例:

I'll use another scripting language as Bat to do that. Here I made a little exemple in Autoit :

StringBetween.au3

#include <String.au3>
Local $hOutFile=FileOpen("output.txt",2)
Local $hTexte=FileRead($CmdLine[1])
$AFind=_StringBetween($hTexte,$cmdline[2],$cmdline[3])
For $i= 0 To UBound($Afind)-1 step 1
   FileWrite($hOutFile,$AFind[$i]&@crlf)
Next
FileClose($hOutFile)

您可以自己编译或下载它已经在这里编译:

You can compile it yourself or Download it already compiled here :

StringBetween.rar

用法:

Stringbetween [INPUTFILE] [StringRight] [StringLeft]

Stringbetween [InPutFile] [StringRight] [StringLeft]

输出继电器:Output.txt的

Ouput : "Output.txt"

在您的情况:

Stringbetween.exe "example.html" "<a href=" ">"

一个文件Output.txt的将被创建:

https://www.youtube.com/watch?v=9bZkp7q19f0
https://www.youtube.com/watch?v=kYtGl1dX5qI&list=RD9bZkp7q19f0
https://www.youtube.com/watch?v=lWA2pjMjpBs&list=RD9bZkp7q19f0

这篇关于文件内的任何非二进制文件中提取的字符串不论其位置的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆