Java:从互联网上的目录中读取文本文件 [英] Java: Read in text files from a directory, from the internet
问题描述
有人知道如何递归读取来自互联网上特定目录的文件,在Java中?
我想读取这个网站目录中的所有文本文件: http://www.cs.ucdavis.edu/~davidson/courses/170-S11/Female/
我知道如何阅读在我的计算机上的文件夹中的多个文件中,以及如何从互联网上读取单个文件。但是我怎样才能读取互联网上的多个文件,而不硬编码的网址?
我试过的东西:
//列出桌面上的文件
final File folder = new File(/ Users / crystal / Desktop);
File [] listOfFiles = folder.listFiles();
for(int i = 0; i< listOfFiles.length; i ++){
File fileEntry = listOfFiles [i];
if(!fileEntry.isDirectory()){
System.out.println(fileEntry.getName());
$ / code>
我试过的另一件事:
//从网上读取数据
尝试
{
//创建一个URL对象
URL url = new URL(http://www.cs.ucdavis.edu/~davidson/courses/170-S11/Female/5_1_1.txt);
//读取HTTP服务器返回的所有文本
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String htmlText; //保存当前文件行的字符串
//一次读取一行文件。 ((htmlText = in.readLine())!= null)
System.out.println(htmlText);
}
in.close();
} catch(MalformedURLException e){
e.printStackTrace();
} catch(IOException e){
//如果生成另一个异常,则打印堆栈跟踪
e.printStackTrace();
$ / code>
谢谢!
由于您提到的网址已启用索引,因此您很幸运。
您在这里有几个选项。
- 解析html以找到a标签的属性,使用SAX2或任何其他的XML解析器。 htmlunit也会工作,我想。
- 使用一点正则表达式魔术来匹配
< a href =
和<$
一旦您将已经得到了你需要的所有URL的列表,然后第二块代码应该工作得很好。只需迭代你的列表,并从该列表构造你的URL。
这里有一个示例正则表达式应该匹配你想要的。它确实捕获了一些额外的链接,但你应该能够过滤出来。
< a\ href = (?+)>
Does anybody know how to recursively read in files from a specific directory on the internet, in Java? I want to read in all the text files from this web directory: http://www.cs.ucdavis.edu/~davidson/courses/170-S11/Female/
I know how to read in multiple files that are in a folder on my computer, and I how to read in a single file from the internet. But how can I read in multiple files on the internet, without hardcoding the URLs in?
Stuff I tried:
// List the files on my Desktop
final File folder = new File("/Users/crystal/Desktop");
File[] listOfFiles = folder.listFiles();
for (int i = 0; i < listOfFiles.length; i++) {
File fileEntry = listOfFiles[i];
if (!fileEntry.isDirectory()) {
System.out.println(fileEntry.getName());
}
}
Another thing I tried:
// Reading data from the web
try
{
// Create a URL object
URL url = new URL("http://www.cs.ucdavis.edu/~davidson/courses/170-S11/Female/5_1_1.txt");
// Read all of the text returned by the HTTP server
BufferedReader in = new BufferedReader (new InputStreamReader(url.openStream()));
String htmlText; // String that holds current file line
// Read through file one line at a time. Print line
while ((htmlText = in.readLine()) != null)
{
System.out.println(htmlText);
}
in.close();
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
// If another exception is generated, print a stack trace
e.printStackTrace();
}
Thanks!
Since the URL you mentioned has indexes enabled, you're in luck. You've got a few options here.
- Parse the html to find the attribute of the a tags, using SAX2 or any other XML parser. htmlunit would also work I think.
- Use a little regexp magic to match all string between
<a href="
and">
and use that as the urls to read from.
Once you've got a list of all the URLs you need, then the second piece of code should work just fine. Just iterate over your list, and construct your URL from that list.
Here's a sample regex that should match what you want. It does catch a few extra links, but you should be able to filter those out.
<a\ href="(.+?)">
这篇关于Java:从互联网上的目录中读取文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!