Java：从互联网上的目录中读取文本文件 [英] Java: Read in text files from a directory, from the internet

查看：125 发布时间：2017/11/3 19:00:35 java file input

本文介绍了Java：从互联网上的目录中读取文本文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有人知道如何递归读取来自互联网上特定目录的文件，在Java中？
我想读取这个网站目录中的所有文本文件： http://www.cs.ucdavis.edu/~davidson/courses/170-S11/Female/

我知道如何阅读在我的计算机上的文件夹中的多个文件中，以及如何从互联网上读取单个文件。但是我怎样才能读取互联网上的多个文件，而不硬编码的网址？

我试过的东西：

  //列出桌面上的文件
 final File folder = new File（/ Users / crystal / Desktop）; 
 File [] listOfFiles = folder.listFiles（）; 
 
 for（int i = 0; i< listOfFiles.length; i ++）{
 File fileEntry = listOfFiles [i]; 
 if（！fileEntry.isDirectory（））{
 System.out.println（fileEntry.getName（））; 
 
 
 $ / code>

我试过的另一件事：

  //从网上读取数据
尝试
 {
 //创建一个URL对象
 URL url = new URL（http://www.cs.ucdavis.edu/~davidson/courses/170-S11/Female/5_1_1.txt）; 
 
 //读取HTTP服务器返回的所有文本
 BufferedReader in = new BufferedReader（new InputStreamReader（url.openStream（）））; 
 
 String htmlText; //保存当前文件行的字符串
 
 //一次读取一行文件。 （（htmlText = in.readLine（））！= null）
 
 System.out.println（htmlText）; 
} 
 in.close（）; 
} catch（MalformedURLException e）{
 e.printStackTrace（）; 
} catch（IOException e）{
 //如果生成另一个异常，则打印堆栈跟踪
 e.printStackTrace（）; 
 
 $ / code>

谢谢！

解决方案

由于您提到的网址已启用索引，因此您很幸运。您在这里有几个选项。

 
 解析html以找到a标签的属性，使用SAX2或任何其他的XML解析器。 htmlunit也会工作，我想。
 
 使用一点正则表达式魔术来匹配< a href =和<$

一旦您将已经得到了你需要的所有URL的列表，然后第二块代码应该工作得很好。只需迭代你的列表，并从该列表构造你的URL。

这里有一个示例正则表达式应该匹配你想要的。它确实捕获了一些额外的链接，但你应该能够过滤出来。

 < a\ href = （？+）>

Does anybody know how to recursively read in files from a specific directory on the internet, in Java? I want to read in all the text files from this web directory: http://www.cs.ucdavis.edu/~davidson/courses/170-S11/Female/



I know how to read in multiple files that are in a folder on my computer, and I how to read in a single file from the internet. But how can I read in multiple files on the internet, without hardcoding the URLs in?

Stuff I tried:
// List the files on my Desktop
final File folder = new File("/Users/crystal/Desktop");
File[] listOfFiles = folder.listFiles();

for (int i = 0; i < listOfFiles.length; i++) {
    File fileEntry = listOfFiles[i];
    if (!fileEntry.isDirectory()) {
        System.out.println(fileEntry.getName());
    }
}
Another thing I tried: 
// Reading data from the web 
try 
{
    // Create a URL object
    URL url = new URL("http://www.cs.ucdavis.edu/~davidson/courses/170-S11/Female/5_1_1.txt");

    // Read all of the text returned by the HTTP server
    BufferedReader in = new BufferedReader (new InputStreamReader(url.openStream()));

    String htmlText;      // String that holds current file line

    // Read through file one line at a time. Print line
    while ((htmlText = in.readLine()) != null) 
    {
        System.out.println(htmlText);
    }
    in.close();
} catch (MalformedURLException e) {
    e.printStackTrace();
} catch (IOException e) {
    // If another exception is generated, print a stack trace
    e.printStackTrace();
}
Thanks!
 解决方案 
Since the URL you mentioned has indexes enabled, you're in luck.
You've got a few options here.

Parse the html to find the attribute of the a tags, using SAX2 or any other XML parser. htmlunit would also work I think.
Use a little regexp magic to match all string between <a href=" and "> and use that as the urls to read from.
Once you've got a list of all the URLs you need, then the second piece of code should work just fine. Just iterate over your list, and construct your URL from that list.

Here's a sample regex that should match what you want. It does catch a few extra links, but you should be able to filter those out.
<a\ href="(.+?)">


                        
这篇关于Java：从互联网上的目录中读取文本文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

Java：从互联网上的目录中读取文本文件 [英] Java: Read in text files from a directory, from the internet

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Java：从互联网上的目录中读取文本文件 [英] Java: Read in text files from a directory, from the internet

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭