作为Upstart服务启动时,无法读取UTF-8文件名 [英] Can't read UTF-8 filenames when launched as an Upstart service

查看:161
本文介绍了作为Upstart服务启动时,无法读取UTF-8文件名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的Java程序以递归方式读取目录的内容。这是一个示例树(请注意非ASCII字符):

My Java program reads the contents of a directory recursively. This is a sample tree (note the non-ASCII characters):

./sviluppo
./sviluppo/ciaò
./sviluppo/ciaò/subdir
./sviluppo/pippo
./sviluppo/pippo/prova2.txt <-file
./sviluppo/così

该程序作为Upstart服务启动,配置文件名为 / init / myservice .conf

The program is started as an Upstart service, with a configuration file named like /init/myservice.conf

description "Private Service"
author "AD"
start on runlevel [2345]
stop on runlevel [! 2345]
exec java -jar /home/mainFind.jar >> /tmp/log.txt

当我启动服务时:

root@mdr:/tmp#  service myservice start
myservice start/running, process 15344

它不会在名称中记录带有非ASCII字符的文件名:

it doesn't log filenames with non-ASCII characters in the name:

root@mdr:/tmp#  cat /tmp/log.txt
Found dir: /mnt/sviluppo/pippo

相反,当我运行命令时(以root身份,模仿它作为服务启动时会发生什么)它可以正常工作,有和没有 exec

Instead, when I run the command (as root, to mimic what happens when it's started as a service) it works fine, with and without exec:

root@mdr:/tmp# java -jar /home/mainFind.jar  >> /tmp/log.txt
root@mdr:/tmp# exec java -jar /home/mainFind.jar  >> /tmp/log.txt

root@mdr:/tmp#  cat /tmp/log.txt
Found dir: /mnt/sviluppo/ciaò
Found dir: /mnt/sviluppo/ciaò/subdir
Found dir: /mnt/sviluppo/pippo
Found dir: /mnt/sviluppo/così

为什么同一个用户运行的同一个程序在Upstart服务中不起作用,但是从命令行运行时正确处理所有文件名?这是Java代码

Why the same program run by the same user doesn't work in an Upstart service, but correctly processes all of the filenames when run from the command line? Here is the Java code

public static void aggiungiFileDir(File f){
  File[] lista= f.listFiles();
  for(int i=0;i<lista.length;i++){
    if(lista[i].isDirectory()){
      System.out.println("Found dir: "+lista[i]); 
    }
  }
}

其中形式参数 f 是根目录。该函数将在每个子目录上递归调用。

Where the formal parameter f is the root dir. The function will be called recursively on each subdir.

编辑2:发布ls

root@mdr:/tmp# ls -al /mnt/sviluppo
totale 20
drwx------ 5 root root 4096 nov 15 15:10 .
drwxr-xr-x 7 root root 4096 nov  9 10:43 ..
drwxr-xr-x 2 root root 4096 nov 15 15:10 ciaò
drwxr-xr-x 2 root root 4096 nov 15 11:23 così
drwxr-xr-x 2 root root 4096 nov 15 17:57 pippo


推荐答案

Java使用本机调用列出目录的内容。底层C运行时依赖于 locale 概念,从文件系统存储的字节blob中构建Java String s作为文件名。

Java uses a native call to list the contents of a directory. The underlying C runtime relies on the locale concept to build Java Strings from the byte blob stored by the filesystem as the filename.

当您从shell(作为特权用户或非特权用户)执行Java程序时,它带有由变量组成的环境。读取变量 LANG 以将字节流转码为Java字符串,默认情况下,它在Ubuntu上与UTF-8编码相关联。

When you execute a Java program from a shell (either as a privileged user or an unprivileged one) it carries an environment made of variables. The variable LANG is read to transcode the stream of bytes to a Java String, and by default on Ubuntu it's associated to the UTF-8 encoding.

请注意,不需要从任何shell运行进程,但看看代码似乎Upstart足够聪明,可以理解配置文件中的命令何时从shell执行。因此,假设通过shell调用JVM,问题是未设置变量 LANG ,因此C运行时假定为默认字符集,这恰好是不是 UTF-8。解决方案在Upstart节中:

Note that a process need not to be run from any shell, but looking at the code it seems that Upstart is smart enough to understand when the command in the configuration file is intended to be executed from a shell. So, assuming that the JVM is invoked through a shell, the problem is that the variable LANG is not set, so the C runtime assumes a default charset, which happens to not be UTF-8. The solution is in the Upstart stanza:

description "List UTF-8 encoded filenames"
author "Raffaele Sgarro"
env LANG=en_US.UTF-8
script
  cd /workspace
  java -jar list.jar test > log.txt
end script

我用 en_US.UTF- 8 作为语言环境,但任何UTF-8支持的语言环境都可以。测试的来源 list.jar

I used en_US.UTF-8 as the locale, but any UTF-8 backed one will do just as well. The sources of the test list.jar

public static void main(String[] args) {
    for (File file : new File(args[0]).listFiles()) {
        System.out.println(file.getName());
    }
}

目录 / workspace / test 包含文件名,如àààèèè等等。现在您可以移动到数据库部分;)

The directory /workspace/test contains filenames like ààà, èèè and so on. Now you can move to the database part ;)

这篇关于作为Upstart服务启动时,无法读取UTF-8文件名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆