在OS X应用程序中查找文件描述符泄漏 [英] Locating file descriptor leak in OS X application

查看:112
本文介绍了在OS X应用程序中查找文件描述符泄漏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些非常复杂的应用程序.它是情侣库的组成. 现在,质量检查小组发现了一些问题(某些东西报告错误).
Fromm日志我可以看到应用程序正在泄漏文件描述符(自动测试7小时后+1000). 质量检查小组已经从活动监视器"提供了融洽的打开的文件和端口",我确切地知道没有关闭与哪个服务器的连接.

I have some very complex application. It is composition of couple libraries. Now QA team found the some problem (something reports an error).
Fromm logs I can see that application is leaking a file descriptors (+1000 after 7 hours of automated tests). QA team has delivered rapport "opened files and ports" from "Activity monitor" and I know exactly to which server connection is not closed.

从完整的应用程序日志中,我可以看到泄漏是非常系统的(没有突然的爆发),但是我无法重现问题,甚至看不到文件描述符的很小泄漏.

From full application logs I can see that leak is quite systematic (there is no sudden burst), but I was unable to reproduce issue to see even a small leak of file descriptors.

即使您确定哪个服务器连接从未关闭,也无法找到负责的代码. 我无法重现问题.
在日志中,我可以看到我的库维护的所有资源均已正确释放,服务器地址仍然表明这是我的责任或NSURLSession(已失效).

Even thou I'm sure for which server connection is never closed, I'm unable to find code responsible. I'm unable reproduce issue.
In logs I can see that all resources my library maintains are properly freed, still server address suggest this is my responsibility or NSURLSession (which is invalidated).

由于还有其他库和应用程序代码本身,因此由第三方代码引起泄漏的可能性很小.

Since there are other libraries and application code it self there is small chance that leak is caused by third party code.

如何找到负责泄漏文件描述符的代码? 最好的选择是使用dtruss,其中看起来非常有前途. 从文档我可以使用系统API时,它可以显示堆栈回溯-s.
问题是我不知道如何使用这种方式,以致我不会被大量信息淹没. 我只需要谁创建打开的文件描述符以及关闭它的信息. 由于我无法重现问题,因此我需要一个可以由质量检查小组运行的脚本,这样可以为我提供输出.

How to locate code responsible for leaking file descriptor? Best candidate is use dtruss which looks very promising. From documentation I can see it can print stack backtraces -s when system API is used.
Problem is that I do not know how to use this in such way that I will not get flooded with information. I need only information who created opened file descriptor and if it was closed destroyed. Since I can't reproduce issue I need a script which could be run by QA team so the could deliver me an output.

如果还有其他方法可以找到文件描述符泄漏的来源,请告诉我.

If there are other ways to find the source of file descriptor leak please let me know.

There is bunch of predefined scripts which are using dtruss, but I don't see anything what is matching my needs.

奇怪的是,我知道的唯一代码是使用有问题的连接,不直接使用文件描述符,而是使用自定义NSURLSession(配置为:每个主机一个连接,最低TLS 1.0,禁用cookie,自定义证书验证).从日志中,我可以看到NSURLSession无效.我怀疑NSURLSession是泄漏源,但是目前这是唯一的候选对象.

What is strange the only code I'm aware is using problematic connection, do not use file descriptors directly, but uses custom NSURLSession (configured as: one connection per host, minimum TLS 1.0, disable cookies, custom certificate validation). From logs I can see NSURLSession is invalidated properly. I doubt NSURLSession is source of leak, but currently this is the only candidate.

推荐答案

好的,我发现了如何做-无论如何在Solaris 11上.我得到以下输出(是的,在Solaris 11上我需要root):

OK, I found out how to do it - on Solaris 11, anyway. I get this output (and yes, I needed root on Solaris 11):

bash-4.1# dtrace -s fdleaks.d -c ./fdLeaker
open( './fdLeaker' ) returned 3
open( './fdLeaker' ) returned 4
open( './fdLeaker' ) returned 5
falloc fp: ffffa1003ae56590, fd: 3, saved fd: 3
falloc fp: ffffa10139d28f58, fd: 4, saved fd: 4
falloc fp: ffffa10030a86df0, fd: 5, saved fd: 5

opened file: ./fdLeaker
leaked fd: 3


              libc.so.1`__systemcall+0x6
              libc.so.1`__open+0x29
              libc.so.1`open+0x84
              fdLeaker`main+0x2b
              fdLeaker`_start+0x72

opened file: ./fdLeaker
leaked fd: 4


              libc.so.1`__systemcall+0x6
              libc.so.1`__open+0x29
              libc.so.1`open+0x84
              fdLeaker`main+0x64
              fdLeaker`_start+0x72

找到泄漏的文件描述符的fdleaks.d dTrace脚本:

The fdleaks.d dTrace script that finds leaked file descriptors:

#!/usr/sbin/dtrace

/* this will probably need tuning
   note there can be significant performance
   impacts if you make these large */
#pragma D option nspec=4
#pragma D option specsize=128k

#pragma D option quiet

syscall::open*:entry
/ pid == $target /
{
    /* arg1 might not have a physical mapping yet so
       we can't call copyinstr() until open() returns
       and we don't have a file descriptor yet -
       we won't get that until open() returns anyway */
    self->path = arg1;
}

/* arg0 is the file descriptor being returned */
syscall::open*:return
/ pid == $target && arg0 >= 0  && self->path /
{
    /* get a speculation ID tied to this
       file descriptor and start speculative
       tracing */
    openspec[ arg0 ] = speculation();
    speculate( openspec[ arg0 ] );

    /* this output won't appear unless the associated
       speculation id is commited */
    printf( "\nopened file: %s\n", copyinstr( self->path ) );
    printf( "leaked fd: %d\n\n", arg0 );
    ustack();

    /* free the saved path */
    self->path = 0;
}

syscall::close:entry
/ pid == $target && arg0 >= 0 /
{
    /* closing the fd, so discard the speculation
       and free the id by setting it to zero */
    discard( openspec[ arg0 ] );
    openspec[ arg0 ] = 0;
}

/* Solaris uses falloc() to open a file and associate
   the fd with an internal file_t structure

    When the kernel closes file descriptors that the
    process left open, it uses the closeall() function
    which walks the internal structures then calls
    closef() using the file_t *, so there's no way
    to get the original process file descritor in
    closeall() or closef() dTrace probes.

    falloc() is called on open() to associate the
    file_t * with a file descriptor, so this
    saves the pointers passed to falloc()
    that are used to return the file_t * and
    file descriptor once they're filled in
    when falloc() returns */
fbt::falloc:entry
/ pid == $target /
{
   self->fpp = args[ 2 ];
   self->fdp = args[ 3 ];
}


/* Clause-local variables to make casting clearer */
this int fd;
this uint64_t fp;

/* array to associate a file descriptor with its file_t *
   structure in the kernel */
int fdArray[ uint64_t fp ];

fbt::falloc:return
/ pid == $target && self->fpp && self->fdp /
{
    /* get the fd and file_t * values being
       returned to the caller */
    this->fd = ( * ( int * ) self->fdp );
    this->fp = ( * ( uint64_t * ) self->fpp );

    /* associate the fd with its file_t * */
    fdArray[ this->fp ] = ( int ) this->fd;

    /* verification output */
    printf( "falloc fp: %x, fd: %d, saved fd: %d\n", this->fp, this->fd, fdArray[ this->fp ] );
}

/* if this gets called and the dereferenced
   openspec array element is a still-valid
   speculation id, the fd associated with
   the file_t * passed to closef() was never
   closed by the process itself */
fbt::closef:entry
/ pid == $target /
{
    /* commit the speculative tracing since
       this file descriptor was leaked */
    commit( openspec[ fdArray[ arg0 ] ] );
}

首先,我编写了这个小C程序来泄漏fds:

First, I wrote this little C program to leak fds:

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#include <stdio.h>

#include <unistd.h>

int main( int argc, char **argv )
{
    int ii;

    for ( ii = 0; ii < argc; ii++ )
    {
        int fd = open( argv[ ii ], O_RDONLY );
        fprintf( stderr, "open( '%s' ) returned %d\n", argv[ ii ], fd );
        fd = open( argv[ ii ], O_RDONLY );
        fprintf( stderr, "open( '%s' ) returned %d\n", argv[ ii ], fd );
        fd = open( argv[ ii ], O_RDONLY );
        fprintf( stderr, "open( '%s' ) returned %d\n", argv[ ii ], fd );
        close( fd );
    }
    return( 0 );
}

然后我在dTrace脚本下运行它,以弄清内核如何关闭孤立的文件描述符dtrace -s exit.d -c ./fdLeaker:

Then I ran it under this dTrace script to figure out what the kernel does to close orphaned file descriptors, dtrace -s exit.d -c ./fdLeaker:

#!/usr/sbin/dtrace -s

#pragma D option quiet

syscall::rexit:entry
{
    self->exit = 1;
}

syscall::rexit:return
/ self->exit /
{
    self->exit = 0;
}

fbt:::entry
/ self->exit /
{
    printf( "---> %s\n", probefunc );
}

fbt:::return
/ self->exit /
{
    printf( "<--- %s\n", probefunc );
}

这产生了很多输出,我注意到

That produced a lot of output, and I noticed closeall() and closef() functions, examined the source code, and wrote the dTrace script.

还请注意,Solaris 11上的进程退出dTrace探针是rexit探针-在OSX上可能会更改.

Note also that the process exit dTrace probe on Solaris 11 is the rexit one - that probably changes on OSX.

Solaris上最大的问题是在内核代码中获取文件的文件描述符,以关闭孤立的文件描述符. Solaris不会通过文件描述符关闭,它会在进程的内核打开文件结构中通过struct file_t指针关闭.因此,我不得不检查Solaris源文件来确定fd与file_t *关联的位置-且位于

The biggest problem on Solaris is getting the file descriptor for the file in the kernel code that closes orphaned file descriptors. Solaris doesn't close by file descriptor, it closes by struct file_t pointers in the kernel open files structures for the process. So I had to examine the Solaris source to figure out where the fd is associated with the file_t * - and that's in the falloc() function. The dTrace script associates a file_t * with its fd in an associative array.

这些都不可能在OSX上运行.

None of that is likely to work on OSX.

如果幸运的话,OSX内核会通过文件描述符本身关闭孤立的文件描述符,或者至少提供一些告诉您fd正在关闭的信息,也许是审计功能.

If you're lucky, the OSX kernel will close orphaned file descriptors by the file descriptor itself, or at least provide something that tells you the fd is being closed, perhaps an auditing function.

这篇关于在OS X应用程序中查找文件描述符泄漏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆