ftrace:通过echo从function_graph更改current_tracer时,系统崩溃 [英] ftrace: system crash when changing current_tracer from function_graph via echo

查看:737
本文介绍了ftrace:通过echo从function_graph更改current_tracer时,系统崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近一直在使用ftrace来监视系统的某些行为特征.我一直在通过一个小的脚本来打开/关闭跟踪.运行脚本后,我的系统将崩溃并自行重启.最初,我认为脚本本身可能存在错误,但是从那以后,我确定崩溃和重新启动是echocurrent_tracer时将一些示踪剂添加到/sys/kernel/debug/tracing/current_tracer的结果设置为function_graph.

I have been playing with ftrace recently to monitor some behavior characteristics of my system. I've been handling switching the trace on/off via a small script. After running the script, my system would crash and reboot itself. Initially, I believed that there might be an error with the script itself, but I have since determined that the crash and reboot is a result of echoing some tracer to /sys/kernel/debug/tracing/current_tracer when current_tracer is set to function_graph.

也就是说,以下命令序列将导致崩溃/重启:

That is, the following sequence of commands will produce the crash/reboot:

echo "function_graph" > /sys/kernel/debug/tracing/current_tracer
echo "function" > /sys/kernel/debug/tracing/current_tracer

在由于上述echo语句导致的崩溃之后重新启动,我看到很多输出内容如下:

Durning the reboot after the crash caused by the above echo statements, I see a lot of output that reads:

清除孤立的inode <inode>

我试图通过将function_graph中的current_tracer值替换为C程序中的其他内容来重现此问题:

I tried to reproduce this problem by replacing the current_tracer value from function_graph to something else in a C program:

#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>

int openCurrentTracer()
{
        int fd = open("/sys/kernel/debug/tracing/current_tracer", O_WRONLY);
        if(fd < 0)
                exit(1);

        return fd;
}

int writeTracer(int fd, char* tracer)
{
        if(write(fd, tracer, strlen(tracer)) != strlen(tracer)) {
                printf("Failure writing %s\n", tracer);
                return 0;
        }

        return 1;
}

int main(int argc, char* argv[])
{
        int fd = openCurrentTracer();

        char* blockTracer = "blk";
        if(!writeTracer(fd, blockTracer))
                return 1;
        close(fd);

        fd = openCurrentTracer();
        char* graphTracer = "function_graph";
        if(!writeTracer(fd, graphTracer))
                return 1;
        close(fd);

        printf("Preparing to fail!\n");

        fd = openCurrentTracer();
        if(!writeTracer(fd, blockTracer))
                return 1;
        close(fd);

        return 0;
}

奇怪的是,C程序不会使我的系统崩溃.

Oddly enough, the C program does not crash my system.

我最初在使用Ubuntu(Unity环境)16.04 LTS时遇到此问题,并确认它是4.4.0和4.5.5内核上的问题.我还在4.2.0和4.5.5内核上运行Ubuntu(配合环境)15.10的计算机上测试了此问题,但无法重现该问题.这只会让我更加困惑.

I originally encountered this problem while using Ubuntu (Unity environment) 16.04 LTS and confirmed it to be an issue on the 4.4.0 and 4.5.5 kernels. I have also tested this issue on a machine running Ubuntu (Mate environment) 15.10, on the 4.2.0 and 4.5.5 kernels, but was unable to reproduce the issue. This has only confused me further.

任何人都可以让我了解正在发生的事情吗?具体来说,为什么我可以write()但不能echo到/sys/kernel/debug/tracing/current_tracer?

Can anyone give me insight on what is happening? Specifically, why would I be able to write() but not echo to /sys/kernel/debug/tracing/current_tracer?

更新

正如vielmetti所指出的,其他人也有类似的问题(见此处).

As vielmetti pointed out, others have had a similar issue (as seen here).

ftrace_disable_ftrace_graph_caller()在以下位置修改jmp指令 ftrace_graph_call假设它在jmp(e9)附近有5个字节. 但是,这是一个简短的jmp,仅包含2个字节(eb).和 ftrace_stub()位于ftrace_graph_caller的正下方,因此 上面的修改破坏了导致内核oops的指令 ftrace_stub()具有无效的操作码,如下所示:

The ftrace_disable_ftrace_graph_caller() modifies jmp instruction at ftrace_graph_call assuming it's a 5 bytes near jmp (e9 ). However it's a short jmp consisting of 2 bytes only (eb ). And ftrace_stub() is located just below the ftrace_graph_caller so modification above breaks the instruction resulting in kernel oops on the ftrace_stub() with the invalid opcode like below:

修补程序(如下所示)解决了echo问题,但是我仍然不明白为什么echo之前被破坏了,而write()没有被破坏.

The patch (shown below) solved the echo issue, but I still do not understand why echo was breaking previously when write() was not.

diff --git a/arch/x86/kernel/mcount_64.S b/arch/x86/kernel/mcount_64.S
index ed48a9f465f8..e13a695c3084 100644
--- a/arch/x86/kernel/mcount_64.S
+++ b/arch/x86/kernel/mcount_64.S
@@ -182,7 +182,8 @@ GLOBAL(ftrace_graph_call)
    jmp ftrace_stub
  #endif

 -GLOBAL(ftrace_stub)
 +/* This is weak to keep gas from relaxing the jumps */
 +WEAK(ftrace_stub)
    retq
  END(ftrace_caller)

通过 https://lkml.org/lkml/2016/5/16/493

推荐答案

看起来您不是唯一注意到此行为的人.我明白了

Looks like you are not the only person to notice this behavior. I see

作为问题的报告,

作为解决该问题的内核的补丁.通读整个线程,看来问题出在某些编译器优化上.

as a patch to the kernel that addresses it. Reading through that whole thread it appears that the issue is some compiler optimizations.

这篇关于ftrace:通过echo从function_graph更改current_tracer时,系统崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆