如何调试SEGV_ACCERR [英] How to debug SEGV_ACCERR
问题描述
我有一个应用程序,它使用 Kickflip 和
I have an app that streams video using Kickflip and ButterflyTV libRTMP
现在有99%的时间可以使应用正常运行,但是由于消息过于隐秘,我有时会遇到无法调试的本地分段错误:
Now for 99% percent of the time the app is working ok, but from time to time I get a native segmentation fault that I am not able to debug, since messages are too cryptic:
01-24 10:52:25.576 199-199/? A/DEBUG: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
01-24 10:52:25.576 199-199/? A/DEBUG: Build fingerprint: 'google/hammerhead/hammerhead:6.0.1/M4B30Z/3437181:user/release-keys'
01-24 10:52:25.576 199-199/? A/DEBUG: Revision: '11'
01-24 10:52:25.576 199-199/? A/DEBUG: ABI: 'arm'
01-24 10:52:25.576 199-199/? A/DEBUG: pid: 14302, tid: 14382, name: MuxerThread >>> tv.myapp.broadcast.dev <<<
01-24 10:52:25.576 199-199/? A/DEBUG: signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0x9fef1000
01-24 10:52:25.636 199-199/? A/DEBUG: Abort message: 'Setting to ready!'
01-24 10:52:25.636 199-199/? A/DEBUG: r0 9c6f9500 r1 9c6f94fc r2 9fee900c r3 00007ff4
01-24 10:52:25.636 199-199/? A/DEBUG: r4 9fee9010 r5 9fef0ffd r6 00007ff1 r7 9fef0d88
01-24 10:52:25.636 199-199/? A/DEBUG: r8 cfe40980 r9 9e0a6900 sl 00007ff4 fp 9c6f94fc
01-24 10:52:25.636 199-199/? A/DEBUG: ip 9c6f9058 sp 9c6f94dc lr 000000e9 pc b3a33cb6 cpsr 800f0030
01-24 10:52:25.650 199-199/? A/DEBUG: backtrace:
01-24 10:52:25.651 199-199/? A/DEBUG: #00 pc 00004cb6 /data/app/tv.myapp.broadcast.dev-2/lib/arm/librtmp-jni.so
01-24 10:52:25.651 199-199/? A/DEBUG: #01 pc 00005189 /data/app/tv.myapp.broadcast.dev-2/lib/arm/librtmp-jni.so (rtmp_sender_write_video_frame+28)
01-24 10:52:25.651 199-199/? A/DEBUG: #02 pc 00005599 /data/app/tv.myapp.broadcast.dev-2/lib/arm/librtmp-jni.so (Java_net_butterflytv_rtmp_1client_RTMPMuxer_writeVideo+60)
01-24 10:52:25.651 199-199/? A/DEBUG: #03 pc 014e84e7 /data/app/tv.myapp.broadcast.dev-2/oat/arm/base.odex (offset 0xa66000) (int net.butterflytv.rtmp_client.RTMPMuxer.writeVideo(byte[], int, int, int)+122)
01-24 10:52:25.651 199-199/? A/DEBUG: #04 pc 014dbd55 /data/app/tv.myapp.broadcast.dev-2/oat/arm/base.odex (offset 0xa66000) (void io.kickflip.sdk.av.muxer.RtmpMuxerMix.writeThread()+2240)
01-24 10:52:25.651 199-199/? A/DEBUG: #05 pc 014d8c41 /data/app/tv.myapp.broadcast.dev-2/oat/arm/base.odex (offset 0xa66000) (void io.kickflip.sdk.av.muxer.RtmpMuxerMix.access$000(io.kickflip.sdk.av.muxer.RtmpMuxerMix)+60)
01-24 10:52:25.651 199-199/? A/DEBUG: #06 pc 014d819f /data/app/tv.myapp.broadcast.dev-2/oat/arm/base.odex (offset 0xa66000) (void io.kickflip.sdk.av.muxer.RtmpMuxerMix$1.run()+98)
01-24 10:52:25.651 199-199/? A/DEBUG: #07 pc 721e78d1 /data/dalvik-cache/arm/system@framework@boot.oat (offset 0x1ed6000)
同样,在2小时的播放中,可能永远不会发生,或者可能在播放10分钟后发生.调试起来非常困难,因为我无法强迫该错误发生.
Again, in a 2 hour stream this might not ever happen or it might happen 10 minutes into the stream. It is super hard to debug because I cannot force the bug to happen.
有什么方法可以改善我得到的调试信息吗? SEGV_ACCER到底是什么意思?我读过这意味着您试图访问您无权访问的地址".但是我不确定这是什么意思,因为我可以连续播放数小时而不会发生错误.
Is there any way to improve the debugging information I get? What exactly does SEGV_ACCER mean? I've read that this "means you tried to access an address that you don't have permission to access." but I am unsure as what that means, as I can stream for hours without the bug happening.
有什么方法可以捕捉信号并继续下去吗?
Is there any way to catch the signal and just continue?
添加更多信息,这是应用程序崩溃的本机库的一部分(使用ndk-stack找到):
to add more information, this is the part of the native library where the app crashes (found using ndk-stack):
JNIEXPORT jint JNICALL
Java_net_butterflytv_rtmp_1client_RTMPMuxer_writeVideo(JNIEnv *env, jobject instance,
jbyteArray data_, jint offset, jint length,
jint timestamp) {
jbyte *data = (*env)->GetByteArrayElements(env, data_, NULL);
jint result = rtmp_sender_write_video_frame(data, length, timestamp, 0, 0);
(*env)->ReleaseByteArrayElements(env, data_, data, 0);
return result;
}
int rtmp_sender_write_video_frame(uint8_t *data,
int size,
uint64_t dts_us,
int key,
uint32_t abs_ts)
{
uint8_t * buf;
uint8_t * buf_offset;
int val = 0;
int total;
uint32_t ts;
uint32_t nal_len;
uint32_t nal_len_n;
uint8_t *nal;
uint8_t *nal_n;
char *output ;
uint32_t offset = 0;
uint32_t body_len;
uint32_t output_len;
buf = data;
buf_offset = data;
total = size;
ts = (uint32_t)dts_us;
//ts = RTMP_GetTime() - start_time;
offset = 0;
nal = get_nal(&nal_len, &buf_offset, buf, total);
(...)
}
static uint8_t * get_nal(uint32_t *len, uint8_t **offset, uint8_t *start, uint32_t total)
{
uint32_t info;
uint8_t *q ;
uint8_t *p = *offset;
*len = 0;
if ((p - start) >= total)
return NULL;
while(1) {
info = find_start_code(p, 3);
if (info == 1)
break;
p++;
if ((p - start) >= total)
return NULL;
}
q = p + 4;
p = q;
while(1) {
info = find_start_code(p, 3);
if (info == 1)
break;
p++;
if ((p - start) >= total)
//return NULL;
break;
}
*len = (p - q);
*offset = p;
return q;
}
static uint32_t find_start_code(uint8_t *buf, uint32_t zeros_in_startcode)
{
uint32_t info;
uint32_t i;
info = 1;
if ((info = (buf[zeros_in_startcode] != 1)? 0: 1) == 0)
return 0;
for (i = 0; i < zeros_in_startcode; i++)
if (buf[i] != 0)
{
info = 0;
break;
};
return info;
}
崩溃发生在find_start_code
中的buf[zeros_in_startcode]
.我也删除了一些android_log行(不要认为这很重要吗?).
Crash happens at buf[zeros_in_startcode]
in find_start_code
. I have removed a few android_log lines as well (dont think this matters?).
据我所知,该缓冲区应该是可访问的,它仅在有时"崩溃是没有意义的.
To my understanding, this buffer should be accessible, it makes no sense that it crashes only "sometimes".
PS.这是我从Java调用本机代码的地方:
PS. this is where I call the native code from Java:
private void writeThread() {
while (true) {
Frame frame = null;
synchronized (mBufferLock) {
if (!mConfigBuffer.isEmpty()) {
frame = mConfigBuffer.peek();
} else if (!mBuffer.isEmpty()) {
frame = mBuffer.remove();
}
if (frame == null) {
try {
mBufferLock.wait();
} catch (InterruptedException e) {
}
}
}
if (frame == null) {
continue;
} else if (frame instanceof Sentinel) {
break;
}
int writeResult = 0;
synchronized (mWriteFence) {
if (!mConnected) {
debug(WARN, "Skipping frame due to disconnection");
continue;
}
if (frame.getFrameType() == Frame.VIDEO_FRAME) {
writeResult = mRTMPMuxer.writeVideo(frame.getData(), frame.getOffset(), frame.getSize(), frame.getTime());
} else if (frame.getFrameType() == Frame.AUDIO_FRAME) {
writeResult = mRTMPMuxer.writeAudio(frame.getData(), frame.getOffset(), frame.getSize(), frame.getTime());
}
if (writeResult < 0) {
mRtmpListener.onDisconnected();
mConnected = false;
} else {
//Now we remove the config frame, only if sending was successful!
if (frame.isConfig()) {
synchronized (mBufferLock) {
mConfigBuffer.remove();
}
}
}
}
}
}
请注意,即使我根本不发送音频,崩溃也会发生.
Note that the crash happens even when I dont send audio at all.
推荐答案
您可以将数据存储在
byte[]
中.这允许从以下位置非常快速地进行访问: 托管代码.但是,从本机方面来说,您不能保证自己 无需复制就可以访问数据."
"You can store the data in a
byte[]
. This allows very fast access from managed code. On the native side, however, you're not guaranteed to be able to access the data without having to copy it."
请参见 https://developer.android.com/training/articles/perf-jni.html
一些沉思和尝试的事情:
Some musings and things to try:
- 代码落入的代码非常通用,因此那里可能没有错误
- 必须是
frame
数据已被删除/损坏/锁定/移动 - 是否已删除Java垃圾收集器或已重定位数据?
- 您可以将详细的调试信息写入文件,并在每个文件上覆盖该文件 框架,因此您只有一个带有最后调试信息的小日志.
- 将
frame
变量信息的本地副本(使用ByteBuffer
)发送到mRTMPMuxer.writeVideo
与常规的byte
缓冲区不同,在ByteBuffer
中,存储没有分配在托管的heap
上,并且可以始终直接从本机代码访问.
- The code where it falls over is very generic, so probably no bug there
- It must be the
frame
data has been removed/damaged/locked/moved - Has the Java garbage collector removed OR relocated the data ?
- You could write detailed debug to a file, overwriting it on every frame, so you only have a small log with the last debug info.
- send a local copy of the
frame
variable info (usingByteBuffer
) tomRTMPMuxer.writeVideo
Unlike regularbyte
buffers,inByteBuffer
the storage is not allocated on the managedheap
, and can always be accessed directly from native code.
//allocates memory from the native heap
ByteBuffer data = ByteBuffer.allocateDirect(frame.getData().length);
data.clear();
//System.gc();
//copy data
data.get(frame.getData(), 0, frame.getData().length);
//data = (frame.getData() == null) ? null : frame.getData().clone();
int offset = frame.getOffset();
int size = frame.getSize();
int time = frame.getTime();
writeResult = mRTMPMuxer.writeVideo(data , offset, size, time);
JNIEXPORT jint JNICALL
Java_net_butterflytv_rtmp_1client_RTMPMuxer_writeVideo(
JNIEnv *env,
jobject instance,
jobject data_, //NOT jbyteArray data_,
jint offset,
jint length,
jint timestamp)
{
jbyte *data = env->GetDirectBufferAddress(env, data);//GetDirectBufferAddress NOT GetByteArrayElements
jint result = rtmp_sender_write_video_frame(data, length, timestamp, 0, 0);
//(*env)->ReleaseByteArrayElements(env, data_, data, 0);//????
return result;
}
调试
static uint32_t find_start_code(uint8_t *buf, uint32_t zeros_in_startcode){
//...
try {
if ((info = (buf[zeros_in_startcode] != 1)? 0: 1) == 0) return 0;//your code
}
// You can catch std::exception for more generic error handling
catch (std::exception e){
throwJavaException (env, e.what());//see method below
}
//...
然后是一个新方法:
void throwJavaException(JNIEnv *env, const char *msg)
{
// You can put your own exception here
jclass c = env->FindClass("java/lang/RuntimeException");
if (NULL == c)
{
//B plan: null pointer ...
c = env->FindClass("java/lang/NullPointerException");
}
env->ThrowNew(c, msg);
}
}
不要太挂在SEGV_ACCERR
上,您会遇到分段错误SIGSEGV
(由试图读取或写入非法内存位置的程序引起,请根据您的情况读取).
来自siginfo.h:
Don't get too hung up on SEGV_ACCERR
, you have a segmentation fault,SIGSEGV
(caused by a program trying to read or write an illegal memory location, read in your case).
From siginfo.h:
SEGV_MAPERR表示您尝试访问未映射任何内容的地址. SEGV_ACCERR 意味着您试图访问您无权访问的地址.
SEGV_MAPERR means you tried to access an address that doesn't map to anything. SEGV_ACCERR means you tried to access an address that you don't have permission to access.
这可能很有趣:
This may be of interest:
问:我注意到有RTMP支持.但是一个删除的补丁 RTMP已合并.
问:您能告诉我为什么吗?
答:我们不 认为RTMP以及移动广播用例以及HLS都可以使用,
答:因此我们不想将有限的资源用于 支持它.
Q: I noticed that there was RTMP support. But a patch which remove RTMP had been merged.
Q: Could you tell me why ?
A: We don't think RTMP serves the mobile broadcasting use case as well as HLS,
A: and so we don't want to dedicate our limited resources towards supporting it.
参见:https://github.com/Kickflip/kickflip-android -sdk/issues/33
我建议您在以下位置注册问题:
https://github.com/Kickflip/kickflip-android-sdk/issues
https://github.com/ButterflyTV/LibRtmp-Client-for-Android/问题
I suggest you register an issue with:
https://github.com/Kickflip/kickflip-android-sdk/issues
https://github.com/ButterflyTV/LibRtmp-Client-for-Android/issues
这篇关于如何调试SEGV_ACCERR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!