7021006: EDAC unable to process memory exception leading to panic

This document (7021006) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 12 Service Pack 1 (SLES 12 SP1)

Situation

When this issue occurs, the following two error/stack lines are often observed:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000038

IP: [<ffffffffa03746aa>] sbridge_mce_output_error+0x36a/0xdf0 [sb_edac]

Resolution

The required patches to avoid this issue are included in kernel 3.12.62-60.62.1 (released 2016-08-19) and later.

You may also be able to ‘blacklist’ the sb_edac module to avoid the issue until such time that the affected server/s can be patched.

SLES12 SP2 shipped with the required patches so this issue should not be seen on that version of the OS.

Cause

This is due to the use of newer hardware in conjunction with SUSE Linux Enterprise 12 SP1, where EDAC did not have access to needed information related to the hardware platform in order to enable the correct decode of memory generated errors.

It is caused by lack of some Broadwell-{EP,EX} EDAC patches.

This would lead to an error in the EDAC decode process, causing a kernel panic.

Disclaimer

This Support Knowledgebase provides a valuable tool for NetIQ/Novell/SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented “AS IS” WITHOUT WARRANTY OF ANY KIND.

Related:

Leave a Reply