7017833: SLES12 SP1 ESXi guest with kernel 3.12.57-60.35-default is crashing with an invalid RIP:

This document (7017833) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 12 Service Pack 1 (SLES 12 SP1)

Kernel: 3.12.57-60.35-default or older.

Situation

SLES12 SP1 guests running on VMware ESXi 6.0 may crash.

After a core dump has been saved, the ‘dmesg’ log will show entries like (edited for better readability):

bad: scheduling from the idle thread!

CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.12.57-60.35-default #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015
0000000000000000 ffffffff815202d0 ffff88083fc11f80 ffff88083fc03c60
ffffffff8108f997 ffffffff81c11460 ffffffff81523a01 ffffffff81c11460
ffffffff81c01fd8 ffffffff81c01fd8 ffffffff81c01fd8 ffffffff81c11460
Call Trace:
[<ffffffff8100475d>] dump_trace+0x7d/0x2d0
[<ffffffff81004a44>] show_stack_log_lvl+0x94/0x170
[<ffffffff81005d01>] show_stack+0x21/0x50
[<ffffffff815202d0>] dump_stack+0x5d/0x78
[<ffffffff8108f997>] dequeue_task_idle+0x27/0x30
[<ffffffff81523a01>] thread_return+0x232/0x4a1
[<ffffffff81521ed9>] schedule_timeout+0x269/0x300
[<ffffffff81522c7f>] wait_for_completion+0x9f/0x110
[<ffffffff8107883b>] wait_rcu_gp+0x4b/0x60
[<ffffffffa017b715>] vmci_event_unsubscribe+0x75/0xb0 [vmw_vmci]
[<ffffffffa02c55cd>] vmci_transport_destruct+0x1d/0xe0 [vmw_vsock_vmci_transport]
[<ffffffffa02db7e3>] vsock_sk_destruct+0x13/0x60 [vsock]
[<ffffffff8142438a>] __sk_free+0x1a/0x170
[<ffffffffa02c61f8>] vmci_transport_recv_stream_cb+0x1e8/0x2d0 [vmw_vsock_vmci_transport]
[<ffffffffa017acda>] vmci_datagram_invoke_guest_handler+0xaa/0xd0 [vmw_vmci]
[<ffffffffa017bb71>] vmci_dispatch_dgs+0xc1/0x200 [vmw_vmci]
[<ffffffff8105e03c>] tasklet_action+0x11c/0x130
[<ffffffff8105e61d>] __do_softirq+0xed/0x280
[<ffffffff8153009c>] call_softirq+0x1c/0x30
[<ffffffff810046a5>] do_softirq+0x55/0x90
[<ffffffff8105e905>] irq_exit+0x95/0xa0
[<ffffffff8153088e>] do_IRQ+0x4e/0xb0
[<ffffffff8152686d>] common_interrupt+0x6d/0x6d
[<ffffffff81042762>] native_safe_halt+0x2/0x10
[<ffffffff8100b2a9>] default_idle+0x19/0xd0
[<ffffffff810b1181>] cpu_startup_entry+0xe1/0x2b0
[<ffffffff81d50ea1>] start_kernel+0x43e/0x449
[<ffffffff81d506a3>] x86_64_start_kernel+0x10c/0x11b
BUG: scheduling while atomic: swapper/0/0/0x00000100

followed by the following stack trace:

 BUG: unable to handle kernel NULL pointer dereference at (null)

IP: [< (null)>] (null)
PGD b6202067 PUD b6066067 PMD 0
Oops: 0010 [#1] SMP
Modules linked in: binfmt_misc nfsv3 nfs_acl nfs lockd sunrpc fscache af_packet iscsi_ibft iscsi_boot_sysfs msr vmw_vsock_vmci_transport vsock coretemp cr
ct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel(X) aesni_intel aes_x86_64 ppdev lrw gf128mul vmw_balloon glue_helper ablk_helper cryptd pcspkr serio_raw vmxn
et3 parport_pc parport vmw_vmci shpchp battery processor button ac xfs libcrc32c sr_mod cdrom ata_generic sd_mod ata_piix ahci libahci vmwgfx ttm drm libata vmw_pvscsi dm
_mod sg scsi_mod autofs4
Supported: Yes, External
CPU: 4 PID: 28 Comm: ksoftirqd/4 Tainted: G W X 3.12.49-11-default #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015
task: ffff880813f788c0 ti: ffff880813f7c000 task.ti: ffff880813f7c000
RIP: 0010:[<0000000000000000>] [< (null)>] (null)
RSP: 0018:ffff880813f7dcf0 EFLAGS: 00010082
RAX: 0000000000000000 RBX: ffff880813e3c380 RCX: 000000000000a7f0
RDX: 0000000000000005 RSI: ffff880813e3c380 RDI: ffff88083fd11f40
RBP: ffff880813f7dd08 R08: 0000000000000004 R09: 0000000000000004
R10: 0000000000000550 R11: 0000000000000550 R12: ffff88083fd11f40
R13: 0000000000000000 R14: 0000000000000046 R15: ffff88083fd11f40
FS: 0000000000000000(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000007fb72f000 CR4: 00000000001407e0
Stack:
ffffffff81088de3 ffff880813e3c380 ffff880813e3cde4 ffff880813f7dd58
ffffffff8108aee2 ffff88083fd0c778 ffff88083fd0c7b8 0000d31500000004
0000000000000001 0000000000000000 0000000000000000 0000000000000003
Call Trace:
Inexact backtrace:
[<ffffffff81088de3>] ? ttwu_do_activate.constprop.88+0x33/0x70
[<ffffffff8108aee2>] ? try_to_wake_up+0x1f2/0x2e0
[<ffffffff8107ba67>] ? __wake_up_common+0x57/0x90
[<ffffffff81085087>] ? complete+0x37/0x50
[<ffffffff810ff52a>] ? rcu_process_callbacks+0x1ca/0x530
[<ffffffff8105e1e5>] ? __do_softirq+0xe5/0x230
[<ffffffff8105e35d>] ? run_ksoftirqd+0x2d/0x50
[<ffffffff81082cbc>] ? smpboot_thread_fn+0xfc/0x1a0
[<ffffffff81082bc0>] ? smpboot_unregister_percpu_thread+0x60/0x60
[<ffffffff8107ad94>] ? kthread+0xb4/0xc0
[<ffffffff8107ace0>] ? __kthread_parkme+0x70/0x70
[<ffffffff8152a518>] ? ret_from_fork+0x58/0x90
[<ffffffff8107ace0>] ? __kthread_parkme+0x70/0x70
Code: Bad RIP value.
RIP [< (null)>] (null)
RSP <ffff880813f7dcf0>
CR2: 0000000000000000

Resolution

A fix for this bug should be included in kernel 3.12.62-60.62.1

Disclaimer

This Support Knowledgebase provides a valuable tool for NetIQ/Novell/SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented “AS IS” WITHOUT WARRANTY OF ANY KIND.

Related:

Leave a Reply