cgroup oom-killer вызывает панику ядра
Я использую cgroups с libvirt, чтобы ограничить память, которую группа гостей qemu-kvm может использовать на собственном ядре linux 4.7.8. Проведя пару тестов с ним, я начал видеть панику ядра после вызова oom-killer, когда этой группе libvirt не хватает памяти. Это происходит даже тогда, когда я устанавливаю объем памяти cgroup ниже общего и система бездействует, кроме запуска vms (много памяти осталось для других задач вне cgroup). Для справки, моя система имеет 32 ГБ, и я использовал 20 ГБ для групп гостей. Вот часть журнала сбоев (он очень длинный, но я могу ссылаться на полный журнал позже):
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.727982] Call Trace:
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.727975] 0000000000000296 ffff8801f1638dc0 ffffffff811b50f6 ffff88083dff9800
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728007] [<ffffffff811608ce>] oom_kill_process+0xc2/0x487
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728014] [<ffffffff812e22c2>] ? selinux_capable+0x1f/0x21
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.727994] [<ffffffff811b50f6>] ? mem_cgroup_select_victim_node+0x17d/0x1ac
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728018] [<ffffffff812d95ac>] ? security_capable_noaudit+0x2b/0x46
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728028] [<ffffffff811b235b>] ? mem_cgroup_iter+0x250/0x265
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728025] [<ffffffff811afd1c>] ? css_put+0x18/0x1a
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728031] [<ffffffff811607c1>] ? oom_badness+0x10f/0x15a
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728036] [<ffffffff811af46d>] ? get_mem_cgroup_from_mm+0x52/0x71
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728039] [<ffffffff811b4039>] mem_cgroup_out_of_memory+0x2c7/0x311
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728054] [<ffffffff81160ffd>] pagefault_out_of_memory+0x1f/0x76
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728003] [<ffffffff81171d2c>] ? try_to_free_mem_cgroup_pages+0x10d/0x16a
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728061] [<ffffffff810999bf>] mm_fault_error+0x66/0x103
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.727966] CPU: 1 PID: 15433 Comm: qemu-system-x86 Tainted: G O 4.7.8 #25
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728064] [<ffffffff81099e4c>] __do_page_fault+0x3f0/0x4d8
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728071] [<ffffffff8109a043>] do_page_fault+0x26/0x2f
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728068] [<ffffffff81191a7d>] ? SyS_mremap+0x46c/0x4cf
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728074] [<ffffffff81a3dca8>] page_fault+0x28/0x30
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728023] [<ffffffff8111469e>] ? css_next_descendant_pre+0x32/0x53
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728051] [<ffffffff811b14d2>] ? mem_cgroup_count_precharge_pte_range+0xe8/0xe8
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.727999] [<ffffffff8115fe5f>] dump_header+0x5e/0x286
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728058] [<ffffffff8118e399>] ? vma_adjust+0x4b5/0x58b
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728047] [<ffffffff811b43bb>] mem_cgroup_oom_synchronize+0x1ed/0x27b
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728010] [<ffffffff811b53f5>] ? task_in_mem_cgroup+0xc9/0xd6
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.727989] [<ffffffff81336719>] dump_stack+0x65/0x8c
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728044] [<ffffffff810da147>] ? finish_wait+0x65/0x70
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.727979] ffff8803259d7cf8 ffff8803259d7b38 ffffffff8115fe5f 024200ca00000003
Message from syslogd@ at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.727971] ffff880802cbb700 ffff8803259d7a08 ffffffff81336719 ffff88081948fc00
May 11 15:53:57 kernel: [ 6380.727959] qemu-system-x86 invoked oom-killer: gfp_mask=0x24000c0(GFP_KERNEL), order=0, oom_score_adj=0
May 11 15:53:57 kernel: [ 6380.727966] CPU: 1 PID: 15433 Comm: qemu-system-x86 Tainted: G O 4.7.8 #25
May 11 15:53:57 kernel: [ 6380.727968] Hardware name: ADLINK TECHNOLOGY Inc. Express-SL/, BIOS 1.22.10.KA08 05/03/2017
May 11 15:53:57 kernel: [ 6380.727971] ffff880802cbb700 ffff8803259d7a08 ffffffff81336719 ffff88081948fc00
May 11 15:53:57 kernel: [ 6380.727975] 0000000000000296 ffff8801f1638dc0 ffffffff811b50f6 ffff88083dff9800
May 11 15:53:57 kernel: [ 6380.727979] ffff8803259d7cf8 ffff8803259d7b38 ffffffff8115fe5f 024200ca00000003
May 11 15:53:57 kernel: [ 6380.727982] Call Trace:
May 11 15:53:57 kernel: [ 6380.727989] [<ffffffff81336719>] dump_stack+0x65/0x8c
May 11 15:53:57 kernel: [ 6380.727994] [<ffffffff811b50f6>] ? mem_cgroup_select_victim_node+0x17d/0x1ac
May 11 15:53:57 kernel: [ 6380.727999] [<ffffffff8115fe5f>] dump_header+0x5e/0x286
May 11 15:53:57 kernel: [ 6380.728003] [<ffffffff81171d2c>] ? try_to_free_mem_cgroup_pages+0x10d/0x16a
May 11 15:53:57 kernel: [ 6380.728007] [<ffffffff811608ce>] oom_kill_process+0xc2/0x487
May 11 15:53:57 kernel: [ 6380.728010] [<ffffffff811b53f5>] ? task_in_mem_cgroup+0xc9/0xd6
May 11 15:53:57 kernel: [ 6380.728014] [<ffffffff812e22c2>] ? selinux_capable+0x1f/0x21
May 11 15:53:57 kernel: [ 6380.728018] [<ffffffff812d95ac>] ? security_capable_noaudit+0x2b/0x46
May 11 15:53:57 kernel: [ 6380.728023] [<ffffffff8111469e>] ? css_next_descendant_pre+0x32/0x53
May 11 15:53:57 kernel: [ 6380.728025] [<ffffffff811afd1c>] ? css_put+0x18/0x1a
May 11 15:53:57 kernel: [ 6380.728028] [<ffffffff811b235b>] ? mem_cgroup_iter+0x250/0x265
May 11 15:53:57 kernel: [ 6380.728031] [<ffffffff811607c1>] ? oom_badness+0x10f/0x15a
May 11 15:53:57 kernel: [ 6380.728036] [<ffffffff811af46d>] ? get_mem_cgroup_from_mm+0x52/0x71
May 11 15:53:57 kernel: [ 6380.728039] [<ffffffff811b4039>] mem_cgroup_out_of_memory+0x2c7/0x311
May 11 15:53:57 kernel: [ 6380.728044] [<ffffffff810da147>] ? finish_wait+0x65/0x70
May 11 15:53:57 kernel: [ 6380.728047] [<ffffffff811b43bb>] mem_cgroup_oom_synchronize+0x1ed/0x27b
May 11 15:53:57 kernel: [ 6380.728051] [<ffffffff811b14d2>] ? mem_cgroup_count_precharge_pte_range+0xe8/0xe8
May 11 15:53:57 kernel: [ 6380.728054] [<ffffffff81160ffd>] pagefault_out_of_memory+0x1f/0x76
May 11 15:53:57 kernel: [ 6380.728058] [<ffffffff8118e399>] ? vma_adjust+0x4b5/0x58b
May 11 15:53:57 kernel: [ 6380.728061] [<ffffffff810999bf>] mm_fault_error+0x66/0x103
May 11 15:53:57 kernel: [ 6380.728064] [<ffffffff81099e4c>] __do_page_fault+0x3f0/0x4d8
May 11 15:53:57 kernel: [ 6380.728068] [<ffffffff81191a7d>] ? SyS_mremap+0x46c/0x4cf
May 11 15:53:57 kernel: [ 6380.728071] [<ffffffff8109a043>] do_page_fault+0x26/0x2f
May 11 15:53:57 kernel: [ 6380.728074] [<ffffffff81a3dca8>] page_fault+0x28/0x30
May 11 15:53:57 kernel: [ 6380.728077] Task in /machine/ubc3.libvirt-qemu killed as a result of limit of /machine
May 11 15:53:57 kernel: [ 6380.728082] memory: usage 31457280kB, limit 31457280kB, failcnt 0
May 11 15:53:57 kernel: [ 6380.728084] memory+swap: usage 31457280kB, limit 31457280kB, failcnt 129072
May 11 15:53:57 kernel: [ 6380.728086] kmem: usage 2200kB, limit 9007199254740988kB, failcnt 0
May 11 15:53:57 kernel: [ 6380.728088] Memory cgroup stats for /machine: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
May 11 15:53:57 kernel: [ 6380.728102] Memory cgroup stats for /machine/ubc4.libvirt-qemu: cache:12KB rss:0KB rss_huge:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:8KB active_file:4KB unevictable:0KB
May 11 15:53:57 kernel: [ 6380.728114] Memory cgroup stats for /machine/ubc1.libvirt-qemu: cache:32KB rss:6312484KB rss_huge:0KB mapped_file:20KB dirty:0KB writeback:0KB swap:0KB inactive_anon:16KB active_anon:6312488KB inactive_file:12KB active_file:0KB unevictable:0KB
May 11 15:53:57 kernel: [ 6380.728126] Memory cgroup stats for /machine/ubc2.libvirt-qemu: cache:60KB rss:8408808KB rss_huge:0KB mapped_file:20KB dirty:0KB writeback:0KB swap:0KB inactive_anon:16KB active_anon:8408804KB inactive_file:20KB active_file:20KB unevictable:0KB
May 11 15:53:57 kernel: [ 6380.728137] Memory cgroup stats for /machine/ubc3.libvirt-qemu: cache:60KB rss:8410240KB rss_huge:0KB mapped_file:20KB dirty:0KB writeback:0KB swap:0KB inactive_anon:16KB active_anon:8410244KB inactive_file:24KB active_file:16KB unevictable:0KB
May 11 15:53:57 kernel: [ 6380.728149] Memory cgroup stats for /machine/ubc4.libvirt-qemu: cache:88KB rss:8323296KB rss_huge:0KB mapped_file:20KB dirty:0KB writeback:0KB swap:0KB inactive_anon:16KB active_anon:8323260KB inactive_file:32KB active_file:36KB unevictable:0KB
May 11 15:53:57 kernel: [ 6380.728161] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
May 11 15:53:57 kernel: [ 6380.728189] [ 5493] 0 5493 1626653 1580390 3170 9 0 0 qemu-system-x86
May 11 15:53:57 kernel: [ 6380.728193] [15427] 0 15427 2151137 2104431 4195 11 0 0 qemu-system-x86
May 11 15:53:57 kernel: [ 6380.728197] [17683] 0 17683 2151136 2104812 4195 11 0 0 qemu-system-x86
May 11 15:53:57 kernel: [ 6380.728202] [18273] 0 18273 2152955 2083131 4156 12 0 0 qemu-system-x86
May 11 15:53:57 kernel: [ 6380.728205] Memory cgroup out of memory: Kill process 17683 (qemu-system-x86) score 260 or sacrifice child
May 11 15:53:57 kernel: [ 6380.728217] Killed process 17683 (qemu-system-x86) total-vm:8604544kB, anon-rss:8409956kB, file-rss:9272kB, shmem-rss:20kB
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379203] CPU: 4 PID: 121 Comm: kworker/4:1 Tainted: G O 4.7.8 #25
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379206] Hardware name: ADLINK TECHNOLOGY Inc. Express-SL/, BIOS 1.22.10.KA08 05/03/2017
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379222] 0000000000000296 000000000000039a 0000000000000000 ffffffff81f4b6bb
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379227] 0000000000000000 ffff8808038f7748 ffffffff810a938a 000000091d19fce0
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379232] Call Trace:
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379244] [<ffffffff810a938a>] __warn+0xdc/0xf7
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379239] [<ffffffff81336719>] dump_stack+0x65/0x8c
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379254] [<ffffffff81034a38>] mmu_spte_clear_track_bits+0xe6/0x147
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379272] [<ffffffff810353e7>] drop_spte+0x15/0xa4
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379267] [<ffffffff810d3d7c>] ? update_group_capacity+0x25/0x1d0
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379282] [<ffffffff81071dbe>] ? sched_clock+0x9/0xd
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379287] [<ffffffff81035703>] kvm_mmu_prepare_zap_page+0x177/0x2ef
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379293] [<ffffffff81069828>] ? __switch_to+0x458/0x4ea
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379298] [<ffffffff810d04da>] ? sched_clock_cpu+0x21/0xb4
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379304] [<ffffffff81a39517>] ? __schedule+0x56f/0x594
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379316] [<ffffffff810363a7>] kvm_mmu_invalidate_zap_all_pages+0xcc/0x104
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379321] [<ffffffff813498c2>] ? percpu_ref_put+0x2e/0x2e
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379327] [<ffffffff8102800e>] kvm_arch_flush_shadow_all+0x9/0xb
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379346] [<ffffffff81349bab>] ? percpu_ref_kill_and_confirm+0x60/0x65
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379337] [<ffffffff811a600f>] __mmu_notifier_release+0x4d/0xe3
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379341] [<ffffffff811ab39c>] ? kfree+0x167/0x178
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379357] [<ffffffff811f4e9b>] ? exit_aio+0xc6/0xd5
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379367] [<ffffffff810a6f30>] __mmput+0x19/0xbc
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379371] [<ffffffff81a39517>] ? __schedule+0x56f/0x594
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379376] [<ffffffff810a6fe3>] mmput_async_fn+0x10/0x12
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379387] [<ffffffff81a396c6>] ? schedule+0x98/0xa6
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379407] [<ffffffff810cf9d4>] ? default_wake_function+0xd/0xf
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379392] [<ffffffff810c0ace>] worker_thread+0x36d/0x43c
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379397] [<ffffffff81a39517>] ? __schedule+0x56f/0x594
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379413] [<ffffffff810c0761>] ? process_one_work+0x353/0x353
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379417] [<ffffffff81a396c6>] ? schedule+0x98/0xa6
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379422] [<ffffffff810c0761>] ? process_one_work+0x353/0x353
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379426] [<ffffffff810c49e8>] kthread+0xc8/0xd2
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379436] [<ffffffff81a3bf3f>] ret_from_fork+0x1f/0x40
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379431] [<ffffffff810c0761>] ? process_one_work+0x353/0x353
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379441] [<ffffffff810c4920>] ? kthread_freezable_should_stop+0x61/0x61
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379457] Modules linked in: bridge stp llc ipip ip_gre vfio_iommu_type1 vfio_pci vfio vfio_virqfd qcserial qmi_wwan usbnet cdc_wdm clear_stats(O) fusion(O) gpio_pca953x i2c_i801 i2c_acpi_sbus(O) gpio_exar e1000e
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379481] Hardware name: ADLINK TECHNOLOGY Inc. Express-SL/, BIOS 1.22.10.KA08 05/03/2017
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379489] ffffffff81f4b6bb ffff8808038f7708 ffffffff81336719 ffff880800294f00
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379486] Workqueue: events mmput_async_fn
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379494] 0000000000000296 dead000000000100 0000000000000000 ffffffff81f4b6bb
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379499] 0000000000000000 ffff8808038f7748 ffffffff810a938a 0000000900000100
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379504] Call Trace:
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379523] [<ffffffff81034a38>] mmu_spte_clear_track_bits+0xe6/0x147
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379259] [<ffffffff810d258f>] ? check_preempt_wakeup+0x115/0x1b4
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379263] [<ffffffff81032058>] ? gfn_to_rmap+0x27/0x5a
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379527] [<ffffffff810d258f>] ? check_preempt_wakeup+0x115/0x1b4
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379351] [<ffffffff8118cfb3>] exit_mmap+0x22/0x102
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379331] [<ffffffff8101a7f5>] kvm_mmu_notifier_release+0x2e/0x41
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379536] [<ffffffff810d3d7c>] ? update_group_capacity+0x25/0x1d0
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379541] [<ffffffff810353e7>] drop_spte+0x15/0xa4
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379550] [<ffffffff81071dbe>] ? sched_clock+0x9/0xd
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379555] [<ffffffff81035703>] kvm_mmu_prepare_zap_page+0x177/0x2ef
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379249] [<ffffffff810a93bd>] warn_slowpath_null+0x18/0x1a
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379532] [<ffffffff81032058>] ? gfn_to_rmap+0x27/0x5a
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379565] [<ffffffff810d04da>] ? sched_clock_cpu+0x21/0xb4
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379382] [<ffffffff810c0620>] process_one_work+0x212/0x353
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379570] [<ffffffff81a39517>] ? __schedule+0x56f/0x594
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379580] [<ffffffff810363a7>] kvm_mmu_invalidate_zap_all_pages+0xcc/0x104
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379585] [<ffffffff813498c2>] ? percpu_ref_put+0x2e/0x2e
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379590] [<ffffffff8102800e>] kvm_arch_flush_shadow_all+0x9/0xb
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379362] [<ffffffff810d04da>] ? sched_clock_cpu+0x21/0xb4
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379595] [<ffffffff8101a7f5>] kvm_mmu_notifier_release+0x2e/0x41
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379518] [<ffffffff810a93bd>] warn_slowpath_null+0x18/0x1a
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379609] [<ffffffff81349bab>] ? percpu_ref_kill_and_confirm+0x60/0x65
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379604] [<ffffffff811ab39c>] ? kfree+0x167/0x178
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379619] [<ffffffff811f4e9b>] ? exit_aio+0xc6/0xd5
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379614] [<ffffffff8118cfb3>] exit_mmap+0x22/0x102
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379624] [<ffffffff810d04da>] ? sched_clock_cpu+0x21/0xb4
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379637] [<ffffffff810a6fe3>] mmput_async_fn+0x10/0x12
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379632] [<ffffffff81a39517>] ? __schedule+0x56f/0x594
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379642] [<ffffffff810c0620>] process_one_work+0x212/0x353
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379652] [<ffffffff810c0ace>] worker_thread+0x36d/0x43c
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379647] [<ffffffff81a396c6>] ? schedule+0x98/0xa6
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379277] [<ffffffff81035513>] mmu_page_zap_pte+0x48/0xc1
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379600] [<ffffffff811a600f>] __mmu_notifier_release+0x4d/0xe3
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379667] [<ffffffff810cf9d4>] ? default_wake_function+0xd/0xf
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379672] [<ffffffff810c0761>] ? process_one_work+0x353/0x353
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379661] [<ffffffff810cf9ab>] ? try_to_wake_up+0x240/0x25c
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379681] [<ffffffff810c0761>] ? process_one_work+0x353/0x353
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379574] [<ffffffff814585e7>] ? extract_buf+0xf7/0x106
kernel: [ 6381.379696] [<ffffffff81a3bf3f>] ret_from_fork+0x1f/0x40
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379686] [<ffffffff810c49e8>] kthread+0xc8/0xd2
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379715] Modules linked in: bridge stp llc ipip ip_gre vfio_iommu_type1 vfio_pci vfio vfio_virqfd qcserial qmi_wwan usbnet cdc_wdm clear_stats(O) fusion(O) gpio_pca953x i2c_i801 i2c_acpi_sbus(O) gpio_exar e1000e
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379508] [<ffffffff81336719>] dump_stack+0x65/0x8c
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379701] [<ffffffff810c4920>] ? kthread_freezable_should_stop+0x61/0x61
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379545] [<ffffffff81035513>] mmu_page_zap_pte+0x48/0xc1
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379736] CPU: 4 PID: 121 Comm: kworker/4:1 Tainted: G W O 4.7.8 #25
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379656] [<ffffffff81a39517>] ? __schedule+0x56f/0x594
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379739] Hardware name: ADLINK TECHNOLOGY Inc. Express-SL/, BIOS 1.22.10.KA08 05/03/2017
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379743] Workqueue: events mmput_async_fn
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379760] Call Trace:
Message from syslogd@ at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379750] 0000000000000296 dead000000000100 0000000000000000 ffffffff81f4b6bb
.
.
.
May 11 15:54:13 kernel: [ 6396.890938] Workqueue: events mmput_async_fn
May 11 15:54:13 kernel: [ 6396.890940] ffffffff81f4b6bb ffff8808038f7708 ffffffff81336719 0000000000294f00
May 11 15:54:13 kernel: [ 6396.890945] 0000000000000296 ffffea0002d58220 0000000000000000 ffffffff81f4b6bb
May 11 15:54:13 kernel: [ 6396.890950] 0000000000000000 ffff8808038f7748 ffffffff810a938a 0000000902d58220
May 11 15:54:13 kernel: [ 6396.890954] Call Trace:
May 11 15:54:13 kernel: [ 6396.890958] [<ffffffff81336719>] dump_stack+0x65/0x8c
May 11 15:54:13 kernel: [ 6396.890963] [<ffffffff810a938a>] __warn+0xdc/0xf7
May 11 15:54:13 kernel: [ 6396.890968] [<ffffffff810a93bd>] warn_slowpath_null+0x18/0x1a
May 11 15:54:13 kernel: [ 6396.890972] [<ffffffff81034a38>] mmu_spte_clear_track_bits+0xe6/0x147
May 11 15:54:13 kernel: [ 6396.890977] [<ffffffff81032058>] ? gfn_to_rmap+0x27/0x5a
May 11 15:54:13 kernel: [ 6396.890981] [<ffffffff810353e7>] drop_spte+0x15/0xa4
May 11 15:54:13 kernel: [ 6396.890986] [<ffffffff81035513>] mmu_page_zap_pte+0x48/0xc1
May 11 15:54:13 kernel: [ 6396.890990] [<ffffffff8112b110>] ? kprobe_flush_task+0x8d/0xe8
May 11 15:54:13 kernel: [ 6396.890995] [<ffffffff81035703>] kvm_mmu_prepare_zap_page+0x177/0x2ef
May 11 15:54:13 kernel: [ 6396.891000] [<ffffffff810cee3a>] ? finish_task_switch+0x19f/0x1d5
May 11 15:54:13 kernel: [ 6396.891005] [<ffffffff81a39517>] ? __schedule+0x56f/0x594
May 11 15:54:13 kernel: [ 6396.891009] [<ffffffff81a39517>] ? __schedule+0x56f/0x594
May 11 15:54:13 kernel: [ 6396.891014] [<ffffffff81a39562>] ? preempt_schedule_common+0x26/0x31
May 11 15:54:13 kernel: [ 6396.891020] [<ffffffff810363a7>] kvm_mmu_invalidate_zap_all_pages+0xcc/0x104
May 11 15:54:13 kernel: [ 6396.891024] [<ffffffff813498c2>] ? percpu_ref_put+0x2e/0x2e
May 11 15:54:13 kernel: [ 6396.891029] [<ffffffff8102800e>] kvm_arch_flush_shadow_all+0x9/0x
Я решил попробовать отключить oom-killer в cgroup и исчерпать память cgroup, чтобы посмотреть, что произойдет, и я ожидал, что гости будут висеть, пока я не убью одного из них вручную (как описано в документации по cgroup). Но удивительно, что один из гостей (кажется, случайно) погибает каждый раз, когда я повторяю тест. Я совершенно сбит с толку, потому что если oom-killer отключен, что убивает процессы?
Вот сообщения, которые я получаю от ядра в этом случае, а также в случае, когда oom-killer включен, но система не дает сбоя:
kernel: [ 1143.934857] cache: task_struct(10:ubc2.libvirt-qemu),
object size: 3520, buffer size: 3520, default order: 3, min order: 0
kernel: [ 1143.934860] node 0: slabs: 3, objs: 27, free: 0
kernel: [ 1143.944535] SLUB: Unable to allocate memory on node -1,
gfp=0x24000c0(GFP_KERNEL)
kernel: [ 1143.944541] cache: cred_jar(10:ubc2.libvirt-qemu), object
size: 168, buffer size: 192, default order: 0, min order: 0
kernel: [ 1143.944545] node 0: slabs: 2, objs: 42, free: 0
Из моих наблюдений кажется, что что-то убивает процессы до вызова oom-killer (когда он включен), и в этом случае система восстанавливается нормально, но когда oom-killer действительно вызывается, происходит сбой системы, и машина нуждается в перезагрузке.
Итак, мои вопросы:
- Что может быть причиной того, что оом-убийца сломал машину?
- Что убивает гостей, когда oom-killer отключен?
Было бы здорово, если у кого-нибудь есть какие-либо подсказки по этому вопросу! Спасибо!
Примечание: я использую ядро v4.7.8, созданное с помощью buildroot и скомпилированное с помощью uClibc на платформе x86. Кроме того, нет подкачки в этой системе.