Skip to main content

Softlockup vs. Hardlockup

Demistifying Softlockup & Hardlockup

A 'softlockup' is defined as a bug that causes the kernel to loop in kernel mode for more than 20 seconds.
without giving other tasks a chance to run
The currentstack trace is displayed upon detection and, by default, the system will stay locked up.
A 'hardlockup' is defined as a bug that causes the CPU to loop inkernel mode for more than 10 seconds.
without letting other interrupts have a chance to run.
Similarly to the softlockup case, the current stack trace is displayed upon detection and the system will stay locked up unless the default behavior is changed.

Details:

The so-called lockup refers to a section of kernel code that holds the CPU. A serious lockup can cause the entire system to lose its response. Lockup has several features:
  • First of all, only the kernel code can cause lockup, because user code can be preempted, it is impossible to form a lockup (only one exception, real-time processes with a SCHED_FIFO priority of 99 may even make the [watchdog/x] kernel even in user mode. Threads can't grab the CPU and form a soft lock. See " Real-Time Process Causes System Lockup? ")
  • Secondly, the kernel code must be in the preemption disabled state because Linux is a preemptive kernel. It is only in certain code areas that preemption is forbidden. In these code regions, lockup may be formed.
There are two types of lockup: soft lockup and hard lockup. The difference is that hard lockup occurs when the CPU masks the interrupt.
  • Soft lockup means that the CPU is occupied by the kernel code so that other processes cannot be executed. The principle of detecting soft lockup is to assign a timed execution kernel thread [watchdog/x] to each CPU. If the thread is not executed within the set period, it means that soft lockup occurred. [watchdog/x] is The SCHED_FIFO real-time process, with the highest priority of 99, has the priority to run.
  • Hard lockup is more serious than soft lockup. CPU can not only execute other processes, but also no longer respond to interrupts. The principle of detecting hard lockup utilizes the PMU's NMI perf event because the NMI interrupt is not maskable and can still be executed if the CPU is no longer responding to interrupts. It then checks to see if the clock interrupt counter hrtimer_interrupts is incrementing. Stagnation means that the clock interrupt is not responded, that is, a hard lockup occurs.
Linux kernel designed a mechanism to detect lockup, called NMI Watchdog , which uses NMI interrupt. NMI is used because lockup may occur when the interrupt is masked. At this time, the only way to get CPU down is through NMI, because NMI interrupts are not maskable. The NMI Watchdog contains a soft lockup detector and a hard lockup detector. The implementation of the kernel after 2.6 is as follows.
The trigger mechanism of the NMI Watchdog consists of two parts:
  1. A high-precision timer (hrtimer), the corresponding interrupt handling routine is kernel/watchdog.c: watchdog_timer_fn(), in this routine:
    • To increment the counter hrtimer_interrupts, this counter is used by the hard lockup detector to determine if the CPU is responding to an interrupt.
    • Also wake up the [watchdog/x] kernel thread whose task is to update a timestamp;
    • The soft lock detector checks the timestamp. If the soft lockup threshold has not been updated, it means that [watchdog/x] has not been given a chance to run. This means that the CPU is occupied, that is, soft lockup has occurred.
  2. Based on the PMU's NMI perf event, an NMI interrupt is triggered when the counter of the PMU overflows. The corresponding interrupt handling routine is kernel/watchdog.c: watchdog_overflow_callback(). The hard lockup detector is there, and it checks the number of interrupts of the above hrtimer. (hrtimer_interrupts) Whether to keep incrementing, if stagnant indicates that the hrtimer interrupt did not get a response, that is, a hard lockup occurred.
The hrtimer period is: softlockup_thresh/5. 
Note:
  • In the 2.6 kernel: 
    The value of softlockup_thresh is equal to the kernel parameter kernel.watchdog_thresh, default 60 seconds;
  • In the 3.10 kernel, the 
    name of the kernel parameter kernel.watchdog_thresh has not changed, but the meaning has changed to the hard lockup threshold, and the default value is 10 seconds. The 
    soft lockup threshold is equal to (2*kernel.watchdog_thresh), which is the default value of 20 seconds.
The NMI perf event is based on the PMU. The hard lockup threshold is a fixed 60 seconds in the 2.6 kernel and cannot be manually adjusted. In the 3.10 kernel, it can be manually adjusted because it directly corresponds to the kernel parameter kernel.watchdog_thresh. The default value is 10. second.
What should I do if lockup is detected? You can auto panic, or you can output the information even if it is finished, this can be defined by the kernel parameters:
  • Kernel.softlockup_panic: Determines whether panic is detected when soft lockup is detected. The default value is 0.
  • Kernel.nmi_watchdog: defines whether to enable nmi watchdog, and whether hard lockup causes panic. The format of this kernel parameter is "=[panic,][nopanic,][num]". 
    (Note: The latest kernel introduces new kernel parameters. Kernel.hardlockup_panic, you can determine if your kernel supports it by checking for the existence of /proc/sys/kernel/hardlockup_panic.)

Comments

Popular posts from this blog

Spinlock implementation in ARM architecture

SEV and WFE are the main instructions used for implementing spinlock in case of ARM architecture . Let's look briefly at those two instructions before looking into actual spinlock implementation. SEV SEV causes an event to be signaled to all cores within a multiprocessor system. If SEV is implemented, WFE must also be implemented. Let's look briefly at those two instructions before looking into actual spinlock implementation. WFE If the Event Register is not set, WFE suspends execution until one of the following events occurs: an IRQ interrupt, unless masked by the CPSR I-bit an FIQ interrupt, unless masked by the CPSR F-bit an Imprecise Data abort, unless masked by the CPSR A-bit a Debug Entry request, if Debug is enabled an Event signaled by another processor using the SEV instruction. In case of  spin_lock_irq( )/spin_lock_irqsave( ) , as IRQs are disabled, the only way to to resume after WFE intruction has executed is to execute SEV ins

Explanation of "struct task_struct"

This document tries to explain clearly what fields in the structure task_struct do. It's not complete and everyone is welcome to add information. Let's start by saying that each process under Linux is defined by a structure task_struct. The following information are available (on kernel 2.6.7): - volatile long state;    /* -1 unrunnable, 0 runnable, >0 stopped */ - struct thread_info *thread_info; a pointer to a thread_info... - atomic_t usage; used by get_task_struct(). It's also set in kernel/fork.c. This value acts like a reference count on the task structure of a process. It can be used if we don't want to hold the tasklist_lock. - unsigned long flags;    /* per process flags, defined below */ process flag can be, for example, PF_DEAD when exit_notify() is called. List is of possible values is in include/linux/sched.h - unsigned long ptrace; used by ptrace a system call that provides the ability to a parent process to observe and con

Macro "container_of"

Understanding container_of macro in the Linux kernel When you begin with the kernel, and you start to look around and read the code, you will eventually come across this magical preprocessor construct. What does it do? Well, precisely what its name indicates. It takes three arguments – a  pointer ,  type  of the container, and the  name  of the member the pointer refers to. The macro will then expand to a new address pointing to the container which accommodates the respective member. It is indeed a particularly clever macro, but how the hell can this possibly work? Let me illustrate… The first diagram illustrates the principle of the  container_of(ptr, type, member)  macro for who might find the above description too clumsy. I llustration of how the containter_of macro works. Bellow is the actual implementation of the macro from Linux Kernel: #define container_of(ptr, type, member) ({ \ const typeof( ((type *)0)->member ) *__mptr = (