Skip to main content

Spinlock implementation in ARM architecture

SEV and WFE are the main instructions used for implementing spinlock in case of ARM architecture.


Let's look briefly at those two instructions before looking into actual spinlock implementation.


SEV
SEV causes an event to be signaled to all cores within a multiprocessor system. If SEV is implemented, WFE must also be implemented.
Let's look briefly at those two instructions before looking into actual spinlock implementation.


WFE

If the Event Register is not set, WFE suspends execution until one of the following events occurs:
  • an IRQ interrupt, unless masked by the CPSR I-bit
  • an FIQ interrupt, unless masked by the CPSR F-bit
  • an Imprecise Data abort, unless masked by the CPSR A-bit
  • a Debug Entry request, if Debug is enabled
  • an Event signaled by another processor using the SEV instruction.

In case of spin_lock_irq( )/spin_lock_irqsave( ),

  • as IRQs are disabled, the only way to to resume after WFE intruction has executed is to execute SEV instruction on some other core.

In case of spin_lock( ),

  • If IRQs are enabled even before we had called spin_lock( ) and we executed WFE and execution got suspended,
    • Scenario 1: Interrupt occured and handled; we resume, but as the lock was still unreleased, we will loopback and execute WFE.
    • Scenario 2: Some other core executed WFE and released some other lock (but didn't release our lock); we resume; as the lock is still unreleased, we will loopback and execute WFE.
    • Scenario 3: Some other core executed WFE and released this lock; we resume; as the lock was released, we will acquire the lock.
  • If IRQs are disabled before calling spin_lock(), then the situation is same as spin_lock_irqsave().

In case of spin_unlock( ),

  • lock is released and SEV instruction is executed.
Check out the following code snippets for actual implementation:


static inline void arch_spin_lock(arch_spinlock_t *lock)
{
        unsigned long tmp;

        __asm__ __volatile__(
"1:     ldrex   %0, [%1]\n"
"       teq     %0, #0\n"
        WFE("ne")
"       strexeq %0, %2, [%1]\n"
"       teqeq   %0, #0\n"
"       bne     1b"
        : "=&r" (tmp)
        : "r" (&lock->lock), "r" (1)
        : "cc");

        smp_mb();
}



static inline void arch_spin_unlock(arch_spinlock_t *lock)
{
        smp_mb();

        __asm__ __volatile__(
"       str     %1, [%0]\n"
        :
        : "r" (&lock->lock), "r" (0)
        : "cc");

        dsb_sev();
}


static inline void dsb_sev(void)
{
#if __LINUX_ARM_ARCH__ >= 7
        __asm__ __volatile__ (
                "dsb\n"
                SEV
        );
#else
        __asm__ __volatile__ (
                "mcr p15, 0, %0, c7, c10, 4\n"
                SEV
                : : "r" (0)
        );
#endif
}


ARM architecture provides two instructions, LDREX and STREX, for implementing spinlock. The basic semantic is simple. If you performs LDREX (load execlusive), you load from a memory location to a register and mark the memory location as execlusive. Other processors can still call the LDREX operations at the same time allowing multiple concurrent owners of the same memory location. The important part is done by STREX (store execluive). If multiple processors concurrently execute the STREX, the only one processor will succeed. The successful store returns 0 and non successful one returns 1. For those processors who failed in STREX, they must retry with LDREX and STREX in sequence again to gain “exclusive” access right to the memory location. The actual code in Linux kernel is shown as follows. I will describe them line by line.

The line 1 loads lock->lock value and line 2 check whether it is 0. At line 3, if the lock->lock value is not zero it executes WFE which enters CPU into power saving mode until it receives an external interrupt or event. At line 4, it tries to store 1 to lock->lock to indicate it is currently being held. The line 5 checks whether the store was successful. If not, it repeats from the line 1. If it is successulf, line 7 performs a memory barrier operation to make all previous memory updates from other processors to be visible to the requesting processor.

Unlocking is much simpler. The line 1 performs a memory barrier operation to make all previous changes made inside the spinlock to be visible to all other processors. Then it simply stores 0 to lock->lock to indicate the lock is now free at line 2. Finally, at line 3, the processor sends an event to other processors who are waiting at WFE (line 3 of arch_spin_lock function).

Comments

Popular posts from this blog

Explanation of "struct task_struct"

This document tries to explain clearly what fields in the structure task_struct do. It's not complete and everyone is welcome to add information. Let's start by saying that each process under Linux is defined by a structure task_struct. The following information are available (on kernel 2.6.7): - volatile long state;    /* -1 unrunnable, 0 runnable, >0 stopped */ - struct thread_info *thread_info; a pointer to a thread_info... - atomic_t usage; used by get_task_struct(). It's also set in kernel/fork.c. This value acts like a reference count on the task structure of a process. It can be used if we don't want to hold the tasklist_lock. - unsigned long flags;    /* per process flags, defined below */ process flag can be, for example, PF_DEAD when exit_notify() is called. List is of possible values is in include/linux/sched.h - unsigned long ptrace; used by ptrace a system call that provides the ability to a parent process to observe and con

Macro "container_of"

Understanding container_of macro in the Linux kernel When you begin with the kernel, and you start to look around and read the code, you will eventually come across this magical preprocessor construct. What does it do? Well, precisely what its name indicates. It takes three arguments – a  pointer ,  type  of the container, and the  name  of the member the pointer refers to. The macro will then expand to a new address pointing to the container which accommodates the respective member. It is indeed a particularly clever macro, but how the hell can this possibly work? Let me illustrate… The first diagram illustrates the principle of the  container_of(ptr, type, member)  macro for who might find the above description too clumsy. I llustration of how the containter_of macro works. Bellow is the actual implementation of the macro from Linux Kernel: #define container_of(ptr, type, member) ({ \ const typeof( ((type *)0)->member ) *__mptr = (