Spinlock implementation in ARM architecture

SEV and WFE are the main instructions used for implementing spinlock in case of ARM architecture.

Let's look briefly at those two instructions before looking into actual spinlock implementation.

SEV

SEV causes an event to be signaled to all cores within a multiprocessor system. If SEV is implemented, WFE must also be implemented.
Let's look briefly at those two instructions before looking into actual spinlock implementation.

WFE

If the Event Register is not set, WFE suspends execution until one of the following events occurs:

an IRQ interrupt, unless masked by the CPSR I-bit
an FIQ interrupt, unless masked by the CPSR F-bit
an Imprecise Data abort, unless masked by the CPSR A-bit
a Debug Entry request, if Debug is enabled
an Event signaled by another processor using the SEV instruction.

In case of spin_lock_irq( )/spin_lock_irqsave( ),

as IRQs are disabled, the only way to to resume after WFE intruction has executed is to execute SEV instruction on some other core.

In case of spin_lock( ),

If IRQs are enabled even before we had called spin_lock( ) and we executed WFE and execution got suspended,

Scenario 1: Interrupt occured and handled; we resume, but as the lock was still unreleased, we will loopback and execute WFE.
Scenario 2: Some other core executed WFE and released some other lock (but didn't release our lock); we resume; as the lock is still unreleased, we will loopback and execute WFE.
Scenario 3: Some other core executed WFE and released this lock; we resume; as the lock was released, we will acquire the lock.

If IRQs are disabled before calling spin_lock(), then the situation is same as spin_lock_irqsave().

In case of spin_unlock( ),

lock is released and SEV instruction is executed.

Check out the following code snippets for actual implementation:

static inline void arch_spin_lock(arch_spinlock_t *lock)
{
        unsigned long tmp;

        __asm__ __volatile__(
"1:     ldrex   %0, [%1]\n"
"       teq     %0, #0\n"
        WFE("ne")
"       strexeq %0, %2, [%1]\n"
"       teqeq   %0, #0\n"
"       bne     1b"
        : "=&r" (tmp)
        : "r" (&lock->lock), "r" (1)
        : "cc");

        smp_mb();
}



static inline void arch_spin_unlock(arch_spinlock_t *lock)
{
        smp_mb();

        __asm__ __volatile__(
"       str     %1, [%0]\n"
        :
        : "r" (&lock->lock), "r" (0)
        : "cc");

        dsb_sev();
}


static inline void dsb_sev(void)
{
#if __LINUX_ARM_ARCH__ >= 7
        __asm__ __volatile__ (
                "dsb\n"
                SEV
        );
#else
        __asm__ __volatile__ (
                "mcr p15, 0, %0, c7, c10, 4\n"
                SEV
                : : "r" (0)
        );
#endif
}

ARM architecture provides two instructions, LDREX and STREX, for implementing spinlock. The basic semantic is simple. If you performs LDREX (load execlusive), you load from a memory location to a register and mark the memory location as execlusive. Other processors can still call the LDREX operations at the same time allowing multiple concurrent owners of the same memory location. The important part is done by STREX (store execluive). If multiple processors concurrently execute the STREX, the only one processor will succeed. The successful store returns 0 and non successful one returns 1. For those processors who failed in STREX, they must retry with LDREX and STREX in sequence again to gain “exclusive” access right to the memory location. The actual code in Linux kernel is shown as follows. I will describe them line by line.

The line 1 loads lock->lock value and line 2 check whether it is 0. At line 3, if the lock->lock value is not zero it executes WFE which enters CPU into power saving mode until it receives an external interrupt or event. At line 4, it tries to store 1 to lock->lock to indicate it is currently being held. The line 5 checks whether the store was successful. If not, it repeats from the line 1. If it is successulf, line 7 performs a memory barrier operation to make all previous memory updates from other processors to be visible to the requesting processor.

Unlocking is much simpler. The line 1 performs a memory barrier operation to make all previous changes made inside the spinlock to be visible to all other processors. Then it simply stores 0 to lock->lock to indicate the lock is now free at line 2. Finally, at line 3, the processor sends an event to other processors who are waiting at WFE (line 3 of arch_spin_lock function).

Linux Internals

Search This Blog

Spinlock implementation in ARM architecture

WFE

Comments

Post a Comment

Popular posts from this blog

Explanation of "struct task_struct"

Macro "container_of"