Locks

Normally, we create threads, but the OS schedules them however it likes. Locking/unlocking gives the programmer some semblance of control on this scheduling.

One can have several locks to control different aspects of concurrency. The more the locks, the more the complexity!

Qualities of a Good Lock

We want locks with the following qualities:

Mutual exclusion
Fairness: Will a given blocked thread get a fair chance of acquiring the lock?
Performance

To evaluate performance, look at three cases:

When a single thread grabs and releases the lock without waiting (baseline)
When multiple threads compete on a single CPU
When multiple threads compete on multiple CPUs

Approach: Controlling Interrupts

An old way to provide mutex was to let a piece of code disable interrupts in critical sections.

This is a bad idea:

It is easy for a buggy (or malicious) code to block all interrupts causing the system to hang.
It doesn’t work with multiple CPUs.
Interrupts can be lost if they are turned off for a long time.
It is inefficient: Masking/unmasking interrupts isn’t cheap.

Buggy Approach: Loads/Stores

typedef struct __lock_t { int flag; } lock_t;

void init(lock_t *mutex) {
    // 0 -> lock is available, 1 -> held
    mutex->flag = 0;
}

void lock(lock_t *mutex) {
    while (mutex->flag == 1) // TEST the flag
        ; // spin-wait (do nothing)
    mutex->flag = 1; // now SET it!
}

void unlock(lock_t *mutex) {
    mutex->flag = 0;
}

This has two problems. The first is that it is just wrong, If you get interrupted after checking the flag is 0, but before setting it to 1, then you can have two threads setting it to 1 and being in the critical section.

The second problem is the spin-wait can be expensive - especially on a single CPU machine.

Test and Set Approach

We solve the problem in the prior section by making the checking/setting as an atomic operation, which CPUs provide. It is called test and set.

The C equivalent would be:

int TestAndSet(int *old_ptr, int new) {
    int old = *old_ptr; // fetch old value at old_ptr
    *old_ptr = new;     // store ’new’ into old_ptr
    return old;         // return the old value
}

Now the locking code is simple:

typedef struct __lock_t {
    int flag;
} lock_t;

void init(lock_t *lock) {
    // 0: lock is available, 1: lock is held
    lock->flag = 0;
}

void lock(lock_t *lock) {
    while (TestAndSet(&lock->flag, 1) == 1)
        ; // spin-wait (do nothing)
}

void unlock(lock_t *lock) {
    lock->flag = 0;
}

The subtle difference is that here we check what the value of flag was before we got to modify it. If it already was 1, then we didn’t actually have the lock to begin with, so we continue to wait.

Note that this isn’t simply C code allowing it - the support comes from the CPU.

Note that this still causes a spin lock. On a single core CPU, you need a preemptive scheduler to ensure we get out of the spinning mode. A preemptive scheduler interrupts a thread via a timer.

This spin lock fails on the fairness test.

For performance, the spin lock does poorly on single core CPUs - especially if there are \(N\) threads waiting, but not too bad on multicore as long as the number of threads doesn’t vastly exceed the number of CPUs.

Compare and Swap

The compare and swap instruction is a HW instruction. It compares the value at a particular pointer location with expected, and if it is expected, it will update it with a new value. Either way, it will return the value it found.

You can trivially code up a test-and-set lock with this operation, but you can do more with it - described in other chapters?

Load Linked and Store Conditional

Load Linked and Store Conditional are two HW instructions. Load Linked works just like a load - it will copy a value from memory into a register. The store-conditional, though, works only if no intervening store has occurred at that memory location.

Here is a lock that uses these operations:

void lock(lock_t *lock) {
    while (1) {
        while (LoadLinked(&lock->flag) == 1)
            ; // spin until it’s zero
        if (StoreConditional(&lock->flag, 1) == 1)
            return; // if set-it-to-1 was a success: all done
                    // otherwise: try it all over again
    }
}

void unlock(lock_t *lock) {
    lock->flag = 0;
}

Here StoreConditional returns 1 only if this thread was the one to update the flag.

A more concise way of writing this:

void lock(lock_t *lock) {
    while (LoadLinked(&lock->flag) ||
           !StoreConditional(&lock->flag, 1))
        ; // spin
}

Fetch-And-Add and the Ticket Lock

The fetch and add instruction fetches the value from an address, increments 1 to it, and returns the old value.

A lock based on this is below:

typedef struct __lock_t {
    int ticket;
    int turn;
} lock_t;

void lock_init(lock_t *lock) {
    lock->ticket = 0;
    lock->turn = 0;
}

void lock(lock_t *lock) {
    int myturn = FetchAndAdd(&lock->ticket);
    while (lock->turn != myturn)
        ; // spin
}

void unlock(lock_t *lock) {
    lock->turn = lock->turn + 1;
}

Why does this ticket lock work? When a thread wants to do something, it gets its own unique “ticket” value. It spins while the current turn isn’t its turn. When a thread unlocks, it increments the turn value. This is essentially like getting a ticket at a restaurant and waiting for your turn.

The nice thing about this lock is it ensures a thread’s place. A thread can’t get stuck forever.

Test and Set With Yield

Another approach is the yield command, similar to that in Python. Instead of spinning, you yield control to another thread. Now if you have \(N\) threads waiting, there is still some waste. And frequent context switching is expensive - if you context switch at almost every instruction, could this be more expensive than a simple spinning approach?

This approach is still unfair - there is no guarantee you’ll get to execute.

A Queue Based Lock

typedef struct __lock_t {
    int flag;
    int guard;
    queue_t *q;
} lock_t;

void lock_init(lock_t *m) {
    m->flag = 0;
    m->guard = 0;
    queue_init(m->q);
}

void lock(lock_t *m) {
    while (TestAndSet(&m->guard, 1) == 1)
        ; //acquire guard lock by spinning
    if (m->flag == 0) {
        m->flag = 1; // lock is acquired
        m->guard = 0;
    } else {
        queue_add(m->q, gettid());
        m->guard = 0;
        park();
    }
}

void unlock(lock_t *m) {
    while (TestAndSet(&m->guard, 1) == 1)
        ; //acquire guard lock by spinning
    if (queue_empty(m->q))
        m->flag = 0; // let go of lock; no one wants it
    else
        unpark(queue_remove(m->q)); // hold lock
                                    // (for next thread!)
    m->guard = 0;
}

Why does this work? With some help from the OS. The park command puts a thread to sleep, and unpark wakes it up.

We kind of have two flags here: flag and guard. A given thread will spin until it can set the guard to 1. If it can then acquire the lock, great! If not, it sets guard back to 0, but puts itself into a queue.

Now the thread that has the lock, when unlocking, will wake up the next thread in the queue. This thread will not have to wait!. It’s basically passing off the torch to this other thread.

Now there is a race condition here, so Solaris added setpark. Another approach is to let the kernel handle parking/unparking as an atomic operation.

Two Phased Lock

There is another type of lock: A thread will spin for a short while in an attempt to acquire the lock. If it fails, it goes to sleep and is woken up only when the lock is available.