summaryrefslogtreecommitdiff
path: root/lock/Design
diff options
context:
space:
mode:
Diffstat (limited to 'lock/Design')
-rw-r--r--lock/Design301
1 files changed, 0 insertions, 301 deletions
diff --git a/lock/Design b/lock/Design
deleted file mode 100644
index e423ff7..0000000
--- a/lock/Design
+++ /dev/null
@@ -1,301 +0,0 @@
-# $Id$
-
-Synchronization in the Locking Subsystem
-
-This is a document that describes how we implemented fine-grain locking
-in the lock manager (that is, locking on a hash bucket level instead of
-locking the entire region). We found that the increase in concurrency
-was not sufficient to warrant the increase in complexity or the additional
-cost of performing each lock operation. Therefore, we don't use this
-any more. Should we have to do fine-grain locking in a future release,
-this would be a reasonable starting point.
-
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
-1. Data structures
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
-
-The lock manager maintains 3 different structures:
-
-Objects (__db_lockobj):
- Describes an object that is locked. When used with DB, this consists
- of a __db_ilock (a file identifier and a page number).
-
-Lockers (__db_locker):
- Identifies a specific locker ID and maintains the head of a list of
- locks held by a locker (for using during transaction commit/abort).
-
-Locks (__db_lock):
- Describes a particular object lock held on behalf of a particular
- locker id.
-
-Objects and Lockers reference Locks.
-
-These structures are organized via two synchronized hash tables. Each
-hash table consists of two physical arrays: the array of actual hash
-buckets and an array of mutexes so we can lock individual buckets, rather
-than the whole table.
-
-One hash table contains Objects and the other hash table contains Lockers.
-Objects contain two lists of locks, waiters and holders: holders currently
-hold a lock on the Object, waiters are lock waiting to be granted.
-Lockers are a single linked list that connects the Locks held on behalf
-of the specific locker ID.
-
-In the diagram below:
-
-Locker ID #1 holds a lock on Object #1 (L1) and Object #2 (L5), and is
-waiting on a lock on Object #1 (L3).
-
-Locker ID #2 holds a lock on Object #1 (L2) and is waiting on a lock for
-Object #2 (L7).
-
-Locker ID #3 is waiting for a lock on Object #2 (L6).
-
- OBJECT -----------------------
- HASH | |
- ----|------------- |
- ________ _______ | | ________ | |
- | |-->| O1 |--|---|-->| O2 | | |
- |_______| |_____| | | |______| V |
- | | W H--->L1->L2 W H--->L5 | holders
- |_______| | | | | V
- | | ------->L3 \ ------->L6------>L7 waiters
- |_______| / \ \
- . . / \ \
- . . | \ \
- . . | \ -----------
- |_______| | -------------- |
- | | ____|____ ___|_____ _|______
- |_______| | | | | | |
- | | | LID1 | | LID2 | | LID3 |
- |_______| |_______| |_______| |______|
- ^ ^ ^
- | | |
- ___|________________________|________|___
- LOCKER | | | | | | | | |
- HASH | | | | | | | | |
- | | | | | | | | |
- |____|____|____|____|____|____|____|____|
-
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
-2. Synchronization
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
-
-There are four types of mutexes in the subsystem.
-
-Object mutexes;
- These map one-to-one to each bucket in the Object hash table.
- Holding a mutex on an Object bucket secures all the Objects in
- that bucket as well as the Lock structures linked from those
- Objects. All fields in the Locks EXCEPT the Locker links (the
- links that attach Locks by Locker ID) are protected by these
- mutexes.
-
-Locker mutexes:
- These map one-to-one to each bucket in the Locker hash table.
- Holding a mutex on a Locker bucket secures the Locker structures
- and the Locker links in the Locks.
-
-Memory mutex:
- This mutex allows calls to allocate/free memory, i.e. calls to
- __db_shalloc and __db_shalloc_free, as well as manipulation of
- the Object, Locker and Lock free lists.
-
-Region mutex:
- This mutex is currently only used to protect the locker ids.
- It may also be needed later to provide exclusive access to
- the region for deadlock detection.
-
-Creating or removing a Lock requires locking both the Object lock and the
-Locker lock (and eventually the shalloc lock to return the item to the
-free list).
-
-The locking hierarchy is as follows:
-
- The Region mutex may never be acquired after any other mutex.
-
- The Object mutex may be acquired after the Region mutex.
-
- The Locker mutex may be acquired after the Region and Object
- mutexes.
-
- The Memory mutex may be acquired after any mutex.
-
-So, if both and Object mutex and a Locker mutex are going to be acquired,
-the Object mutex must be acquired first.
-
-The Memory mutex may be acquired after any other mutex, but no other mutexes
-can be acquired once the Memory mutex is held.
-
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
-3. The algorithms:
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
-The locking subsystem supports four basic operations:
- Get a Lock (lock_get)
-
- Release a Lock (lock_put)
-
- Release all the Locks on a specific Object (lock_vec)
-
- Release all the Locks for a specific Locker (lock_vec)
-
-Get a lock:
- Acquire Object bucket mutex.
- Acquire Locker bucket mutex.
-
- Acquire Memory mutex.
- If the Object does not exist
- Take an Object off the freelist.
- If the Locker doesn't exist
- Take a Locker off the freelist.
- Take a Lock off the free list.
- Release Memory mutex.
-
- Add Lock to the Object list.
- Add Lock to the Locker list.
- Release Locker bucket mutex
-
- If the lock cannot be granted
- Release Object bucket mutex
- Acquire lock mutex (blocks)
-
- Acquire Object bucket mutex
- If lock acquisition did not succeed (e.g, deadlock)
- Acquire Locker bucket mutex
- If locker should be destroyed
- Remove locker from hash table
- Acquire Memory mutex
- Return locker to free list
- Release Memory mutex
- Release Locker bucket mutex
-
- If object should be released
- Acquire Memory mutex
- Return object to free list
- Release Memory mutex
-
- Release Object bucket mutex
-
-Release a lock:
- Acquire Object bucket mutex.
- (Requires that we be able to find the Object hash bucket
- without looking inside the Lock itself.)
-
- If releasing a single lock and the user provided generation number
- doesn't match the Lock's generation number, the Lock has been reused
- and we return failure.
-
- Enter lock_put_internal:
- if the Lock is still on the Object's lists:
- Increment Lock's generation number.
- Remove Lock from the Object's list (NULL link fields).
- Promote locks for the Object.
-
- Enter locker_list_removal
- Acquire Locker bucket mutex.
- If Locker doesn't exist:
- Release Locker bucket mutex
- Release Object bucket mutex
- Return error.
- Else if Locker marked as deleted:
- dont_release = TRUE
- Else
- Remove Lock from Locker list.
- If Locker has no more locks
- Remove Locker from table.
- Acquire Memory mutex.
- Return Locker to free list
- Release Memory mutex
- Release Locker bucket mutex.
- Exit locker_list_removal
-
- If (!dont_release)
- Acquire Memory mutex
- Return Lock to free list
- Release Memory mutex
-
- Exit lock_put_internal
-
- Release Object bucket mutex
-
-Release all the Locks on a specific Object (lock_vec, DB_PUT_ALL_OBJ):
-
- Acquire Object bucket mutex.
-
- For each lock on the waiter list:
- lock_put_internal
- For each lock on the holder list:
- lock_put_internal
-
- Release Object bucket mutex.
-
-Release all the Locks for a specific Locker (lock_vec, DB_PUT_ALL):
-
- Acquire Locker bucket mutex.
- Mark Locker deleted.
- Release Locker mutex.
-
- For each lock on the Locker's list:
- Remove from locker's list
- (The lock could get put back on the free list in
- lock_put and then could get reallocated and the
- act of setting its locker links could clobber us.)
- Perform "Release a Lock" above: skip locker_list_removal.
-
- Acquire Locker bucket mutex.
- Remove Locker
- Release Locker mutex.
-
- Acquire Memory mutex
- Return Locker to free list
- Release Memory mutex
-
-Deadlock detection (lock_detect):
-
- For each bucket in Object table
- Acquire the Object bucket mutex.
- create waitsfor
-
- For each bucket in Object table
- Release the Object mutex.
-
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
-FAQ:
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
-Q: Why do you need generation numbers?
-A: If a lock has been released due to a transaction abort (potentially in a
- different process), and then lock is released by a thread of control
- unaware of the abort, the lock might have potentially been re-allocated
- to a different object. The generation numbers detect this problem.
-
- Note, we assume that reads/writes of lock generation numbers are atomic,
- if they are not, it is theoretically possible that a re-allocated lock
- could be mistaken for another lock.
-
-Q: Why is is safe to walk the Locker list without holding any mutexes at
- all?
-A: Locks are created with both the Object and Locker bucket mutexes held.
- Once created, they removed in two ways:
-
- a) when a specific Lock is released, in which case, the Object and
- Locker bucket mutexes are again held, and
-
- b) when all Locks for a specific Locker Id is released.
-
- In case b), the Locker bucket mutex is held while the Locker chain is
- marked as "destroyed", which blocks any further access to the Locker
- chain. Then, each individual Object bucket mutex is acquired when each
- individual Lock is removed.
-
-Q: What are the implications of doing fine grain locking?
-
-A: Since we no longer globally lock the entire region, lock_vec will no
- longer be atomic. We still execute the items in a lock_vec in order,
- so things like lock-coupling still work, but you can't make any
- guarantees about atomicity.
-
-Q: How do I configure for FINE_GRAIN locking?
-
-A: We currently do not support any automatic configuration for FINE_GRAIN
- locking. When we do, will need to document that atomicity discussion
- listed above (it is bug-report #553).