Database table locking and concurrent access

I know that table locking sucks and it doesn’t scale, but I’m not sure the best way to handle this particular problem, so maybe one of you SQL dudes might have a good idea. FWIW, I’m using SQLAlchemy instead of SQL statements directly, but I can drop down to SQL statements if necessary.

So I have a table full of prioritized stuff, like a Netflix Queue. Multiple people can be editing this queue simultaneously. Each item in the queue as an order # from 1 to N, where N is the # of items in the queue.

For simplicity, I’m ensuring that the ordering is contiguous, so every time an operation is made on the table elements I go ahead and update the ordering. This is slow, yeah, but it works for now (if anyone has a better idea, I’m all ears). In other words, if you have 100 items, and you delete the 50th item, I just go regrab the table, sort, and then renumber.

The potential problem I’m running into is that multiple people might be monkeying with the table order, which means all kinds of fun and fucked up race conditions. While I’m reorganizing the table, someone else can log in and delete one of the things I’m working on.

What is the standard way to handle this type of thing in DB land? I could lock a range of rows or an entire table. Locking a range of rows has a problem because it can still introduce a hiccup if someone is merely adding an item to the top of the queue, so when they try to reorder things automatically it’ll stall waiting for the lock to release.

I’m guessing there might be a nifty way to express this in SQL and throw it in as a transaction?

I’m sure this is a common problem, but just googling “SQL table locking” didn’t show up much useful information.

I’m using SQLite as the backend for now but will probably migrate to something else* eventually.

  • Which is not up for religious debate yet

Just use a transaction. It’ll lock what it needs to (should be just the relevant rows), and not affect the rest of the table.

Table locking would mean that I couldn’t edit my queueueueue if anyone on Netflix was editing theirs, which is obviously untenable.

Also, you should be able to do it in a single statement, which’ll happen atomically anyay.

update queueitems set position = position + 1 where listno = @list and position > @addedrowposition

Do you need to keep things contiguous? If you delete the 50th element, is there some reason why you need to renumber all 50 rows above it? Maybe you can just order them at runtime, like a list of search results, based on a priority field (something like an int with a default space of 100 between items). Then you’d only have to reorder things once in a blue moon.

Hmmm, I’ve hardly ever used SQLLite so I don’t know if it’s a real ACID compliant database that promises transaction integrity. I’ll assume it is for the sake of this answer but if it’s not then disregard the following :)

First off, which is more common, updates to the order or needing to access an item by absolute order/position? If the former then it’s probably more efficient to recalculate the ordering upon access instead of recalculating absolute list position upon every item update. Might cut down the re-order calculations. If every item in the list contains a relative list position, say references to immediate neighbors, then you can get away with a few row level locks of the affected record and it’s immediate neighbors instead of a table level lock. Then calculate absolute position only when absolute position is needed.

If absolute position is needed frequently/always and any given row update might scramble the whole table then it’s tougher. At that point it’s a table locking issue basically. However, speaking from years of DB admin, an application that frequently blocks itself and harms its own performance with table level locks is flawed not at the SQL level but at a higher design level.

Also, is the number that’s being reordered your primary key? That can be rather dangerous in cases where the entry #50 displayed to the user isn’t the same #50 when he finally hits the ‘delete’ button a half hour later. The primary key should be something that’s guaranteed to be unique and permanent (e.g., an autoincrementing integer column).

The app is very similar to a netflix queue, the only real difference is that properties for each row might be edited (e.g. rating, maybe who has it checked out, etc.).

So given that, in general I’d say that the frequency order is probably:

READ > UPDATE > SORT > INSERTS > DELETES

The priority doesn’t need to be contiguous (in Netflix it is however), and it is not the primary key (primary keys are independent and static).

My guess is that there’s a magical select statement that would allow me to do the reordering, but I just don’t know SQL well enough to say.

The most common operation would be shifting priority, e.g. saying “Actually, I want this movie before this other one”. Since DB interfaces are so wacky to me, I’m not sure how to express that in DB speak, but basically:

target_row
source_row

source_row.priority = target_row.priority

p = source_row.priority + 1
for all rows after source_row:
row.priority = p
p = p + 1

That’s probably the single most common operation I’d need to do. I’m currently doing that manually, which I’m guessing totally sucks.

An ugly solution, but then not so ugly when you actually get down to it: Implement your queues as a strings containing a comma separated list of the item ids. You can store those on a separate queues table in a “sequence” blob field, easily read and manipulate them on your program as arrays and then update them back into their single row.
Two people editing the same list? Last to finish wins… why complicate it further? If you really need locking, now you can lock a single row in a table for the whole list.
Should work great if your queues aren’t going to be huge.

This might not make any sense now that I think I don’t know how the hell a netflix queue looks

“Mostly” ACID-compliant!

Really, just use transactions. It’ll be much less work than getting anything else working reliably.

The standard way to handle multiple users inserting into a linked list like this is to use a method called “spider locking” or “chain locking” or “hand-over-hand” locking.

Basically, you traverse the queue with two locks, as follows:

Lock A
Lock B
Unlock A
Lock C
Unlock B
Lock D

You can then insert between C and D

For a deletion, you need a third lock when you reach the object you want to remove. For sorts, you need three locks also, and leave one at the start of the area you are sorting, and one at the end.

I don’t know if that would be prohibitively slow on a database.

Just change everything between the old and new positions appropriately. Two update statements, no freakin’ comma separated rows, no table locking.

UPDATE rows
SET priority = priority - 1
WHERE priority > @OldPriority AND priority <= @NewPriority

UPDATE rows
SET priority = priority +1
WHERE priority >= @NewPriority AND priority < @OldPriority

I have maintained your code, and I hate you.

You have no way of knowing that. Netflix displays it to you as an in-order sequence with no holes, but the database values don’t have to be the same as the presentation values. It’s trivial to return non-contiguous things in priority order and then have the UI display them as contiguous values.

That’s not thread-safe, I think. You also need to change the priority of the object you’re moving, and another user doing a swap between those two ops will cause corruption.

Also, it would be hard to detect stale writes that way. Multiple writers to a data set means you need locking. Even something as simple as an insert onto the end of a table is going to cause locking internally in the DB. You should probably lock the table prior to sorting. The dirty cache issues will be painful if you don’t.

Sorry, you’re supposed to put a transaction around the whole thing. I was just telling him how to do what he wants to do without elaborate cursors and actually digging Codd’s dead body up and shooting it in the head.

And I don’t get why people keep insisting on “table locking.” Just do the transaction, and let the DBMS figure out what level it’s going to lock at. If you’ve got multiple queueueueues like this in your table, it’s unlikely to ever actually have to lock the whole thing.

SQLAlchemy can be made transactional by default (all actions are part of a transaction that is flushed upon an explicit commit). I’ll give that a whack, although I’m not sure what the semantics are if the necessary change is invalid due to a change in relevant data that occurred while the transaction was being built.

Looks like it’ll throw an exception, which you can catch and issue a transaction.rollback().

There’s a reason why Oracle is a multi-billion dollar company. A lot of that money went into making transactions work. Now you can make transactions work for you.

You enterprise guys, always so stressed…

I did a little quick research and it seems that SQLite supports exactly one kind of lock, a global DB lock. So the OP really can’t improve things anyway as long as SQLite is the back end for this app. If he moves to a real DB then things change of course.

That’s still not guaranteed safe though in the case of multiple apps attempting to do this simultaneously. Yeah a transactional approach will guarantee that one fires completely and commits before the next one gets to start changing rows, but by the time the second fires the values for NewPriority and OldPriority will contain incorrect values and cause odd results. The beginning of the transaction would have to contain a select for update or similar to guarantee a static snapshot of the affected data. Also, since the affected data in this hypothetical transaction is the entire fricking table all we’ve really accomplished is recommending to the OP a more complicated way to recreate his original problem wherein an update requires locking an entire table briefly. It works, but it’s not performance friendly for a busy table with lots of minor updates trickling in.

I think the key here is not to store the priority but rather calculate it in on-demand fashion and have each item in the queue store only relative positioning data rather than absolute. So in more practical terms, something like:

create table items
(
id number,
item_data char(40),
. . .
);

create table item_queue
(
queue_id number,
item_id number REFERENCES items.id,
neighbor_above number,
neighbor_below number,
. . .
);

Just to illustrate the idea, we might see the following data in a queue using the example schema I listed.

queue_id = 1, item_id = 22, neighbor_above=NULL, neighbor_below=45
queue_id = 1, item_id = 45, neighbor_above=22, neighbor_below=19
queue_id = 1, item_id = 19, neighbor_above=45, neighbor_below=37
queue_id = 1, item_id = 37, neighbor_above=19, neighbor_below=108
queue_id = 1, item_id = 108, neighbor_above=37, neighbor_below=64
queue_id = 1, item_id = 64, neighbor_above=108, neighbor_below=20
queue_id = 1, item_id = 20, neighbor_above=108, neighbor_below=NULL

Now I see several possible scenarios here that affect queue ordering. They are

  1. New item insertion
  2. Item deletion
  3. Item moves one position, so just swaps with immediate neighbor
  4. Item moves multiple positions.

Given those scenarios, here are the locking requirements

  1. Select for update or otherwise lock the two rows that will be the new above and below neighbors of the newly inserted row. Total locking requirement: 3 rows max.
  2. Select for update or otherwise lock the two rows that are the existing above and below neighbors of the row targeted for deletion. Total locking requirement: 3 rows max.
  3. Select for udpate or otherwise lock the immediate neighbors of the two rows swapping positions. Total locking requirement: 4 rows max.
  4. Select for udpate or otherwise lock the immediate neighbors of the original above/below neighbors and the new above/below neighbors in addition to the moving row. . Total locking requirement: 5 rows max.

Now this solves the locking problem but does require a little more work to assign absolute position to items. A hierarchical query can solve it at the SQL level, but I think the hierarchical queries I’m familiar with aren’t ANSI standard SQL but rather something custom that Oracle cooked up. More practically I suspect that it’s rather trivial at the code level to translate the result set from a “select * from item_queue” into a data structure that lends itself to ordering.