This project is read-only.

Concurrency, Consistency and Locking

Contents

  1. Overview - preserving consistency with locks
  2. VITA locking API
  3. Locks, transactions, connections and exceptions
  4. Locking Test Fixture
  5. Implementation SQL: MySql, Postgres
  6. Implementation for MS SQL Server
  7. NoLock option
  8. Real-life story - concurrency troubles with a single user

Overview - preserving consistency with locks

We store data documents in relational databases. A logical document in normalized database is split between several linked tables, so the information resides in different places. With concurrent access to the data comes the problem of consistency. Each update process supposed to leave the document in consistent state - with all internal constraints satisfied. But what about two or more concurrent updates? Running updates in transactions guarantees the consistency, but brings the risk of deadlocks. As for reads, we need to ensure that we do not catch the document in the middle of update by other process, so we end up with temporarily inconsistent snapshot.
To sum it up, we need to ensure that:
  • No two client processes can update the same document concurrently
  • No concurrent read is allowed while the document is being updated
  • Reads do not block other concurrent reads and can run in parallel - this is an extra requirement for performance
The standard solution for these problems is explicit database locks. Database servers give you a range of different locks, but we will use just two record-level locks:
  • Shared Read - allows multiple clients to acquire the lock and read the data.
  • Update Lock - exclusive lock for update operations
The control logic of the locks resides in application- only the app code knows that a SELECT statement starts a multi-step update of a particular document, so all other concurrent operations should wait until it's done. Actual locks on the other hand reside in the database. VITA, being a bridge between an application and a database, provides locking methods that allow locking from the app code.
We assume that any 'logical' document has a root entity (header) that we will use as a target of locks. The locking strategy for consistency is straightforward:
  • Update - obtain an exclusive update lock on the document header before starting an update operation.
  • Read - obtain a shared read lock on the document header . Release the lock when the read is completed.
Note that we do not need to lock all records that we are going to load or update - just a single record, document header.
For an update lock, we need to obtain the lock at the 'first read', when we only prepare to update and start loading the target document. Otherwise, we risk to get an inconsistent read, just like with regular reads.
For all database servers the lifetime of the lock is the enclosing transaction. In order to make lock last longer than a single statement, we have to enclose it in explicit transaction. For an update lock, the transaction is committed and the lock is released when we call session.SaveChanges() at the end of update operation - so update locks are released automatically to some extent. For a read lock, we must release the lock explicitly at the end of read operation.
A few words about particular servers. MySql and Postgres offer a direct, 'standard' implementation for both lock types with special FOR ... clauses for SELECT statements. And everything works as expected. SQL Server is different and surprisingly presented quite a challenge in implementation. It does not offer FOR clauses; instead, you use table hints. Plus there several other options that all must be set properly to make locking work. We discuss SQL Server case in a separate section.

VITA locking API

The lock-related methods are implemented as extensions in a static class in Vita.Entities.Locking namespace. If you add a using clause for this namespace, you will see extra overloads to core IEntitySession methods and a method to release locks:
  session.GetEntity<TEntity>(object pk, LockOptions options);
  session.EntitySet<TEntity>(LockOptions options);
  session.ReleaseLocks();
The LockOptions flag enumeration has the following members:
  [Flags]
  public enum LockOptions {
    None = 0, 
    NoLock = 1, 
    SharedRead = 1 << 1, 
    ForUpdate = 1 << 2,
  }
The last two members are the lock types that we are interested in here. The NoLock value is for different situation and discussed in the section below.
The GetEntity method starts a transaction and executes a select-by-id SQL with extra locking clause, dependent on the server type. Use this method to load and lock the document header. For a READ operation load document details and release the locks by using the ReleaseLocks() method. The ReleaseLocks() simply aborts the transaction. Just like with completing transactions, a good practice is to use try/finally block, and put ReleaseLocks() inside finally block. For an UPDATE operation, after setting the update lock with GetEntity() method, load/change document details (and header), finishing with session.SaveChanges() - it will save changes, commit the transaction releasing the locks.
The EntitySet(...) is for locking multiple records. The method returns a queryable object with a 'trigger' inside. It will not do any locking immediately; but when you execute a LINQ query based on this object, it will start a transaction and add a locking clause to the SQL, thus locking all records touched by a query.
We recommend to use the GetEntity() method in most cases. When you need to update multiple documents selected by some criteria, the preferred way is to first find the IDs of all documents using a LINQ query without locks, and then to lock/update all documents one-by-one.
The following sample code shows a typical use of locking methods - for locked update and locked document load:
  public void AddBookToOrder(IEntitySession session, Guid orderId, 
                             Guid bookId, int quantity) {
    var order = session.GetEntity<IBookOrder>(orderId, LockOptions.ForUpdate);
    var book = session.GetEntity<IBook>(bookId);
    var orderLine = session.NewEntity<IBookOrderLine>();
    orderLine.Order = order;
    orderLine.Book = book;
    orderLine.Quantity = quantity;
    orderLine.Price = book.Price;
    order.Lines.Add(orderLine);
    // that's why we need lock!
    order.Total = order.Lines.Sum(ol => ol.Price * ol.Quantity);
    session.SaveChanges(); 
  }
  public BookOrderStats GetBookOrderStats(IEntitySession session, Guid orderId) {
    try {
      var order = session.GetEntity<IBookOrder>(orderId, LockOptions.SharedRead);
      return new BookOrderStats() { 
        OrderId = orderId, LineCount = order.Lines.Count, 
        MaxPrice = order.Lines.Max(ol => ol.Price) };
    } finally {
      session.ReleaseLocks(); 
    }
  }

Locks, transactions, connections and exceptions

The scope of the database lock (its lifetime) is the enclosing transaction. Any select statement is enclosed into an implicit transaction if there is no explicit transaction already. This means that a standalone select with lock would release the lock as soon as it completes. To make a lock last longer, we need to start an explicit transaction before we execute Select with lock. VITA does just that.
By default, the SELECTs are executed on a connection that is fresh opened or acquired from the pool and closed or returned to the pool when SELECT completes. You can make a connection permanently associated with a session by using a direct DB access facility (session.GetDirectDbConnector method). The connection and transaction opened in this case are attached to the Entity session and stay with it until you close it.
This facility is used in locking. Whenever you request a lock, a connection is opened and is attached to the entity session. The system starts a transaction and then executes the Select-with-lock statement. All consequitive operations (reads and writes) are executed in the context of this transaction. The transaction is committed when you call session.SaveChanges() or session.ReleaseLocks.
Note that this behavior - auto-commit on SaveChanges is slightly different from what you have with direct db connector operations. If you start a transaction through direct connector and then call SaveChanges(), the method does not introduce its own transaction, it recognizes there is already one on connection, and it does not commit it. It assumes that the code that started the transaction will commit it explicitly. For a transaction started by locking method, it is different. The transaction is marked with a flag CommitOnSave, so SaveChanges() call will commit it automatically.
Exceptions? - What happens with lock transactions and connection if an exception is thrown? We all know that db connections should be closed and transactions ended explicitly after you done using them. In case of an exception, you normally close the connection in the catch block. The read-with-lock operations in VITA do open connections and start transactions.
The error scenario might depend on where exception happens.
If an exception is thrown inside VITA's data access methods, a transaction is aborted automatically, always.
As for connection, it has an associated lifetime attribute, which essentially tells the engine what to do with it in case of exception. If the connection was started by lock statement, it will be closed.
If exception occurs in your code and outside of VITA data access methods, your code should handle it properly. In case of read lock, the 'finally' block like shown in example above is sufficient. For read lock you need to add a ReleaseLocks() call in the finally block.
In Web service projects VITA makes all this handling a bit easier for you. Any connection you open is registered in Disposables list of OperationContext object. When the web call completes (with or without exception), VITA's HTTP handler closes/disposes all these disposable objects in the default operations context associated with the call. It means
VITA explicitly closes all connections opened during the web call handling, if they are opened with the default operation context of the call.
So you do not need to worry too much about open connections in case of error - they will not live long, they will be closed by the framework automatically at the end of web call.

Locking Test Fixture

To verify that locking is working as expected, we need to put it under really stressful test. This test was created, and is in TestLocking method in Basic unit test project. The data model contains two tables:

LockTestSchema.jpg
Figure 1: Test data model

We have a document header (Doc) and details (DocDetail) representing a virtual document. Document details is just a set of parameters - Name-Value (string/int) pairs, with unique constraint on (DocId,Name), so parameter name is unique for a document. The internal consistency constraint is that Total in document header is equal to the sum of all values in related DocDetail records. We will use this constraint to verify the consistency of the loaded document in doc-read operation. The names in DocDetails are limited to just 5 values: "N0"... "N4", and Value column values are from 0..9 range.
The test proceeds as follows:
  1. Create 5 documents named D0..D1
  2. Initialize with zero 2 error counters: update failures (deadlocks) and read failures (consistency check)
  3. Launch 30 threads that repeat 50 times the following operation:
  4. Randomly pick a document out of 5 created. Randomly pick the action type - UPSERT (update/insert), DELETE, LOAD
    1. For UPSERT : load the document header with update lock; generate random integer value (0..9); randomly pick a value name (N0..N4); if DocDetail with this name exists, update its value to generated random value; otherwise, insert new DocDetail row; update total in document header; save changes;
    2. For DELETE : lock the document header for update; randomly pick a DocDetail name (N0..N4) and delete DocDetail row with this name if it exists; update total in document header; save changes.
    3. For LOAD : Load document header with shared read lock; load document details records; release the locks. Check the loaded document for consistency: doc.Total == Sum(DocDetail.Value). If consistency check fails, increment error count.
  5. Wait for all threads to complete the execution.
  6. Check that error counts are zero.
Note: any exceptions thrown inside the loop are caught; the catch block increments the error count, and loop continues.
This is a quite heavy test, with unrealistically crowded concurrent access to a few documents - 30 threads beating on 5 docs with random read/writes. The code is sprinkled with Thread.Yield() calls, to make more thread switches in the middle of operations. The test passed without problems with VITA locking for MySql and Postgres, but failed with initial implementation for MS SQL Server, clearly proving itself as a good test fixture. With final, fixed implementation for MS SQL the test passes for all 3 servers.

Implemenation SQL: MySql, Postgres

Both MySql and Postgres directly implement special SELECT clauses for setting locks. VITA simply adds an appropriate clause to the SELECT-with-lock statement. The following samples show the clauses for both servers.
-- MySql
  SELECT * FROM "SomeTable" WHERE "Id" = 123 FOR UPDATE;  
  SELECT * FROM "SomeTable" WHERE "Id" = 123 LOCK IN SHARE MODE;  
-- Postgres 
  SELECT * FROM "SomeTable" WHERE "Id" = 123 FOR UPDATE;  
  SELECT * FROM "SomeTable" WHERE "Id" = 123 FOR SHARE;  
Use of these clauses results in exact behavior we are looking for. The heavy-hit tests prove it. MS SQL Server turned out to be a bit different.

Implementation for MS SQL Server

SQL Server does not provide special lock clauses for SELECT statements. Instead, you have table hints. On the surface, it appears there are hints just for the job - based on what you can derive from documentation:
  SELECT * FROM "SomeTable" WITH(UpdLock) WHERE "Id" = 123;  -- exclusive update lock
  SELECT * FROM "SomeTable" WITH(Serializable) WHERE "Id" = 123; -- shared read lock
Serializable hint has an old synonym HoldLock, which works the same. The attempt to run a lock test brings a surprise - a lot of errors reporting transaction deadlocks and one of the processes killed as a victim. Not good. Trying to change/adjust hints did not bring anything, deadlocks and inconsistent loads. It appears SQL Server escalates the locks easily, and the only escalation level above row is table - so entire tables get locked (or attempted) resulting in deadlocks. Attempts to disable escalation on a table (by setting ESCALATION_LEVEL = DISABLED) did not help - still deadlocks.
One workaround was found - if we use exclusive update lock (UpdLock hint) both for updates and reads - everything is working fine. However this is not a good solution: in this case we lose parallel execution for reads, and this can hurt performance and throughput of the application.
I had to turn to experts for advice, posted a question in TSQL forum, and this resulted in a quite long thread. We finally figured it out.
Instead of using With(Serializable) table hint for document read/lock, we should use a transaction with snapshot isolation level. Using this level requires a database level option to be enabled, it is off by default:
  ALTER DATABASE VitaTest SET ALLOW_SNAPSHOT_ISOLATION ON;
Next, remember that before setting the read lock we need to start a transaction which will define the lifetime of the lock? This transaction should be started with isolation level = Snapshot. After that, when we load the document header we do not need to set any locks with hints. With snapshot isolation the server will pin the time of transaction start, and will ensure that all the reads in the transaction will bring the data as they were at the beginning of the transaction - with isolation of concurrent updates. So we get a consistent data view without explicit locks.
So the final arrangements for SQL Server are:
  • UPDATE: transaction isolation level = ReadCommitted, Hint = UpdLock
  • READ: transaction isolation level = Snapshot, Hint = (no hints)
A little side note. What is interesting - and disturbing - is that a few initial advises on how to fix the problem from real experts (MS SQL Server MVP) proved to be bad guesses. Which says something about the severity of the problem - the very common and very important scenario, that I guess occurs constantly out there, practically any serious app uses some variation of it - this scenario is not something that is easy to figure out correctly. Many of us would implemented the way I did at the beginning, and it would appear things are working great, we have isolation and consistency. It takes a heavy hit test (30 threads beating on 5 docs) to prove it wrong. I am afraid, many developers out there have this wrong implementations running in production, and do not know this. Which illuminates 2 points -
  • MS needs to fix it somehow
  • Yet another reason to use a framework like VITA - you get correct implementation of locking, checked with real heavy test.
We have one remaining problem - enabling snapshot isolation at database level. You can always do it manually, in production or test environment, but it would be nice to have a way to do it automatically on new databases - to handle databases on individual developers' machines. Data migrations provide a way to do it - you just tie a migration action to version 1.0.0.0 of your app/module. See an example in TestLocking test, and also in the BooksStore sample project - file
BooksModule_EnableSnapshot.cs.

NoLock option

This option has effects only in MS SQL Server.
There's an extra value of LockOptions enum: NoLock. Note that it is different from LockOptions.None: with None, no hints are added to table reference in SQL, so GetEntity(pk, LockOptions.None) behaves the same as regular GetEntity method. With LockOptions.NoLock, the actual SQL will have a hint 'With(NOLOCK)' added to table references. Roughly, it instructs the server engine to read the data without establishing any locks or waiting for any locks already there.
The NoLock option is useful in statistical queries against large, heavily-used transactional tables that have a constant flow of updates/inserts/deletes. If you run a statistical query against that table - count rows or sum values based on some filter - normally it would have to wait until all writes are completed, then it will most likely lock the entire table and run, making all incoming writes to wait. This is a necessary behavior if you want the result to be a correct representation of data at some point in time. But this can really disturb the flow of concurrent operations, especially if the statistical query is complex and takes some time to complete.
However in most cases for statistical queries you do not need an absolute precision, you're OK with a ballpark, accurate within a few seconds window. In this case you can use the NoLock option - the query will run without blocking out any writes, and will use whatever is there, committed or not. The heavy transactional load will be going undisturbed and you can get your stats - everybody's happy.

Real-life story - concurrency troubles with a single user

This is a story from my own experience. I always thought that you need locks when you have multiple users that can eventually try to edit the same document. It turned out that even with a single user there are cases when locks are the right solution.
We have a Web application that allows users to edit a data document, which has an attached child table that stores a list of name/value pairs (document parameters). This table has a unique constraint on (docId, name) pair - so parameter names should be unique for a document. The user can add a parameter by filling out name, value boxes on a web page and clicking the Submit button. The server-side code first looks up a parameter record (by name) in the database. If the record exists, the value in the record is updated; otherwise, a new record is inserted.
So we went to testing a prototype of the app (without locks, it wasn't there yet), and suddenly started seeing strange errors on the server. Submits were failing occasionally with 'unique index violation' error. Web call log clearly showed that in each case there were two identical submits, from the same client/machine, within a fraction of a second. Then the error cause is clear - two requests come at the same time, both try to lookup existing name, do not find it, and then both try to insert a new name/value. The first wins and goes OK, while the second results in unique constraint violation.
OK, the picture on the server is clear. Why then two calls from the same client? Simple - occasional double-click on Submit button. We also had clients using tablet devices with touch (Safari on iPad) - and it turns out touches often result in doubles or even multiples.
Next, what happens on click in the browser. The click event is handled by a JavaScript code. Obvious fix - disable the button once it is clicked. We tried it, errors became less frequent, but were still happening. The reason is that the UI is controlled by a client-side framework, and things like enable/disable are done through manipulating button styles, indirectly - through some element/attribute bindings to data. It takes some time for client-side framework like Angular to propagate the change and actually visually disable the button. As a result, double-clicks do go through. This is a standard situation in modern Web single-page apps - don't expect UI to change instantly.
Rather than continuing to fight the issue on the client side, we opted for a fix on the server - by locking the document on update. In any case the server code should be able to handle requests correctly, no matter if they come far apart or almost at the same time. Locking provides such a solution.
This explains why the record-locking functionality suddenly popped up to the top of my TODO list.





Last edited Mar 23, 2016 at 5:20 AM by rivantsov, version 17

Comments

No comments yet.