Wednesday, March 19, 2008

Transactions and Locks in SQL Server

• What is a “Database Transactions “?
A database transaction is a unit of work performed against a database management system or similar system that is treated in a coherent and reliable way independent of other transactions. A database transaction, by definition, must be atomic, consistent, isolated and durable. These properties of database transactions are often referred to by the acronym ACID.

Transactions provide an "all-or-nothing" proposition stating that work units performed in a database must be completed in their entirety or take no effect whatsoever. Further, transactions must be isolated from other transactions, results must conform to existing constraints in the database and transactions that complete successfully must be committed to durable storage.

In some systems, transactions are also called LUWs for Logical Units of Work.

• What is ACID?
The ACID model is one of the oldest and most important concepts of database theory. It sets forward four goals that every database management system must strive to achieve: atomicity, consistency, isolation and durability. No database that fails to meet any of these four goals can be considered reliable.

Let’s take a moment to examine each one of these characteristics in detail:

Atomicity states that database modifications must follow an “all or nothing” rule. Each transaction is said to be “atomic.” If one part of the transaction fails, the entire transaction fails. It is critical that the database management system maintain the atomic nature of transactions in spite of any DBMS, operating system or hardware failure.

Consistency states that only valid data will be written to the database.If, for some reason, a transaction is executed that violates the database’s consistency rules, the entire transaction will be rolled back and the database will be restored to a state consistent with those rules. On the other hand, if a transaction successfully executes, it will take the database from one state that is consistent with the rules to another state that is also consistent with the rules.

Isolation requires that multiple transactions occurring at the same time not impact each other’s execution. For example, if Joe issues a transaction against a database at the same time that Mary issues a different transaction, both transactions should operate on the database in an isolated manner. The database should either perform Joe’s entire transaction before executing Mary’s or vice-versa. This prevents Joe’s transaction from reading intermediate data produced as a side effect of part of Mary’s transaction that will not eventually be committed to the database. Note that the isolation property does not ensure which transaction will execute first, merely that they will not interfere with each other.

Durability ensures that any transaction committed to the database will not be lost. Durability is ensured through the use of database backups and transaction logs that facilitate the restoration of committed transactions in spite of any subsequent software or hardware failures.

• What is “Begin Trans”, “Commit Tran”, “Rollback Tran” and “Save Tran”?
Transactions group a set of tasks into a single execution unit. Each transaction begins with a specific task and ends when all the tasks in the group successfully complete. If any of the tasks fails, the transaction fails. Therefore, a transaction has only two results: success or failure. Incomplete steps result in the failure of the transaction.

Users can group two or more Transact-SQL statements into a single transaction using the following statements:

* Begin Transaction
* Rollback Transaction
* Commit Transaction

Begin Transaction
Marks the starting point of an explicit, local transaction. BEGIN TRANSACTION increments @@TRANCOUNT by 1.

Rollback Transaction
If anything goes wrong with any of the grouped statements, all changes need to be aborted. The process of reversing changes is called rollback in SQL Server terminology.
A ROLLBACK, on the other hand, works regardless of the level at which it is issued, but rolls back all transactions, regardless of the nesting level

Commit Transaction
If everything is in order with all statements within a single transaction, all changes are recorded together in the database. In SQL Server terminology, we say that these changes are committed to the database.
A COMMIT issued against any transaction except the outermost one doesn't commit any changes to disk - it merely decrements the@@TRANCOUNT automatic variable.

Save Tran
Savepoints offer a mechanism to roll back portions of transactions. A user can set a savepoint, or marker, within a transaction. The savepoint defines a location to which a transaction can return if part of the transaction is conditionally canceled. SQL Server allows you to use savepoints via the SAVE TRAN statement, which doesn't affect the @@TRANCOUNT value. A rollback to a savepoint (not a transaction) doesn't affect the value returned by @@TRANCOUNT, either. However, the rollback must explicitly name the savepoint: using ROLLBACK TRAN without a specific name will always roll back the entire transaction.

• What are “Checkpoint’s” in SQL Server?
Forces all dirty pages for the current database to be written to disk. Dirty pages are data or log pages modified after entered into the buffer cache, but the modifications have not yet been written to disk.

Syntax
CHECKPOINT

• What are “Implicit Transactions”?
Microsoft SQL Server operates in three transaction modes:
Autocommit transactions
Each individual statement is a transaction.
Explicit transactions
Each transaction is explicitly started with the BEGIN TRANSACTION statement and explicitly ended with a COMMIT or ROLLBACK statement.
Implicit transactions
A new transaction is implicitly started when the prior transaction completes, but each transaction is explicitly completed with a COMMIT or ROLLBACK statement.

• Is it good to use “Implicit Transactions”?
If you want all your commands to require an explicit COMMIT or ROLLBACK in order to finish, you can issue the command SET IMPLICIT_TRANSACTIONS ON. By default, SQL Server operates in the autocommit mode; it does not operate with implicit transactions. Any time you issue a data modification command such as INSERT, UPDATE, or DELETE, SQL Server automatically commits the transaction. However, if you use the SET IMPLICIT_TRANSACTIONS ON command, you can override the automatic commitment so that SQL Server will wait for you to issue an explicit COMMIT or ROLLBACK statement to do anything with the transaction. This can be handy when you issue commands interactively, mimicking the behavior of other databases such as Oracle.

What's distinctive about implicit transactions is that reissuing SET IMPLICIT_TRANSACTIONS ON does not increase the value of @@TRANCOUNT. Also, neither COMMIT nor ROLLBACK reduce the value of @@TRANCOUNT until after you issue the command SET IMPLICIT_TRANSACTIONS OFF. Developers do not often use implicit transactions; however, there is an interesting exception in ADO. See the sidebar, Implicit Transactions and ADO Classic.

• What is Concurrency?
When many people attempt to modify data in a database at the same time, a system of controls must be implemented so that modifications made by one person do not adversely affect those of another person. This is called concurrency control.

Concurrency control theory has two classifications for the methods of instituting concurrency control:
Pessimistic concurrency control
A system of locks prevents users from modifying data in a way that affects other users. After a user performs an action that causes a lock to be applied, other users cannot perform actions that would conflict with the lock until the owner releases it. This is called pessimistic control because it is mainly used in environments where there is high contention for data, where the cost of protecting data with locks is less than the cost of rolling back transactions if concurrency conflicts occur.
Optimistic concurrency control
In optimistic concurrency control, users do not lock data when they read it. When an update is performed, the system checks to see if another user changed the data after it was read. If another user updated the data, an error is raised. Typically, the user receiving the error rolls back the transaction and starts over. This is called optimistic because it is mainly used in environments where there is low contention for data, and where the cost of occasionally rolling back a transaction outweighs the costs of locking data when read.

• What are “Dirty reads”?
Uncommitted dependency occurs when a second transaction selects a row that is being updated by another transaction. The second transaction is reading data that has not been committed yet and may be changed by the transaction updating the row.

• What are “Unrepeatable reads”?
Inconsistent Analysis (Nonrepeatable Read)
Inconsistent analysis occurs when a second transaction accesses the same row several times and reads different data each time. Inconsistent analysis is similar to uncommitted dependency in that another transaction is changing the data that a second transaction is reading. However, in inconsistent analysis, the data read by the second transaction was committed by the transaction that made the change. Also, inconsistent analysis involves multiple reads (two or more) of the same row and each time the information is changed by another transaction; thus, the term nonrepeatable read.

• What are “Phantom rows”?
Phantom reads occur when an insert or delete action is performed against a row that belongs to a range of rows being read by a transaction. The transaction's first read of the range of rows shows a row that no longer exists in the second or succeeding read, as a result of a deletion by a different transaction. Similarly, as the result of an insert by a different transaction, the transaction's second or succeeding read shows a row that did not exist in the original read.

For example, an editor makes changes to a document submitted by a writer, but when the changes are incorporated into the master copy of the document by the production department, they find that new unedited material has been added to the document by the author. This problem could be avoided if no one could add new material to the document until the editor and production department finish working with the original document.

• What are “Lost Updates”?
Lost updates occur when two or more transactions select the same row and then update the row based on the value originally selected. Each transaction is unaware of other transactions. The last update overwrites updates made by the other transactions, which results in lost data.

• What are different levels of granularity of locking resources?
Microsoft SQL Server 2000 has multigranular locking that allows different types of resources to be locked by a transaction. To minimize the cost of locking, SQL Server locks resources automatically at a level appropriate to the task. Locking at a smaller granularity, such as rows, increases concurrency, but has a higher overhead because more locks must be held if many rows are locked. Locking at a larger granularity, such as tables, are expensive in terms of concurrency because locking an entire table restricts access to any part of the table by other transactions, but has a lower overhead because fewer locks are being maintained.

SQL Server can lock these resources (listed in order of increasing granularity).
RID: Row identifier. Used to lock a single row within a table.
Key: Row lock within an index. Used to protect key ranges in serializable transactions.
Page: 8 kilobyte –(KB) data page or index page.
Extent: Contiguous group of eight data pages or index pages.
Table: Entire table, including all data and indexes.
DB: Database.

• What are different types of Isolation levels in SQL Server?
READ COMMITTED
Specifies that shared locks are held while the data is being read to avoid dirty reads, but the data can be changed before the end of the transaction, resulting in nonrepeatable reads or phantom data. This option is the SQL Server default.

READ UNCOMMITTED
Implements dirty read, or isolation level 0 locking, which means that no shared locks are issued and no exclusive locks are honored. When this option is set, it is possible to read uncommitted or dirty data; values in the data can be changed and rows can appear or disappear in the data set before the end of the transaction. This option has the same effect as setting NOLOCK on all tables in all SELECT statements in a transaction. This is the least restrictive of the four isolation levels.

REPEATABLE READ
Locks are placed on all data that is used in a query, preventing other users from updating the data, but new phantom rows can be inserted into the data set by another user and are included in later reads in the current transaction. Because concurrency is lower than the default isolation level, use this option only when necessary.

SERIALIZABLE
Places a range lock on the data set, preventing other users from updating or inserting rows into the data set until the transaction is complete. This is the most restrictive of the four isolation levels. Because concurrency is lower, use this option only when necessary. This option has the same effect as setting HOLDLOCK on all tables in all SELECT statements in a transaction.

• If you are using COM+, what “Isolation” level is set by default?
SERIALIZABLE transaction isolation level is the default isolation level for the COM+ application.

• What are “Lock” hints?
A range of table-level locking hints can be specified using the SELECT, INSERT, UPDATE, and DELETE statements to direct Microsoft SQL Server 2000 to the type of locks to be used. Table-level locking hints can be used when a finer control of the types of locks acquired on an object is required. These locking hints override the current transaction isolation level for the session.

• What is a “Deadlock”?
Deadlocking occurs when two user processes have locks on separate objects and each process is trying to acquire a lock on the object that the other process has. When this happens, SQL Server identifies the problem and ends the deadlock by automatically choosing one process and aborting the other process, allowing the other process to continue. The aborted transaction is rolled back and an error message is sent to the user of the aborted process. Generally, the transaction that requires the least amount of overhead to rollback is the transaction that is aborted.

• What are the steps you can take to avoid “Deadlocks”?
Here are some tips on how to avoid deadlocking on your SQL Server:
* Ensure the database design is properly normalized.
* Have the application access server objects in the same order each time.
* During transactions, don't allow any user input. Collect it before the transaction begins.
* Avoid cursors.
* Keep transactions as short as possible. One way to help accomplish this is to reduce the number of round trips between your application and SQL Server by using stored procedures or keeping transactions with a single batch. Another way of reducing the time a transaction takes to complete is to make sure you are not performing the same reads over and over again. If your application does need to read the same data more than once, cache it by storing it in a variable or an array, and then re-reading it from there, not from SQL Server.
* Reduce lock time. Try to develop your application so that it grabs locks at the latest possible time, and then releases them at the very earliest time.
* If appropriate, reduce lock escalation by using the ROWLOCK or PAGLOCK.
* Consider using the NOLOCK hint to prevent locking if the data being locked is not modified often.
* If appropriate, use as low of an isolation level as possible for the user connection running the transaction.
* Consider using bound connections.

• What is Bound Connection?
Bound connections allow two or more connections to share the same transaction and locks. Bound connections can work on the same data without lock conflicts. Bound connections can be created from multiple connections within the same application, or from multiple applications with separate connections. Bound connections make coordinating actions across multiple connections easier.

To participate in a bound connection, a connection calls sp_getbindtoken or srv_getbindtoken (Open Data Services) to get a bind token. A bind token is a character string that uniquely identifies each bound transaction. The bind token is then sent to the other connections participating in the bound connection. The other connections bind to the transaction by calling sp_bindsession, using the bind token received from the first connection.

• Specity the types of Bound Connections
Local bound connection
Allows bound connections to share the transaction space of a single transaction on a single server.
Distributed bound connection
Allows bound connections to share the same transaction across two or more servers until the entire transaction is either committed or rolled back by using Microsoft Distributed Transaction Coordinator (MS DTC).

• How can I know what locks are running on which resource?
Use SP_Locks system stored procedure

9 comments:

  1. i can't describe how i feel
    i really appreciate this article
    it's very useful
    full of interesting data
    and the most important is its simplicity
    thank you sooooooooo much for your effort.

    ReplyDelete
  2. Hi Amr,

    You can use SET XACT_ABORT ON option. and to find the longest running transaction use the DBCC OPENTRAN command

    ReplyDelete
  3. can u please give a simple example with the help of a query in a transaction and how the result is affected in the output about how to set shared,exclusive,update locks

    ReplyDelete
  4. Thanks Yaar its too Good

    ReplyDelete
  5. superb bro...
    could you put whole T-SQL in best arranged way...so begineer like me can go step by step.....
    thank you
    very much

    ReplyDelete
  6. Hi Jugal,

    Can you please suggest, is there a shared lock when a table is read ? How using NOLOCK improves the performance ?

    ReplyDelete
  7. NoLock will allow you to read un-commited data. using it, rest of the queries will be able to perform read/write operation (reduced blocking)

    ReplyDelete
  8. Really Very Helpful Information.

    ReplyDelete