Paul Randall and Kimberly Tripp came to the Redmond .NET User Group last night and gave an animated talk on indexing--a subject that both of them know cold. I thought I would share a few tidbits that struck me as important. Consider that Kimberly and Paul are SQL Server experts and don't always worry about "developer" issues. When asked what they thought of the Entity Framework their reaction was... less than enthusiastic. If you ever get a chance to listen to this married couple speak, jump in an take a seat. They are informative and fun.
- Optimize Row Size: SQL Server 2005 (and later) supports 8K columns. This means a row can be, well, over 80K. Does this make sense? Ah, not usually. Managing row size is as important as ever to those interested in efficiency. The trick to good performance and good use of space is to make sure that when the 8K page is filled with rows, there is little wasted space. This means that if the row size is over (about) 4K, only one row will fit on a page and (about) 4K of space is wasted on the page. A secondary problem is that the number of index pages also has to increase to address the pages.
- Selectivity: When the query optimizer (QO) studies your SQL, the degree of selectivity determines if an index should be used to perform the operation. By processing the Statistics (or "stats") for an index (and an index can have from one to many (many) sets of stats), the QO can determine the selectivity. Basically, it's weighing the choice of using the index to walk through the selected rows or doing a table scan. The example Kimberly used made it pretty clear how it worked but we were surprised to learn:
"When the number of rows (selected by the query) is around 1/4 of the number of data pages in the table, the index was not useful and it is more efficient to perform a table scan to fetch the rows selected. This often turns out to be less than 5% of the rows in the table..."
- Rebuilding Statistics: SQL Server automatically rebuilds the statistics for you but consider that statistics are invalidated once 20% or more of the data has changed. There are times when it's important to update the statistics manually.
- Filtered Indexes: New for SQL Server 2008, permit you to add a WHERE clause to an index, thus focusing the index on the most important rows.
- Choosing the right column to index: Kimberly showed several examples of how GUID or overly long index values could materially affect performance. She suggested use of a simple identity value (incrementing integer) for clustered indexes--just avoid wide keys on clustered indexes.
- "The best (clustered) index is unique, narrow and static"
- Using Order By: Even when a table has a clustered index (which stores the data in physical order), SQL Server does not guarantee that rows will be returned in that (or any particular) order unless an ORDER BY clause is used.
- Avoiding Index and Page Fragmentation: When an index page is filled, a new page must be created and about half of the index entries on that page are copied to the new page. This (necessary) process can be minimized by setting the Fill Factor to an appropriate value when first building the table or rebuilding the index. A Fill Factor of 100 is appropriate for RO data, but 50 is a better choice for R/W data. Note that the default is "0" which leaves very little room for expansion.
- Page Locks: There are any number of reasons pages can be locked while performing queries or updates. Paul reminded us that if you use the NO LOCKs hint in TSQL, your data queries might return the same row(s) more than once. He suggested finding the root cause of the locking problem before resorting to different isolation levels or lock hints.
I asked a number of questions (as usual):
Question: Does the QO know which data or index pages are already in the cache?
Answer: No. The QO does assume that root-level index pages are cached, it does not know if leaf-level or data pages are cached.
Question: Since SQL Server Developer Edition implements Enterprise Edition functionality, is there a way to tell it (the SQL Server engine) to pretend to implement the Workgroup, Standard or even Express edition subset functionality? This way developers could make sure their applications were tuned for the targeted server.
Answer: No--but this is a good suggestion. (But no one from the Microsoft product team was there to hear it).
Question: Since it takes time and resources to sort a returned rowset, does it make sense to do so when the target display element is a DataGrid that sorts the data again? Consider that best practices dictate that the number of rows returned be 500 or fewer.
Answer: Probably not.
Kimberly and Paul mentioned the "DTA" (Data Engine Tuning Advisor) as a useful tool to determine the health of your SQL Server database.
http://blogs.msdn.com/sqlserverstorageengine/archive/2007/11/12/types-of-data-compression-in-sql-server-2008.aspx (compression details)