Curing A Plethora Of Performance Problems
November 26, 2004 • Vol.26 Issue 48
Page(s) 21 in print issue
One of the perennial problems perpetrated by pedantic programmers is returning far too many rows from a server, processing them, and sending them back to be stored in the database. When working with home databases like JET, this might be the only option you have as there is no "server-side" processor to handle in situ edits. However, when you're working with a serious system that's not hooked up to a voting machine, it's usually a good idea to avoid round trips to the server to fetch the data, especially if you intend to simply change the data and send it back. (A round trip refers to the process of opening a connection, revalidating security credentials, submitting a query to the server, compiling the query, executing the query, building a rowset, and sending it back up the wire to the client. It's an expensive process, akin to going to the video store in Cleveland when you live in Redmond.)
Sure, if you don't know how to write a stored procedure, you're really forced to use the programming language or application you're familiar with to make the changes on the client. Consider that ADO is terrible at bulk copy, but so is DAO, RDO, and all of the other data access interfaces designed as query interfaces-designed to execute queries and return a few rows. Asking them to return a few thousand or a few million rows is, well, a challenge.
Why Waste Round Trips?
Consider what happens when you ask SQL Server to execute a SELECT * FROM Customers. The query optimizer immediately recognizes this as a crude query that simply returns all of the rows from the table so the server applies share locks to the table (or extents thereof) to prevent updates until the table is transferred to the client. This can materially affect both server and user (the end user) performance as they wait for the rowset to be spooled to the client. The server then spools the entire table over the wire as quickly as the net protocol and the other traffic on the wire permits. This assumes that the client program making the request is actively processing the rows as they arrive. If a user is asked to intervene as each row is retrieved, this process can take quite some time, which means the server-side locks are not released expeditiously.
Using DTS To Manage Bulk Data
Sure, there are cases where you need to extract data from one server and send it to another. And no, you can't just cram the new data into the production tables without validating and munging it. For example, when pulling daily results or totals from a corporate mainframe to be accumulated on your departmental server, you might feel compelled to write an ADO application to do so. If you do, take your hands off the keyboard, lean back, and keep on reading. There are far better tools to move bulk data from place to place, for example Microsoft Distributed Transaction Services (DTS).
While DTS uses the same low-level interfaces used by ADO, it's designed to eliminate virtually all of the overhead involved with bulk-data transfers. It's like using a 40-ton truck to haul the data instead of a picnic basket (and it doesn't crush the sandwiches). Sure, DTS won't do much to the data it's moving. It's not designed to, but it can do some filtering and data mapping. If your inbound data needs complex validation, you're probably going to want to write a server-side procedure to do so. This is accomplished by importing the raw data into a "holding" table on your production server designed specifically for this purpose. Here's a tip: Create this table without indexes to speed up the load process. Next, you execute a stored procedure that intelligently validates, collates, massages, and moves data from the holding table to the "real" table and truncate the holding table when you are done.
BCP: Another Alternative
Another approach you might take is executing the Bulk Copy utility (BCP) directly from your code. BCP, like DTS, is designed to move bulk data from place to place. However, in this case the data might originate at the client. Instead of building a block of INSERT statements and submitting them in a batch, you can use BCP right from your application by simply executing a T-SQL query that contains the BCP operators. This is a handy way to build intelligent queries, as well.
Suppose your application collects 50 or so items that the user wants to fetch or modify. Instead of making 50 (or 5,000) round trips or trying to build a WHERE clause that contains the IDs needed, you could write a delimited file to the local disk and BCP it to a temporary table. Once on the server, your WHERE clause could address the temp table as a source for the rows to change. I'll be illustrating this technique in my next book and (hopefully) in a future article. This time next year you'll find that BCP is incorporated in the .NET Framework 2.0.