UPDATE STATISTICS WITH SAMPLE take longer then FULLSCAN and consumes more tempdb

Recently there was a question in MSDN forum about time and tempdb taken by update statistics with full scan and sampling.
Question was why full scan completes faster than sampling

I was trying to reproduce this behavior in one of my lab server.

1.When I was updating the statistics with n% sampling for the Clustered Index on Table(Which had clustered and non-clustered index ), SQL Server choses a non-clustered index to do the scan. Here the data scanned/fetched was in the order of the non-clustered index keys. Hence we need to SORT them once based on the clustered index keys to generate the statistics. Sort was consuming time and we also used tempdb to create the worktables.

2 .On the other side when I was using update stats with fullscan SQL Server chooses a “clustered index scan” to do the scan. Since the data was already sorted in clustered index, sort was avoided and update stats completed faster.

Why optimizer used non-clustered index to do the scan when I was using n% sampling ?
As we are using a non-clustered index to scan the rows, we effectively get lot lesser rows into the memory from disk when compared with using a clustered index to scan the rows. The decision about which index to use was cost based. As we know SQL Server is a cost based optimizer it chooses the index that can provide the results with lesser subtree cost.
If I wouldn’t have got that non-clustered index , then sampling might have run faster.
There is no straight answer “It depends“.You have to look at the plan of update stats and troubleshoot/Identify the cause like you do for any other slow queries.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s