Recently there was a question in MSDN forum about time and tempdb taken by update statistics with full scan and sampling.
Question was why full scan completes faster than sampling
I was trying to reproduce this behavior in one of my lab server.
1.When I was updating the statistics with n% sampling for the Clustered Index on Table(Which had clustered and non-clustered index ), SQL Server choses a non-clustered index to do the scan. Here the data scanned/fetched was in the order of the non-clustered index keys. Hence we need to SORT them once based on the clustered index keys to generate the statistics. Sort was consuming time and we also used tempdb to create the worktables.
2 .On the other side when I was using update stats with fullscan SQL Server chooses a “clustered index scan” to do the scan. Since the data was already sorted in clustered index, sort was avoided and update stats completed faster.
Why optimizer used non-clustered index to do the scan when I was using n% sampling ?
As we are using a non-clustered index to scan the rows, we effectively get lot lesser rows into the memory from disk when compared with using a clustered index to scan the rows. The decision about which index to use was cost based. As we know SQL Server is a cost based optimizer it chooses the index that can provide the results with lesser subtree cost.
If I wouldn’t have got that non-clustered index , then sampling might have run faster.
There is no straight answer “It depends“.You have to look at the plan of update stats and troubleshoot/Identify the cause like you do for any other slow queries.