sql server physical operations

the Physical Operation and Logical Operation.

The logical operators are the results of the optimizer’s calculations for what should happen when the query executes.
The physical operators represent what actually occurred.
The logical and physical operators are usually the same, but not always

SQL Server bases all estimations on the statistics available on the columns and indexes in any table.

The I/O cost and CPU cost are not actual values, but rather the estimated cost numbers assigned by the query optimizer during its calculations. These numbers can be useful when considering whether most of the estimated cost is I/O-based (as in this case), or if we’re potentially putting a load on the CPU

most physical operations

  1. Select (Result)
  2. Clustered Index Scan
  3. NonClustered Index Scan
  4. Clustered Index Seek
  5. NonClustered Index Seek
  6. Hash Match
  7. Nested Loops
  8. Merge Join
  9. Sort 17.
  10. Key Lookup
  11. Compute Scalar
  12. Constant Scan
  13. Table Scan
  14. RID Lookup
  15. Filter
  16. Lazy Spool
  17. Spool
  18. Eager Spool
  19. Stream Aggregate
  20. Distribute Streams
  21. Repartition Streams
  22. Gather Streams
  23. Bitmap
  24. Split

Most operators behave in one of two ways, non-blocking or blocking. A non-blocking operator creates output data at the same time as it receives the input. A blocking operator has to get all the data prior to producing its output. A blocking operator might contribute to concurrency problems, hurting performance.

相关选项

Show text version of the estimated plan

SET SHOWPLAN_TEXT ON

Show xml version of the estimated plan

SET SHOWPLAN_XML ON

In order to turn the text version of the actual execution plan on or off, use the code

SET STATISTICS PROFILE ON;

To return the amount of time it takes to compile and run a query

SET STATISTICS TIME ON

Clustered Index Scan

One of the more common operators is the Clustered Index Scan. This operation occurs when a Seek against the clustered index or some other index, can’t satisfy the needs of the query. In that event, SQL Server must walk through, i.e. scan, the entire data set.

This means that a Clustered Index Scan is very similar in concept to a Table Scan. The entire index, or a large percentage of it, is being traversed, row by row, in order to retrieve the data needed by the query.

Clustered Index Seek

A Clustered Index Seek operator occurs when a query uses the index to access only one row, or a few contiguous rows. It’s one of the faster ways to retrieve data from the system.

Index seeks are completely different from scans, where the engine walks through all the rows to find what it needs. Clustered and NonClustered Index Seeks occur when the optimizer is able to locate an index that it can use to retrieve the required records. Therefore, it tells the storage engine to look up the values based on the keys of the given index by assigning the Seek operation instead of a scan.

NonClustered Index Seek

A NonClustered Index Seek operation is a seek against a non-clustered index. This operation is effectively no different than the Clustered Index Seek but the only data available is that which is stored in the index itself.

covering index: the Lookup operator for non-clustered indexes to a clustered index is the key value of the clustered index.

非聚集索引的键包含了聚集索引的键,且排在非聚集索引键之前。only contains key information, the clustered key, and any lookup columns。

Key Lookup

A Key Lookup operator (there are two, RID and Key) is required to get data from the heap or the clustered index, respectively, when a non-clustered index is used, but is not a covering index.

Since this index is not a covering index, the query optimizer is forced to not only read the non-clustered index, but also to read the clustered index to gather all the data required to process the query. This is a Key Lookup and, essentially, it means that the optimizer cannot retrieve the rows in a single operation, and has to use a clustered key (or a row ID from a heap table) to return the corresponding rows from a clustered index (or from the table itself).

The presence of a Key Lookup is an indication that query performance might benefit from the presence of a covering index.

A join operation, which combines the results of the two operations, always accompanies a Key Lookup.

Because Nested Loop was involved, along with the Key Lookup operator, at least two additional operations are required for every row returned from the non-clustered index. This is what can make a Key Lookup operation a very expensive process in terms of performance.

If this table had been a heap, a table without a clustered index, the operator would have been a RID Lookup operator. RID stands for row identifier, the means by which rows in a heap table are uniquely marked and stored within a table. The basics of the operation of a RID Lookup are the same as a Key Lookup.

Table Scan

This operator is self-explanatory and is one we encountered in Chapter 1. Table Scans only occur against heap tables, tables without clustered indexes. With a clustered index, we’d get a Clustered Index Scan, which is the equivalent of a Table Scan operation.

A Table Scan can occur for several reasons, but it’s often because there are no useful indexes on the table, and the query optimizer has to search through every row in order to identify the rows to return. Another common cause of a Table Scan is a query that requests all the rows of a table.

When all (or the majority) of the rows of a table are returned then, whether an index exists or not, it is often faster for the query optimizer to scan through each row and return them than look up each row in an index. This commonly occurs in tables with few rows.

RID Lookup

RID Lookup is the heap equivalent of the Key Lookup operation. As was mentioned before, non-clustered indexes don’t always have all the data needed to satisfy a query. When they do not, an additional operation is required to get that data. When there is a clustered index on the table, it uses a Key Lookup operator as described above. When there is no clustered index, the table is a heap and must look up data using an internal identifier known as the Row ID or RID.

Hash Match join

A Hash Match operator appears in the plan when SQL Server puts two data sets into temporary tables, hash tables, and then uses these structures to compare data and arrive at the matching set.

Before we can talk about what a Hash Match join is, we need to understand two new concepts: hashing and a hash table. Hashing is a programmatic technique where data is converted into a symbolic form that makes searching for that data much more efficient. For example, SQL Server programmatically converts a row of data in a table into a unique value that represents the contents of the row. In many ways, it is like taking a row of data and encrypting it. Like encryption, a hashed value can be converted back to the original data.

A hash table, on the other hand, is a data structure that divides all of the elements into equal-sized categories, or buckets, to allow quick access to the elements. The hashing function determines into which bucket an element goes. For example, SQL Server can take a row from a table, hash it into a hash value, and then store the hash value in a hash table, in tempdb.

Now that we understand these terms, we can discuss the Hash Match join operator. It occurs when SQL Server has to join two large data sets, and decides to do so by first hashing the rows from the smaller of the two data sets, and inserting them into a hash table. It then processes the larger data set, one row at a time, against the hash table, looking for matches, indicating the rows to be joined.

If both tables are very large, a Hash Match join can be very inefficient as compared to other types of joins. All the data for the hash table is stored within tempdb, so excessive use of Hash Joins in your queries can lead to a heavier load on tempdb.

Hash Match joins also work well for tables that are not sorted on JOIN columns; if they are, then Merge Joins tend to work better. Hash Match joins can be efficient in cases where there are no useable indexes.

For example, seeing a Hash Match join in an execution plan sometimes indicates:

  • a missing or unusable index
  • a missing WHERE clause
  • a WHERE clause with a calculation or conversion that makes it non-sargable (a commonly used term meaning that the search argument, “sarg” can’t be used). This means it won’t use an existing index.

Nested Loops join

A Nested Loops join functions by taking a set of data, referred to as the outer set, and comparing it, one row at a time to another set of data, called the inner set. This sounds like a cursor, and effectively, it is one but, with the appropriate data set, it can be a very efficient operation.

Another name for the Nested Loops join is a nested iteration. This operation takes the input from two sets of data and joins them by scanning the outer data set once for each row in the inner set.The number of rows in each of the two data sets was small, making this a very efficient operation. As long as the inner data set is small and the outer data set, small or not, is indexed, then this is an extremely efficient join mechanism. Except in cases of very large data sets, this is the best type of join to see in an execution plan.

Compute Scalar

Merge Join

A Merge Join operator works from sorted data, and sorted data only. It takes the data from two different data sets and uses the fact that the data is sorted to simply merge it together, combining based on the matching values, which it can do very easily because the order of the values will be identical. If the data is sorted, this can be one of the most efficient join operations. However, the data is frequently not sorted, so sorting it for a Merge Join requires the addition of a Sort operator to ensure it works; this can make this join operation less efficient.

The key to the performance of a Merge Join is that the joined columns are pre-sorted. If they are not, and the query optimizer chooses to sort the data in a separate operation before it performs a Merge Join, this might be an indication that a Merge Join is not an ideal way to join the tables, or it might indicate that you need to reconsider your indexing strategy.

Sort

As a rule of thumb, if sorting takes more than 25% of a query’s total execution time, then you need to review it carefully and optimize it, if possible.

If an execution plan has multiple Sort operators, review the query to see if they are all necessary, or if you can rewrite the code so that fewer sorts will accomplish the goal of the query.

If you have no choice but to sort a lot of data, you should consider using trace events, Performance:Showplan XML, to see if any Sort Warnings are generated. To boost performance, SQL Server attempts to perform sorting in memory instead of on disk, since sorting in RAM is much faster than sorting on disk. However, if the Sort operation is large, SQL Server may have to write data to the tempdb database and sort on disk. Whenever this occurs, SQL Server generates a Sort Warning event, which we can capture using trace events. If your server is performing many Sorts, and generating many Sort Warnings, then you may need to add more RAM to your server, or to speed up tempdb access.

Hash Match (aggregate)

Earlier in this chapter, we looked at the Hash Match operator for joins. This same Hash Match operator can also occur when aggregations are present within a query.

Quite often, aggregations within queries can be expensive operations. About the only way to “speed” the performance of an aggregation via code is to ensure that you have a restrictive WHERE clause to limit the number of rows that need to be aggregated, thus reducing the amount of aggregation that needs to be done. You can also pre-aggregate data by using an indexed view.

The optimizer does not apply the HAVING clause until all the aggregation of the data is complete.

A brief aside on rebinds and rewinds

In order to understand what these values mean, we need some background. Whenever a physical operator, such as the Sort operator in an execution plan occurs, three things happen.

  • First, the physical operator is initialized and any required data structures are set up. This is called the Init() method. In all cases, this happens once for an operator, although it is possible for it to happen many times.
  • Second, the physical operator gets (or receives) the rows of data that it is to act on. This is called the GetNext() method. Depending on the type of operator, it may receive none, or many GetNext() calls.
  • Third, once the operator has performed its function, it needs to clean itself up and shut itself down. This is called the Close() method. A physical operator only ever receives a single Close() call.

A rebind or rewind is a count of the number of times the Init() method is called by an operator. A rebind and a rewind both count the number of times the Init() method is called, but do so under different circumstances.

Only certain operators record values for rebind and rewind:

  1. Non-Clustered Index Spool
  2. Remote Query
  3. Row Count Spool
  4. Sort
  5. Table Spool
  6. Table-Valued Function
  7. Assert (only StartupExpression for the physical operation is set to TRUE)
  8. Filter (only StartupExpression for the physical operation is set to TRUE)

For the operators affected, multiple rebind or rewind events only occur in relation to a Nested Loops join operation, specifically on the inner (lower) set of data. A rebind occurs, increasing the rebind count, when one or more of the correlated parameters of the Nested Loops join change and the inner side must be reevaluated. A rewind occurs, increasing the rewind count, when none of the correlated parameters change and the prior inner result set may be reused.

If you see an operator where rebind equals one and rewinds equals zero, this means that an Init() method was called one time on a physical operator that is not on the inner side of a loop join. If the physical operator is on the inner side of a loop join used by an operator, then the sum of the rebinds and rewinds will equal the number of rows processed on the outer side of a join used by the operator.

INSERT, UPDATE and DELETE operations

Constant Scan

This operator introduces a constant number of rows into a query.

Compute Scalar

Top

In an execution plan for an UPDATE statement, the Top operator enforces row count limits, if there are any (top clause).

Table Spool

a form of a Eager Spool. This obscure sounding operator essentially takes each of the rows to be updated and stores them in a hidden temporary object stored in the tempdb database. The rewind is put in place in an update query like this as part of prevention for the Halloween problem (for a good definition of the Halloween Problem, read the document at http://en.wikipedia.org/ wiki/Halloween_Problem). The Eager Spool is in place in order to facilitate rollback operations.

Clustered Index Insert

Clustered Index Update

Clustered Index DELETE

发表评论

电子邮件地址不会被公开。