Thursday, 28 August 2014

Common Table Expressions

A common table expression (CTE) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. Unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query.

A CTE can be used to:

  • Create a recursive query.
  • Substitute for a view when the general use of a view is not required; that is, you do not have to store the definition in metadata.
  • Enable grouping by a column that is derived from a scalar subselect, or a function that is either not deterministic or has external access.
  • Reference the resulting table multiple times in the same statement.
     
Using a CTE offers the advantages of improved readability and ease in maintenance of complex queries. The query can be divided into separate, simple, logical building blocks. These simple blocks can then be used to build more complex, interim CTEs until the final result set is generated.

CTEs can be defined in user-defined routines, such as functions, stored procedures, triggers, or views.


Guidelines for Creating and Using Common Table Expressions

The following guidelines apply to nonrecursive common table expressions.
  • A CTE must be followed by a single SELECT, INSERT, UPDATE, or DELETE statement that references some or all the CTE columns. A CTE can also be specified in a CREATE VIEW statement as part of the defining SELECT statement of the view.
  • Multiple CTE query definitions can be defined in a nonrecursive CTE. The definitions must be combined by one of these set operators: UNION ALL, UNION, INTERSECT, or EXCEPT.
  • A CTE can reference itself and previously defined CTEs in the same WITH clause. Forward referencing is not allowed.
  • Specifying more than one WITH clause in a CTE is not allowed. For example, if a CTE_query_definition contains a subquery, that subquery cannot contain a nested WITH clause that defines another CTE.
  • The following clauses cannot be used in the CTE_query_definition:
    • ORDER BY (except when a TOP clause is specified)
    • INTO
    • OPTION clause with query hints
    • FOR BROWSE
  • When a CTE is used in a statement that is part of a batch, the statement before it must be followed by a semicolon.
  • A query referencing a CTE can be used to define a cursor.
  • Tables on remote servers can be referenced in the CTE.
  • When executing a CTE, any hints that reference a CTE may conflict with other hints that are discovered when the CTE accesses its underlying tables, in the same manner as hints that reference views in queries. When this occurs, the query returns an error.
Example: total number of sales orders per year for each salesperson
USE AdventureWorks2008R2;
GO
-- Define the CTE expression name and column list.
WITH Sales_CTE (SalesPersonID, SalesOrderID, SalesYear)
AS
-- Define the CTE query.
(
    SELECT SalesPersonID, SalesOrderID, YEAR(OrderDate) AS SalesYear
    FROM Sales.SalesOrderHeader
    WHERE SalesPersonID IS NOT NULL
)
-- Define the outer query referencing the CTE name.
SELECT SalesPersonID, COUNT(SalesOrderID) AS TotalSales, SalesYear
FROM Sales_CTE
GROUP BY SalesYear, SalesPersonID
ORDER BY SalesPersonID, SalesYear;
GO

 
SalesPersonID TotalSales  SalesYear
------------- ----------- -----------
274           4           2001
274           20          2002
274           14          2003
274           10          2004
275           56          2001
275           139         2002
275           169         2003


Example:

WITH Sales_CTE (SalesPersonID, TotalSales, SalesYear)
AS
-- Define the first CTE query.
(
    SELECT SalesPersonID, SUM(TotalDue) AS TotalSales, YEAR(OrderDate) AS SalesYear
    FROM Sales.SalesOrderHeader
    WHERE SalesPersonID IS NOT NULL
       GROUP BY SalesPersonID, YEAR(OrderDate)

)
,   -- Use a comma to separate multiple CTE definitions.

-- Define the second CTE query, which returns sales quota data by year for each sales person.
Sales_Quota_CTE (BusinessEntityID, SalesQuota, SalesQuotaYear)
AS
(
       SELECT BusinessEntityID, SUM(SalesQuota)AS SalesQuota, YEAR(QuotaDate) AS SalesQuotaYear
       FROM Sales.SalesPersonQuotaHistory
       GROUP BY BusinessEntityID, YEAR(QuotaDate)
)

-- Define the outer query by referencing columns from both CTEs.
SELECT SalesPersonID
  , SalesYear
  , FORMAT(TotalSales,'C','en-us') AS TotalSales
  , SalesQuotaYear
  , FORMAT (SalesQuota,'C','en-us') AS SalesQuota
  , FORMAT (TotalSales -SalesQuota, 'C','en-us') AS Amt_Above_or_Below_Quota
FROM Sales_CTE
JOIN Sales_Quota_CTE ON Sales_Quota_CTE.BusinessEntityID = Sales_CTE.SalesPersonID
                    AND Sales_CTE.SalesYear = Sales_Quota_CTE.SalesQuotaYear

ORDER BY SalesPersonID, SalesYear;

SalesPersonID SalesYear   TotalSales    SalesQuotaYear SalesQuota  Amt_Above_or_Below_Quota
------------- ---------   -----------   -------------- ---------- ---------------------------------- 

274           2005        $32,567.92    2005           $35,000.00  ($2,432.08)
274           2006        $406,620.07   2006           $455,000.00 ($48,379.93)
274           2007        $515,622.91   2007           $544,000.00 ($28,377.09)

274           2008        $281,123.55   2008           $271,000.00  $10,123.55

Guidelines for Defining and Using Recursive Common Table Expressions

The following guidelines apply to defining a recursive common table expression:
  • The recursive CTE definition must contain at least two CTE query definitions, an anchor member and a recursive member. Multiple anchor members and recursive members can be defined; however, all anchor member query definitions must be put before the first recursive member definition. All CTE query definitions are anchor members unless they reference the CTE itself.
  • Anchor members must be combined by one of these set operators: UNION ALL, UNION, INTERSECT, or EXCEPT. UNION ALL is the only set operator allowed between the last anchor member and first recursive member, and when combining multiple recursive members.
  • The number of columns in the anchor and recursive members must be the same.
  • The data type of a column in the recursive member must be the same as the data type of the corresponding column in the anchor member.
  • The FROM clause of a recursive member must refer only one time to the CTE expression_name.
  • The following items are not allowed in the CTE_query_definition of a recursive member:
    • SELECT DISTINCT
    • GROUP BY
    • PIVOT (When the database compatibility level is 110 or higher. 
    • HAVING
    • Scalar aggregation
    • TOP
    • LEFT, RIGHT, OUTER JOIN (INNER JOIN is allowed)
    • Subqueries
    • A hint applied to a recursive reference to a CTE inside a CTE_query_definition.
The following guidelines apply to using a recursive common table expression:
  • All columns returned by the recursive CTE are nullable regardless of the nullability of the columns returned by the participating SELECT statements.
  • An incorrectly composed recursive CTE may cause an infinite loop. For example, if the recursive member query definition returns the same values for both the parent and child columns, an infinite loop is created. To prevent an infinite loop, you can limit the number of recursion levels allowed for a particular statement by using the MAXRECURSION hint and a value between 0 and 32,767 in the OPTION clause of the INSERT, UPDATE, DELETE, or SELECT statement. This lets you control the execution of the statement until you resolve the code problem that is creating the loop. The server-wide default is 100. When 0 is specified, no limit is applied. Only one MAXRECURSION value can be specified per statement. For more information, see Query Hints (Transact-SQL).
  • A view that contains a recursive common table expression cannot be used to update data.
  • Cursors may be defined on queries using CTEs. The CTE is the select_statement argument that defines the result set of the cursor. Only fast forward-only and static (snapshot) cursors are allowed for recursive CTEs. If another cursor type is specified in a recursive CTE, the cursor type is converted to static.
  • Tables on remote servers may be referenced in the CTE. If the remote server is referenced in the recursive member of the CTE, a spool is created for each remote table so the tables can be repeatedly accessed locally. If it is a CTE query, Index Spool/Lazy Spools is displayed in the query plan and will have the additional WITH STACK predicate. This is one way to confirm proper recursion.
  • Analytic and aggregate functions in the recursive part of the CTE are applied to the set for the current recursion level and not to the set for the CTE. Functions like ROW_NUMBER operate only on the subset of data passed to them by the current recursion level and not the entire set of data pased to the recursive part of the CTE. For more information, see example K. Using analytical functions in a recursive CTE that follows.

Structure of a Recursive CTE


The structure of the recursive CTE in Transact-SQL is similar to recursive routines in other programming languages. Although a recursive routine in other languages returns a scalar value, a recursive CTE can return multiple rows.
A recursive CTE consists of three elements:
  1. Invocation of the routine.
    The first invocation of the recursive CTE consists of one or more CTE_query_definitions joined by UNION ALL, UNION, EXCEPT, or INTERSECT operators. Because these query definitions form the base result set of the CTE structure, they are referred to as anchor members.
    CTE_query_definitions are considered anchor members unless they reference the CTE itself. All anchor-member query definitions must be positioned before the first recursive member definition, and a UNION ALL operator must be used to join the last anchor member with the first recursive member.
  2. Recursive invocation of the routine.
    The recursive invocation includes one or more CTE_query_definitions joined by UNION ALL operators that reference the CTE itself. These query definitions are referred to as recursive members.
  3. Termination check.
    The termination check is implicit; recursion stops when no rows are returned from the previous invocation.
The semantics of the recursive execution is as follows:
  1. Split the CTE expression into anchor and recursive members.
  2. Run the anchor member(s) creating the first invocation or base result set (T0).
  3. Run the recursive member(s) with Ti as an input and Ti+1 as an output.
  4. Repeat step 3 until an empty set is returned.
  5. Return the result set. This is a UNION ALL of T0 to Tn.
Example:

Below query returns hierarchical list of employees, starting with the highest ranking employee, in the Adventure Works Cycles company. 

-- Create an Employee table.
CREATE TABLE dbo.MyEmployees
(
 EmployeeID smallint NOT NULL,
 FirstName nvarchar(30)  NOT NULL,
 LastName  nvarchar(40) NOT NULL,
 Title nvarchar(50) NOT NULL,
 DeptID smallint NOT NULL,
 ManagerID int NULL,
 CONSTRAINT PK_EmployeeID PRIMARY KEY CLUSTERED (EmployeeID ASC) 
);

-- Populate the table with values.
INSERT INTO dbo.MyEmployees VALUES 
 (1, N'Ken', N'Sánchez', N'Chief Executive Officer',16,NULL)
,(273, N'Brian', N'Welcker', N'Vice President of Sales',3,1)
,(274, N'Stephen', N'Jiang', N'North American Sales Manager',3,273)
,(275, N'Michael', N'Blythe', N'Sales Representative',3,274)
,(276, N'Linda', N'Mitchell', N'Sales Representative',3,274)
,(285, N'Syed', N'Abbas', N'Pacific Sales Manager',3,273)
,(286, N'Lynn', N'Tsoflias', N'Sales Representative',3,285)
,(16,  N'David',N'Bradley', N'Marketing Manager', 4, 273)
,(23,  N'Mary', N'Gibson', N'Marketing Specialist', 4, 16);

USE AdventureWorks2008R2;
GO
WITH DirectReports (ManagerID, EmployeeID, Title, DeptID, Level)(
-- Anchor member definitionSELECT e.ManagerID, e.EmployeeID, e.Title, e.Deptid, 0 AS LevelFROM dbo.MyEmployees eWHERE e.ManagerID IS NULLUNION ALL
-- Recursive member definitionSELECT e.ManagerID, e.EmployeeID, e.Title, e.Deptid,Level + 1FROM dbo.MyEmployees AS eINNER JOIN DirectReports AS ON e.ManagerID = d.EmployeeID)
-- Statement that executes the CTE SELECT ManagerID, EmployeeID, Title, DeptID, Level FROM DirectReports INNER JOIN HumanResources.Department AS dp ON DirectReports.DeptID = dp.DepartmentID WHERE dp.GroupName = N'Sales and Marketing' OR Level = 0; GO
Explanation
  1. The recursive CTE, DirectReports, defines one anchor member and one recursive member.
  2. The anchor member returns the base result set T0. This is the highest ranking employee in the company; that is, an employee who does not report to a manager.
    Here is the result set returned by the anchor member:
    ManagerID EmployeeID Title                         Level
    --------- ---------- ----------------------------- ------
    NULL      1          Chief Executive Officer        0
    
  3. The recursive member returns the direct subordinate(s) of the employee in the anchor member result set. This is achieved by a join operation between the Employeetable and the DirectReports CTE. It is this reference to the CTE itself that establishes the recursive invocation. Based on the employee in the CTE DirectReports as input (Ti), the join (MyEmployees.ManagerID = DirectReports.EmployeeID) returns as output (Ti+1), the employees who have (Ti) as their manager. Therefore, the first iteration of the recursive member returns this result set:
    ManagerID EmployeeID Title                         Level
    --------- ---------- ----------------------------- ------
    1         273        Vice President of Sales       1
    
  4. The recursive member is activated repeatedly. The second iteration of the recursive member uses the single-row result set in step 3 (containing EmployeeID 273) as the input value, and returns this result set:
    ManagerID EmployeeID Title                         Level
    --------- ---------- ----------------------------- ------
    273       16         Marketing Manager             2
    273       274        North American Sales Manager  2
    273       285        Pacific Sales Manager         2
    
    The third iteration of the recursive member uses the result set above as the input value, and returns this result set:
    ManagerID EmployeeID Title                         Level
    --------- ---------- ----------------------------- ------
    16        23         Marketing Specialist          3
    274       275        Sales Representative          3
    274       276        Sales Representative          3
    285       286        Sales Representative          3
    
  5. The final result set returned by the running query is the union of all result sets generated by the anchor and recursive members.
    Here is the complete result set returned by the example:
    ManagerID EmployeeID Title                         Level
    --------- ---------- ----------------------------- ------
    NULL      1          Chief Executive Officer       0
    1         273        Vice President of Sales       1
    273       16         Marketing Manager             2
    273       274        North American Sales Manager  2
    273       285        Pacific Sales Manager         2
    16        23         Marketing Specialist          3
    274       275        Sales Representative          3
    274       276        Sales Representative          3
    285       286        Sales Representative          3

Using MAXRECURSION to cancel a statement

MAXRECURSION can be used to prevent a poorly formed recursive CTE from entering into an infinite loop.
USE AdventureWorks2012;
GO
--Creates an infinite loop
WITH cte (EmployeeID, ManagerID, Title) as
(
    SELECT EmployeeID, ManagerID, Title
    FROM dbo.MyEmployees
    WHERE ManagerID IS NOT NULL
  UNION ALL
    SELECT cte.EmployeeID, cte.ManagerID, cte.Title
    FROM cte 
    JOIN  dbo.MyEmployees AS e 
        ON cte.ManagerID = e.EmployeeID
)
--Uses MAXRECURSION to limit the recursive levels to 2
SELECT EmployeeID, ManagerID, Title
FROM cte
OPTION (MAXRECURSION 2);
GO

Example:
The following example shows the hierarchy of product assemblies and components that are required to build the bicycle for ProductAssemblyID = 800.
USE AdventureWorks2012;
GO
WITH Parts(AssemblyID, ComponentID, PerAssemblyQty, EndDate, ComponentLevel) AS
(
    SELECT b.ProductAssemblyID, b.ComponentID, b.PerAssemblyQty,
        b.EndDate, 0 AS ComponentLevel
    FROM Production.BillOfMaterials AS b
    WHERE b.ProductAssemblyID = 800
          AND b.EndDate IS NULL
    UNION ALL
    SELECT bom.ProductAssemblyID, bom.ComponentID, p.PerAssemblyQty,
        bom.EndDate, ComponentLevel + 1
    FROM Production.BillOfMaterials AS bom 
        INNER JOIN Parts AS p
        ON bom.ProductAssemblyID = p.ComponentID
        AND bom.EndDate IS NULL
)
SELECT AssemblyID, ComponentID, Name, PerAssemblyQty, EndDate,
        ComponentLevel 
FROM Parts AS p
    INNER JOIN Production.Product AS pr
    ON p.ComponentID = pr.ProductID
ORDER BY ComponentLevel, AssemblyID, ComponentID;
GO

Using multiple anchor and recursive members

The following example uses multiple anchor and recursive members to return all the ancestors of a specified person. A table is created and values inserted to establish the family genealogy returned by the recursive CTE.
-- Genealogy table
IF OBJECT_ID('dbo.Person','U') IS NOT NULL DROP TABLE dbo.Person;
GO
CREATE TABLE dbo.Person(ID int, Name varchar(30), Mother int, Father int);
GO
INSERT dbo.Person 
VALUES(1, 'Sue', NULL, NULL)
      ,(2, 'Ed', NULL, NULL)
      ,(3, 'Emma', 1, 2)
      ,(4, 'Jack', 1, 2)
      ,(5, 'Jane', NULL, NULL)
      ,(6, 'Bonnie', 5, 4)
      ,(7, 'Bill', 5, 4);
GO
-- Create the recursive CTE to find all of Bonnie's ancestors.
WITH Generation (ID) AS
(
-- First anchor member returns Bonnie's mother.
    SELECT Mother 
    FROM dbo.Person
    WHERE Name = 'Bonnie'
UNION
-- Second anchor member returns Bonnie's father.
    SELECT Father 
    FROM dbo.Person
    WHERE Name = 'Bonnie'
UNION ALL
-- First recursive member returns male ancestors of the previous generation.
    SELECT Person.Father
    FROM Generation, Person
    WHERE Generation.ID=Person.ID
UNION ALL
-- Second recursive member returns female ancestors of the previous generation.
    SELECT Person.Mother
    FROM Generation, dbo.Person
    WHERE Generation.ID=Person.ID
)
SELECT Person.ID, Person.Name, Person.Mother, Person.Father
FROM Generation, dbo.Person
WHERE Generation.ID = Person.ID;
GO

No comments:

Post a Comment