UDF or User Defined Functions are a set or batch of code where one can apply any SQL logic and return a single scalar value or a record set.
According to MS BOL UDFs are the subroutines made up of one or more Transact-SQL statements that can be used to encapsulate code for reuse. These reusable subroutines can be used as:
- In TSQL SELECT statements at column level.
- To create parametrized view or improve the functionality of in indexed view.
- To define a column and CHECK constraints while creating a table.
- To replace a stored procedures and views.
- Join complex logic with a table where a stored procedure fails.
- Faster execution like Stored procedures, reduce compliation cost by caching the execution query plans.
- In TSQL SELECT statements at column level.
- To create parametrized view or improve the functionality of in indexed view.
- To define a column and CHECK constraints while creating a table.
- To replace a stored procedures and views.
- Join complex logic with a table where a stored procedure fails.
- Faster execution like Stored procedures, reduce compliation cost by caching the execution query plans.
Apart from the benefits UDF’s has certain limitations:
- Can not modify any database objects, limited to update table variables only.
- Can not contain the new OUTPUT clause.
- Can only call extended stored procedures, no other procedures.
- Can not define TRY-CATCH block.
- Some built-in functions are not allowed here, like:GETDATE(), because GETDATE is non-deterministic as its value changes every time it is called. On the other hand DATEADD() is allowed as it is deterministic, because it will return same result when called with same argument values.
- Can not modify any database objects, limited to update table variables only.
- Can not contain the new OUTPUT clause.
- Can only call extended stored procedures, no other procedures.
- Can not define TRY-CATCH block.
- Some built-in functions are not allowed here, like:GETDATE(), because GETDATE is non-deterministic as its value changes every time it is called. On the other hand DATEADD() is allowed as it is deterministic, because it will return same result when called with same argument values.
A UDF can take 0 or upto 1024 parameters and returns either a scalar value or a table record set depending on its type.
SQL Server supports mainly 3 types of UDFs:
1. Scalar function
2. Inline table-valued function
3. Multistatement table-valued function
SQL Server supports mainly 3 types of UDFs:
1. Scalar function
2. Inline table-valued function
3. Multistatement table-valued function
1. Scalar function: Returns a single value of any datatype except text, ntext, image, cursor & timestamp.
-- Example: --// Create Scalar UDF [dbo].[ufn_GetContactOrders] SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE FUNCTION [dbo].[ufn_GetContactOrders](@ContactID int) RETURNS varchar(500) AS BEGIN DECLARE @Orders varchar(500) SELECT @Orders = COALESCE(@Orders + ', ', '') + CAST(SalesOrderID as varchar(10)) FROM Sales.SalesOrderHeader WHERE ContactID = @ContactID RETURN (@Orders) END --// Usage: -- Used at COLUMN level with SELECT SELECT ContactID, dbo.ufn_GetContactOrders(ContactID) FROM Person.Contact WHERE ContactID between 100 and 105 -- Output below -- Used while defining a computed column while creating a table. CREATE TABLE tempCustOrders (CustID int, Orders as (dbo.ufn_GetContactOrders(CustID))) INSERT INTO tempCustOrders (CustID) SELECT ContactID FROM Person.Contact WHERE ContactID between 100 and 105 SELECT * FROM tempCustOrders -- Output below DROP TABLE tempCustOrders
Output of both the selects above:
ContactID OrdersCSV
100 51702, 57021, 63139, 69398
101 47431, 48369, 49528, 50744, 53589, 59017, 65279, 71899
102 43874, 44519, 46989, 48013, 49130, 50274, 51807, 57113, 63162, 69495
103 43691, 44315, 45072, 45811, 46663, 47715, 48787, 49887, 51144, 55310, 61247, 67318
104 43866, 44511, 45295, 46052, 46973, 47998, 49112, 50215, 51723, 57109, 63158, 69420
105 NULL
Note: If this was a temp(#) table then the function also needs to be created in tempdb, cause the temp table belongs to tempdb. The tables in function should also have the database name prefixed, i.e. [AdventureWorks].[Sales].[SalesOrderHeader]
2. Inline table-valued function: Returns a table i.e. a record-set. The function body contains just a single TSQL statement, which results to a record-set and is returned from here.
-- Example: --// Create Inline table-valued UDF [dbo].[ufn_itv_GetContactSales] SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE FUNCTION [dbo].[ufn_itv_GetContactSales](@ContactID int) RETURNS TABLE AS RETURN ( SELECT h.[ContactID], h.[SalesOrderID], p.[ProductID], p.[Name], h.[OrderDate], h.[DueDate], h.[ShipDate], h.[TotalDue], h.[Status], h.[SalesPersonID] FROM Sales.SalesOrderHeader AS h JOIN Sales.SalesOrderDetail AS d ON d.SalesOrderID = h.SalesOrderID JOIN Production.Product AS p ON p.ProductID = d.ProductID WHERE ContactID = @ContactID ) --// Usage: SELECT * FROM ufn_itv_GetContactSales(100)
3. Multistatement table-valued function: Also returns a table (record-set) but can contain multiple TSQL statements or scripts and is defined in BEGIN END block. The final set of rows are then returned from here.
-- Example: --// Create Multistatement table-valued UDF [dbo].[ufn_mtv_GetContactSales] SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE FUNCTION [dbo].[ufn_mtv_GetContactSales](@ContactID int) RETURNS @retSalesInfo TABLE ( [ContactID] INT NOT NULL, [SalesOrderID] INT NULL, [ProductID] INT NULL, [Name] NVARCHAR(50) NULL, [OrderDate] DATETIME NULL, [DueDate] DATETIME NULL, [ShipDate] DATETIME NULL, [TotalDue] MONEY NULL, [Status] TINYINT NULL, [SalesPersonID] INT NULL) AS BEGIN IF @ContactID IS NOT NULL BEGIN INSERT @retSalesInfo SELECT h.[ContactID], h.[SalesOrderID], p.[ProductID], p.[Name], h.[OrderDate], h.[DueDate], h.[ShipDate], h.[TotalDue], h.[Status], h.[SalesPersonID] FROM Sales.SalesOrderHeader AS h JOIN Sales.SalesOrderDetail AS d ON d.SalesOrderID = h.SalesOrderID JOIN Production.Product AS p ON p.ProductID = d.ProductID WHERE ContactID = @ContactID END -- Return the recordsets RETURN END --// Usage: SELECT * FROM ufn_mtv_GetContactSales(100)
– Output:
CROSS APPLY vs OUTER APPLY
UDFs can be used in queries at column level, table levels and on column definition while creating tables.
They can also be joined with other tables, but not by simple joins. They have special joins called APPLY operator.
They can also be joined with other tables, but not by simple joins. They have special joins called APPLY operator.
According to MS BOL an APPLY operator allows you to invoke a table-valued function for each row returned by an outer table expression of a query. The table-valued function acts as the right input and the outer table expression acts as the left input. The right input is evaluated for each row from the left input and the rows produced are combined for the final output. The list of columns produced by the APPLY operator is the set of columns in the left input followed by the list of columns returned by the right input.
There are 2 forms of APPLY:
- CROSS APPLY acts as INNER JOIN, returns only rows from the outer table that produce a result set from the table-valued function.
- OUTER APPLY acts as OUTER JOIN, returns both rows that produce a result set, and rows that do not, with NULL values in the columns produced by the table-valued function.
- CROSS APPLY acts as INNER JOIN, returns only rows from the outer table that produce a result set from the table-valued function.
- OUTER APPLY acts as OUTER JOIN, returns both rows that produce a result set, and rows that do not, with NULL values in the columns produced by the table-valued function.
Lets take 2 tables: Person.Contact & Sales.SalesOrderHeader
SELECT * FROM Person.Contact WHERE ContactID = 100 SELECT * FROM Sales.SalesOrderHeader WHERE ContactID = 100
You have a UDF that returns Sales Order Details of a Particular Contact. Now you want to use that UDF to know what all Contacts have Ordered what with other details. Lets see:
First creating a UDF to test with JOINS & APPLY:
--// Create Multiline UserDefinedFunction [dbo].[ufn_mtv_GetContactSales] SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE FUNCTION [dbo].[ufn_mtv_GetContactSales](@ContactID int) RETURNS @retSalesInfo TABLE ( [ContactID] INT NOT NULL, [SalesOrderID] INT NULL, [ProductID] INT NULL, [Name] NVARCHAR(50) NULL, [OrderDate] DATETIME NULL, [DueDate] DATETIME NULL, [ShipDate] DATETIME NULL, [TotalDue] MONEY NULL, [Status] TINYINT NULL, [SalesPersonID] INT NULL) AS BEGIN IF @ContactID IS NOT NULL BEGIN INSERT @retSalesInfo SELECT h.[ContactID], h.[SalesOrderID], p.[ProductID], p.[Name], h.[OrderDate], h.[DueDate], h.[ShipDate], h.[TotalDue], h.[Status], h.[SalesPersonID] FROM Sales.SalesOrderHeader AS h JOIN Sales.SalesOrderDetail AS d ON d.SalesOrderID = h.SalesOrderID JOIN Production.Product AS p ON p.ProductID = d.ProductID WHERE ContactID = @ContactID END -- Return the recordsets RETURN END --// Test the UDF SELECT * FROM dbo.ufn_mtv_GetContactSales(100)
Trying to JOIN UDF with a table, problem is you need to apply a parameter and it can’t be a column, but a value:
--// UDF with JOIN, try it out!!! SELECT * FROM Person.Contact c JOIN dbo.ufn_mtv_GetContactSales(100) f -- You will have to pass the ContactID parameter, so no use of joining. ON f.ContactID = c.ContactID
Testing with CROSS APPLY:
--// CROSS APPLY -- 279 records (All matched records, 1 missing out of 280) SELECT c.[ContactID], c.[FirstName], c.[LastName], c.[EmailAddress], c.[Phone], s.* FROM Person.Contact AS c CROSS APPLY ufn_mtv_GetContactSales(c.ContactID) AS s WHERE c.ContactID between 100 and 105 -- Same equivalent query without cross apply, using JOINs -- 279 records SELECT c.[ContactID], c.[FirstName], c.[LastName], c.[EmailAddress], c.[Phone], h.[ContactID], h.[SalesOrderID], p.[ProductID], p.[Name], h.[OrderDate], h.[DueDate], h.[ShipDate], h.[TotalDue], h.[Status], h.[SalesPersonID] FROM Person.Contact AS c JOIN Sales.SalesOrderHeader AS h ON c.ContactID = h.ContactID JOIN Sales.SalesOrderDetail AS d ON d.SalesOrderID = h.SalesOrderID JOIN Production.Product AS p ON p.ProductID = d.ProductID WHERE c.ContactID between 100 and 105
Testing with OUTER APPLY:
--// OUTER APPLY -- 280 records (All 280 records with 1 not matched) SELECT c.[ContactID], c.[FirstName], c.[LastName], c.[EmailAddress], c.[Phone], s.* FROM Person.Contact AS c OUTER APPLY ufn_mtv_GetContactSales(c.ContactID) AS s WHERE c.ContactID between 100 and 105 -- Same equivalent query without OUTER APPLY, using LEFT JOINs -- 280 records SELECT c.[ContactID], c.[FirstName], c.[LastName], c.[EmailAddress], c.[Phone], h.[ContactID], h.[SalesOrderID], p.[ProductID], p.[Name], h.[OrderDate], h.[DueDate], h.[ShipDate], h.[TotalDue], h.[Status], h.[SalesPersonID] FROM Person.Contact AS c LEFT JOIN Sales.SalesOrderHeader AS h ON c.ContactID = h.ContactID LEFT JOIN Sales.SalesOrderDetail AS d ON d.SalesOrderID = h.SalesOrderID LEFT JOIN Production.Product AS p ON p.ProductID = d.ProductID WHERE c.ContactID between 100 and 105
More Details about CROSS APPLY and OUTER APPLY
My first introduction to the APPLY operator was using the DMVs. For quite a while after first being introduced, I didn’t understand it or see a use for it. While it is undeniable that it is has some required uses when dealing with table valued functions, it’s other uses evaded me for a while. Luckily, I started seeing some code that used it outside of table valued functions. It finally struck me that it could be used as a replacement for correlated sub queries and derived tables. That’s what we’ll discuss today.
I never liked correlated subqueries because it always seemed like adding full blown queries in the select list was confusing and improper.
SELECT
SalesOrderID = soh.SalesOrderID
,OrderDate = soh.OrderDate
,MaxUnitPrice = (SELECT MAX(sod.UnitPrice) FROM Sales.SalesOrderDetail sodWHERE soh.SalesOrderID = sod.SalesOrderID)
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
SalesOrderID = soh.SalesOrderID
,OrderDate = soh.OrderDate
,MaxUnitPrice = (SELECT MAX(sod.UnitPrice) FROM Sales.SalesOrderDetail sodWHERE soh.SalesOrderID = sod.SalesOrderID)
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
It always seemed to me that these operations should go below the FROM clause. So to get around this, I would typically create a derived table. Which didn’t completely feel right either, but it was still just a bit cleaner:
SELECT
soh.SalesOrderID
,soh.OrderDate
,sod.max_unit_price
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
JOIN
(
SELECT
max_unit_price = MAX(sod.UnitPrice),
SalesOrderID
FROM Sales.SalesOrderDetail AS sod
GROUP BY sod.SalesOrderID
) sod
ON sod.SalesOrderID = soh.SalesOrderID
soh.SalesOrderID
,soh.OrderDate
,sod.max_unit_price
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
JOIN
(
SELECT
max_unit_price = MAX(sod.UnitPrice),
SalesOrderID
FROM Sales.SalesOrderDetail AS sod
GROUP BY sod.SalesOrderID
) sod
ON sod.SalesOrderID = soh.SalesOrderID
What made this ugly was the need to use the GROUP BY clause because we could not correlate. Also, even though SQL almost always generates the same execution plan as a correlated sub query, there were times when the logic inside the derived table got so complex, that it would not limit the result set of the derived table by inferring the correlation first. This made this kind of query sometimes impractical.
Luckily, this is where the CROSS APPLY steps in so nicely. It gives us the best of both worlds by allowing us to correlate AND not have the query embedded in the select list:
SELECT
soh.SalesOrderID
,soh.OrderDate
,sod.max_unit_price
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
CROSS APPLY
(
SELECT
max_unit_price = MAX(sod.UnitPrice)
FROM Sales.SalesOrderDetail AS sod
WHERE soh.SalesOrderID = sod.SalesOrderID
) sod
soh.SalesOrderID
,soh.OrderDate
,sod.max_unit_price
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
CROSS APPLY
(
SELECT
max_unit_price = MAX(sod.UnitPrice)
FROM Sales.SalesOrderDetail AS sod
WHERE soh.SalesOrderID = sod.SalesOrderID
) sod
The other advantage this has over the correlated sub query is when we want to add more columns in our SELECT list, we do not have to completely repeat the entire query. We still have it in one place, making it somewhat modular. So instead of this:
SELECT
SalesOrderID = soh.SalesOrderID
,OrderDate = soh.OrderDate
,MaxUnitPrice = (SELECT MAX(sod.UnitPrice) FROM Sales.SalesOrderDetail sodWHERE soh.SalesOrderID = sod.SalesOrderID) -- 1
,SumLineTotal = (SELECT SUM(LineTotal) FROM Sales.SalesOrderDetail sodWHERE soh.SalesOrderID = sod.SalesOrderID) -- 2
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
SalesOrderID = soh.SalesOrderID
,OrderDate = soh.OrderDate
,MaxUnitPrice = (SELECT MAX(sod.UnitPrice) FROM Sales.SalesOrderDetail sodWHERE soh.SalesOrderID = sod.SalesOrderID) -- 1
,SumLineTotal = (SELECT SUM(LineTotal) FROM Sales.SalesOrderDetail sodWHERE soh.SalesOrderID = sod.SalesOrderID) -- 2
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
We have this:
SELECT
soh.SalesOrderID
,soh.OrderDate
,sod.max_unit_price
,sod.sum_line_total
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
CROSS APPLY
(
SELECT
max_unit_price = MAX(sod.UnitPrice)
,sum_line_total = SUM(sod.LineTotal)
FROM Sales.SalesOrderDetail AS sod
WHERE soh.SalesOrderID = sod.SalesOrderID
) sod
soh.SalesOrderID
,soh.OrderDate
,sod.max_unit_price
,sod.sum_line_total
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
CROSS APPLY
(
SELECT
max_unit_price = MAX(sod.UnitPrice)
,sum_line_total = SUM(sod.LineTotal)
FROM Sales.SalesOrderDetail AS sod
WHERE soh.SalesOrderID = sod.SalesOrderID
) sod
As for the execution plans, in my experience CROSS APPLY has always won. Not always by a lot, but it still wins.
So what is OUTER APPLY? It’s equivalent to a left join on the derived table.
SELECT
soh.SalesOrderID
,soh.OrderDate
,sod.max_unit_price
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
LEFT JOIN
(
SELECT
max_unit_price = MAX(sod.UnitPrice),
SalesOrderID
FROM Sales.SalesOrderDetail AS sod
GROUP BY sod.SalesOrderID
) sod
ON sod.SalesOrderID = soh.SalesOrderID
soh.SalesOrderID
,soh.OrderDate
,sod.max_unit_price
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
LEFT JOIN
(
SELECT
max_unit_price = MAX(sod.UnitPrice),
SalesOrderID
FROM Sales.SalesOrderDetail AS sod
GROUP BY sod.SalesOrderID
) sod
ON sod.SalesOrderID = soh.SalesOrderID
SELECT
soh.SalesOrderID
,soh.OrderDate
,sod.max_unit_price
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
OUTER APPLY
(
SELECT
max_unit_price = MAX(sod.UnitPrice)
FROM Sales.SalesOrderDetail AS sod
WHERE soh.SalesOrderID = sod.SalesOrderID
) sod
soh.SalesOrderID
,soh.OrderDate
,sod.max_unit_price
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
OUTER APPLY
(
SELECT
max_unit_price = MAX(sod.UnitPrice)
FROM Sales.SalesOrderDetail AS sod
WHERE soh.SalesOrderID = sod.SalesOrderID
) sod