Grouping Rows with GROUP BY

Accessing and Changing Relational Data

Accessing and Changing Relational Data

Grouping Rows with GROUP BY

The GROUP BY clause is used to produce aggregate values for each row in the result set. When used without a GROUP BY clause, aggregate functions report only one aggregate value for a SELECT statement.

This example returns the number of units sold for each product in category 2:

USE Northwind
SELECT OrdD.ProductID AS ProdID,
       SUM(OrdD.Quantity) AS AmountSold
FROM [Order Details] AS OrdD JOIN Products as Prd
     ON OrdD.ProductID = Prd.ProductID
     AND Prd.CategoryID = 2
GROUP BY OrdD.ProductID

Here is the result set:

ProdID      AmountSold  
----------- ----------- 
3           328         
4           453         
5           298         
6           301         
8           372         
15          122         
44          601         
61          603         
63          445         
65          745         
66          239         
77          791         

(12 row(s) affected)

The GROUP BY keywords are followed by a list of columns, known as the grouping columns. The GROUP BY clause restricts the rows of the result set; there is only one row for each distinct value in the grouping column or columns. Each result set row contains summary data related to the specific value in its grouping columns.

There are restrictions on the items that can be specified in the select list when a SELECT statement contains a GROUP BY. Items allowed in the select list are:

  • The grouping columns.

  • Expressions that return only one value for each value in the grouping columns, such as aggregate functions that have a column name as one of their parameters. These are known as vector aggregates.

For example, TableX contains:

ColumnA ColumnB ColumnC
------- ------- -------
1 abc 5
1 def 4
1 ghi 9
2 jkl 8
2 mno 3

If ColumnA is the grouping column, there will be two rows in the result set, one summarizing the information for the value 1, and the other summarizing the information for value 2.

When ColumnA is the grouping column, the only way ColumnB or ColumnC can be referenced is if they are parameters in an aggregate function that can return a single value for each value in ColumnA. It is legal for the select list to include expressions such as MAX(ColumnB), SUM(ColumnC), or AVG(ColumnC):

SELECT ColumnA,
       MAX(ColumnB) AS MaxB,
       SUM(ColumnC) AS SumC
FROM TableX
GROUP BY ColumnA

This select returns two rows, one for each unique value in ColumnA:

ColumnA     MaxB SumC        
----------- ---- ----------- 
1           ghi  18          
2           mno  11          

(2 row(s) affected)

It is not legal, however, to have just the expression ColumnB in the select list:

SELECT ColumnA,
       ColumnB,
       SUM(ColumnC) AS SumC
FROM TableX
GROUP BY ColumnA

Because the GROUP BY can return only one row with a value of 1 in ColumnA, there is no way to return the three values of ColumnB (abc, def, and ghi) associated with the value 1 in ColumnA.

You cannot use GROUP BY or HAVING on ntext,text,image, or bit columns unless they are in a function that returns a value having another data type. Examples of such functions are SUBSTRING and CAST.

See Also

CAST and CONVERT

SUBSTRING

SELECT