Grouping Rows with GROUP BY
The GROUP BY clause is used to produce aggregate values for each row in the result set. When used without a GROUP BY clause, aggregate functions report only one aggregate value for a SELECT statement.
This example returns the number of units sold for each product in category 2:
USE Northwind
SELECT OrdD.ProductID AS ProdID,
SUM(OrdD.Quantity) AS AmountSold
FROM [Order Details] AS OrdD JOIN Products as Prd
ON OrdD.ProductID = Prd.ProductID
AND Prd.CategoryID = 2
GROUP BY OrdD.ProductID
Here is the result set:
ProdID AmountSold
----------- -----------
3 328
4 453
5 298
6 301
8 372
15 122
44 601
61 603
63 445
65 745
66 239
77 791
(12 row(s) affected)
The GROUP BY keywords are followed by a list of columns, known as the grouping columns. The GROUP BY clause restricts the rows of the result set; there is only one row for each distinct value in the grouping column or columns. Each result set row contains summary data related to the specific value in its grouping columns.
There are restrictions on the items that can be specified in the select list when a SELECT statement contains a GROUP BY. Items allowed in the select list are:
- The grouping columns.
- Expressions that return only one value for each value in the grouping columns, such as aggregate functions that have a column name as one of their parameters. These are known as vector aggregates.
For example, TableX contains:
ColumnA |
ColumnB |
ColumnC |
------- |
------- |
------- |
1 |
abc |
5 |
1 |
def |
4 |
1 |
ghi |
9 |
2 |
jkl |
8 |
2 |
mno |
3 |
If ColumnA is the grouping column, there will be two rows in the result set, one summarizing the information for the value 1, and the other summarizing the information for value 2.
When ColumnA is the grouping column, the only way ColumnB or ColumnC can be referenced is if they are parameters in an aggregate function that can return a single value for each value in ColumnA. It is legal for the select list to include expressions such as MAX(ColumnB), SUM(ColumnC), or AVG(ColumnC):
SELECT ColumnA,
MAX(ColumnB) AS MaxB,
SUM(ColumnC) AS SumC
FROM TableX
GROUP BY ColumnA
This select returns two rows, one for each unique value in ColumnA:
ColumnA MaxB SumC
----------- ---- -----------
1 ghi 18
2 mno 11
(2 row(s) affected)
It is not legal, however, to have just the expression ColumnB in the select list:
SELECT ColumnA,
ColumnB,
SUM(ColumnC) AS SumC
FROM TableX
GROUP BY ColumnA
Because the GROUP BY can return only one row with a value of 1 in ColumnA, there is no way to return the three values of ColumnB (abc, def, and ghi) associated with the value 1 in ColumnA.
You cannot use GROUP BY or HAVING on ntext,text,image, or bit columns unless they are in a function that returns a value having another data type. Examples of such functions are SUBSTRING and CAST.