Eliminating Duplicates with DISTINCT

Accessing and Changing Relational Data

Accessing and Changing Relational Data

Eliminating Duplicates with DISTINCT

The DISTINCT keyword eliminates duplicate rows from the results of a SELECT statement. If DISTINCT is not specified, all rows are returned, including duplicates. For example, if you select all the author IDs in titleauthor without DISTINCT, the following rows are returned (with some duplicate listings):

USE pubs
SELECT au_id
FROM titleauthor

Here is the result set:

au_id       
----------- 
172-32-1176 
213-46-8915 
213-46-8915 
238-95-7766 
267-41-2394 
267-41-2394 
274-80-9391 
409-56-7008 
427-17-2319 
472-27-2349 
486-29-1786 
486-29-1786 
648-92-1872 
672-71-3249 
712-45-1867 
722-51-5454 
724-80-9391 
724-80-9391 
756-30-7391 
807-91-6654 
846-92-7186 
899-46-2035 
899-46-2035 
998-72-3567 
998-72-3567 

(25 row(s) affected)

With DISTINCT, you can eliminate duplicates and see only the unique author IDs:

USE pubs
SELECT DISTINCT au_id
FROM titleauthor

Here is the result set:

au_id       
----------- 
172-32-1176 
213-46-8915 
238-95-7766 
267-41-2394 
274-80-9391 
409-56-7008 
427-17-2319 
472-27-2349 
486-29-1786 
648-92-1872 
672-71-3249 
712-45-1867 
722-51-5454 
724-80-9391 
756-30-7391 
807-91-6654 
846-92-7186 
899-46-2035 
998-72-3567 

(19 row(s) affected)

Important  The output for statements involving DISTINCT depends on the collation of the column or expression on which the DISTINCT is applied. For more information about the effects of different collations, see SQL Server Collation Fundamentals.

For the DISTINCT keyword, null values are considered to be duplicates of each other. When DISTINCT is included in a SELECT statement, only one NULL is returned in the results, regardless of how many null values are encountered.

Note  For compatibility with the SQL-92 standard and other implementations of Microsoft® SQL Server™, the ALL keyword can explicitly ask for all rows. However, there is no need to specify ALL because it is the default.