12.10. Functions and Modifiers for Use with GROUP BY Clauses

MySQL 5.0

12.10. Functions and Modifiers for Use with GROUP BY Clauses

12.10.1. GROUP BY (Aggregate) Functions

This section describes group (aggregate) functions that operate on sets of values. Unless otherwise stated, group functions ignore values.

If you use a group function in a statement containing no clause, it is equivalent to grouping on all rows.

The and aggregate functions do not work with temporal values. (They convert the values to numbers, which loses the part after the first non-numeric character.) To work around this problem, you can convert to numeric units, perform the aggregate operation, and convert back to a temporal value. Examples:

SELECT SEC_TO_TIME(SUM(TIME_TO_SEC())) FROM ;
SELECT FROM_DAYS(SUM(TO_DAYS())) FROM ;
  • )

    Returns the average value of . The option can be used as of MySQL 5.0.3 to return the average of the distinct values of .

    returns if there were no matching rows.

    mysql> 
        ->        
        ->        
    
  • )

    Returns the bitwise of all bits in . The calculation is performed with 64-bit () precision.

    This function returns if there were no matching rows. (This is the value of an unsigned value with all bits set to 1.)

  • )

    Returns the bitwise of all bits in . The calculation is performed with 64-bit () precision.

    This function returns if there were no matching rows.

  • )

    Returns the bitwise of all bits in . The calculation is performed with 64-bit () precision.

    This function returns if there were no matching rows.

  • )

    Returns a count of the number of non- values in the rows retrieved by a statement. The result is a value.

    returns if there were no matching rows.

    mysql> 
        ->        
        ->        
        ->        
    

    is somewhat different in that it returns a count of the number of rows retrieved, whether or not they contain values.

    is optimized to return very quickly if the retrieves from one table, no other columns are retrieved, and there is no clause. For example:

    mysql> 
    

    This optimization applies only to tables only, because an exact row count is stored for this storage engine and can be accessed very quickly. For transactional storage engines such as and , storing an exact row count is more problematic because multiple transactions may be occurring, each of which may affect the count.

  • ,[...])

    Returns a count of the number of different non- values.

    returns if there were no matching rows.

    mysql> 
    

    In MySQL, you can obtain the number of distinct expression combinations that do not contain by giving a list of expressions. In standard SQL, you would have to do a concatenation of all expressions inside .

  • )

    This function returns a string result with the concatenated non- values from a group. It returns if there are no non- values. The full syntax is as follows:

    GROUP_CONCAT([DISTINCT]  [, ...]
                 [ORDER BY { |  | }
                     [ASC | DESC] [, ...]]
                 [SEPARATOR ])
    
    mysql> 
        ->     
        ->     
        ->     
    

    Or:

    mysql> 
        ->     
        ->               
        ->     
        ->     
    

    In MySQL, you can get the concatenated values of expression combinations. You can eliminate duplicate values by using . If you want to sort values in the result, you should use clause. To sort in reverse order, add the (descending) keyword to the name of the column you are sorting by in the clause. The default is ascending order; this may be specified explicitly using the keyword. is followed by the string value that should be inserted between values of result. The default is a comma (‘’). You can eliminate the separator altogether by specifying .

    You can set a maximum allowed length with the system variable. (The default value is 1024.) The syntax to do this at runtime is as follows, where is an unsigned integer:

    SET [SESSION | GLOBAL] group_concat_max_len = ;
    

    If a maximum length has been set, the result is truncated to this maximum length.

    Beginning with MySQL 5.0.19, the type returned by is always unless is greater than 512, in which case, it returns a . (Previously, it returned a with greater than 512 only if the query included an clause.)

    See also and : Section 12.3, “String Functions”.

  • ), )

    Returns the minimum or maximum value of . and may take a string argument; in such cases they return the minimum or maximum string value. See Section 7.4.5, “How MySQL Uses Indexes”. The keyword can be used to find the minimum or maximum of the distinct values of , however, this produces the same result as omitting .

    and return if there were no matching rows.

    mysql> 
        ->        
        ->        
    

    For , , and other aggregate functions, MySQL currently compares and columns by their string value rather than by the string's relative position in the set. This differs from how compares them. This is expected to be rectified in a future MySQL release.

  • ) )

    Returns the population standard deviation of . This is an extension to standard SQL. The form of this function is provided for compatibility with Oracle. As of MySQL 5.0.3, the standard SQL function can be used instead.

    These functions return if there were no matching rows.

  • )

    Returns the population standard deviation of (the square root of ). This function was added in MySQL 5.0.3. Before 5.0.3, you can use or , which are equivalent but not standard SQL.

    returns if there were no matching rows.

  • )

    Returns the sample standard deviation of (the square root of . This function was added in MySQL 5.0.3.

    returns if there were no matching rows.

  • )

    Returns the sum of . If the return set has no rows, returns . The keyword can be used in MySQL 5.0 to sum only the distinct values of .

    returns if there were no matching rows.

  • )

    Returns the population standard variance of . It considers rows as the whole population, not as a sample, so it has the number of rows as the denominator. This function was added in MySQL 5.0.3. Before 5.0.3, you can use , which is equivalent but is not standard SQL.

    returns if there were no matching rows.

  • )

    Returns the sample variance of . That is, the denominator is the number of rows minus one. This function was added in MySQL 5.0.3.

    returns if there were no matching rows.

  • )

    Returns the population standard variance of . This is an extension to standard SQL. As of MySQL 5.0.3, the standard SQL function can be used instead.

    returns if there were no matching rows.

12.10.2. GROUP BY Modifiers

The clause allows a modifier that causes extra rows to be added to the summary output. These rows represent higher-level (or super-aggregate) summary operations. thus allows you to answer questions at multiple levels of analysis with a single query. It can be used, for example, to provide support for OLAP (Online Analytical Processing) operations.

Suppose that a table named has , , , and columns for recording sales profitability:

CREATE TABLE sales
(
    year    INT NOT NULL,
    country VARCHAR(20) NOT NULL,
    product VARCHAR(32) NOT NULL,
    profit  INT
);

The table's contents can be summarized per year with a simple like this:

mysql> 
+------+-------------+
| year | SUM(profit) |
+------+-------------+
| 2000 |        4525 |
| 2001 |        3010 |
+------+-------------+

This output shows the total profit for each year, but if you also want to determine the total profit summed over all years, you must add up the individual values yourself or run an additional query.

Or you can use , which provides both levels of analysis with a single query. Adding a modifier to the clause causes the query to produce another row that shows the grand total over all year values:

mysql> 
+------+-------------+
| year | SUM(profit) |
+------+-------------+
| 2000 |        4525 |
| 2001 |        3010 |
| NULL |        7535 |
+------+-------------+

The grand total super-aggregate line is identified by the value in the column.

has a more complex effect when there are multiple columns. In this case, each time there is a “break” (change in value) in any but the last grouping column, the query produces an extra super-aggregate summary row.

For example, without , a summary on the table based on , , and might look like this:

mysql> 
    -> 
    -> 
+------+---------+------------+-------------+
| year | country | product    | SUM(profit) |
+------+---------+------------+-------------+
| 2000 | Finland | Computer   |        1500 |
| 2000 | Finland | Phone      |         100 |
| 2000 | India   | Calculator |         150 |
| 2000 | India   | Computer   |        1200 |
| 2000 | USA     | Calculator |          75 |
| 2000 | USA     | Computer   |        1500 |
| 2001 | Finland | Phone      |          10 |
| 2001 | USA     | Calculator |          50 |
| 2001 | USA     | Computer   |        2700 |
| 2001 | USA     | TV         |         250 |
+------+---------+------------+-------------+

The output indicates summary values only at the year/country/product level of analysis. When is added, the query produces several extra rows:

mysql> 
    -> 
    -> 
+------+---------+------------+-------------+
| year | country | product    | SUM(profit) |
+------+---------+------------+-------------+
| 2000 | Finland | Computer   |        1500 |
| 2000 | Finland | Phone      |         100 |
| 2000 | Finland | NULL       |        1600 |
| 2000 | India   | Calculator |         150 |
| 2000 | India   | Computer   |        1200 |
| 2000 | India   | NULL       |        1350 |
| 2000 | USA     | Calculator |          75 |
| 2000 | USA     | Computer   |        1500 |
| 2000 | USA     | NULL       |        1575 |
| 2000 | NULL    | NULL       |        4525 |
| 2001 | Finland | Phone      |          10 |
| 2001 | Finland | NULL       |          10 |
| 2001 | USA     | Calculator |          50 |
| 2001 | USA     | Computer   |        2700 |
| 2001 | USA     | TV         |         250 |
| 2001 | USA     | NULL       |        3000 |
| 2001 | NULL    | NULL       |        3010 |
| NULL | NULL    | NULL       |        7535 |
+------+---------+------------+-------------+

For this query, adding causes the output to include summary information at four levels of analysis, not just one. Here's how to interpret the output:

  • Following each set of product rows for a given year and country, an extra summary row is produced showing the total for all products. These rows have the column set to .

  • Following each set of rows for a given year, an extra summary row is produced showing the total for all countries and products. These rows have the and columns set to .

  • Finally, following all other rows, an extra summary row is produced showing the grand total for all years, countries, and products. This row has the , , and columns set to .

Other Considerations When using

The following items list some behaviors specific to the MySQL implementation of :

When you use , you cannot also use an clause to sort the results. In other words, and are mutually exclusive. However, you still have some control over sort order. in MySQL sorts results, and you can use explicit and keywords with columns named in the list to specify sort order for individual columns. (The higher-level summary rows added by still appear after the rows from which they are calculated, regardless of the sort order.)

can be used to restrict the number of rows returned to the client. is applied after , so the limit applies against the extra rows added by . For example:

mysql> 
    -> 
    -> 
    -> 
+------+---------+------------+-------------+
| year | country | product    | SUM(profit) |
+------+---------+------------+-------------+
| 2000 | Finland | Computer   |        1500 |
| 2000 | Finland | Phone      |         100 |
| 2000 | Finland | NULL       |        1600 |
| 2000 | India   | Calculator |         150 |
| 2000 | India   | Computer   |        1200 |
+------+---------+------------+-------------+

Using with may produce results that are more difficult to interpret, because you have less context for understanding the super-aggregate rows.

The indicators in each super-aggregate row are produced when the row is sent to the client. The server looks at the columns named in the clause following the leftmost one that has changed value. For any column in the result set with a name that is a lexical match to any of those names, its value is set to . (If you specify grouping columns by column number, the server identifies which columns to set to by number.)

Because the values in the super-aggregate rows are placed into the result set at such a late stage in query processing, you cannot test them as values within the query itself. For example, you cannot add to the query to eliminate from the output all but the super-aggregate rows.

On the other hand, the values do appear as on the client side and can be tested as such using any MySQL client programming interface.

12.10.3. GROUP BY and HAVING with Hidden Fields

MySQL extends the use of so that you can use non-aggregated columns or calculations in the list that do not appear in the clause. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. For example, you do not need to group on in the following query:

SELECT order.custid, customer.name, MAX(payments)
  FROM order,customer
  WHERE order.custid = customer.custid
  GROUP BY order.custid;

In standard SQL, you would have to add to the clause. In MySQL, the name is redundant.

Do not use this feature if the columns you omit from the part are not constant in the group. The server is free to return any value from the group, so the results are indeterminate unless all values are the same.

A similar MySQL extension applies to the clause. The SQL standard does not allow the clause to name any column that is not found in the clause if it is not enclosed in an aggregate function. MySQL allows the use of such columns to simplify calculations. This extension assumes that the non-grouped columns will have the same group-wise values. Otherwise, the result is indeterminate.

If the SQL mode is enabled, the MySQL extension to does not apply. That is, columns not named in the clause cannot be used in the list or clause if not used in an aggregate function.

The select list extension also applies to . That is, you can use non-aggregated columns or calculations in the clause that do not appear in the clause. This extension does not apply if the SQL mode is enabled.

In some cases, you can use and to obtain a specific column value even if it isn't unique. The following gives the value of from the row containing the smallest value in the column:

SUBSTR(MIN(CONCAT(RPAD(sort,6,' '),column)),7)

See Section 3.6.4, “The Rows Holding the Group-wise Maximum of a Certain Field”.

Note that if you are trying to follow standard SQL, you can't use expressions in clauses. You can work around this limitation by using an alias for the expression:

SELECT id,FLOOR(value/100) AS val
  FROM 
  GROUP BY id, val;

MySQL does allow expressions in clauses. For example:

SELECT id,FLOOR(value/100)
  FROM 
  GROUP BY id, FLOOR(value/100);