summaryBy(doBy) | R Documentation |

## Function to calculate groupwise summary statistics

### Description

Function to calculate groupwise summary statistics, much like the summary procedure of SAS

### Usage

summaryBy(formula, data = parent.frame(), id = NULL, FUN = mean, keep.names=FALSE, p2d=FALSE, order=TRUE, ...)

### Arguments

`formula` |
A formula object, see examples below |

`data` |
A data frame |

`id` |
A formula specifying variables which data are not grouped by but which should appear in the output. See examples below. |

`FUN` |
A list of functions to be applied, see examples below. |

`keep.names` |
If TRUE and if there is only ONE function in FUN, then the variables in the output will have the same name as the variables in the input, see 'examples'. |

`p2d` |
Should parentheses in output variable names be replaced by dots? |

`order` |
Should the resulting dataframe be ordered according to the variables on the right hand side of the formula? (using orderBy |

`...` |
Additional arguments to FUN. This could for example be NA actions. |

### Details

Extra arguments ('...') are passed onto the functions in FUN. Hence care must be taken that all functions in FUN accept these arguments - OR one can explicitly write a functions which get around this. This can particularly be an issue in connection with handling NAs. See examples below.

Some code for this function has been suggested by Jim Robison-Cox.

### Value

A data frame

### Author(s)

Søren Højsgaard, [email protected]

### See Also

`orderBy`

,
`transformBy`

,
`splitBy`

,
`lapplyBy`

,

### Examples

data(dietox) dietox12 <- subset(dietox,Time==12) summaryBy(Weight+Feed~Evit+Cu, data=dietox12, FUN=c(mean,var,length)) summaryBy(Weight+Feed~Evit+Cu+Time, data=subset(dietox,Time>1), FUN=c(mean,var,length)) ## Calculations on transformed data: summaryBy(log(Weight)+Feed~Evit+Cu, data=dietox12) ## Calculations on all numerical variables (not mentioned elsewhere): summaryBy(.~Evit+Cu, data=dietox12, id=~Litter, FUN=mean) ## There are missing values in the 'airquality' data, so we remove these ## before calculating mean and variance with 'na.rm=TRUE'. However the ## length function does not accept any such argument. Hence we get ## around this by defining our own summary function in which length is ## not supplied with this argument while mean and var are: sumfun <- function(x, ...){ c(m=mean(x, ...), v=var(x, ...), l=length(x)) } summaryBy(Ozone+Solar.R~Month, data=airquality, FUN=sumfun, na.rm=TRUE) ## Using '.' on the right hand side of a formula means to stratify by ## all variables not used elsewhere: data(warpbreaks) summaryBy(breaks ~ wool+tension, warpbreaks) summaryBy(breaks ~., warpbreaks) summaryBy(.~ wool+tension, warpbreaks) ## Keep the names of the variables (works only if FUN only returns one ## value): summaryBy(Ozone+Wind~Month, data=airquality,FUN=c(mean),na.rm=TRUE, keep.names=TRUE)

[Package

*doBy*version 3.0 Index]