Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics

https://doi.org/10.1016/S0378-3758(99)00041-5Get rights and content

Abstract

A new false discovery rate controlling procedure is proposed for multiple hypotheses testing. The procedure makes use of resampling-based p-value adjustment, and is designed to cope with correlated test statistics. Some properties of the proposed procedure are investigated theoretically, and further properties are investigated using a simulation study. According to the results of the simulation study, the new procedure offers false discovery rate control and greater power. The motivation for developing this resampling-based procedure was an actual problem in meteorology, in which almost 2000 hypotheses are tested simultaneously using highly correlated test statistics. When applied to this problem the increase in power was evident. The same procedure can be used in many other large problems of multiple testing, for example multiple endpoints. The procedure is also extended to serve as a general diagnostic tool in model selection.

Introduction

The common approach in simultaneous testing of multiple hypotheses is to construct a multiple comparison procedure (MCP) (Hochberg and Tamhane, 1987) that controls the probability of making one or more type I error – the family wise error-rate (FWE). In Benjamini and Hochberg (1995) the authors introduce another measure for the erroneous rejection of a number of true null hypotheses, namely the false discovery rate (FDR). The FDR is the expected proportion of true null hypotheses which are erroneously rejected, out of the total number of hypotheses rejected. When some of the tested hypotheses are in fact false, FDR control is less strict than FWE control, therefore FDR-controlling MCPs are potentially more powerful. While there are situations in which FWE control is needed, in other cases FDR control is sufficient.

The correlation map is a data analytic tool used in meteorology, which is a case in point. For example, the correlation between mean January pressure and January precipitation in Israel over some 40 years, is estimated at 1977 grid points over the northern hemisphere, and drawn on a map with the aid of iso-correlation lines (Manes, 1994). On this map, correlation centers are identified, and their orientation and location are analyzed to provide synopticians with the insight needed to construct forecasting schemes. If we treat the correlation map as (partially) a problem of testing independence at 1977 locations we immediately encounter the major difficulty. No control of multiplicity at all, which is the ongoing practice, would result in many spurious correlation centers. But since the multiplicity problem is large, we should be careful about loss of power.

If such centers are identified we can bear a few erroneous ones, as long as they are a small proportion of those identified. If all we face is noise we need full protection. Thus FDR control offers the appropriate mode of protection. Moreover, using data on an even finer grid is highly disadvantageous if we take the traditional approach to multiplicity control, although it is obviously advantageous from a meteorological point of view. A finer grid increases the number of true and non-true null hypotheses approximately by the same proportion. Because the FDR is the proportion of true null hypotheses rejected among the rejected, FDR controlling MCPs should approximately retain their power as resolution increases.

The major problem we still face is that the test statistics are highly correlated. So far, all FDR controlling procedures were designed in the realm of independent test statistics. Most were shown to control the FDR even in cases of dependency (Benjamini et al., 1995; Benjamini and Yekutieli, 1997), but they were not designed to make use of the dependency structure in order to gain more power when possible. Here we design new FDR controlling procedures for general dependent test statistics by developing a resampling approach along the line of Westfall and Young (1993) for FWE control. This new procedure is not specific to the meteorological problem, and can be modified to tackle many problems where dependency is suspected to be high, yet of unknown structure. An important example of such a problem is multiple endpoints in clinical trials, reviewed in this issue by Wassmer et al. (1997).

Combining FDR control and resampling is not a straightforward matter. When designing resampling-based p-value adjustments, Westfall and Young relied on the fact that the probability of making any type I error depends on the distribution of the true null hypotheses only, and treating more than necessary as true is always conservative in terms of FWE. This is not generally true for FDR control: the FDR depends on the distribution of both the true and false null hypotheses, and failing to reject a false null hypothesis can make the FDR larger.

The approach taken here is as following: we limit ourselves to a family of “generic” MCPs, which rejects an hypothesis if its p-value is less than p. For each p we estimate the FDR of the generic MCP, the FDR local estimators. As a “Global” q level FDR controlling MCP we suggest the most powerful of the generic MCPs whos FDR local estimate is less than q. The resulting MCP is adaptive, since the FDR local estimators are based on the set of observed p-values. It is also a step-up procedure, since the specific generic MCP is chosen in a step-up fashion. Unlike the stepwise resampling procedures suggested by Westfall and Young (1993) and Troendle (1995), in which the entire resampling simulation is repeated in a step-up fashion, the resampling simulation in our proposed method is performed only once on the entire set of hypotheses.

In Section 2 the framework for defining the MCPs is laid, FWE and FDR controlling MCPs, and the relationship between p-value adjustments and local estimators are discussed. In Section 3, the p-value resampling approach is reviewed. In Section 4, two types of FDR local estimators are introduced, a local estimator based on Benjamini and Hochberg's FDR controlling MCP, and two resampling-based FDR local estimators. In Section 5, the use of FDR local estimators for inference is presented, and the advantages of the local point of view are discussed, especially using the suggested “local estimates plot”. 2 Multiple comparison procedures, 3 and the beginning of Section 5 (with its references to Section 4) suffice in order to understand the main features of the new procedure and apply it in practice. The results of applying the new MCPs to a correlation map are presented in Section 6, and the use of the p-value adjustment plot is demonstrated. In that example the new MCPs proved to be most powerful, and revealed some new patterns. A simulation study was used to show the global FDR control of the suggested MCP. It was also used to show that the newly proposed procedures are more powerful than the existing MCP. Results of the simulation study are presented in Section 7, proofs of the propositions are given in Section 8.

Section snippets

Multiple comparison procedures

The family of m hypotheses which are tested simultaneously includes m0 true null hypotheses and m1 false ones. For each hypothesis Hi a test statistic is available, with the corresponding p-value Pi. Denote by {H01,…,H0m0} the hypotheses for which the null hypothesis is true, and by {H11,…,H1m1} the false ones. The corresponding vectors of p-values are P0 and P1. If the test statistics are continuous, P0i∼U[0,1]. The marginal distribution of each P1i is unknown, but if the tests are unbiased,

p-value resampling

The construction of powerful MCPs requires knowledge of the distribution of P0. p-value resampling is a method to approximate the distribution of P0 using data gathered in a single experiment. Since the number and identity of the true null hypotheses is not known, p-value resampling is conducted under the complete null hypothesis, i.e. under the assumption that all the hypotheses are in fact true.

The basic setup for p-value resampling: Let Y denote a data set gathered to test an ensemble of

FDR local estimators

FDR local estimators are estimators of the FDR correction. The first is based on the FDR controlling MCP in Benjamini and Hochberg (1995). The BH FDR local estimator is defined asQestBH(p)=m·p/r(p)ifr(p)⩾1,0otherwise.Let R−1(p) denote the reciprocal of R(p), taking the value 0 if R(p) is 0. The expected value of the BH local estimator is then m·p·ER−1(p) hence greater than EV(pER−1(p), the FDR correction is E(V(pR−1(p)) therefore the bias of the BH local estimator as an estimator of the FDR

Use of local estimates in multiple-hypotheses testing

With an eye towards the user we now outline the multiple testing procedure.

  • 1.

    Construct a p-value resampling scheme as described in Section 3.

  • 2.

    Choose the set of p-values for inquiry. If the purpose is testing, it is enough to consider the set of observed p-values p. Drawing the FDR local estimates plot, described at the end of this section, might require computing the FDR local estimators on a grid of p-values.

  • 3.

    For each p-value p, in the set of p-values specified in step 2, compute the resample

The problem

The Israeli Meteorological Service has routinely issued seasonal forecasts of precipitation since 1983. These forecasts were constructed by the Seasonal Forecast Research Group, see Manes et al. (1994). The successful forecasting effort had always involved modeling the association between anomalies in the pressure field over the northern hemisphere and precipitation in Israel.

Within the ongoing forecasting effort, models and methods are ever changing. Recently interest grew in the forecasting

Simulation study

In the previous sections we showed that FDR local estimates are conservative estimators of the FDR correction if the vectors of p-value corresponding to true and non-true null hypotheses are independent. In order to show that the MCPs based on the FDR local estimates control the FDR, we revert to a simulation study.

The performance of the three MCPs is compared in terms of FDR control and power. They are also compared to a fourth MCP based on the FDR p-value correction Qe (REAL), which can be

Proofs of propositions

Proof of Proposition 4.2.

As defined, Qβ(p)⩾ERR(p)/(R(p)+r(p)−rβ(p)). Therefore,Pr{Qβ(p)⩾QV(p)}⩾PrERRR+r−rβ⩾EVVV+s(Assuming subset pivotality distribution of V(p) and V0(p) are identical)PrEV0V0V0+r−rβ⩾EV0V0V0+sPr{r−rβ⩽s}=Pr{s+v−rβ⩽s}=Pr{v⩽rβ}=Pr{V0⩽rβ}Pr{R(p)⩽rβ(p)}⩾1−β.

Proof of Proposition 4.3.

If S(p) and V(p) are independent, the distribution of V(p)|S(p)=s(p) and V(p) are identical and assuming subset pivotality, the distribution of V0(p) and V(p) are identical thusQV|s(p)=EV(p)V(p)s(p)+V0(p)=EV(p)ER(p)V0(p)

Acknowledgements

We are thankful to Dr. Manes from the Israeli Meteorological Service (IMS) and to Prof. Alpert from Tel-Aviv University for introducing us to the meteorological problem, and making the data accessible. We are also thankful to the many comments of one of the referees which have helped us improve the style of the presentation.

References (15)

  • Y. Benjamini et al.

    Conditional versus unconditional analysis in some regression models.

    Comm. Statist. Theory Methods

    (1990)
  • Y. Benjamini et al.

    Controlling the false discovery rate: a practical and powerful approach to multiple testing

    J. Roy. Stat. Soc. Ser. B

    (1995)
  • Benjamini, Y., Hochberg, Y., Kling, Y., 1995. False discovery rate controlling procedures for pairwise comparisons....
  • Benjamini, Y., Yekutieli, D., 1997. The control of the false discovery rate in multiple testing under positive...
  • C.W. Dunnet et al.

    A step-up multiple test procedure

    J. Amer. Statist. Assoc.

    (1992)
  • J.F. Heyse et al.

    Adjusting for multiplicity of statistical tests in the analysis of carcinogenicity studies

    Biomet. J.

    (1988)
  • Hochberg, Y., Tamhane, A.C., 1987. Multiple Comparison Procedures. Wiley, New...
There are more references available in the full text version of this article.

Cited by (542)

  • Default Mode Network Hypoalignment of Function to Structure Correlates With Depression and Rumination

    2024, Biological Psychiatry: Cognitive Neuroscience and Neuroimaging
View all citing articles on Scopus
View full text