Batch-sequential design and heteroskedastic surrogate modeling for delta smelt conservation
Delta smelt is an endangered fish species in the San Francisco estuary that have shown an overall population decline over the past 30 years. Researchers have developed a stochastic, agent-based simulator to virtualize the system, with the goal of understanding the relative contribution of natural and anthropogenic factors suggested as playing a role in their decline. However, the input configuration space is high-dimensional, running the simulator is time-consuming, and its noisy outputs change nonlinearly in both mean and variance. Getting enough runs to effectively learn input–output dynamics requires both a nimble modeling strategy and parallel supercomputer evaluation. Recent advances in heteroskedastic Gaussian process (HetGP) surrogate modeling helps, but little is known about how to appropriately plan experiments for highly distributed simulator evaluation. We propose a batch sequential design scheme, generalizing one-at-a-time variance-based active learning for HetGP surrogates, as a means of keeping multi-core cluster nodes fully engaged with expensive runs. Our acquisition strategy is carefully engineered to favor selection of replicates which boost statistical and computational efficiencies when training surrogates to isolate signal in high noise regions. Design and modeling performance is illustrated on a range of toy examples before embarking on a large-scale smelt simulation campaign and downstream high-fidelity input sensitivity analysis.
READ FULL TEXT