Multivariate Functional Data Modeling with Time-varying Clustering
We consider the situation where multivariate functional data has been collected over time at each of a set of sites. Our illustrative setting is bivariate, monitoring ozone and PM_10 levels as a function of time over the course of a year at a set of monitoring sites. The data we work with is from 24 monitoring sites in Mexico City which record hourly ozone and PM_10 levels. We use the data for the year 2017. Hence, we have 48 functions to work with. Our objective is to implement model-based clustering of the functions across the sites. Using our example, such clustering can be considered for ozone and PM_10 individually or jointly. It may occur differentially for the two pollutants. More importantly for us, we allow that such clustering can vary with time. We model the multivariate functions across sites using a multivariate Gaussian process. With many sites and several functions at each site, we use dimension reduction to provide a stochastic process specification for the distribution of the collection of multivariate functions over the say n sites. Furthermore, to cluster the functions, either individually by component or jointly with all components, we use the Dirichlet process which enables shared labeling of the functions across the sites. Specifically, we cluster functions based on their response to exogenous variables. Though the functions arise in continuous time, clustering in continuous time is extremely computationally demanding and not of practical interest. Therefore, we employ a partitioning of the time scale to capture time-varying clustering.
READ FULL TEXT