Phase Transition of Community Detection Under Efficient Algorithms, Expressive Generative Models, and Confidentiality Constraints
Date
Authors
ORCID
Journal Title
Journal ISSN
Volume Title
Publisher
item.page.doi
Abstract
We formulate a semi-definite relaxation for the maximum likelihood estimation of node labels, subject to observing both graph and non-graph data. This formulation is distinct from the semidefinite programming solution of standard community detection, but maintains its desirable properties. We calculate the exact recovery threshold for three types of non- graph information, which are called side information: partially revealed labels, noisy labels, as well as multiple observations (features) per node with arbitrary but finite cardinality. We find that semidefinite programming has the same exact recovery threshold in the presence of side information as maximum likelihood with side information. Empirical observations suggest that in practice, community membership does not completely explain the dependency between the edges of an observation graph. The residual dependence of the graph edges are modeled in this dissertation, to first order, by auxiliary node latent variables that affect the statistics of the graph edges but carry no information about the communities of interest. We then study community detection in graphs obeying the stochastic block model and censored block model with auxiliary latent variables. We analyze the conditions for exact recovery when these auxiliary latent variables are unknown, representing unknown nuisance parameters or model mismatch. We also analyze exact recovery when these secondary latent variables have been either fully or partially revealed. Finally, we propose a semidefinite programming algorithm for recovering the desired labels when the secondary labels are either known or unknown. We show that exact recovery is possible by semidefinite programming down to the respective maximum likelihood exact recovery threshold. Releasing graph structures containing nodes with multiple latent variables might cause privacy issues and confidential information leakage of the users. This dissertation investigates the confidentiality in community detection in networks with multiple latent variables. Focusing on stochastic block model and censored block model with multiple latent variables, we address the leakage of confidential information by changing the connectivity of nodes. To this end, we first propose a new metric for evaluation of confidentiality based on Chernoff- Hellinger divergence. An optimization is introduced to minimize the required changes on the edges of the graph realization.