An Operational Approach to Information Leakage

Issa, Ibrahim2018-04-262019-09-112017-08-30Issa_cornellgrad_0058F_10519http://dissertations.umi.com/cornellgrad:10519bibid: 10361398https://hdl.handle.net/1813/56721Given two random variables $X$ and $Y$, how much information does $Y$ ``leak'' about $X$? An operational approach is undertaken to answer this question, which is fundamental to the study of communication security. The resulting measure ${\mathcal{L}(X \!\! \to \!\! Y)}$ is called \emph{maximal leakage}, and is defined as the multiplicative increase, upon observing $Y$, of the probability of correctly guessing a randomized function of $X$, maximized over all such randomized functions. A closed form expression for $\mathcal{L}(X \!\! \to \!\! Y)$ is given for discrete $X$ and $Y$, and it is subsequently generalized to handle a large class of random variables. The resulting properties are shown to be consistent with an axiomatic view of a leakage measure, and the definition is shown to be robust to variations in the setup. Moreover, the \emph{guessing} framework is used to give operational definitions to commonly used leakage measures, such as Shannon capacity, maximal correlation, and local differential privacy. Counter-intuitively, it is shown that Shannon capacity \emph{underestimates} leakage. Furthermore, a variant of the Shannon cipher system is studied, in which performance of an encryption scheme is measured using maximal leakage. A single-letter characterization of the optimal limit of (normalized) maximal leakage is derived and asymptotically-optimal encryption schemes are demonstrated. The Shannon cipher system is also studied when there is a known distortion function up to which the adversary is interested in $X$, in which case the relevant metric is the (exponent of the) probability of a successful guess. A single-letter characterization of the highest achievable exponent is provided, and asymptotically-optimal strategies for both the primary user and the adversary are demonstrated. Finally, the sample complexity of estimating maximal leakage from data is studied. It is shown that the task is possibly only if the minimum strictly positive probability of a source symbol, $\theta$, is known. In that case, $\mathcal{O} \left( \frac{|\mathcal{Y}| \log |\mathcal{X}|}{\theta} \right)$ samples are sufficient, and $\Omega (|\mathcal{Y}|^{1-\eta}/\theta)$ samples, for any $\eta > 0$, are necessary.en-USElectrical engineeringInformation scienceGuessingInformation leakageMaximal leakageSibson mutual informationAn Operational Approach to Information Leakagedissertation or thesishttps://doi.org/10.7298/X49G5JZZ