Lab Machine learning on encrypted data

This course is listed

in Aachen RWTHonline as Lab Machine learning on encrypted data,
in Bonn Basis as MA-INF 4322 - Machine Learning on encrypted data.

Final presentations

Friday, 19 February 2021, 10⁰⁰-12⁰⁰, digital seminar room.
Marie-Theres Schier, Penelope Mueck, Robert Logiewa.
Extended Cryptotree: Training and Inference on Encrypted Data.
Literature:
- Daniel Huynh (2020). Cryptotree: fast and accurate predictions on encrypted structured data.arXiv:2006.08299 [cs.LG].
- Adi Akavia, Max Leibovich, Yehezkel S. Resheff, Roey Ron, Moni Shahar & Margarita Vald (2020). Privacy-Preserving Decision Tree Training and Prediction against Malicious Server. IACR ePrint 2019/1282.
- Microsoft SEAL. Webpage.
tba: Talk PDF. Report PDF.
Friday, 26 March 2021, 10⁰⁰-12⁰⁰, digital seminar room.
Gerd Mund, Van Thong Nguyen, Yat Wai Wong.
Training a logistic regression model on encrypted data.
Literature:
- tba.
tba: Talk PDF. Report PDF.

Lecture

Michael Nüsken

Time & Place

Send an email to apply for participation in this lab.

We meet by appointment.

Tuesday, 14⁰⁰-16⁰⁰, digital seminar room .
Friday, 10⁰⁰-12⁰⁰, digital seminar room.

Kick-off meeting: Tuesday, 3 November 2020, 14⁰⁰, digital seminar room.

With the rise of more and more mechanisms and installations of data science methodology to automatically analyze large amounts of possibly privacy infringing data we have to carefully understand how to protect our data. Also more and more fake data shows up and we have to find ways to distinguish faked from trustable data. At the same time we want to allow insightful research and life-easing analyzes to be possible. This seeming contradiction has lead to various efforts for unifying both: protecting data and allowing analyzes, at least to some extent and possibly under some restrictions. See Munn et al. (2019) for a review on challenges and options.

Some methods use supervised machine learning where test data is classified manually and then used as training data. Other use unsupervised machine learning where data is classified without any manual training. The latter include clustering and outlier detection methods.

Jäschke & Armknecht (2018) employ fully homomorphic encryption (FHE) to analyze data with real valued dimensions and perform a classical clustering algorithm on encrypted data. This is a first major step to employ unsupervised machine learning methods in a fully privacy-protected scenario. A variety of other clustering methods have not yet been investigated at all. The aim of this lab is to understand the given approach and apply it to further methods.
Crawford, Gentry, Halevi, Platt & Shoup (2018) show how to get the best out of present FHE.
Results from last winter.
tbc.

The target of the lab is to understand how unsupervised machine learning on encrypted data may work. Ideally, we can come up with a novel solution for performing an unconsidered algorithm. We study the tasks and tools, select a clustering algorithm, find a protocol, prototype an implemention, perform a security analysis, present an evaluation, ... The named core papers and the experience from last winter's lab shall serve as guide how to proceed.

We will plan and distribute work shares together in our meetings.

Literature

Angela Jäschke & Frederik Armknecht (2018). Unsupervised Machine Learning on Encrypted Data. Selected Areas in Cryptography – SAC 2018. DOI 10.1007/978-3-030-10970-7_21. Also number 2018/411 in Cryptology ePrint Archive.
Jung Hee Cheon, Duhyeong Kim & Jai Hyun Park (2019). Towards a Practical Cluster Analysis over Encrypted Data. Number 2019/465 in Cryptology ePrint Archive.
Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal & Karn Seth (2017). Practical Secure Aggregation for Privacy-Preserving Machine Learning. DOI 10.1145/3133956.3133982. Among source materials you find the CCS Talk by Aaron Segal.
Jack L. H. Crawford, Craig Gentry, Shai Halevi, Daniel Platt & Victor Shoup (2018). Doing Real Work with FHE: The Case of Logistic Regression. WAHC'18 version. Full version.
Xianrui Meng, Dimitrios Papadopoulos, Alina Oprea & Nikos Triandopoulos (2019). Privacy-Preserving Hierarchical Clustering: Formal Security and Efficient Approximation. arXiv 1904.04475.
Raphael Bost, Raluca Ada Popa, Stephen Tu & Shafi Goldwasser (2015). Machine Learning Classification over Encrypted Data. NDSS 2015. Full version: number 2014/331 at Cryptology ePrint Archive.

Background on unsupervised machine learning, clustering:

Mamta Mittal, Lalit M. Goyal, Duraisamy Jude Hemanth & Jasleen K. Sethi (2018). Clustering approaches for high‐dimensional databases: A review. DOI 10.1002/widm.1300.
Chapter 7 in:
Avrim Blum, John Hopcroft, and Ravindran Kannan (2018+). Foundations of Data Science. Present draft is on Hopcroft's page (look for "Cambridge").
Rui Xu & Donald Wunsch II (2005). Survey of Clustering Algorithms. DOI 10.1109/TNN.2005.845141. @Researchgate.

Background on supervised machine learning with encryption:

Qian Lou & Lei Jiang (2019). SHE: A Fast and Accurate Privacy-Preserving Deep Neural Network Via Leveled TFHE and Logarithmic Data Representation. arXiv:1906.00148.
Le Trieu Phong, Yoshinori Aono, Takuya Hayashi, Lihua Wang & Shiho Moriai (2017). Privacy-Preserving Deep Learning via Additively Homomorphic Encryption. DOI 10.1109/TIFS.2017.2787987. Full version: number 2017/715 at Cryptology ePrint Archive.
Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig & John Wernsing (2016). CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy. Proceedings of The 33rd International Conference on Machine Learning (PMLR) 48, 201-210. PDF. See also arXiv:1412.6181. Talk @ DataScienceSummit 2017. ICML 2016: slides.
Edward Chou, Josh Beal, Daniel Levy, Serena Yeung, Albert Haque, Li Fei-Fei (2018). Faster CryptoNets: Leveraging Sparsity for Real-World Encrypted Inference. arXiv:1811.09953.
Yunlu Cai & Chunming Tang (2019). Privacy of outsourced two‐party k‐means clustering. DOI 10.1002/cpe.5473.

Background on fully homomorphic encryption:

Frederik Armknecht, Colin Boyd, Christopher Carr, Kristian Gjøsteen, Angela Jäschke, Christian A. Reuter & Martin Strand (2015). A Guide to Fully Homomorphic Encryption. Number 2015/1192 in Cryptology ePrint Archive.
Paulo Martins, Leonel Sousa & Artur Mariano (2017). A Survey on Fully Homomorphic Encryption: An Engineering Perspective. ACM Computing Surveys Vol. 50 Issue 6, Article 83. DOI 10.1145/3124441.
SPQlios team (2018). TFHE: Fast Fully Homomorphic Encryption over the Torus. Webpage.
Patrick Tu (2018). Conway’s Game of Life meets the Simple Encrypted Arithmetic Library (SEAL). Blog.
Benjamin M. Case and Shuhong Gao and Gengran Hu and Qiuxia Xu (2019). Fully Homomorphic Encryption with k-bit Arithmetic Operations. Number 2019/521 at Cryptology ePrint Archive.

Background on ethics and perspectives:

Munn, L., Hristova, T., Magee, L., Bourdignon, D., Horan, T., Levin, T., … Park, L. (2019). The New Privacy: Emerging Standards for Cloud-Based Security. DOI 10.26183/5c85e73df25ff.

Other related literature:

Jianyu Yang (Beijing University of Posts and Telecommunications), Xiang Cheng (Beijing University of Posts and Telecommunications), Sen Su (Beijing University of Posts and Telecommunications), Rui Chen (Samsung Research America, Mountain View, USA), Qiyu Ren (Beijing University of Posts and Telecommunications), Yuhan Liu (Beijing University of Posts and Telecommunications): Collecting Preference Rankings Under Local Differential Privacy. ICDE2019, PDF.

Prerequisites

Basic knowledge of data science and privacy is helpful.
A fast understanding of mathematical and computer science topics is required.

Allocation

Lab.

Master in Media Informatics: PR, 5 SWS.
Students have to register this course in RWTHonline.
Master in Computer Science at University of Bonn: Lab, 4 SWS.
Students have to register this course in BASIS.