# ECE-5583: Information Theory and Probabilistic Programming

There has been a strong resurgence of AI in recent years. An important core technology of AI is statistical learning, which aims to automatically “program” machines with data. While the idea can date back to the 50's of the last century, the plethora of data and inexpensive computational power allow the techniques to thrive and penetrate into every aspect of our daily lives — customer behavior prediction, financial market prediction, fully automatic surveillance, self-driving vehicles, autonomous robots, and beyond.

Information theory was first introduced and developed by the great communications engineer, Claude Shannon in the 50's of the last century. The theory was introduced in an attempt to explain the principle behind point-to-point communication and data storing. However, the technique has been incorporated into statistical learning and has inspire many of the underlying principles. In this graduate course, we would try to explore the exciting area of statistical learning from the perspectives of information theorists. It facilitates students to have a deeper understanding of the omnipresent field of statistical learning and to appreciate the wide-spread significance of information theory. Moreover, we will look into recent advance in probabilistic programming technology that facilitates users to tackle inference problems through computer programs.

The course will start by providing an overview of information theory and statistical learning. We will then aid students to establish a solid foundation on core information theory principles including information measures, AEP, source and channel coding theory. We will then introduce common and yet powerful statistical techniques such as Bayesian learning, decision forests, and belief propagation algorithms and discuss how these modern statistical learning techniques are connected to information theory. To summarize, we will skim through some probablistic programming tools. The main reference text is a book by Professor Mackay — Information Theory, Inference, and Learning Algorithms but we will also borrow heavily from materials available online. Other most important reference texts are

### Course Syllabus (Tentative)

• Probability review

• Maximum likelihood estimator, MAP, Bayesian estimator

• Graphical models and message passing algorithms

• Lossless source coding theory, Huffmann coding, and introduction to Arithmetic coding

• Asymptotic equipartition property (AEP), typicality and joint typicality

• Entropy, conditional entropy, mutual information, and their properties

• Channel coding theory, capacity, and Fano’s inequality Continuous random variables, differential entropy, Gaussian source, and Gaussian channel

• Error correcting codes, linear codes, and introduction to low-density parity check code

• Methods of type, large deviation theory, maximum entropy principle

N.B. You will expect to expose to some Python and Matlab. You won't become an expert on these things after this class. But it is good to get your hands dirty and play with them early.

Quizzes (In class participation): 10% (extra credits).

Presentations: 20%.

Homework: 20%.

"Mid-term": 20%. take home but will only have half of a day to complete.

Final Project: 40%.

• A: $$\sim$$ 90 and above

• B: $$\sim$$ between 80 and 90

• C: $$\sim$$ between 70 and 80

• D: $$\sim$$ between 60 and 70

• F: Below 60

### Calendar

 Topics Materials 8/24 Overview of IT, probability overview, Monty Hall problem, discrete and continuous random variables, expectation (video), (video last year) probability review, slides2022a 8/31 Joint and conditional probabilities, independence and conditional independence (video) 9/07 ML, MAP, Bayesian inference (video) 9/14 Conjugate prior, Beta distribution, Python introduction (video) 9/21 Law of large number (LLN), proof of weak LLN, sampling discrete distributions, asymptotic equal partition, Kelly's criterion (video) slides2022b 9/28 Typical sequences, Source Coding Theorem, Jensen's inequality (video) Quantifying information 10/05 Conditional entropy, Huffman coding, KL-divergence, entropy of Gaussian source, Gaussian source maximizes entropy, constructive proof of source coding theorem (video) information measure 10/12 Mutual information, Thiel index, cross-entropy, data processing inequality (video) (slides from Berkeley CS188) 10/19 Chain rule of mutual information, Shannon's perfect secrecy, Decision tree, TF-IDF (video) 10/26 Fano's inequality, channel capacity, binary symmetric channel, Gaussian channel, packing lemma, covering lemma (video) Read Lecture 7 11/02 Mid-term 11/09 Presentations (video) 11/16 More Lea, proof of channel coding theorem, Lagrange multiplier and KKT conditions, parallel Gaussian channel (video) 11/23 Thanksgiving (video) 11/26 Graphical models, Bayesian networks, undirected graphs, factor graphs (video) slides2022c

