Representation Learning with Contrastive Predictive Coding

by Aaron van den Oord, Yazhe Li and Oriol Vinyals

tl;dr The paper presents an auto-regressive framework for representation learning using a contrastive loss.

The paper can be found here.

My sample github implementation is under construction.

Introduction

In supervised learning, we are usually given data of the form $ \{ x_i, y_i \} , i=1,…,N$ where the task is to find a mapping $f(x)$ such that $y_i \approx f(x_i)$. When we first start with ML, we usually see $x_i$ and $y_i$ belonging to different sets. For example, $x_i \in \mathbb{R}^d $ and $y_i \in \{0, 1\}$ is the binary classification setting.

We have no $y_i$ in an unsupervised setup. Thus, the basic intuition which changes here is to consider $x_i$ and $y_i$ as two parts of the same whole (data sample). For example, in an image, one patch can act as $x$ while the next adjacent patch acts as $y$. Loosely speaking, the distribution $P(\tilde{x})$ of the entire data sample that we are interested in, becomes the joint distribution $P(x,y)$.

This converts an unsupervised problem to a self-supervised problem. How? Well, if we create learning tasks between these $x$ and $y$ (like predicting $y$ from $x$), we can learn representations (which just an embedding in a high-dimensional space) for $x$ which has some understanding of its relationship with $y$. In my “patch of an image” example, this means that our representation learnt something about the local structure of an image - something which could be useful in an actual task, like Image classification. So a self-supervised task helps us in representation learning.

…