UMAP: An intuitive overview

The big idea here is to turn the data into a graph then to find a low dimensional representation of said graph before turning it back into a format we can use.

Before we describe the construction,we will first define what N-simplexes and N-simplex complexes are.

N-simplex: Pick n+1 points and draw lines between them. Make sure they’re not all aligned in a hyperplane. This forms some kind of shape. Whatever is in this shape is in the N-simplex.

A more formal construction would be:

Pick a constant c. The set of numbers whose addition all equal to this constant is the simplex.

A face of an N-simplex is simply a subset of said N-simplex.

N-simplex complexes: A bunch of N-simplexes strung together.

We want to construct a Cech cohomology. It’s jargon but the author assures us that in this particular case it’s easy to do: for each datum, drop an open cover on it then create a 0-simplex. Next create the 1-simplexes (the edges) between two 0-simplex if their respective open covers intersect.

Since the topic is machine learning, and our data does not perfectly capture the real world, we make our open covers “fuzzy” in the sense that the ball doesn’t denote certain inclusion.

There is a problem with this: suppose if you leave the data “as is”. You might have points that end up into have a monstruously complicated simplex that are separated.

Ideally we would want the data to be uniformly distributed. But what kind of black magic would allow such tour-de-force?

Let’s start with the basic concept of a distance metric. In math as in real life, you can measure distances in several ways.

The super saiyan Vegeta can likely use a bird’s definition of distance.

We, however, cannot jump over 13 story-high skyscrapers so in practice we typically think of distance as the shortest path we can take while dodging pedestrians, blocked streets due to construction, etc.

So what he does is define a distance for each datum containing a certain amount data so that the “volume” of each ball around each datum is the same.

But then the solution creates a problem:

If you’re trying to meet up Vegeta for coffee, it doesn’t jive very well. In the best of world, you both agree to your distance metric but well, Vegeta is a dick so he likely won’t. At the same time, you can’t fly faster than the speed of sound and don’t want to meet him in india.

So how do you communicate? Enter category theory. It’s a topic for another post, but in a nutshell, you don’t really use n-simplexes but rather simplical sets that capture all relevant information.

The union of which create a greater whole. Unless I’m misunderstanding something, in practice, the paper is saying that you can get away with just looking at things locally.

There is an additional QoL improvement:

in order for the graph to be locally connected (e.g. So that there aren’t several disconnected simplexes), we set the distance between any datum and its nearest neighbour to zero and start the odometer only past the point of the nearest neighbour.

Now that we’ve organized our data on something that’s easy to deal with, it remains to find a low dimensional representation of said data.

For this we need to have some idea of how close these are. This is given by the cross entropy function.

[Note to self: copy/paste cross entropy function here]. We can then set up something with Schochastic gradient descent to find an optimal solution.

Regardless of the manifold the data lies on, we want the target representation to be something in good old R^n.

Grosso modo, that’s what it is. What we’re trying to optimize is the “edges” in the trials graph.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: