DeepWalk

DeepWalk is a machine learning method that learns embeddings (social representations) of a graph's vertices. These embeddings capture neighborhood similarity and community membership by encoding social relations in a continuous vector space with a relatively small number of dimensions.

The Goal of DeepWalk

The main goal of DeepWalk is to learn a latent representation, not only a probability distribution of node co-occurrences. This is achieved by introducing a mapping function $\Phi \colon v \in V \mapsto \mathbb{R}^{|V|\times d}$. The mapping function $\Phi$ represents the latent social representation associated with each vertex $v$ in the graph. In practice, $\Phi$ is represented by a $|V| \times d$ matrix of free parameters.

How DeepWalk Works

DeepWalk models a stream of short random walks to learn embeddings of a graph's vertices. The method uses a special language composed of a set of randomly-generated walks to generalize neural language models. This allows DeepWalk to learn a latent representation of the graph's vertices that captures neighborhood similarity and community membership.

The process of learning the embeddings involves the following steps:

Initialize the free parameters of the mapping function $\Phi$.
Generate a set of short random walks on the graph. Each vertex is visited a fixed number of times in each walk.
Use the generated walks to update the parameters of $\Phi$. This is done by maximizing the likelihood of observing the set of random walks, given the current parameters of $\Phi$.
Repeat steps 2 and 3 until the parameters of $\Phi$ converge.

Applications of DeepWalk

DeepWalk has been used in several applications, including:

Link prediction: Predicting the likelihood of the existence of an edge between two vertices in a graph.
Node classification: Classifying nodes in a graph into different categories based on their embeddings.
Recommendation systems: Recommending items to users based on their embeddings.
Community detection: Identifying communities of nodes in a graph based on their embeddings.

Advantages of DeepWalk

DeepWalk has several advantages over traditional graph-based machine learning methods. Some of these advantages include:

Efficiency: DeepWalk can scale to large graphs with millions of vertices and edges.
Flexibility: DeepWalk can be applied to a wide range of graph-based tasks, including link prediction, node classification, and community detection.
Interpretability: The learned embeddings can be easily interpreted as they are represented in a continuous vector space.

Limitations of DeepWalk

DeepWalk also has some limitations that should be considered when using the method. Some of these limitations include:

Not suitable for all types of graphs: DeepWalk works well on graphs with a relatively simple structure, but may not be suitable for graphs with more complex structures.
Limited context information: DeepWalk only uses information from the local neighborhood of a vertex, and may not capture global structure information.
Data sparsity: DeepWalk may not perform well on sparse graphs, where there are few edges between vertices.

DeepWalk is a machine learning method that learns embeddings (social representations) of a graph's vertices. It captures neighborhood similarity and community membership by encoding social relations in a continuous vector space with a relatively small number of dimensions. DeepWalk has been used in several applications, including link prediction, node classification, and community detection. While the method has several advantages, it also has limitations that should be considered when using the method.