Self-Adversarial Negative Sampling

Self-Adversarial Negative Sampling is a technique used in natural language processing to improve the efficiency of negative sampling in methods like word embeddings and knowledge graph embeddings. Negative sampling is a process that involves the sampling of negative triplets that are false in order to provide meaningful information during training. However, traditional negative sampling samples negatives uniformly, which leads to inefficiencies since many samples are blatantly false. This is where self-adversarial negative sampling comes in.

The Problem with Traditional Negative Sampling

Traditional negative sampling has its issues, one of which is inefficiency. When negative triplets are sampled uniformly, a lot of them are obviously false during the training process. This leads to a lack of meaningful information being provided. It is also worth noting that inefficiency can cause training to be slower due to the algorithm having to sift through countless samples that provide no useful information.

What is Self-Adversarial Negative Sampling?

Self-adversarial negative sampling is a technique that allows negative triplets to be sampled according to the current embedding model. Essentially, it allows for negative triplets to be chosen in a way that is directly influenced by the current state of the model. As such, the negative triplets chosen will have a higher likelihood of providing meaningful information during training.

How Self-Adversarial Negative Sampling Works

In self-adversarial negative sampling, the negative triplets are sampled from a distribution that takes into account the current state of the model. This distribution can be calculated as follows:

$$ p\left(h^{'}\_{j}, r, t^{'}\_{j} | \text{set}\left(h\_{i}, r\_{i}, t\_{i} \right) \right) = \frac{\exp\alpha{f}\_{r}\left(\mathbf{h}^{'}\_{j}, \mathbf{t}^{'}\_{j}\right)}{\sum\_{i=1}\exp\alpha{f}\_{r}\left(\mathbf{h}^{'}\_{i}, \mathbf{t}^{'}\_{i}\right)} $$

Here, the sampling temperature is represented by the variable $\alpha$. Essentially, the higher the value of $\alpha$, the more the model will prioritize sampling negative triplets with a higher likelihood of being true. Moreover, the above probability distribution can be treated as the weight of the negative sample since the sampling procedure may be costly.

The final negative sampling loss with self-adversarial training can be represented as follows:

$$ L = −\log\sigma\left(\gamma − d\_{r}\left(\mathbf{h}, \mathbf{t}\right)\right) − \sum^{n}\_{i=1}p\left(h^{'}\_{i}, r, t^{'}\_{i}\right)\log\sigma\left(d\_{r}\left(\mathbf{h}^{'}\_{i}, \mathbf{t}^{'}\_{i}\right) - \gamma\right) $$

Here, $\gamma$ represents the margin, $\sigma$ represents the sigmoid function, and $\left(\mathbf{h}^{'}\_{i}, r, \mathbf{t}^{'}\_{i}\right)$ represents the $i$-th negative triplet. Essentially, the loss function takes into account the weights of the negative samples obtained from the probability distribution mentioned earlier.

Final Thoughts

Self-adversarial negative sampling is a useful technique that can greatly improve negative sampling efficiency in natural language processing models. The key takeaway is that by sampling negative triplets based on the current state of the model, meaningful information can be provided even during later stages of training when most uniform samples are obviously false. Additionally, with the use of a probability distribution and temperature, the sampling process can be optimized to prioritize more reliable negative triplets.