Hubbry Logo
search
logo
2592513

Knowledge graph embedding

logo
Community Hub0 Subscribers
Write something...
Be the first to start a discussion here.
Be the first to start a discussion here.
See all
Knowledge graph embedding

In representation learning, knowledge graph embedding (KGE), also called knowledge representation learning (KRL), or multi-relation learning, is a machine learning task of learning a low-dimensional representation of a knowledge graph's entities and relations while preserving their semantic meaning. Leveraging their embedded representation, knowledge graphs (KGs) can be used for various applications such as link prediction, triple classification, entity recognition, clustering, and relation extraction.

A knowledge graph is a collection of entities , relations , and facts . A fact is a triple that denotes a link between the head and the tail of the triple. Another notation that is often used in the literature to represent a triple (or fact) is . This notation is called the Resource Description Framework (RDF). A knowledge graph represents the knowledge related to a specific domain; leveraging this structured representation, it is possible to infer a piece of new knowledge from it after some refinement steps. However, nowadays, people have to deal with the sparsity of data and the computational inefficiency to use them in a real-world application.

The embedding of a knowledge graph is a function that translates each entity and each relation into a vector of a given dimension , called embedding dimension. It is even possible to embed the entities and relations with different dimensions. The embedding vectors can then be used for other tasks.

A knowledge graph embedding is characterized by four aspects:

All algorithms for creating a knowledge graph embedding follow the same approach. First, the embedding vectors are initialized to random values. Then, they are iteratively optimized using a training set of triples. In each iteration, a batch of size triples is sampled from the training set, and a triple from it is sampled and corrupted—i.e., a triple that does not represent a true fact in the knowledge graph. The corruption of a triple involves substituting the head or the tail (or both) of the triple with another entity that makes the fact false. The original triple and the corrupted triple are added in the training batch, and then the embeddings are updated, optimizing a scoring function. Iteration stops when a stop condition is reached. Usually, the stop condition depends on the overfitting of the training set. At the end, the learned embeddings should have extracted semantic meaning from the training triples and should correctly predict unseen true facts in the knowledge graph.

The following is the pseudocode for the general embedding procedure.

These indexes are often used to measure the embedding quality of a model. The simplicity of the indexes makes them very suitable for evaluating the performance of an embedding algorithm even on a large scale. Given as the set of all ranked predictions of a model, it is possible to define three different performance indexes: Hits@K, MR, and MRR.

Hits@K or in short, H@K, is a performance index that measures the probability to find the correct prediction in the first top K model predictions. Usually, it is used . Hits@K reflects the accuracy of an embedding model to predict the relation between two given triples correctly.

See all
User Avatar
No comments yet.