The dense vector representations obtained by knowledge graph embedding techniques provide three fundamental advantages: First, they enable the integration of information from different modalities like images and multilingual text with symbolic knowledge into one common representation. Such cross-modal embeddings provide measurable benefits for semantic similarity benchmarks and entity-type prediction tasks. Second, they allow to transfer knowledge across modalities even for concepts that are not represented in the other modalities.
Third, they are key to solving complex AI tasks beyond link prediction, like image
captioning or multi-step decision making. Again, the transfer of information from other modalities can be beneficial. E.g., cross-modal knowledge transfer assist the captioning of images which contain visual objects that are unseen in the image-caption parallel training data. Ultimately, this allows to tackle several real-world application areas where knowledge-guided representation learning can provide considerable benefits, like media analytics, manufacturing or medical engineering.