TDV is a high dimensional, sparse vector representation of lexical items (terms) ranging from morphemes to phrases, based on the definitions of their meanings as presented in Wiktionary. It contrasts with distributional representation methods, such as word2vec and GloVe, which define term meanings from their usage patterns (context windows). Compared to distributional methods, TDV performs better at semantic similarity computation, where the former perform better at semantic relatedness. It provides interesting features for Natural Language Processing, such as sense polarity between terms, multi-language representation, and the ability to disambiguate senses using Part of Speech (POS) information.
Project repo: https://github.com/dscarvalho/tdv
Related publications:
EasyESA is an implementation of Explicit Semantic Analysis (ESA) based on the Wikiprep-ESA code from Çağatay Çallı https://github.com/faraday/wikiprep-esa. It runs as a JSON webservice which can be queried for the semantic relatedness measure, concept vectors or the context windows.
Explicit Semantic Analysis (ESA) is a technique for text representation that uses Wikipedia commonsense knowledge base using the co-occurrence of words in the text. The articles’ words are associated with its concept using TF-IDF scoring, and a word can be represented as a vector of its associations to each concept and thus “semantic relatedness” between any two words can be measured by means of cosine similarity. A document containing a string of words is represented as the centroid of the vectors representing its words.
Project repo: https://github.com/dscarvalho/easyesa
Publication: Danilo Carvalho, Çağatay Çallı, André Freitas, Edward Curry, EasyESA: A Low-effort Infrastructure for Explicit Semantic Analysis, In Proceedings of the 13th International Semantic Web Conference (ISWC), Rival del Garda, 2014. (Demonstration Paper in Proceedings).
Graphia is a framework which extracts structured data graphs from factual unstructured texts. Instead of extracting simple relations, or committing to a specific conceptual model, Graphia aims at the extraction of graphs which can represent the complexity of contexts present in texts.
The graph representation adopted by the framework (SDGs – Structured Discourse Graphs) can be naturally serialized as an entity-centric RDF graph, which facilitates the integration and the use of the graph with other resources and applications. Additionally, the graph representation supports a pay-as-you-go / semantic best-effort extraction, where a comprehensive extraction is prioritized over accuracy and where the quality of the extracted graph evolves over time.
Features included in the framework:
Project page: graphia.dcc.ufrj.br Online demo: graphia.dcc.ufrj.br/OnlineDemo
Related publications::
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia
In Proceedings of the 1st Workshop on the Web of Linked Entities (WoLE 2012) at the 11th International Semantic Web Conference (ISWC), 2012 (Workshop Full Paper)
Graphia: Extracting Contextual Relation Graphs from Text
In Proceedings of the 10th Extended Semantic Web Conference (ESWC), Montpellier, France, 2013. (Demonstration Paper in Proceedings).
Representing Texts as Contextualized Entity-Centric Linked Data Graphs
In Proceedings of the 12th International Workshop on Web Semantics and Web Intelligence (WebS 2013), 24th International Conference on Database and Expert Systems Applications (DEXA), Prague, 2013. (Workshop Full Paper)