Concept Identification from Single-Documents

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

© Springer Nature Switzerland AG 2018. This article presents a method that extracts relevant concepts automatically, consisting of one or several words, whose main contribution is that it does so from a single document of any domain, regardless of its length; however, documents of short length are used (which are the most frequent to obtain on the web) to perform the work. This research was conducted for documents written in Spanish and was tested in multiple randomized domains to compare their results. For this, an algorithm was used to automatically identify syntactic patterns in the document. This work uses the previous work of [1] to obtain its results. This algorithm is based on statistical approximations and on the length of the identifiable patterns contained in the document, applies certain heuristic that can enhance or decrease the patterns’ choice according to the selection of one of the 5 methods that are processed (M1 to M5), with these patterns the candidate concepts are obtained, which go through another evaluation process that will obtain the final concepts. This proposal presents at least four advantages: (1) It is multi-domain, (2) It is independent of the text length, (3) It can work with one or more documents and (4) It allows the discarding of garbage or undesirable patterns from the beginning. The method was implemented in 11 different domains and its results range varies between 58%–70% of precision and 25%–46% of recall.
Original languageAmerican English
Title of host publicationConcept identification from single-documents
Pages158-173
Number of pages16
DOIs
StatePublished - 1 Nov 2018
EventInternational Conference on Technologies and Innovation: 4th International Conference, CITI 2018, Guayaquil, Ecuador - Guayaquil, Ecuador
Duration: 6 Nov 20189 Nov 2018
Conference number: 4
https://link.springer.com/book/10.1007/978-3-030-00940-3

Conference

ConferenceInternational Conference on Technologies and Innovation
Abbreviated titleTechnologies and Innovation
CountryEcuador
CityGuayaquil
Period6/11/189/11/18
Internet address

Keywords

  • Concept extraction
  • Syntactic patterns
  • Text analysis
  • Single-documents

Fingerprint Dive into the research topics of 'Concept Identification from Single-Documents'. Together they form a unique fingerprint.

  • Cite this