Concept Identification from Single-Documents

Resultado de la investigación: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva


© Springer Nature Switzerland AG 2018. This article presents a method that extracts relevant concepts automatically, consisting of one or several words, whose main contribution is that it does so from a single document of any domain, regardless of its length; however, documents of short length are used (which are the most frequent to obtain on the web) to perform the work. This research was conducted for documents written in Spanish and was tested in multiple randomized domains to compare their results. For this, an algorithm was used to automatically identify syntactic patterns in the document. This work uses the previous work of [1] to obtain its results. This algorithm is based on statistical approximations and on the length of the identifiable patterns contained in the document, applies certain heuristic that can enhance or decrease the patterns’ choice according to the selection of one of the 5 methods that are processed (M1 to M5), with these patterns the candidate concepts are obtained, which go through another evaluation process that will obtain the final concepts. This proposal presents at least four advantages: (1) It is multi-domain, (2) It is independent of the text length, (3) It can work with one or more documents and (4) It allows the discarding of garbage or undesirable patterns from the beginning. The method was implemented in 11 different domains and its results range varies between 58%–70% of precision and 25%–46% of recall.
Idioma originalInglés estadounidense
Título de la publicación alojadaConcept identification from single-documents
Número de páginas16
EstadoPublicada - 1 nov 2018
EventoInternational Conference on Technologies and Innovation: 4th International Conference, CITI 2018, Guayaquil, Ecuador - Guayaquil, Ecuador
Duración: 6 nov 20189 nov 2018
Número de conferencia: 4


ConferenciaInternational Conference on Technologies and Innovation
Título abreviadoTechnologies and Innovation
Dirección de internet


Profundice en los temas de investigación de 'Concept Identification from Single-Documents'. En conjunto forman una huella única.

Citar esto