CITI has stopped operations in 2014, to co-launch NOVA LINCS THIS SITE IS NOT BEING UPDATED SINCE 2013
citi banner
  Home  \  Graduation Activities  \  Post-Graduation Page Login  
banner bottom
File Top
Towards improving WEBSOM with Multi-Word Expresions
MSc Post-Graduation

Large quantities of free-text documents are usually rich in information and covers several topics. However, since their dimensions are very large, searching and filtering data is an exhaustive task. A large text collection covers a set of topics where each topic is affiliated to a group of documents. This thesis presents a method for building a document map about the core contents covered in the collection. WEBSOM is an approach that combines document encoding methods and Self-Organizing Maps (SOM) to generate a document map. However, this methodology has a weakness in the document encoding method because it uses single words to characterize documents. Single words tend to be ambiguous and semantically vague, so some documents can be incorrectly related. This thesis proposes a new document encoding method to improve the WEBSOM approach by using multi word expressions (MWWs) to describe documents. Previous research and ongoing experiments encourage us to use MWEs to characterize documents because these are semantically more accurate than single words and more descriptive. Keywords: Self-Organizing Maps (SOM), Text Mining, WEBSOM, Relevant Expressions

Start Date: 2012-04-01

End Date: 2013-03-20

Post-Graduation Student / Researcher / Professor:
  • Stefan Alves ( DI-FCT-UNL )

Post-Graduation Supervisor(s):

Post-Graduation Jury:
File Bottom