CITI has stopped operations in 2014, to co-launch NOVA LINCS THIS SITE IS NOT BEING UPDATED SINCE 2013
citi banner
  Home  \  Seminars @ CITI  \  Seminar Page Login  
banner bottom
File Top
Measuring the Structural Similarity of Semistructured Documents Using Entropy
{ Fri, 2 Mar 2007, 16h00 }

By: Sven Helmer  [ show info ]

We propose a technique for measuring the structural similarity of semistructured documents based on entropy. After extracting the structural information from two documents we use either Ziv-Lempel encoding or Ziv-Merhav crossparsing to determine the entropy and consequently the similarity between the documents. To the best of our knowledge, this is the first linear-time approach for evaluating structural similarity. In an experimental evaluation we demonstrate that the results of our algorithm in terms of clustering quality are on a par with or even better than existing approaches.

Hosted by: Software Systems

Location: Sala de SeminĂ¡rios do DI

File Bottom