The benefit of LSI is that it helps remove noise

Discuss hot database and enhance operational efficiency together.
Post Reply
Rina7RS
Posts: 473
Joined: Mon Dec 23, 2024 3:39 am

The benefit of LSI is that it helps remove noise

Post by Rina7RS »

Once the term-document matrix is ​​built, it is cleaned of stop words pronouns, articles, function words and some word forms are truncated so-called stemming is done, although it may not be necessary for the language. The terms are now represented in a bag-of-words model.

The entries in the term-document matrix are typically converted into weights according to their estimated importance e., by the TF-IDF method, which will be described further.

Then, SVD is performed on the matrix to decompose it into three other matrices. Each term and document gets a vector representation in an orthogonal matrix, and the diagonal matrix shows the singular values georgia mobile database ​​in descending order. Only the largest values ​​are retained, and the remaining values ​​are set to zero. The choice of the k factor for matrix reduction is empirical and related to the size of the collection. Therefore, SVD reduces the matrix size while preserving the main semantic structure.

The data are then compared by taking the cosine of the angle between two vectors formed by any two columns there are other ways to compare, e., by Euclidean distance.

These calculations identify co-occurring knowledge in the body of text and help reveal common concepts across multiple documents in a text collection. and transforms a very sparse TDM matrix into a low-rank approximation matrix that reveals common structure. The disadvantage of LSI is that it is computationally complex.
Post Reply