20.02.2019 Views

CLC-Conference-Proceeding-2018

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Figura 5: On the left the matrix associated with<br />

the image and on the right the image with<br />

missing pixels and the corresponding absent data<br />

matrix (taken from xlix ).<br />

To compute the NMF Garza de la Luna explain<br />

in l how a modification of the ALS method can be<br />

used to deal with the missing data. In section 1.2,<br />

the vector formulation of ALS is presented as a<br />

Least Squared Problem with inequality<br />

restrictions (LSI). The idea is to eliminate from<br />

the sum the elements that depend on values of<br />

the data that are not known. For them, values pij<br />

(weights) are introduced such that<br />

The weighted version of the cost function is<br />

obtained<br />

subject to x ≥ 0, the algorithm with the modified<br />

cost function is known as weighted ALS.<br />

Once the matrices A and X are calculated, even if<br />

the matrix Y has missing data, the coefficients of<br />

A and X are all known and if the product is<br />

computed an approximation of the data matrix Y<br />

is obtained.<br />

1.2 Study of the Cuban public Spanish<br />

NMF have been widely used in Text Mining. In<br />

this example will be presented the main ideas of<br />

another application of the NMF.<br />

.<br />

At the University of Havana, the<br />

Faculties of Mathematics and Computer Science<br />

and the Faculty of Arts and Letters work together<br />

in a project for the study of the Public Spanish of<br />

Cuba. To do this, a corpus with representative<br />

texts is created and studied. The corpus has been<br />

called CORESPUC and has 4 large groups of<br />

texts. The detection of main topics is one of the<br />

studies that are in develop right now. This<br />

application is the one most frequently reported in<br />

the literature.<br />

Although the NMF became well known<br />

with the works of Lee and Seung li , especially by<br />

its application to the Database of face images,<br />

the applications in Text Mining lii . In the same<br />

work, Lee and Seung reported an application to<br />

the semantic analysis of documents with the<br />

same multiplicative rules presented as an<br />

algorithm for the images and applied it to a<br />

Database of 30,991 articles of the Grolier<br />

Encyclopedia. In this application, the count of<br />

occurrences of each of the words (15,276) that<br />

appeared in the vocabulary to form the matrix<br />

Y30,991x15,276 was performed.<br />

The experiment in development has as its<br />

first objective to obtain semantically related<br />

documents. For this, several sets of texts formed<br />

by letters to the Juventud Rebelde newspaper<br />

among others.<br />

As a second objective, the Biber<br />

Methodology liii is applied to study the registry<br />

variation in this set of texts. A register is<br />

characterized by a set of linguistic features. Once<br />

the linguists define the features to be taken into<br />

account, the aforementioned methodology is<br />

applied. In the literature, a factorial analysis of<br />

the covariance matrix is applied. In this research<br />

an NMF of this matrix is looked for and its<br />

effectiveness on the proposal of the literature is<br />

studied. ■

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!