10.01.2015 Views

Package 'fpc' - open source solution for an Internet free, intelligent ...

Package 'fpc' - open source solution for an Internet free, intelligent ...

Package 'fpc' - open source solution for an Internet free, intelligent ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

28 clusterboot<br />

Arguments<br />

data<br />

B<br />

dist<strong>an</strong>ces<br />

bootmethod<br />

bscompare<br />

multipleboot<br />

jittertuning<br />

noisetuning<br />

something that c<strong>an</strong> be coerced into a matrix. The data matrix - either <strong>an</strong> n*p-data<br />

matrix (or data frame) or <strong>an</strong> n*n-dissimilarity matrix (or dist-object).<br />

integer. Number of resampling runs <strong>for</strong> each scheme, see bootmethod.<br />

logical. If TRUE, the data is interpreted as dissimilarity matrix. If data is a<br />

dist-object, dist<strong>an</strong>ces=TRUE automatically, otherwise dist<strong>an</strong>ces=FALSE by<br />

default. This me<strong>an</strong>s that you have to set it to TRUE m<strong>an</strong>ually if data is a dissimilarity<br />

matrix.<br />

vector of strings, defining the methods used <strong>for</strong> resampling. Possible methods:<br />

"boot": nonparametric bootstrap (precise behaviour is controlled by parameters<br />

bscompare <strong>an</strong>d multipleboot).<br />

"subset": selecting r<strong>an</strong>dom subsets from the dataset. Size determined by<br />

subtuning.<br />

"noise": replacing a certain percentage of the points by r<strong>an</strong>dom noise, see<br />

noisetuning.<br />

"jitter" add r<strong>an</strong>dom noise to all points, see jittertuning. (This didn’t per<strong>for</strong>m<br />

well in Hennig (2007), but you may w<strong>an</strong>t to get your own experience.)<br />

"bojit" nonparametric bootstrap first, <strong>an</strong>d then adding noise to the points, see<br />

jittertuning.<br />

Import<strong>an</strong>t: only the methods "boot" <strong>an</strong>d "subset" work with dissimilarity<br />

data!<br />

The results in Hennig (2007) indicate that "boot" is generally in<strong>for</strong>mative <strong>an</strong>d<br />

often quite similar to "subset" <strong>an</strong>d "bojit", while "noise" sometimes provides<br />

different in<strong>for</strong>mation. There<strong>for</strong>e the default (<strong>for</strong> dist<strong>an</strong>ces=FALSE) is to<br />

use "boot" <strong>an</strong>d "noise". However, some clustering methods may have problems<br />

with multiple points, which c<strong>an</strong> be solved by using "bojit" or "subset"<br />

instead of "boot" or by multipleboot=FALSE below.<br />

logical. If TRUE, multiple points in the bootstrap sample are taken into account to<br />

compute the Jaccard similarity to the original clusters (which are represented by<br />

their "bootstrap versions", i.e., the points of the original cluster which also occur<br />

in the bootstrap sample). If a point was drawn more th<strong>an</strong> once, it is in the "bootstrap<br />

version" of the original cluster more th<strong>an</strong> once, too, if bscompare=TRUE.<br />

Otherwise (default) multiple points are ignored <strong>for</strong> the computation of the Jaccard<br />

similarities. If multipleboot=FALSE, it doesn’t make a difference.<br />

logical. If FALSE, all points drawn more th<strong>an</strong> once in the bootstrap draw are only<br />

used once in the bootstrap samples.<br />

positive numeric. Tuning <strong>for</strong> the "jitter"-method. The noise distribution <strong>for</strong><br />

jittering is a normal distribution with zero me<strong>an</strong>. The covari<strong>an</strong>ce matrix has the<br />

same Eigenvectors as that of the original data set, but the st<strong>an</strong>dard deviation<br />

along the principal directions is determined by the jittertuning-qu<strong>an</strong>tile of<br />

the dist<strong>an</strong>ces between neighboring points projected along these directions.<br />

A vector of two positive numerics. Tuning <strong>for</strong> the "noise"-method. The first<br />

component determines the probability that a point is replaced by noise. Noise<br />

is generated by a uni<strong>for</strong>m distribution on a hyperrect<strong>an</strong>gle along the principal directions<br />

of the original data set, r<strong>an</strong>ging from -noisetuning[2] to noisetuning[2]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!