22 2 Data structures2.3 Forming factors from variates or textsWe can use the Change Sheet or Column menu (Figure 2.7) to convert variates or texts intofactors. Alternatively, you can form a new factor with the Form Groups menu, which canbe opened by clicking on the Form Groups (Factors) option of the Data menu on the menubar. This menu uses the GROUPS directive.In the simplest form of GROUPS, you specify the identifier of the variate or text usingthe first parameter, DATA, and the identifier for the new factor using the FACTORparameter. GROUPS then forms a factor with a level for every distinct value of the variateor text. So, we could form a new factor Fumfac from the text Fumigant, with thecommandGROUPS Fumigant; FACTOR=FumfacYou can set option REDEFINE=yes if you want to change the variate or text itself tobecome a factor (any setting of the FACTOR parameter is then ignored). so we couldconvert Fumigant back into a factor with the commandGROUPS [REDEFINE=yes] FumigantAlternatively, you can divide the values of the variate or text into groups to berepresented by the factor. You can use the LIMITS option to specify the range of valuesfor each group. The limits vector is a text or a variate, depending whether the factor isbeing defined from a variate or a text; its values specify boundaries for the ranges. TheBOUNDARIES option controls whether these are regarded as upper or lower boundaries;by default BOUNDARIES=lower. You can also ask GROUPS itself to set limits that willpartition the units into groups of nearly equal size. You should then specify the NGROUPSoption and leave the LIMITS parameter unset. (If you give both LIMITS and NGROUPS,then NGROUPS is ignored.)If you are defining a factor from a variate VECTOR, the LMETHOD option controls howthe levels vector is formed. The default LMETHOD=median forms the levels from themedian of the units in each group. There are also settings to allow them to be formedfrom minima or maxima. With any of these settings (median, minimum or maximum)you can specify a variate, using the LEVELS parameter, to store the levels that areproduced; this can be done even if no factor is being formed, that is if no identifier issupplied for the factor by the FACTOR list. Alternatively, if you put LMETHOD=given,you can use the LEVELS parameter to supply your own levels. Finally, for LMETHOD=*,no levels are formed and any existing levels of the factor will be retained if they are stillappropriate; otherwise the levels will be the integers 1 upwards. With any of thesesettings, you can use the LABELS parameter to specify labels for the factor.Similar rules apply if you have a text VECTOR except that LMETHOD then governs howthe labels are defined for the factor, and LEVELS can be used to specify its levels. TheCASE option controls whether the case of the letters in the text strings is important. So,for example, if you set CASE=ignored the strings 'April' and 'april' will be putinto the same group. With the default, CASE=significant, they would form differentgroups.The LDIRECTION option controls the ordering of the levels (for a variate VECTOR) orthe labels (for a text VECTOR) when LMETHOD is set to median, minimum or maximum.By default, they are sorted into ascending order, but you can set LDIRECTION=givento take them in the order in which they occur in the VECTOR. This may be useful, forexample, if a text vector contains the names of days or of months in calendar order.You can set the DECIMALS option to request that the values of a variate VECTOR berounded to a particular number of decimal places before the groups are formed: forexample DECIMALS=0 would round each value to the nearest integer.
2.4 Other data structures2.4 Other data structures 23There are many other data structures available within GenStat, each with appropriateattributes. A single numerical value is stored within a scalar. A two dimensional arrayof data is contained in a matrix, and the two specialized forms of matrices (symmetric ordiagonal) can also be used. Numerical results of cross tabulations or analyses are storedin tables that are indexed by a number of classifying factors. Pointers store references to(i.e. "point" to) sets of other data structures. A dummy stores a reference to a single datastructure. Expressions define numerical calculations, and formulae define statisticalmodels.Full details about allof GenStat's datastructures are given inChapter 2 of the Guidet o t h e G e n S t a tCommand Language,Part 1 Syntax and DataManagement. To openthis within GenStat,click on the Syntax andData Management suboptionof the GenStatGuides option of the Help Figure 2.9menu on the menu bar,as shown in Figure 2.9.The Guide explainsthe syntax of thedirectives that "declare"(i.e. define) the varioustypes of data structure,and has exampleprograms to show theiruse. They can beaccessed using one ofthe examples menus,opened by clicking onthe Syntax and DataManagement sub-optionof the Examples optionof the Help menu on theFigure 2.10menu bar (Figure 2.10).