How to use FSBforecast Excel add-in for regression analysis

How to use FSBforecast 

Excel add‐in for regression analysis 

FSBforecast is an Excel add‐in for data analysis and regression that was developed here at the Fuqua School of 

Business over the last 3 years by faculty members who teach statistics, in collaboration with Professor John Butler 

at the University of Texas. See the separate handout on “How to install and uninstall FSBforecast” for details on 

how to install or update it. After it has been installed, you should see FSBforecast appear on the main menu bar 

in Excel whenever you use it. If you click on the FSBforecast tab, a toolbar will appear with the following options: 

FSforecast is very simple to use—this handout contains about all you need to know. The examples shown here 

were created from the accompanying file called FSBforecast_car_data.xlsx that contains data on makes and 

models of cars sold in the U.S. in 1993. To obtain this file, go to the Decision 411 course software web page, click 

on the “FSBforecast_car_data_file” link, then click the Extract button on the Winzip toolbar to extract the Excel file 

to a directory of your choice. Then open it from there using Excel after FSBforecast has been installed. (A second 

file containing the completed analysis, called FSBforecast_car_data_with_analysis.xlsx, is also available there.) 

Data definitions: FSBforecast expects your variables to reside in named ranges in Excel. Variables which are to be 

used in the same analysis should all be the same length, and the best approach is to organize them on a single 

data worksheet in consecutive columns with their names in the first row. For example, here is a picture of a 

portion of the sample file, which is arranged in this format. Note that text labels (to be used as variable names) 

appear in row 1 and the data appears in subsequent rows. Only a portion of this file is shown here. Overall it has 

15 columns and 93 rows of data. 

Variables are defined as named ranges in Excel. They 

can be located anywhere in a workbook, but it is usually 

best to organize them in a single table on a single data 

worksheet with variable names in the first row. 

To assign the text labels in row 1 as range names for the data in the rows below, proceed as follows: 

1. Select the entire data area (including the top row with the names) by positioning the cursor on cell A1 

and then holding down the Shift key while hitting the End key and then the Home key, i.e.,“Shift‐End‐ 

Home.” Caution: check to be sure that the lower right corner of the selected (blue) area is really the 

lower right corner of the data area. Sometimes this automatic method of selecting a range grabs an area 

with blank rows or columns or even the entire worksheet. If that happens, you will need to select the 

area “manually” by clicking and dragging the cursor to the bottom‐right data value. 

1

2. Hit the Create‐From‐Selection button on the FSBforecast menu and check (only) the “Top row” box in the 

dialog box. 

To define the variables for analysis, 

highlight the table of data (including the 

first row with the variable names) and hit 

the “Create From Selection” button. Check 

only the “Top row” box for creating names. 

You can have any number of named ranges in your workbook, although you cannot use more than 50 variables at 

one time in the Data Analysis or Regression procedures. You can have up to 32,000 rows of data, although the 

graphs will take a long time to draw if you have a huge number of rows, and the row limit is somewhat less for 

regressions with large numbers of variables. A 50‐variable regression is limited to about 18,000 rows. The 

regression procedure has a “brief‐output” mode that suppresses some of the chart output to speed up the 

analysis of large data sets and keep file sizes from getting too large when many models are fitted. In brief‐output 

mode, a regression with 50 variables and 18,000 rows of data will run in about 30 seconds on most PC’s, which is 

as fast or faster than most other regression software such as SPSS. 

Data analysis: The Data Analysis procedure provides descriptive statistics, correlations, series plots, and 

scatterplots for a selected group of variables. Simply click the Data Analysis button on the FSBforecast toolbar and 

check the boxes for the variables you wish to include. The variable list that you see will only include variables 

containing at least some rows of numeric data. In this example, the variables Make and Type do not appear on 

the list of variables available for analysis because they have only text values. Model does appear because a few of 

its values are numeric (e.g., for the Audi 90 and 100 models), but you would not choose it for analysis. 

2 

In the Data Analysis procedure, select 

the variables you want to analyze and 

choose the plot options.

If you check the Show Series Plots box, you will also get a plot of each variable versus row number. We 

recommend that you always ask for series plots in at least one of your data analysis runs, no matter how large the 

data set. These plots give you a visual impression of each variable by itself and are vitally important if the 

variables are time series (although in this example they are not). If your variables are time series (i.e., 

measurements of the same quantities performed at different periods in time and arranged in chronological order), 

then you should check the Time Series Data box. This will provide an additional table of statistics, namely the 

autocorrelations of the variables, i.e., their correlations with their own prior values, going back as far as 12 periods 

into the past depending on the amount of history available. Also, the series plots are drawn with connecting 

lines when the Time Series box is checked. 

Here is a picture of the top portion of the Data Analysis report for the variables selected above, showing the 

descriptive statistics and series plots. (Only two of the 7 series plots in this analysis are shown.) Notice that the 

Cylinders variable has only a small number of possible values and they are all integers (4, 5, 6, 8), and there are 

only two cars with 5 cylinders and only seven cars with 8 cylinders in the sample. This is an example of the 

properties of your data that you can clearly see when you look at the series plots. 

The results of running the procedure are stored 

on a new worksheet. Descriptive stats and 

optional series plots appear at the top. If the 

“Time series data” box is checked, you also get a 

table of autocorrelations and the series plots 

have connecting lines. 

Sample sizes may vary if any values are missing: Be aware that on any given run the data analysis procedure 

ignores rows where any of the selected variables have missing values or text values, so that the sample size is the 

same for all the variables. (In some data files, missing values may be coded as text labels such as “NA”, meaning 

“not available.”) This means that the sample sizes and the values of the sample statistics may vary from one data 

analysis run to another if you add or drop variables that have missing or text values in different positions. If the 

sample size (“Count”) is less than you expected or if it varies from one run to another, you should look carefully at 

the data matrix to see if there are unsuspected missing or text values scattered around among the variables. In 

this data set, if you choose Model as one of the variables to be analyzed, you will only get a sample size of 7, 

because there are only 7 cars whose model names consist of numbers (like the Audi 90 and 100). 

The reason for following this convention is that it keeps the data analysis sheet in synch with a regression model 

sheet that uses the same set of variables—e.g., the correlation matrix on both sheets will be the same. When 

3

fitting a regression model, only rows of data in which all the chosen dependent and independent variables have 

numeric values can be used to estimate the model. 

Correlation and scatterplot matrices: The Data Analysis procedure always shows you the correlation matrix of 

the selected variables (i.e., all correlations between one variable and another), because correlations are the key 

statistics that are used to measure linear relationships among variables. If you check the Show Scatter Plots box 

when running the Data Analysis procedure you will also get a matrix of all 2‐way scatterplots, which is the visual 

counterpart of the correlation matrix. The scatter plots may take some time to draw if you choose to analyze a 

large number of variables at once (e.g., 15 or more) and there are also many rows of data (e.g., 1000 or more). If 

you run the procedure and select n variables, you will get n 2 plots, and they are drawn at the rate of several per 

second (faster or slower depending on the number of rows of data). If you try this with 50 variables, you will get 

2500 scatterplots on a single worksheet. The result is impressive to look at, but you may wait a while for it! Here 

is a picture of what the output looks like when only 3 variables are chosen: 

The correlation matrix is displayed farther down 

on the Data Analysis worksheet, and there is an 

option to generate a full matrix of all 2‐way 

scatterplots. 

Any of the individual scatterplots can be enlarged by pulling on its corners, and it can be copied and pasted to 

another worksheet or to a Word or Powerpoint document and re‐formatted there as well. The same is true of all 

chart output in FSBforecast. 

Note that in these plots, the relationship between MPG_City and the two other variables appears to be somewhat 

nonlinear, i.e., the points appear to be distributed around a curved line rather than a straight line. Other patterns 

you might (or might not) observe in a scatterplot are extreme values of some variables (“outliers”), which may or 

may not line up with extreme values of other variables, or clusters of points along the edges or in the corners of 

some plots. These sorts of patterns can present challenges for fitting models that assume linear relationships and 

normally distributed errors. Sometimes transformations of variables are needed to “straighten things out.” 

4

Regression: The Regression procedure fits multiple regression models and allows them to be easily compared 

side‐by‐side. Just hit the Regression button and select the dependent variable you want to use and check the 

boxes for the independent variables from which you wish to predict it, then hit the “Run” button. Consecutive 

models are named “Model 1”, “Model 2”, etc., by default, but you can also enter a name of your choice in the 

Model Name box before hitting “Run”, and the custom name will be used to label all of the output. 

To run a regression, select the dependent variable and 

then check the boxes for the independent variables 

you wish to include, and hit the “Run” button. 

A model can have up to 50 independent variables and 

over 18,000 rows of data. 

If you also check the Brief Output box, then some of the usual regression output‐‐‐the normal probability plot, the 

descriptive statistics and plots of the individual variables, the residuals‐vs‐independent‐variable plots, and the 

residual table—will not be included on the model worksheet. These take a large amount of time and space to 

produce compared to the rest of the standard output. If you have relatively large numbers of independent 

variables (say, a dozen or more) and/or relatively large numbers of rows (say, 500 or more), you may wish to ask 

for brief output when first running a model. Brief output will give you more compact model sheets, and it will 

also cut down on the time needed to re‐draw plots with large numbers of points when you scroll up and down the 

sheet. Once you have identified a promising‐looking model for a large data set, you can re‐run it with full output 

for a more complete picture. Brief‐output mode will also keep the file size more manageable if you fit a large 

number of models in one workbook. It is easy to end up with file sizes of 10M or 20M or more if you run a lot of 

full‐output regressions with many variables and many rows of data. 

If all your variables consist of time series (i.e., variables whose values are ordered in time, such as daily or weekly 

or monthly or annual observations of some quantities), then you should also check the Time Series Data box. This 

will provide additional model statistics that are relevant only for time series, such as autocorrelations of the 

residuals, which reveal whether there are unexplained time patterns. 

5

There is also a Set Intercept to 0 option, which forces the intercept to be zero in the equation. In the special case 

of a simple (1‐variable) regression model, this means that the regression line is a straight line that passes through 

the origin, i.e., the point (0, 0) in the X‐Y plane. If you use this option, values for R‐squared and adjusted R‐ 

squared are not computed, because they do not have the same meaning for a model that does not include an 

intercept and there is no universally accepted way of defining them in this situation. 

The model sheet: The regression results for each model are stored on a new worksheet whose name is whatever 

model name was entered in the name box on the regression input panel when the model was run (either a default 

name such as “Model n” or a custom name of your choice). Here is a picture of a portion of the regression output 

which appears at the top of the model sheet. More tables and charts will appear below it. 

The results of running each model are 

stored on a new worksheet. At the top 

of the sheet the variables are listed 

and the model equation is printed out 

as a text string, suitable for copying 

and pasting into a report. 

The usual tables of regression 

model statistics, coefficient 

estimates, and significance tests 

appear below… 

…followed by a table of residual distribution statistics that includes the Anderson‐Darling 

test for a non‐normal error distribution and the size and location of the largestmagnitude 

residual. If the “Time series data” box was checked, a table of residual 

autocorrelations and tests of their significance are also shown. 

It is easy to refine an existing model by adding or removing variables. If you hit the Regression button while 

positioned on an existing model worksheet, the variable specifications for that model are the starting point for 

specifying the next model. You can add or remove a variable relative to that model by checking or unchecking a 

single box. 

6

Charts appear farther down on the model sheet. The output always includes a chart of actual and predicted 

values vs. observation number, residuals vs. observation number, residual histogram plot, residuals vs. predicted 

values, and a line fit plot in the case of a simple (1‐variable) regression model. Forecasts, if any were produced, 

are shown in a table and also plotted. “Full” output, which is the default, also includes a normal probability plot 

and plots of residuals vs. each of the independent variables and dependent variable vs. each of the independent 

variables. On the worksheet the charts are all arranged one above the other, not side‐by‐side as shown here, and 

the charts and tables are sized to be printable at 100% scaling on 8.5” wide paper. The default print area is preset 

to include all pages of output, so the entire output is printable on standard‐width paper with a few keystrokes, 

leaving a complete audit trail on paper. However, for presentation purposes, it is usually best to copy and paste 

individual charts and tables to other documents, as discussed later. 

All table and chart titles include the model name 

and the name of the dependent variable to leave an 

audit trail if they are copied and pasted to reports. 

At the very bottom of the model sheet is a table 

that shows actual and predicted values, residuals, 

and standardized residuals for all rows in the data 

file. The table is sorted in descending order of 

absolute values of the residuals, so that “outliers” 

appear at the top. 

Forecasting: If you wish to generate forecasts from your fitted regression models, there are two ways to do it in 

FSBforecast: “manually” and “automatically.” In the manual approach, define your variables so that they contain 

only the sample data to be used for estimating the model, not the data to be used for forecasting. Then, after 

fitting a regression model, scroll down to the line on the worksheet that says “Forecasts: Dep. Var. = etc.”, and 

click the + in the left sidebar of the sheet to maximize (i.e., open up) the forecast table. Then type (or copy‐and‐ 

7

paste) values for the independent variables into the cells at the right end of the forecast row, as in the shaded cells 

in the table below, and then click the Forecasting button. The forecast and its confidence limits will then be 

computed and displayed in the cells to the left. Two plots of the forecasts are also produced. The first one shows 

only the forecast(s), together with 95% confidence limits for both means and forecasts. (A 95% confidence 

interval for the mean is a confidence interval for the true height of the regression line for given values of the 

independent variables. A 95% confidence interval for the forecast is a confidence interval for a prediction that is 

based on the regression line. The latter confidence interval also takes into account the unexplained variations of 

the data around the regression line, so it is wider.) The second plot shows the actual and predicted values from 

the sample to which the model was fitted, together with the forecasts and 95% confidence intervals for forecasts. 

(The latter plot is always produced, even if there are no forecasts.) 

How to generate forecasts “manually”: 

enter values for the independent 

variables in one or more rows at the 

right end of the forecast table, below 

the variable names, then hit the 

Forecasting button on the toolbar. 

The forecasts and confidence limits will 

be displayed at the left end of the same 

row(s), and they will also be plotted. 

In the automatic forecasting approach, which is more systematic and more suitable for generating many forecasts 

at once, define your variables up front so that they include rows for out‐of‐sample data from which forecasts are 

to be computed later. FSBforecast will automatically generate forecasts for any rows where all of the independent 

variables have values but the dependent variable is missing (i.e., has a blank cell). The variables must all be 

ranges with the same length, but the dependent variable will have some empty cells at the bottom or elsewhere. 

The advantage of this approach is that you only need to enter the forecast data once, at the time the data file is 

first created, and it will automatically be transformed if you apply any data transformations to the same variables 

later. Also, when using this method it is possible for forecasts to be generated in the middle of the data set if 

missing values of the dependent variable happen to occur there. The file used in the example above contains an 

extra row of data at the bottom for a “hypothetical car” whose mileage is to be predicted. It has values for all the 

numeric variables other than MPG_City, so any model fitted to MPG_City will generate a forecast for this row 

automatically, without the need for you to type values for the independent variables in the forecast table. Only 

one forecast is shown in this example, but you can generate any number of forecasts in this way by including 

8

additional rows with out‐of‐sample data for the independent variables. You can also use this feature to do out‐ofsample 

testing of a model by removing the values of the dependent variable from a large block of rows and then 

comparing the forecasts to the actual values afterward. 

A forecast is also generated automatically for any 

row of data where the dependent variable is 

missing and all independent variables are present. 

Viewing tables and charts in your regression output: Each model worksheet provides a number of standard 

tables and charts, and they can be maximized or minimized by clicking the +’s or –’s on the left sidebar of the 

worksheet. At the time you run the model you have the option for “full” regression output (which is the default) 

or “brief” output (which you get by checking the box). If you allow full output to be produced, much of it will be 

minimized to start with, and you will need to go down the left sidebar of the sheet checking the +’s to see the 

complete results. As noted earlier, full output includes scatterplots of the dependent variable versus each of the 

independent variables and plots of the residuals versus each of the independent variables. These are all 

minimized by default because they take up a lot of room when there are many variables. Full output also includes 

a normal probability plot (a diagnostic test for normally distributed errors) as well as the usual histogram plot of 

the residuals. In the special case of a simple regression model, you also get a line fit plot (the regression line and 

confidence bands around it) in both brief‐output and full‐output mode. See the last page of this handout for an 

example. 

Choosing the output to display: click the “‐” 

symbol to minimize (hide) a table or chart and click 

“+“ to maximize (unhide) it. 

Model summary worksheet: An innovative feature of FSBforecast is that it maintains a separate “Model 

Summary” worksheet that shows side‐by‐side summary statistics and model coefficients for all regression models 

that have been fitted in the same workbook. This allows easy comparison of models, and it also provides an 

“audit trail” for all of the models you have fitted so far. Here’s an example of the model summary worksheet that 

was obtained after fitting two more models in which less‐significant variables were successively removed: 

9

Model statistics and coefficients 

are compared side‐by‐side on the 

Model Comparison worksheet. 

This sheet also provides an audit 

trail of your work. Each model is 

time‐and‐date‐stamped. 

Variable Transformations: At any stage in your analysis you can create new variables in additional columns by 

entering and copying your own Excel formulas and assigning range names to the results. However, there is also a 

Variable Transformations option on the Regression panel that allows you to easily create new variables by 

applying standard transformations to your existing variables such as the natural log transformation or exponential 

or power transformations. The transformed variables are automatically assigned descriptive names, such as X_LN 

(natural log of X). 

The “Variable Transformation” tool 

can be used to create additional 

variables from transformations of 

the existing ones. 

10

In the data set shown here, the relationship between miles‐per‐gallon and some of the other variables looks 

somewhat nonlinear on the scatterplots, as pointed out earlier. Perhaps it would be better to predict gallons‐permile 

as the dependent variable? The MPG_City variable can be transformed into units of gallons per mile by 

raising it to the power of negative‐1, as shown in the dialog box below. 

Basic variable transformation options: 

natural log, exponential, power, 

plus/minus/times/divided‐by (“f(x)”), and 

creation of dummy variables for integer 

or categorical data. 

The transformed variable will be automatically assigned the name MPG_City_POWneg1, and it will show up next 

to the original variable in the alphabetical list of variable names in the dialog boxes: 

You could also assign a less‐geeky name to the variable (e.g., GallonsPerMile) by using the Name Manager to 

change its name. To change the name of a variable, click the Formulas tab on the Excel main menu, then click the 

Name Manager button, then click on the variable whose name you want to change, then click the Edit button, and 

enter a new name for it in the Name box and hit OK. 

The “Make Dummy Variable” transformation can be used to create dummy (0‐1) variables from variables that 

consist either of numbers or text labels, including variables such as DriveTrain (front/rear/all) in this file. A 

separate dummy variable (with a name such as “DriveTrain_EQ_front”) will automatically be created for each 

distinct value of the input variable. 

11

If the Time Series Data box is checked on the regression input panel, then many additional transformations are 

available which are specific to time series, such as computing lagged values, or changes from one period to 

another, or percentage changes from one period to another, or adjusting for inflation using a fixed rate of 

deflation: 

Additional transformations that are 

specific to time series data: lags, 

differences, and deflation. These are only 

available when the “Time Series Data” box 

is checked on the regression input panel. 

Scaling of variables: The coefficients in the regression equation and regression summary table are displayed in 

fixed format with 3 decimal places. Normally this is fine for a wide range of units of measurement, but if your 

dependent and independent variables are measured in units that are “poorly scaled” relative to each other (e.g. 

one measured in dollars and another measured in millions or billions of dollars), the coefficients may end up 

displaying as zeros in 3‐decimal‐place format because their estimated values are less than 0.0005, even though 

they are statistically significant. Keep in mind that the value of a regression coefficient is measured in “units of Y 

per unit of X”, whatever those units may be. If you are puzzled to find zeros or very small numbers in the model 

equation or table of regression coefficients, when the model otherwise seems reasonable, you should consider 

rescaling some of the variables. For example, if an independent variable has a coefficient that is displayed as zero 

despite being statistically significant (as indicated by a large t‐stat and a small P‐value), consider rescaling it in 

thousands of its original units, so that its values are smaller by a factor of 1000, which will increase its estimated 

coefficient by the same factor while leaving the t‐stat and P‐value unaffected. Alternatively, you might rescale the 

dependent variable so that its values are larger rather than smaller. In the car data example above, the 

coefficients of RevsPerMile and Weight were on the order of 0.002 and ‐0.008 respectively, so they were 

displayed with only one significant digit of precision. Some re‐scaling of variables might be helpful there. For 

example, you could create a new dependent variable called GallonsPer100Miles by multiplying GallonsPerMile by 

100. This would increase the values of all the estimated coefficients by a factor of 100, other things being equal. 

12

Displaying gridlines and column headings on the spreadsheet: By default the data analysis sheets and model 

sheets do not show gridlines and column headings, in order to make the data stand out more clearly. However, if 

you wish to turn them back on, you can do so by going to the “View” toolbar and clicking the boxes for “Gridlines” 

and/or “Headings.” This allows you to do things like changing column widths if necessary. 

Copying output to Word and Powerpoint files: The various tables and charts produced by FSBforecast have been 

designed in such a way that they can be easily copied to document files, and the table and chart titles all include 

the name of the dependent variable and the model name so that they can be traced back to their source. When 

copying and pasting a chart or table, there are several alternatives. On the Home tab, the pull‐down Paste menu 

has a row of icons for different formats as well as a “paste special” option. The icons give you a number of 

complicated options, e.g., tables can be pasted in a form that allows their contents to edited, and they can be 

given the same format as either their source or destination, and their contents can be merged into other tables. 

We suggest that you use the “picture” option, which is on the right end of the list of icons, or else choose “paste 

special” and then choose one of the picture formats (e.g., png or enhanced metalfile). This will paste the table or 

chart as an image whose contents cannot be edited. It can be scaled up and down in a way that will keep 

everything in proportion, and it will be secure against having its numbers changed (accidentally by you or 

deliberately by others) later on. Often charts can be made smaller without loss of readability or impact, and you 

should always consider doing this when preparing reports. 

For example, here is the line fit plot for a simple regression model pasted as a picture and scaled way down: 

55 

Line Fit Plot 

Dep. Var. = MPG_City, Model = Model 3 

MPG_City 

45 

35 

25 

15 

5 

1500 2000 2500 3000 3500 4000 4500 

Weight 

13 

Actual 

Upper 95%F 

Predicted 

Lower 95%F

How to use FSBforecast Excel add-in for regression analysis

Create successful ePaper yourself

Delete template?

Save as template?