Dupree Fuels Company sells heating oil to residential customers. The company wants to guarantee to its customers that they will not run out of heating oil at any time during the winter months. The company has pledged to its customers that if they run out of heating oil during the winter months, they will receive 50 gallons of heating oil at no cost. The problem facing Dupree is how can it predict customer heating oil usage so that refills of heating oil can be delivered before customers actually run out of oil to heat their homes.
In order for Dupree to be able to predict customer heating oil usage, it is important to understand some of the factors that affect heating usage during winter months. Not surprisingly, one of the main factors that affect the level of heating oil usage during the winter months is temperature. During periods of the winter in which temperatures become colder than normal, people are more likely to use larger amounts of heating oil (Eydeland and Wolyniec 3). In this regard, Dupree can determine the average temperature for the winter months in the area that it serves. Then, as temperatures fall below the average temperature for the region of the country in which
its customers are located, predictions may be able to be made about changes in heating oil consumption.
However, it is not temperature alone that impacts the amount of heating oil that residential customers use during winter months. Another important factor that impacts heating oil consumption is the energy efficiency of homes. Older homes and larger homes are less likely to be energy efficient, meaning that they do not hold heat, which forces customers to use more heating oil to maintain a constant temperature (Husher 265). Furthermore, the type of insulation that is used in homes and even the type of furnace system that is in the home can work together to impact the efficiency of not only heating a home, but also the efficiency of maintaining the heat in the home once it is produced (Kruger and Seville 15). Homes that are poorly insulated lose more of the heat in the home to the outside environment. At the same time, older furnace units are not able to produce heat for the home with the same level of efficiency as newer units. The result is that more heating oil is needed by older furnace units to produce the same level of heat for the home as newer furnace units,
Based on these factors, Dupree Fuels Company needs to obtain information from customers about their heating oil usage in relation to temperature and in relation to the energy efficiency of their homes and their home heating systems. The problem for the company is the ability to predict the amount of heating oil that will be used. These data should allow the company to adequately predict the amount of heating oil that its customers will use so that heating oil tanks will not be depleted before a delivery of additional heating oil occurs. Data Description
The data that have been made available by the company is from a sample of 40 residential customers. From the sample of 40 residential customers, data regarding four variables have been collected. The first variable is the number of gallons of heating oil that were used over a given period of time. The second variable is the number of degree days since oil tanks for the customers were refilled. The number of degree days is the difference
between the average daily temperature and 68 degrees Fahrenheit. This variable provides a means to determine the number of days that the average daily temperature was below what is considered to be normal for the use of heating systems in relation to the amount of time in which a tank of heating oil has been used.
A third variable included in the data collected from the sample was the number of people living in the home. The reason for collecting data on the number of people living in a customers home is that it is assumed that as more people live in a home, the amount of hot water that will be used will increase, which, in turn, will increase the amount of heating oil that is used. Finally, the fourth variable in the data collected from the customers in the sample is a home factor, which is a determination by company staff members of a composite index that takes into account home size, age of the home, exposure to wind, level of insulation, and furnace type. The home factor index ranges from 1 to 5, with 1 indicating a lower level of energy consumption and 5 indicating a higher level of energy consumption based on the characteristics of the home. Data Characteristics
Table 1 shows the descriptive statistics for the data collected from the sample of residential customers. The mean oil usage was 218.05 gallons, while the number of degree days between tank refills was 633.38. In addition, the mean home index for the customers in the sample was 2.75, which indicates that the homes of the customers had a high level of energy efficiency, and the mean number of people in each of the homes was 4.35. Table 1:Descriptive Statistics
| Mean| Standard Deviation| Minimum| Maximum|
Oil Usage| 218.05| 176.70| 7| 679|
Degree Days| 633.38| 381.38| 54| 1464|
Home Index| 2.75| 1.43| 1| 5|
Number People| 4.35| 1.31| 1| 7|
One of the important assumptions in using data to predict some type of outcome, in this case, the prediction of the amount of heating oil used by
customers of Dupree Fuels Company, is a lack of multicollinearity. When the variables that are used to predict an outcome have a high level of multicollinearity, which means that they are highly correlated with each other, it is difficult to make predictions with a high level of accuracy because as one variable changes, another variable with a high level of correlation will change in the same way. Table 2 shows the correlation coefficients for the three predictor variables included in the dataset. The correlation coefficients between the three predicator variables are very low. Multicollinearity does not exist between the predictor variables included in the dataset. Table 2:Correlation Coefficients of Predictor Variables
| Degree Days| Home Index| Number People|
Degree Days| 1| | |
Home Index| -0.06| 1| |
Number People| -0.14| -0.06| 1|
In order to make reliable predictions using a linear regression model, several assumptions must be met regarding the distribution of dependent variables in relation to independent variables, the value of error in the model, and the variance of the dependent variable in relation to the values of the independent variables. The first assumption that must be met is that for any of the values of the independent variables, the dependent variable is normally distributed, which is known as normality. One of the ways in which to determine normality for the linear regression model is to create a plot of the residuals of the variables included in the model. If the distribution of the residuals is fairly close zero, then normality can be assumed.
Figure 1 shows the normality plot of the residuals for the linear regression model to be examined to predict customer heating oil usage based on number of degree days and the home index value of their homes. The black line on the graph indicates the zero error of the residuals. Based on the distribution of the residuals, it appears that normality can be assumed for
the model because the residuals are evenly distributed around the zero point. Figure 1:Normal Probability Plot
The second assumption that must be met for the linear regression model is that the error value for the model must be zero or as least very close to zero. The mean error values for the values of the independent variables in the model are close to zero. The assumption is determined to be met. The third assumption that is necessary to perform for the linear regression model is that the variance of the dependent variable must be constant for any value of the independent variables. In the model that is being proposed, the two independent variables are degree days and home index. Figure 2 shows the residual plot for degree days. It should be noted that the variance of the residuals of the variable are fairly linear in nature, which is an indication that the assumption of constant variance is met. There are some outliers that appear in the graph, particularly at the higher end of the range of degree days. However, it does not seem that there are so many outliers that they need to be removed. Figure 4 shows the residual plot for the home index variable. The residuals of the home index variable appear to be constant. Overall, the assumption of constant variance for the independent variables has been met.
Figure 2Degree Days Residual Plot
Figure 3:Home Index Residual Plot
One other issue that must be discussed in relation to the data is the limits that exist with regards to the predictions that can be made. If further analysis of the data allows for the determination that Dupree Fuels Company can predict customer heating oil usage, there is a limit to the predicts that can be made. Predictions can only be made within the range of data that are present in the sample. For example, the range of oil usage within the sample of residential customers is between 7 gallons and 679 gallons, so any predictions about oil usage in relation to degree days or home index would have to be in this range. Attempting to make predictions about
smaller amounts of oil usage or larger amounts of oil usage would not be appropriate because those limits have not been taken into account within the model. The same thing is true regarding the independent or predicator variables included in the model. Understanding and recognizing the of the data is important for any predictions that are made. Statistical Methodology
Two tools were used in performing the statistical analysis and creating the tables that are part of this report. The statistical tests included in this report have been performed using the SPSS statistical package. SPSS is a widely used statistical software package that provides the ability to run a variety of statistical tests, including a wide range of regression models. The graphs and charts in the report were created using Microsoft Excel. While charts and graphs can be created directly within the SPSS software, they are often difficult to read, particularly for those who are not familiar with the software. By using Microsoft Excel, it is possible to create the charts and graphs that are not only easy to read and understand, but also to create charts and graphs that can provide additional meaning to the explanations that are provided to those who may not be completely familiar with statistical analysis or specific statistical tests. Models
In order to ensure that the best model is created from which Dupree Fuels Company can predict home heating oil usage, two linear regression analyses are performed. The first model that is explained is the model in which degree days and home index are used as independent variables to predict the dependent variable of oil usage. This model is based on the background information that has been discussed regarding the factors that generally impact heating oil usage during the winter months. A second model analysis is performed in which SPSS calculates all of the variables included in the data collected from the residential sample. The software has the ability to determine which of the variables best predict heating oil usage while removing those variables that are not significant predictors. Any differences in the variables included in the second model that represents the best fit of the data will be discussed. Results and Conclusions
The results of the linear regression model in which the dependent variable of oil usage was regressed on the independent variables of degree days and home index is shown in table 3. Before discussing the actual significance of the independent variables included in the model, it is necessary to determine if the actual model is significant with regards to how well the regression model fits the data in the sample. The null hypothesis for the model fit is that the independent variables do not fit the model. The alternative hypothesis is that the independent variables in the do fit the model. The F-statistic and the corresponding significance value of the F-statistic located in the ANOVA portion of the table indicate that the null hypothesis can be rejected as the significance value is less than 0.05. The alternative hypothesis is accepted that the regression model does fit the data in the sample.
Table 3:Linear Regression Results
Regression Statistics| | | | | | | |
Multiple R| 0.88| | | | | | | |
R Square| 0.78| | | | | | | |
Adjusted R Square| 0.77| | | | | | | |
Standard Error| 84.60| | | | | | | |
Observations| 40.00| | | | | | | |
| | | | | | | | |
ANOVA| | | | | | | | |
| df| SS| MS| F| Significance F| | | |
Regression| 2.00| 952922.19| 476461.09| 66.58| 5.50809E-13| | | | Residual| 37.00| 264785.71| 7156.37| | | | | |
Total| 39.00| 1217707.90| | | | | | |
| | | | | | | | |
| Coefficients| Standard Error| t Stat| P-value| Lower 95%| Upper 95%| Lower 95.0%| Upper 95.0%| Intercept| -192.82| 38.04| -5.07| 0.00| -269.90| -115.73| -269.90| -115.73| Degree Days| 0.27| 0.04| 7.66| 0.00| 0.20| 0.34| 0.20| 0.34| Home Index| 86.65| 9.51| 9.11| 0.00| 67.38| 105.91| 67.38| 105.91|
Next, the significance of the independent variables can be examined. The null hypothesis for each of the independent variables is that they are not significant predictors of the dependent variable. The alternative hypothesis is that the independent variables are significant predictors of the dependent variable. The t-statistic and corresponding p-value for Degree Days allows for the rejection of the null hypothesis and the acceptance of the alternative hypothesis because the p-value is less than 0.05. The same is true for the home index variable as the p-value of the t-statistic for the variable is less than 0.05.
The regression coefficients for the two independent variables are used in the prediction process. The coefficients for degree days and home index are both positive, which means that as the number of days in which the temperature is below 68 degree Fahrenheit and as the home index increases, which is an indication of reduced energy efficiency and energy characteristics of the home, then the heating oil usage increases.
One other part of the results of the linear regression model that was created that is important in interpreting and using the model is the r-squared value, which is the amount of variance in the dependent variable that is explained by the independent variables. The model that has been created has an adjusted r-squared value of 0.87. This means that 87% of the variance in the dependent variable of heating oil usage is explained by the independent variables of degree days and home index. The high level of variance in the dependent variable that is explained by the independent variables in the model provides a strong indication that the model can be used for prediction purposes.
However, in order to determine if the model that has been proposed is indeed the best model that can be created from the data collected from the sample of residential customers, another linear regression analysis is performed in which all of the independent variables are examined and only those that are significant predictors of the dependent variable are included. The results of this linear regression analysis indicate that the model that was originally proposed was indeed the best model from the data that were
available. The variables of degree days and home index are significant predictors of heating oil usage, while the number of people in the home was not a significant predictor of heating oil usage in any of the models that were considered. Forecasts and Usage
With the determination that the linear regression model that was created can be used for prediction purposes, it is necessary to forecast home heating oil usage for residential customers based on the number of days in which the temperature is below 68 degrees Fahrenheit and the energy efficiency of the home as measured by the home index. The coefficients of the independent variables found from the results of the regression analysis allows for a prediction equation of oil usage = -192.82 + 0.27(degree days) + 86.65(home index). However, because of the use of computer software, it is possible to create forecasts for average conditions that might occur in the region that is served by Dupree Fuels Company.
One of the ways in which heating oil usage forecasts might be created would be to consider a variety of days in which the average temperature is below 68 degrees Fahrenheit, as well as the average energy efficiency of the customers homes served by the company. Unless the company is willing to have staff members assess each home that is served and determine a home index value, an average home index value can be used. In the sample of 40 residential customers that was examined, the mean home index value was 2.75. This value can be used to forecast heating oil usage based on a range of degree days.
Table 3 shows the forecasted heating oil usage for 50 degree days to 600 degree days in 50 day increments based on a home index value of 2.75. From the linear regression model that was created, it is possible to predict that for 50 days in which the average temperature is below 68 degrees Fahrenheit, a customer is forecasted to use 58.97 gallons of heating oil. At 250 days in which the average temperature is below 68 degrees Fahrenheit, a customer is forecasted to use 112.97 gallons of heating oil. In the event of 600 degree days of temperatures below 68 degrees Fahrenheit, it is forecasted that a customer will use 207.47 gallons of heating oil.
Table 3:Forecasted Oil Usage
Degree Days| Home Index| Oil Usage|
50| 2.75| 58.97|
100| 2.75| 72.47|
150| 2.75| 85.97|
200| 2.75| 99.47|
250| 2.75| 112.97|
300| 2.75| 126.47|
350| 2.75| 139.97|
400| 2.75| 153.47|
450| 2.75| 166.97|
500| 2.75| 180.47|
550| 2.75| 193.97|
600| 2.75| 207.47|
With the forecast about heating oil usage based on degree days and home index, the company can know how much heating oil customers will use in relation to the size of the tanks they have. The company can then plan when it needs to refill the tanks of customers so that the customers do not experience any point of not having heating oil for their furnaces. At the same time, by knowing the size of the tanks of customers, the company can also refill heating oil tanks when they are projected to fall to a certain level. For example, a customer may have a heating oil tank that holds 100 gallons. If the average degree days since the last fill of the tank has been 50, then the company can forecast that the customer has about 41 gallons of heating oil remaining. Furthermore, based on the forecast shown in table 3, the company can know that after 150 degree days since the last fill up, the customer with the 100 gallon tank likely has about 14 gallons remaining, which might be an indication that a fill up is necessary. Improvements and Suggestions to the Model
It is useful to consider improvements that could be made to the model. One of the improvements that could be made to the accuracy of the predictions made by the model would be to have a larger sample size. It is likely that
Dupree Fuels Company has a large number of customers. A sample of only 40 customers is probably a very small percentage of its total customers. By collecting the same data that have been collected but from a larger sample, perhaps 300 or even 500 homes, the results of the linear regression analysis could be more precise. This would mean that the specific forecasts made from the model would be more precise.
Another improvement that could be made to the model would be to break down the variables included in the home index. The home index variable was a subjective assessment by company employees. Rather than subjectively examining the conditions of the homes of the customers in the sample, information such as the age of the furnace, the size of the home, and even the type of insulation in the home could be collected. By examining the variables individually, the regression analysis might show that specific variables related to the energy efficiency of the home are better at making predictions about heating oil usage than others. Even more, more using actual data rather than a subjective index, the results of the linear regression model might be more precise, which would make the forecasts generated from those results more precise.
One other improvement that could be made to the model would be to consider if the location of the customers might be a significant predictor of heating oil usage. It might be that customers that live in townhouses or row houses are impacted by cold temperatures and home characteristics than customers that live in single family homes. In addition, whether the homes are located in metropolitan areas as compared to suburban or rural areas might also impact heating oil usage because of exposure to wind and other conditions while townhouses and city residences might be more protected from wind and other natural elements. While these variables might not be significant predictors of heating oil usage, the ability to examine them within a model would be useful for ensuring that the model created for forecasting heating oil usage is as strong as possible.
Beyond ways in which additional variables might be considered to improve the model, it is suggested that certain variables be avoided. In the data that
were provided, the variable of the number of people living in the homes was not a significant predicting of heating oil usage in any of the models that were examined. This suggests that factors such related to the people living in a home are not as important as the factors related to the conditions of a home with regards to heating oil usage. As the company seeks additional information from which to improve the model for predicting heating oil usage, it should focus on conditions of homes and external conditions as opposed to the living conditions or habits of the people in the home.
Finally, it is suggested that the company actually examine how well the forecast that has been created works in real life. If the company finds that some customers are still emptying their heating oil tanks before a refill occurs, then this would allow for an examination of the particular customer data so that a reason for not identifying that the tank was low could be determined. The model that was created explained 87% of the variance in heating oil usage. It is likely that some customers will still run out of heating oil. However, for those customers that do run out of heating oil, data should be examined to determine if they are essentially outliers in the sample for whom accurate forecasting will be difficult, or if a particular factor might be relevant for them that could be included in the model so that the forecasts become even more accurate.
Eydeland, Alexander and Krzysztof Wolyniec. Energy and Power Risk Management. New York: John Wiley and Sons.
Husher, Josh Durbin. Crises of the 21st Century. New York: iUniverse. Kruger, Abe and Carl Seville. Green Building. Belmont, CA: Cengage Learning.