A streak is a short period of good or bad luck. A team is said to have a winning streak when it wins many games consecutively, and to have a loosing streak when it looses many matches in a row. It is quite easy to say that a team has good players, and therefore has a high chance of winning. Upon closer consideration, though, it may become apparent that the skill and style of play of the teams playing against them has an important part to play, and so are other factors like coaching and the spirit in the players.

In this work, we have considered some variables that appear likely to influence the teams chance of winning. Specifically, we chose opponent 3-points per game, team 3-points per game, team free throws per game, team turnovers per game, opponent turnovers per game, team rebounds per game and opponent rebounds per game as key determining variables in determining the winning chance of a basketball team. We had to deal with the occurrence unusually large or small values in the data, since they affect the final outcome.

Therefore we formed a multiple regression model for prediction, and modified it until we came up with a model with six variables. Our model can be trusted to predict the chance of a team winning by up to 80%, and the percentage win can be predicted with an error margin 0. 1479 percentage points about 95% of the time. Our model showed us that the more turnovers a team has and the more rebounds from an opponent, the less the chance of winning. However, the more 3-point shots, free throws and rebounds made, and the more turnovers an opponent makes, the greater a teams chance of winning.

3 TABLE OF CONTENTS Executive summary 2 Objective of the study 4 Data description 5 Technical report 6 12 Conclusion and managerial implications 14 Appendices Appendix I: Descriptive statistics for the variables 15 Appendix II: Box plots for the variables 16 Appendix III: Scatter plots, winning chance vs. each variable 17 Appendix IV: Multiple regression details for 8-variable model 20 Appendix V: Residual plots for the 8 variables 21 Appendix VI: Best subsets regression details 23 Appendix VII: Regression details for 5-variable model 24.

Appendix VIII: Residual Plots for 5 variables 26 Appendix IX: Regression excluding residual outliers for 5-variable model 28 Appendix X: Regression for 6-variable model 29 Appendix XI: Residual plots for 6-variable model 30 Appendix XII: (a) The final regression model 32 Appendix XII: (b) Residual plots for the final regression model 33 4 OBJECTIVE OF THE STUDY The objective of his study is to create a regression model for predicting the percentage wining of a basketball team among many basketball teams in a particular basketball season.

Regression analysis is a method that aids us in predicting the outcome of a variable, given the values of one or more other (independent) variables. The model thus obtained is examined to ascertain the reliability of its prediction. In our analysis, therefore, we are out to examine a multiple regression model that we shall build, and improve on it until we find the best model for the job. We are motivated by the fact that fans of teams every now and then go into arguments (and even betting) about what chance there is for a particular team to win.

Winning a game, we believe, is not entirely a chance occurrence. We therefore want to investigate what factors can be expected to determine the winning chance of a team. We do not expect to get a magical model, but that we will have to modify our model until its predictive ability has been greatly improved. The importance of this work lies in the fact that, without accurate knowledge of the most influential factors affecting a phenomenon, one may end up spending a lot of resources (time, energy and money) on a factor that might not be so important, at the expense of the really important factors.

This results in a lot of input with no corresponding output, thereby leading to frustration. This can be especially true in sports and related activities. This work is our little contribution to more efficient planning and sport outing for a basketball team.

5 DATA DESCRIPTION The data that we have used is taken from ¦¦¦ It presents the statistics for sixty-eight (68) teams in a sporting season. Therefore we shall not be going into issues of time series or other techniques that come into play when dealing with data that has been collected over an extended period.

The data presents a list of 68 basketball teams. Each team has played a number of games in a particular basketball sporting season. The spreadsheet contains a lot of information on these 68 teams, such as their winning percentage and vital statistics of the games played in this particular season. In this work, we are going to designate a dependent variable (Y) and seven independent variables (X1, X2, X3, X4, X5, X6 and X7). The variables are defined as follows: Y = Winning Percentage X1 = Opponents 3-point per game X2 = Teams 3-point per game X3 = Teams free throws pr game

X4 = Teams turnover per game X5 = Opponents turnover per game X6 = Teams rebound per game X7 = Opponents rebound per game With the above variables, we shall formulate a regression model for the winning percentage of a team in this data.

6 TECHNICAL REPORT 6. 1 Preliminaries Our first task, having obtained the data, is to examine the descriptive statistics for each of our independent variables. The Minitab result is presented in Appendix I. The data appears to be normally distributed, since the mean and median are close. To further verify this, we will look at the box plots for each of the variables.

The box plots reveal that the data is normally distributed, except for turnover per game and opponent turnover per game with one outlier each, and home rebound per game with three outliers. The Box plots are presented in Appendix II. To further understand our data, we still look at the scatter plots of each variable against the winning percentage. This will show us the extent to which each of then influence the winning percentage. Although this is not the final regression model, it presents us with marginal regression relationships between each variable and the winning percentage.

The details of the results are presented in Appendix III. The marginal regressions reveal that some of the variables are more influential to the winning percentage than others, but we note that this is not the final regression model yet. On close examination, we observe that Opponents 3-point per game accounts for very little of the chances of winning a game, and in fact is negatively correlated with percentage wins of a team. A similar case arises concerning Teams turnover per game, only that the relationship is even weaker here. The same goes for Teams rebound per game.

The rest exhibit a positive correlation. The strongest correlation observable from the scatter plots is that of Teams free throws per game, and the weakest positive correlation is that of Opponents turnover per game. 6. 2 6. 4. 1 7 Regression analysis is a very useful analysis tool. Moreover, with the aid of modern computers, data analysis is even easier (and sometimes fun) to carry out. The final model we have been able to come up with will help in predicting the winning chance of a basketball team. We would like to state here that our model does not have magical powers of prediction.

The predictive accuracy of the model has been stated in the body of this work, and shows us that it does not incorporate EVERY variable that affects the winning chance of a team. It is common knowledge that factors like the co-operation between team management and players, relationship among players, the individual skills of the players and the support of a teams fans play a very important role in a teams ability to win a game, and so do many other factors. Yet these factors cannot be quantitatively described so as to be included in the model.

Nevertheless, we believe that the variables we have analyzed have very important roles to play, and therefore should not be ignored. We therefore recommend, based on our findings, that a team should strategize its game so as to minimize their turnovers, since from our model they have the strongest negative effect on their winning chance. Similarly, the opponents rebound will do damage. On the other hand, a basketball team should, as much as possible, maximize their 3-point shots, free throws, rebounds and the opponents turnovers, since according to our model, these have a positive influence on their winning chance.

Finally to the sports fan, you can know what to expect from a team if you can observe the above-mentioned variables. So, instead of raising your heart rate in blind anticipation, you can assess for yourself the chance that your favorite team will not let you down. In the meantime, we wish you the best of luck!

8 APPENDIXES 8. 1 APPENDIX I: Descriptive Statistics for the variables 1. Descriptive Statistics Variable N N* Mean SE Mean StDev Variance Minimum Winning percentage 68 0 0. 5946 0. 0197 0. 1625 0. 0264 0. 2333 Opp 3-point per game 68 0 6. 318 0.

107 0. 880 0. 774 3. 788 3-point per game 68 0 6. 478 0. 161 1. 326 1. 757 3. 645 Free throws per game 68 0 14. 203 0. 280 2. 307 5. 323 8. 536 Turn-over, pg 68 0 14. 086 0. 164 1. 355 1. 835 10. 974 Opponent Turn-over,pg 68 0 14. 755 0. 192 1. 583 2. 506 11. 438 Home rebound per game 68 0 35. 380 0. 389 3. 209 10. 297 27. 323 Oppnt rebound per game 68 0 33. 841 0. 258 2. 128 4. 528 28. 970 Variable Q1 Median Q3 Maximum Range IQR Winning percentage 0.

4707 0. 5938 0. 7403 0. 9487 0. 7154 0. 2696 Opp 3-point per game 5. 688 6. 323 6. 956 8. 138 4. 350 1. 268 3-point per game 5. 782 6. 433 7. 413 9. 471 5. 825 1. 631 Free throws per game 12. 619 14. 322 15. 883 19. 568 11. 032 3. 264 Turn-over, pg 13. 116 14. 000 14. 875 17. 656 6. 682 1. 759 Opponent Turn-over,pg 13. 574 14. 769 15. 514 18. 406 6. 969 1. 939 Home rebound per game 33. 304 35. 383 37. 063 45. 548 18. 226 3. 758 Oppnt rebound per game 32. 611 33. 754 35. 047 39. 938 10. 968 2. 436 2.

Descriptive Statistics: Winning percentage Variable N N* Mean SE Mean StDev Minimum Q1 Median Winning percentage 68 0 0. 5946 0. 0197 0. 1625 0. 2333 0. 4707 0. 5938 Variable Q3 Maximum IQR Variance Range Winning percentage 0. 7403 0. 9487 0. 2696 0. 026 o. 7154 8. 2 APPENDIX II: Box Plots for the variables 8. 3 APPENDIX III: Scatter Plots (With Corresponding Regression Equations) Regression Analysis: Winning percentage versus Opp 3-point per game The regression equation is Winning percentage = 0. 729 0. 0212 Opp 3-point per game S = 0.

162686 R-Sq = 1. 3% R-Sq(adj) = 0. 0% Regression Analysis: Winning percentage versus 3-point per game The regression equation is Winning percentage = 0. 397 + 0. 0304 3-point per game S = 0. 158646 R-Sq = 6. 2% R-Sq(adj) = 4. 7% Regression Analysis: Winning percentage versus Free throws per game The regression equation is Winning percentage = 0. 058 + 0. 0378 Free throws per game S = 0. 138185 R-Sq = 28. 8% R-Sq(adj) = 27. 7% Regression Analysis: Winning percentage versus Turn-over, pg The regression equation is Winning percentage = 1. 14 0. 0387 Turn-over, pg S = 0. 155019 R-Sq = 10.

4% R-Sq(adj) = 9. 0% Regression Analysis: Winning percentage versus Opponent Turn-over,pg The regression equation is Winning percentage = 0. 293 + 0. 0204 Opponent Turn-over,pg S = 0. 160503 R-Sq = 4. 0% R-Sq(adj) = 2. 5% Regression Analysis: Winning percentage versus Home rebound per game The regression equation is Winning percentage = 0. 243 + 0. 0237 Home rebound per game S = 0. 144773 R-Sq = 21. 9% R-Sq(adj) = 20. 7% Regression Analysis: Winning percentage versus Oppnt rebound per game The regression equation is Winning percentage = 1. 44 0. 0249 Oppnt rebound per game S = 0.

154803 R-Sq = 10. 7% R-Sq(adj) = 9. 3% 8.

4 APPENDIX IV: Multiple Regression Details Regression Analysis: Winning perc versus 3-point per , Free throws , ¦ The regression equation is Winning percentage = 0. 633 + 0. 0224 3-point per game + 0. 0176 Free throws per game 0. 0622 Turn-over, pg + 0. 0414 Opponent Turn-over,pg + 0. 0267 Home rebound per game 0. 0296 Oppnt rebound per game 0. 0172 Opp 3-point per game Predictor Coef SE Coef T P Constant 0. 6327 0. 2123 2. 98 0. 004 3-point per game 0. 022369 0. 007221 3. 10 0. 003

Free throws per game 0. 017604 0. 005720 3. 08 0. 003 Turn-over, pg -0. 062214 0. 007380 -8. 43 0. 000 Opponent Turn-over,pg 0. 041398 0. 006398 6. 47 0. 000 Home rebound per game 0. 026699 0. 004175 6. 39 0. 000 Oppnt rebound per game -0. 029645 0. 004594 -6. 45 0. 000 Opp 3-point per game -0. 01724 0. 01130 -1. 53 0. 132 S = 0. 0747588 R-Sq = 81. 1% R-Sq(adj) = 78. 8% Analysis of Variance Source DF SS MS F P Regression 7 1. 43486 0. 20498 36. 68 0. 000 Residual Error 60 0. 33533 0. 00559 Total 67 1.

77019 Source DF Seq SS 3-point per game 1 0. 10906 Free throws per game 1 0. 53614 Turn-over, pg 1 0. 24618 Opponent Turn-over,pg 1 0. 13117 Home rebound per game 1 0. 13403 Oppnt rebound per game 1 0. 26527 Opp 3-point per game 1 0. 01302 Unusual Observations 3-point Winning Obs per game percentage Fit SE Fit Residual St Resid 2 4. 59 0. 79412 0. 63575 0. 02114 0. 15837 2. 21R 27 6. 60 0. 76667 0. 60456 0. 01272 0. 16211 2. 20R 30 6. 21 0. 50000 0. 65441 0. 01571 -0. 15441 -2.

11R 45 4. 75 0. 25000 0. 39253 0. 02404 -0. 14253 -2. 01R R denotes an observation with a large standardized residual. 8. 5 APPENDIX V: Residuals plots for the 8 variables 8. 6 APPENDIX VI: Best Subsets Regression Best Subsets Regression: Winning perc versus Opp 3-point , 3-point per , ¦ Response is Winning percentage O O H p O F p o p p r p m n p e o e t e n 3 3 e r r t n e e p p h t b b o o r T o o i i o u T u u n n w r u n n t t s n r d d n p p p o p p e e e v o e e r r r e v r r r e g g g , r g g a a a , a a Mallows m m m p p m m.

Vars R-Sq R-Sq(adj) Cp S e e e g g e e 1 28. 8 27. 7 161. 5 0. 13818 X 1 21. 9 20. 7 183. 5 0. 14477 X 2 46. 9 45. 3 106. 1 0. 12021 X X 2 41. 2 39. 4 124. 4 0. 12658 X X 3 55. 2 53. 1 81. 7 0. 11126 X X X 3 54. 9 52. 8 82. 9 0. 11172 X X X 4 73. 8 72. 2 24. 9 0. 085772 X X X X 4 65. 1 62. 9 52. 4 0. 098958 X X X X 5 77. 7 75. 9 14. 6 0. 079790 X X X X X 5 76. 8 74. 9 17. 6 0. 081431 X X X X X.

6 80. 3 78. 4 8. 3 0. 075569 X X X X X X 6 78. 1 75. 9 15. 5 0. 079781 X X X X X X 7 81. 1 78. 8 8. 0 0. 074759 X X X X X X X 8. 7 APPENDIX VII: Regression Analysis with Five Variables Regression Analysis The regression equation is Winning percentage = 0. 528 + 0. 0250 3-point per game 0. 0631 Turn-over, pg + 0. 0471 Opponent Turn-over,pg + 0. 0349 Home rebound per game 0. 0336 Oppnt rebound per game Predictor Coef SE Coef T P Constant 0. 5280 0. 2213 2. 39 0. 020 3-point per game 0.025031 0. 007617 3. 29 0. 002.

Turn-over, pg -0. 063103 0. 007859 -8. 03 0. 000 Opponent Turn-over,pg 0. 047061 0. 006531 7. 21 0. 000 Home rebound per game 0. 034908 0. 003176 10. 99 0. 000 Oppnt rebound per game -0. 033572 0. 004713 -7. 12 0. 000 S = 0. 0797903 R-Sq = 77. 7% R-Sq(adj) = 75. 9% Analysis of Variance Source DF SS MS F P Regression 5 1. 37547 0. 27509 43. 21 0. 000 Residual Error 62 0. 39472 0. 00637 Total 67 1. 77019 Source DF Seq SS 3-point per game 1 0. 10906.

Turn-over, pg 1 0. 13137 Opponent Turn-over,pg 1 0. 15696 Home rebound per game 1 0. 65508 Oppnt rebound per game 1 0. 32300 Unusual Observations 3-point Winning Obs per game percentage Fit SE Fit Residual St Resid 8 4. 13 0. 83333 0. 66281 0. 02375 0. 17053 2. 24R 13 6. 79 0. 55172 0. 72095 0. 02073 -0. 16923 -2. 20R 27 6. 60 0. 76667 0. 60253 0. 01331 0. 16414 2. 09R 30 6. 21 0. 50000 0. 66321 0. 01474 -0. 16321 -2. 08R 45 4. 75 0. 25000 0. 41575 0. 02187 -0. 16575 -2. 16R.

R denotes an observation with a large standardized residual. APPENDIX VII (Continued): Descriptive Statistics for five Variables Descriptive Statistics Variable N N* Mean SE Mean StDev Variance Minimum Winning percentage 68 0 0. 5946 0. 0197 0. 1625 0. 0264 0. 2333 3-point per game 68 0 6. 478 0. 161 1. 326 1. 757 3. 645 Turn-over, pg 68 0 14. 086 0. 164 1. 355 1. 835 10. 974 Opponent Turn-over,pg 68 0 14. 755 0. 192 1. 583 2. 506 11. 438 Home rebound per game 68 0 35. 380 0. 389 3. 209 10.

297 27. 323 Oppnt rebound per game 68 0 33. 841 0. 258 2. 128 4. 528 28. 970 Variable Q1 Median Q3 Maximum Range IQR Winning percentage 0. 4707 0. 5938 0. 7403 0. 9487 0. 7154 0. 2696 3-point per game 5. 782 6. 433 7. 413 9. 471 5. 825 1. 631 Turn-over, pg 13. 116 14. 000 14. 875 17. 656 6. 682 1. 759 Opponent Turn-over,pg 13. 574 14. 769 15. 514 18. 406 6. 969 1. 939 Home rebound per game 33. 304 35. 383 37. 063 45. 548 18. 226 3. 758 Oppnt rebound per game 32. 611 33. 754 35. 047 39.938 10. 968 2. 436 8. 8.

APPENDIX VIII: Residual Plots for 5 variables 8. 9 APPENDIX IX: Regression Excluding Residual Outliers Regression Analysis: The regression equation is Winning percentage = 0. 487 + 0. 0184 Free throws per game + 0. 0240 Opponent Turn-over,pg + 0. 0188 Home rebound per game 0. 0303 Oppnt rebound per game 0. 0243 Opp 3-point per game Predictor Coef SE Coef T P Constant 0. 4873 0. 2956 1. 65 0. 105 Free throws per game 0. 018444 0. 009412 1. 96 0. 055 Opponent Turn-over,pg 0. 024021 0. 009784 2. 46 0. 017

Home rebound per game 0. 018835 0. 006555 2. 87 0. 006 Oppnt rebound per game -0. 030258 0. 007625 -3. 97 0. 000 Opp 3-point per game -0. 02428 0. 02129 -1. 14 0. 259 S = 0. 118905 R-Sq = 49. 8% R-Sq(adj) = 45. 7% Analysis of Variance Source DF SS MS F P Regression 5 0. 84309 0. 16862 11. 93 0. 000 Residual Error 60 0. 84831 0. 01414 Total 65 1. 69140 Source DF Seq SS Free throws per game 1 0. 47458 Opponent Turn-over,pg 1 0. 03295 Home rebound per game 1 0. 04175 Oppnt rebound per game 1 0.

27543 Opp 3-point per game 1 0. 01839 Unusual Observations Free throws Winning Obs per game percentage Fit SE Fit Residual St Resid 12 12. 2 0. 3333 0. 5854 0. 0270 -0. 2521 -2. 18R 34 12. 2 0. 9487 0. 6218 0. 0297 0. 3269 2. 84R 42 14. 5 0. 2333 0. 5227 0. 0400 -0. 2893 -2. 58R 43 12. 5 0. 2500 0. 4925 0. 0367 -0. 2425 -2. 14R R denotes an observation with a large standardized residual. 8. 10 APPENDIX X: Regression with 6 Variables Regression Analysis: Winning perc versus 3-point per , Free throws , ¦

The regression equation is Winning percentage = 0. 565 + 0. 0239 3-point per game + 0. 0163 Free throws per game 0. 0630 Turn-over, pg + 0. 0436 Opponent Turn-over,pg + 0. 0265 Home rebound per game 0. 0310 Oppnt rebound per game Predictor Coef SE Coef T P Constant 0. 5654 0. 2100 2. 69 0. 009 3-point per game 0. 023949 0. 007224 3. 32 0. 002 Free throws per game 0. 016290 0. 005717 2. 85 0. 006 Turn-over, pg -0. 062984 0. 007443 -8. 46 0. 000 Opponent Turn-over,pg 0. 043571 0. 006305 6. 91 0.

000 Home rebound per game 0. 026482 0. 004218 6. 28 0. 000 Oppnt rebound per game -0. 031028 0. 004552 -6. 82 0. 000 S = 0. 0755690 R-Sq = 80. 3% R-Sq(adj) = 78. 4% Analysis of Variance Source DF SS MS F P Regression 6 1. 42184 0. 23697 41. 50 0. 000 Residual Error 61 0. 34835 0. 00571 Total 67 1. 77019 Source DF Seq SS 3-point per game 1 0. 10906 Free throws per game 1 0. 53614 Turn-over, pg 1 0. 24618 Opponent Turn-over,pg 1 0. 13117 Home rebound per game 1 0. 13403.

Oppnt rebound per game 1 0. 26527 Unusual Observations 3-point Winning Obs per game percentage Fit SE Fit Residual St Resid 27 6. 60 0. 76667 0. 60084 0. 01262 0. 16582 2. 23R 44 6. 03 0. 23333 0. 38536 0. 02559 -0. 15202 -2. 14R 45 4. 75 0. 25000 0. 41158 0. 02076 -0. 16158 -2. 22R R denotes an observation with a large standardized residual. 8. 11 APPENDIX XI: Residual Plots for the 6-variable Model 8. 12 APPENDIX XII (a): The Final Regression Model. Regression Analysis: Winning perc versus 3-point per , Free throws , ¦

The regression equation is Winning percentage = 0. 604 + 0. 0226 3-point per game + 0. 0167 Free throws per game 0. 0660 Turn-over, pg + 0. 0420 Opponent Turn-over,pg + 0. 0256 Home rebound per game 0. 0292 Oppnt rebound per game Predictor Coef SE Coef T P Constant 0. 6038 0. 2065 2. 92 0. 005 3-point per game 0. 022564 0. 007108 3. 17 0. 002 Free throws per game 0. 016706 0. 005600 2. 98 0. 004 Turn-over, pg -0. 066016 0. 007456 -8. 85 0. 000 Opponent Turn-over,pg 0. 041969 0. 006229 6. 74 0.

000 Home rebound per game 0. 025649 0. 004152 6. 18 0. 000 Oppnt rebound per game -0. 029173 0. 004561 -6. 40 0. 000 S = 0. 0739739 R-Sq = 80. 8% R-Sq(adj) = 78. 8% Analysis of Variance Source DF SS MS F P Regression 6 1. 37853 0. 22976 41. 99 0. 000 Residual Error 60 0. 32833 0. 00547 Total 66 1. 70686 Source DF Seq SS 3-point per game 1 0. 10202 Free throws per game 1 0. 50620 Turn-over, pg 1 0. 30758 Opponent Turn-over,pg 1 0. 11512 Home rebound per game 1 0. 12372.

Oppnt rebound per game 1 0. 22390 Unusual Observations 3-point Winning Obs per game percentage Fit SE Fit Residual St Resid 26 6. 60 0. 76667 0. 60237 0. 01238 0. 16429 2. 25R 29 6. 21 0. 50000 0. 64694 0. 01477 -0. 14694 -2. 03R 43 6. 03 0. 23333 0. 38546 0. 02505 -0. 15213 -2. 19R 44 4. 75 0. 25000 0. 41580 0. 02045 -0. 16580 -2. 33R R denotes an observation with a large standardized residual.

APPENDIX XII (b): Residual Plots for the final regression model.

APPENDIXXII (b): Continued REFERENCES Please state the source of data here.

In this work, we have considered some variables that appear likely to influence the teams chance of winning. Specifically, we chose opponent 3-points per game, team 3-points per game, team free throws per game, team turnovers per game, opponent turnovers per game, team rebounds per game and opponent rebounds per game as key determining variables in determining the winning chance of a basketball team. We had to deal with the occurrence unusually large or small values in the data, since they affect the final outcome.

Therefore we formed a multiple regression model for prediction, and modified it until we came up with a model with six variables. Our model can be trusted to predict the chance of a team winning by up to 80%, and the percentage win can be predicted with an error margin 0. 1479 percentage points about 95% of the time. Our model showed us that the more turnovers a team has and the more rebounds from an opponent, the less the chance of winning. However, the more 3-point shots, free throws and rebounds made, and the more turnovers an opponent makes, the greater a teams chance of winning.

3 TABLE OF CONTENTS Executive summary 2 Objective of the study 4 Data description 5 Technical report 6 12 Conclusion and managerial implications 14 Appendices Appendix I: Descriptive statistics for the variables 15 Appendix II: Box plots for the variables 16 Appendix III: Scatter plots, winning chance vs. each variable 17 Appendix IV: Multiple regression details for 8-variable model 20 Appendix V: Residual plots for the 8 variables 21 Appendix VI: Best subsets regression details 23 Appendix VII: Regression details for 5-variable model 24.

Appendix VIII: Residual Plots for 5 variables 26 Appendix IX: Regression excluding residual outliers for 5-variable model 28 Appendix X: Regression for 6-variable model 29 Appendix XI: Residual plots for 6-variable model 30 Appendix XII: (a) The final regression model 32 Appendix XII: (b) Residual plots for the final regression model 33 4 OBJECTIVE OF THE STUDY The objective of his study is to create a regression model for predicting the percentage wining of a basketball team among many basketball teams in a particular basketball season.

Regression analysis is a method that aids us in predicting the outcome of a variable, given the values of one or more other (independent) variables. The model thus obtained is examined to ascertain the reliability of its prediction. In our analysis, therefore, we are out to examine a multiple regression model that we shall build, and improve on it until we find the best model for the job. We are motivated by the fact that fans of teams every now and then go into arguments (and even betting) about what chance there is for a particular team to win.

Winning a game, we believe, is not entirely a chance occurrence. We therefore want to investigate what factors can be expected to determine the winning chance of a team. We do not expect to get a magical model, but that we will have to modify our model until its predictive ability has been greatly improved. The importance of this work lies in the fact that, without accurate knowledge of the most influential factors affecting a phenomenon, one may end up spending a lot of resources (time, energy and money) on a factor that might not be so important, at the expense of the really important factors.

This results in a lot of input with no corresponding output, thereby leading to frustration. This can be especially true in sports and related activities. This work is our little contribution to more efficient planning and sport outing for a basketball team.

5 DATA DESCRIPTION The data that we have used is taken from ¦¦¦ It presents the statistics for sixty-eight (68) teams in a sporting season. Therefore we shall not be going into issues of time series or other techniques that come into play when dealing with data that has been collected over an extended period.

The data presents a list of 68 basketball teams. Each team has played a number of games in a particular basketball sporting season. The spreadsheet contains a lot of information on these 68 teams, such as their winning percentage and vital statistics of the games played in this particular season. In this work, we are going to designate a dependent variable (Y) and seven independent variables (X1, X2, X3, X4, X5, X6 and X7). The variables are defined as follows: Y = Winning Percentage X1 = Opponents 3-point per game X2 = Teams 3-point per game X3 = Teams free throws pr game

X4 = Teams turnover per game X5 = Opponents turnover per game X6 = Teams rebound per game X7 = Opponents rebound per game With the above variables, we shall formulate a regression model for the winning percentage of a team in this data.

6 TECHNICAL REPORT 6. 1 Preliminaries Our first task, having obtained the data, is to examine the descriptive statistics for each of our independent variables. The Minitab result is presented in Appendix I. The data appears to be normally distributed, since the mean and median are close. To further verify this, we will look at the box plots for each of the variables.

The box plots reveal that the data is normally distributed, except for turnover per game and opponent turnover per game with one outlier each, and home rebound per game with three outliers. The Box plots are presented in Appendix II. To further understand our data, we still look at the scatter plots of each variable against the winning percentage. This will show us the extent to which each of then influence the winning percentage. Although this is not the final regression model, it presents us with marginal regression relationships between each variable and the winning percentage.

The details of the results are presented in Appendix III. The marginal regressions reveal that some of the variables are more influential to the winning percentage than others, but we note that this is not the final regression model yet. On close examination, we observe that Opponents 3-point per game accounts for very little of the chances of winning a game, and in fact is negatively correlated with percentage wins of a team. A similar case arises concerning Teams turnover per game, only that the relationship is even weaker here. The same goes for Teams rebound per game.

The rest exhibit a positive correlation. The strongest correlation observable from the scatter plots is that of Teams free throws per game, and the weakest positive correlation is that of Opponents turnover per game. 6. 2 6. 4. 1 7 Regression analysis is a very useful analysis tool. Moreover, with the aid of modern computers, data analysis is even easier (and sometimes fun) to carry out. The final model we have been able to come up with will help in predicting the winning chance of a basketball team. We would like to state here that our model does not have magical powers of prediction.

The predictive accuracy of the model has been stated in the body of this work, and shows us that it does not incorporate EVERY variable that affects the winning chance of a team. It is common knowledge that factors like the co-operation between team management and players, relationship among players, the individual skills of the players and the support of a teams fans play a very important role in a teams ability to win a game, and so do many other factors. Yet these factors cannot be quantitatively described so as to be included in the model.

Nevertheless, we believe that the variables we have analyzed have very important roles to play, and therefore should not be ignored. We therefore recommend, based on our findings, that a team should strategize its game so as to minimize their turnovers, since from our model they have the strongest negative effect on their winning chance. Similarly, the opponents rebound will do damage. On the other hand, a basketball team should, as much as possible, maximize their 3-point shots, free throws, rebounds and the opponents turnovers, since according to our model, these have a positive influence on their winning chance.

Finally to the sports fan, you can know what to expect from a team if you can observe the above-mentioned variables. So, instead of raising your heart rate in blind anticipation, you can assess for yourself the chance that your favorite team will not let you down. In the meantime, we wish you the best of luck!

8 APPENDIXES 8. 1 APPENDIX I: Descriptive Statistics for the variables 1. Descriptive Statistics Variable N N* Mean SE Mean StDev Variance Minimum Winning percentage 68 0 0. 5946 0. 0197 0. 1625 0. 0264 0. 2333 Opp 3-point per game 68 0 6. 318 0.

107 0. 880 0. 774 3. 788 3-point per game 68 0 6. 478 0. 161 1. 326 1. 757 3. 645 Free throws per game 68 0 14. 203 0. 280 2. 307 5. 323 8. 536 Turn-over, pg 68 0 14. 086 0. 164 1. 355 1. 835 10. 974 Opponent Turn-over,pg 68 0 14. 755 0. 192 1. 583 2. 506 11. 438 Home rebound per game 68 0 35. 380 0. 389 3. 209 10. 297 27. 323 Oppnt rebound per game 68 0 33. 841 0. 258 2. 128 4. 528 28. 970 Variable Q1 Median Q3 Maximum Range IQR Winning percentage 0.

4707 0. 5938 0. 7403 0. 9487 0. 7154 0. 2696 Opp 3-point per game 5. 688 6. 323 6. 956 8. 138 4. 350 1. 268 3-point per game 5. 782 6. 433 7. 413 9. 471 5. 825 1. 631 Free throws per game 12. 619 14. 322 15. 883 19. 568 11. 032 3. 264 Turn-over, pg 13. 116 14. 000 14. 875 17. 656 6. 682 1. 759 Opponent Turn-over,pg 13. 574 14. 769 15. 514 18. 406 6. 969 1. 939 Home rebound per game 33. 304 35. 383 37. 063 45. 548 18. 226 3. 758 Oppnt rebound per game 32. 611 33. 754 35. 047 39. 938 10. 968 2. 436 2.

Descriptive Statistics: Winning percentage Variable N N* Mean SE Mean StDev Minimum Q1 Median Winning percentage 68 0 0. 5946 0. 0197 0. 1625 0. 2333 0. 4707 0. 5938 Variable Q3 Maximum IQR Variance Range Winning percentage 0. 7403 0. 9487 0. 2696 0. 026 o. 7154 8. 2 APPENDIX II: Box Plots for the variables 8. 3 APPENDIX III: Scatter Plots (With Corresponding Regression Equations) Regression Analysis: Winning percentage versus Opp 3-point per game The regression equation is Winning percentage = 0. 729 0. 0212 Opp 3-point per game S = 0.

162686 R-Sq = 1. 3% R-Sq(adj) = 0. 0% Regression Analysis: Winning percentage versus 3-point per game The regression equation is Winning percentage = 0. 397 + 0. 0304 3-point per game S = 0. 158646 R-Sq = 6. 2% R-Sq(adj) = 4. 7% Regression Analysis: Winning percentage versus Free throws per game The regression equation is Winning percentage = 0. 058 + 0. 0378 Free throws per game S = 0. 138185 R-Sq = 28. 8% R-Sq(adj) = 27. 7% Regression Analysis: Winning percentage versus Turn-over, pg The regression equation is Winning percentage = 1. 14 0. 0387 Turn-over, pg S = 0. 155019 R-Sq = 10.

4% R-Sq(adj) = 9. 0% Regression Analysis: Winning percentage versus Opponent Turn-over,pg The regression equation is Winning percentage = 0. 293 + 0. 0204 Opponent Turn-over,pg S = 0. 160503 R-Sq = 4. 0% R-Sq(adj) = 2. 5% Regression Analysis: Winning percentage versus Home rebound per game The regression equation is Winning percentage = 0. 243 + 0. 0237 Home rebound per game S = 0. 144773 R-Sq = 21. 9% R-Sq(adj) = 20. 7% Regression Analysis: Winning percentage versus Oppnt rebound per game The regression equation is Winning percentage = 1. 44 0. 0249 Oppnt rebound per game S = 0.

154803 R-Sq = 10. 7% R-Sq(adj) = 9. 3% 8.

4 APPENDIX IV: Multiple Regression Details Regression Analysis: Winning perc versus 3-point per , Free throws , ¦ The regression equation is Winning percentage = 0. 633 + 0. 0224 3-point per game + 0. 0176 Free throws per game 0. 0622 Turn-over, pg + 0. 0414 Opponent Turn-over,pg + 0. 0267 Home rebound per game 0. 0296 Oppnt rebound per game 0. 0172 Opp 3-point per game Predictor Coef SE Coef T P Constant 0. 6327 0. 2123 2. 98 0. 004 3-point per game 0. 022369 0. 007221 3. 10 0. 003

Free throws per game 0. 017604 0. 005720 3. 08 0. 003 Turn-over, pg -0. 062214 0. 007380 -8. 43 0. 000 Opponent Turn-over,pg 0. 041398 0. 006398 6. 47 0. 000 Home rebound per game 0. 026699 0. 004175 6. 39 0. 000 Oppnt rebound per game -0. 029645 0. 004594 -6. 45 0. 000 Opp 3-point per game -0. 01724 0. 01130 -1. 53 0. 132 S = 0. 0747588 R-Sq = 81. 1% R-Sq(adj) = 78. 8% Analysis of Variance Source DF SS MS F P Regression 7 1. 43486 0. 20498 36. 68 0. 000 Residual Error 60 0. 33533 0. 00559 Total 67 1.

77019 Source DF Seq SS 3-point per game 1 0. 10906 Free throws per game 1 0. 53614 Turn-over, pg 1 0. 24618 Opponent Turn-over,pg 1 0. 13117 Home rebound per game 1 0. 13403 Oppnt rebound per game 1 0. 26527 Opp 3-point per game 1 0. 01302 Unusual Observations 3-point Winning Obs per game percentage Fit SE Fit Residual St Resid 2 4. 59 0. 79412 0. 63575 0. 02114 0. 15837 2. 21R 27 6. 60 0. 76667 0. 60456 0. 01272 0. 16211 2. 20R 30 6. 21 0. 50000 0. 65441 0. 01571 -0. 15441 -2.

11R 45 4. 75 0. 25000 0. 39253 0. 02404 -0. 14253 -2. 01R R denotes an observation with a large standardized residual. 8. 5 APPENDIX V: Residuals plots for the 8 variables 8. 6 APPENDIX VI: Best Subsets Regression Best Subsets Regression: Winning perc versus Opp 3-point , 3-point per , ¦ Response is Winning percentage O O H p O F p o p p r p m n p e o e t e n 3 3 e r r t n e e p p h t b b o o r T o o i i o u T u u n n w r u n n t t s n r d d n p p p o p p e e e v o e e r r r e v r r r e g g g , r g g a a a , a a Mallows m m m p p m m.

Vars R-Sq R-Sq(adj) Cp S e e e g g e e 1 28. 8 27. 7 161. 5 0. 13818 X 1 21. 9 20. 7 183. 5 0. 14477 X 2 46. 9 45. 3 106. 1 0. 12021 X X 2 41. 2 39. 4 124. 4 0. 12658 X X 3 55. 2 53. 1 81. 7 0. 11126 X X X 3 54. 9 52. 8 82. 9 0. 11172 X X X 4 73. 8 72. 2 24. 9 0. 085772 X X X X 4 65. 1 62. 9 52. 4 0. 098958 X X X X 5 77. 7 75. 9 14. 6 0. 079790 X X X X X 5 76. 8 74. 9 17. 6 0. 081431 X X X X X.

6 80. 3 78. 4 8. 3 0. 075569 X X X X X X 6 78. 1 75. 9 15. 5 0. 079781 X X X X X X 7 81. 1 78. 8 8. 0 0. 074759 X X X X X X X 8. 7 APPENDIX VII: Regression Analysis with Five Variables Regression Analysis The regression equation is Winning percentage = 0. 528 + 0. 0250 3-point per game 0. 0631 Turn-over, pg + 0. 0471 Opponent Turn-over,pg + 0. 0349 Home rebound per game 0. 0336 Oppnt rebound per game Predictor Coef SE Coef T P Constant 0. 5280 0. 2213 2. 39 0. 020 3-point per game 0.025031 0. 007617 3. 29 0. 002.

Turn-over, pg -0. 063103 0. 007859 -8. 03 0. 000 Opponent Turn-over,pg 0. 047061 0. 006531 7. 21 0. 000 Home rebound per game 0. 034908 0. 003176 10. 99 0. 000 Oppnt rebound per game -0. 033572 0. 004713 -7. 12 0. 000 S = 0. 0797903 R-Sq = 77. 7% R-Sq(adj) = 75. 9% Analysis of Variance Source DF SS MS F P Regression 5 1. 37547 0. 27509 43. 21 0. 000 Residual Error 62 0. 39472 0. 00637 Total 67 1. 77019 Source DF Seq SS 3-point per game 1 0. 10906.

Turn-over, pg 1 0. 13137 Opponent Turn-over,pg 1 0. 15696 Home rebound per game 1 0. 65508 Oppnt rebound per game 1 0. 32300 Unusual Observations 3-point Winning Obs per game percentage Fit SE Fit Residual St Resid 8 4. 13 0. 83333 0. 66281 0. 02375 0. 17053 2. 24R 13 6. 79 0. 55172 0. 72095 0. 02073 -0. 16923 -2. 20R 27 6. 60 0. 76667 0. 60253 0. 01331 0. 16414 2. 09R 30 6. 21 0. 50000 0. 66321 0. 01474 -0. 16321 -2. 08R 45 4. 75 0. 25000 0. 41575 0. 02187 -0. 16575 -2. 16R.

R denotes an observation with a large standardized residual. APPENDIX VII (Continued): Descriptive Statistics for five Variables Descriptive Statistics Variable N N* Mean SE Mean StDev Variance Minimum Winning percentage 68 0 0. 5946 0. 0197 0. 1625 0. 0264 0. 2333 3-point per game 68 0 6. 478 0. 161 1. 326 1. 757 3. 645 Turn-over, pg 68 0 14. 086 0. 164 1. 355 1. 835 10. 974 Opponent Turn-over,pg 68 0 14. 755 0. 192 1. 583 2. 506 11. 438 Home rebound per game 68 0 35. 380 0. 389 3. 209 10.

297 27. 323 Oppnt rebound per game 68 0 33. 841 0. 258 2. 128 4. 528 28. 970 Variable Q1 Median Q3 Maximum Range IQR Winning percentage 0. 4707 0. 5938 0. 7403 0. 9487 0. 7154 0. 2696 3-point per game 5. 782 6. 433 7. 413 9. 471 5. 825 1. 631 Turn-over, pg 13. 116 14. 000 14. 875 17. 656 6. 682 1. 759 Opponent Turn-over,pg 13. 574 14. 769 15. 514 18. 406 6. 969 1. 939 Home rebound per game 33. 304 35. 383 37. 063 45. 548 18. 226 3. 758 Oppnt rebound per game 32. 611 33. 754 35. 047 39.938 10. 968 2. 436 8. 8.

APPENDIX VIII: Residual Plots for 5 variables 8. 9 APPENDIX IX: Regression Excluding Residual Outliers Regression Analysis: The regression equation is Winning percentage = 0. 487 + 0. 0184 Free throws per game + 0. 0240 Opponent Turn-over,pg + 0. 0188 Home rebound per game 0. 0303 Oppnt rebound per game 0. 0243 Opp 3-point per game Predictor Coef SE Coef T P Constant 0. 4873 0. 2956 1. 65 0. 105 Free throws per game 0. 018444 0. 009412 1. 96 0. 055 Opponent Turn-over,pg 0. 024021 0. 009784 2. 46 0. 017

Home rebound per game 0. 018835 0. 006555 2. 87 0. 006 Oppnt rebound per game -0. 030258 0. 007625 -3. 97 0. 000 Opp 3-point per game -0. 02428 0. 02129 -1. 14 0. 259 S = 0. 118905 R-Sq = 49. 8% R-Sq(adj) = 45. 7% Analysis of Variance Source DF SS MS F P Regression 5 0. 84309 0. 16862 11. 93 0. 000 Residual Error 60 0. 84831 0. 01414 Total 65 1. 69140 Source DF Seq SS Free throws per game 1 0. 47458 Opponent Turn-over,pg 1 0. 03295 Home rebound per game 1 0. 04175 Oppnt rebound per game 1 0.

27543 Opp 3-point per game 1 0. 01839 Unusual Observations Free throws Winning Obs per game percentage Fit SE Fit Residual St Resid 12 12. 2 0. 3333 0. 5854 0. 0270 -0. 2521 -2. 18R 34 12. 2 0. 9487 0. 6218 0. 0297 0. 3269 2. 84R 42 14. 5 0. 2333 0. 5227 0. 0400 -0. 2893 -2. 58R 43 12. 5 0. 2500 0. 4925 0. 0367 -0. 2425 -2. 14R R denotes an observation with a large standardized residual. 8. 10 APPENDIX X: Regression with 6 Variables Regression Analysis: Winning perc versus 3-point per , Free throws , ¦

The regression equation is Winning percentage = 0. 565 + 0. 0239 3-point per game + 0. 0163 Free throws per game 0. 0630 Turn-over, pg + 0. 0436 Opponent Turn-over,pg + 0. 0265 Home rebound per game 0. 0310 Oppnt rebound per game Predictor Coef SE Coef T P Constant 0. 5654 0. 2100 2. 69 0. 009 3-point per game 0. 023949 0. 007224 3. 32 0. 002 Free throws per game 0. 016290 0. 005717 2. 85 0. 006 Turn-over, pg -0. 062984 0. 007443 -8. 46 0. 000 Opponent Turn-over,pg 0. 043571 0. 006305 6. 91 0.

000 Home rebound per game 0. 026482 0. 004218 6. 28 0. 000 Oppnt rebound per game -0. 031028 0. 004552 -6. 82 0. 000 S = 0. 0755690 R-Sq = 80. 3% R-Sq(adj) = 78. 4% Analysis of Variance Source DF SS MS F P Regression 6 1. 42184 0. 23697 41. 50 0. 000 Residual Error 61 0. 34835 0. 00571 Total 67 1. 77019 Source DF Seq SS 3-point per game 1 0. 10906 Free throws per game 1 0. 53614 Turn-over, pg 1 0. 24618 Opponent Turn-over,pg 1 0. 13117 Home rebound per game 1 0. 13403.

Oppnt rebound per game 1 0. 26527 Unusual Observations 3-point Winning Obs per game percentage Fit SE Fit Residual St Resid 27 6. 60 0. 76667 0. 60084 0. 01262 0. 16582 2. 23R 44 6. 03 0. 23333 0. 38536 0. 02559 -0. 15202 -2. 14R 45 4. 75 0. 25000 0. 41158 0. 02076 -0. 16158 -2. 22R R denotes an observation with a large standardized residual. 8. 11 APPENDIX XI: Residual Plots for the 6-variable Model 8. 12 APPENDIX XII (a): The Final Regression Model. Regression Analysis: Winning perc versus 3-point per , Free throws , ¦

The regression equation is Winning percentage = 0. 604 + 0. 0226 3-point per game + 0. 0167 Free throws per game 0. 0660 Turn-over, pg + 0. 0420 Opponent Turn-over,pg + 0. 0256 Home rebound per game 0. 0292 Oppnt rebound per game Predictor Coef SE Coef T P Constant 0. 6038 0. 2065 2. 92 0. 005 3-point per game 0. 022564 0. 007108 3. 17 0. 002 Free throws per game 0. 016706 0. 005600 2. 98 0. 004 Turn-over, pg -0. 066016 0. 007456 -8. 85 0. 000 Opponent Turn-over,pg 0. 041969 0. 006229 6. 74 0.

000 Home rebound per game 0. 025649 0. 004152 6. 18 0. 000 Oppnt rebound per game -0. 029173 0. 004561 -6. 40 0. 000 S = 0. 0739739 R-Sq = 80. 8% R-Sq(adj) = 78. 8% Analysis of Variance Source DF SS MS F P Regression 6 1. 37853 0. 22976 41. 99 0. 000 Residual Error 60 0. 32833 0. 00547 Total 66 1. 70686 Source DF Seq SS 3-point per game 1 0. 10202 Free throws per game 1 0. 50620 Turn-over, pg 1 0. 30758 Opponent Turn-over,pg 1 0. 11512 Home rebound per game 1 0. 12372.

Oppnt rebound per game 1 0. 22390 Unusual Observations 3-point Winning Obs per game percentage Fit SE Fit Residual St Resid 26 6. 60 0. 76667 0. 60237 0. 01238 0. 16429 2. 25R 29 6. 21 0. 50000 0. 64694 0. 01477 -0. 14694 -2. 03R 43 6. 03 0. 23333 0. 38546 0. 02505 -0. 15213 -2. 19R 44 4. 75 0. 25000 0. 41580 0. 02045 -0. 16580 -2. 33R R denotes an observation with a large standardized residual.

APPENDIX XII (b): Residual Plots for the final regression model.

APPENDIXXII (b): Continued REFERENCES Please state the source of data here.