Predicting National Basketball Association Game Attendance Using Random Forests
Barry E. King

Research predicting National Basketball Association game attendance using a random forest approach is presented. Attendance and other data obtained for the 2009 through 2013 basketball seasons are used. Predictor variables include: home team popularity, popularity of opponent, match type (regular season or playoff), day of the week on which the match occurs, home team winning percentage, home city’s total personal income, capacity of home venue, conference of the home team, lagged variables on attendance and on winning percentage, and others. A random forest approach, using the R statistical modeling language, was selected in order to use numerous predictor variables without having to first deselect variables and not to over-fit the data. The random forest prediction is compared favorably with that of a multiple linear regression. Additional results indicate that some variables suggested by sports writers do not contribute much to the prediction and that a better measure of a team’s popularity is needed.

Full Text: PDF     DOI: 10.15640/jcsit.v5n1a1