How do you group all the golf courses in the United States into meaningful categories that represent similar design and play style?
In my recent work on the CourseIQ application (mentioned previously), I needed to solve what on the surface seemed like a pretty simple problem – how do I show a golf course architect related courses when he is researching a club for renovation?
Course design is both an art and a science. Each hole must be carefully crafted to appeal to the desired players in terms of difficulty, duration of play. This requires an understanding of what factors make a course popular, and how it relates to other courses (it’s competition!).
I already had a database with thousands of data points about each course, and I knew I needed to somehow cluster courses based on the similarity of these attributes.
In this case, Azure Machine Learning was able to solve this problem for me by applying a k-means clustering algorithm over my data. K-means clustering is a method of grouping data sets into clusters based on the similarity of their attributes. It is extremely computationally expensive, but Azure Machine learning has a heuristic algorithm that runs in the cloud to do this very quickly.
I first identified the attributes I wanted to cluster based on, and fed those into Azure ML from my Azure SQL data source. I then initialized a model with 100 random ‘clusters’ (Voronoi cells) and asked Azure to perform 50,000 iterations over my data to fit it to these clusters. Once the clusters had been trained, I exported the classifications into my SQL instance where it could be referenced in queries.
In addition to data points about course terrain, hole length and player scorecards I included weather and geographical data. This technique improves on traditional methodologies of simply correlating a few data points by relying on hundreds of relevant attributes. The results are fantastic and our related courses are truly similar from the perspective of a player.