In the Workroom: A Naive Bayes Classifier
Last month Data for Progress launched a prediction competition to determine who's got what it takes to predict America's next Drag Superstar. My team name: "Bayes the House Down." My approach: a Naive Bayes Classifier. Here, I describe a little bit about how my classifier works, and how the algorithm has performed so far.
The team over at Data for Progress was generous enough to provide some excellent datasets along with an example algorithm that includes code for scraping the data into R. Datasets include demographic data for every Queen who has ever competed on the show, social media statistics, and every Queen's performance on every single episode from the 10 previous regular seasons.
I decided to use a Naive Bayes Classifier (NBC), implemented in R with the package
e1070. I chose the NBC
because it is super easy to implement, even easier to understand, and it runs incredibly fast. I don't have
time to go into the math right now, but if you'd like to learn more about what the algorithm is doing,
this blog post offers a great explanation.
For predicting the weekly winner and loser of RPDR, I take into consideration the following features:
- Home State
- Past Wins
- Past Losses
To quantify past performance, I gave each queen 1 point if they performed high or won the maxi challenge; -1 points if they performed low, had to lip sync, or were sent home; or 0 points if they were safe. These scores were then averaged. To transform Age and Past Performance from continuous variables into discrete categories, I normalized the values, calculated a percentile, rounded the percentiles to the nearest tenths place to create ten discrete groups.
Because challenges tend to vary depending on how far the season has progressed (e.g. Snatch Game always falls towards the middle of the season), I decided to train the algorithm only on data from the beginning, middle, or end of a season, as appropriate. Currently, we are still at the beginning of the season, so I'm only training on the first few episodes of each season.
On the Main Stage: Some Initial Success
Below, I've listed the algorithm's predictions for each Queen for each episode so far. When I run the NBC, I get three probabilities: P(Win), P(Safe), and P(Loss) for each Queen. For my prediction, I choose the Queen with the highest P(Win) as the predicted winner for the week, and the highest P(Loss) as the predicted loser. There are advantages and disadvantages to this, and I plan to write up a blog post later with a more in-depth look at the model's performance, strengths, and weaknesses. Quick humble brag: The algorithm successfully predicted that Brooke Lynn Hytes would win the first episode! Haven't had much luck since, but we'll see...
This Week's Predictions: Episode 6
Predicted to Win: Yvie Oddly Actual Winner:Predicted to Lose: Ra'jah D. O'Hara Sent Home:
|A'keria Chanel Davenport||0.208||0.458||0.334|
|Ra'jah D. O'Hara||0.245||0.326||0.429||SAFE|
|Silky Nutmeg Ganache||0.293||0.536||0.171|
|Vanessa Vanjie Mateo||0.402||0.390||0.209|
|Brooke Lynn Hytes||0.410||0.434||0.156|
This Week's Predictions: Episode 5: Monster Ball
Predicted to Win: Yvie Oddly Actual Winner: Brooke Lynn Hytes
Predicted to Lose: Shuga Cain Sent Home: Ariel Versace
This week's predictions are stunning, darling. Yvie Oddly has been performing consistently well all season and has become a fan favorite. Would love to see her snatch the crown this week. Shuga Cain was previously one of the classifier's top picks, but this week the predictions have her neck and neck with Ra'jah O'Hara for who will be going home. Personally, I would choose Ra'jah, with two lip-syncs in a row to be going home over Shuga, but I've got to let the algorithm speak for itself!
|Ra'jah D. O'Hara||0.133||0.562||0.341||SAFE|
|A'keria Chanel Davenport||0.168||0.518||0.314||SAFE|
|Silky Nutmeg Ganache||0.217||0.606||0.177||LOW|
|Vanessa Vanjie Mateo||0.383||0.518||0.099||SAFE|
|Brooke Lynn Hytes||0.408||0.535||0.0567||WIN|
Episode 4: Trump: The Rusical
Predicted to Win: Miss Vaaaaaaanjie (Vanessa Vanjie Mateo) Actual Winner: Silky Nutmeg Ganache
Predicted to Lose: Nina West Sent Home: Mercedes Iman Diamond
Again, Nina West is predicted to lose, which I think is unlikely. Unfortunately her win last week wasn't enough to make the algorithm nicer to her. However, her P(Loss) did decrease by about 10 percentage points. I think the prediction of a win for Vanjie is a good one and I'd like to see her win a challenge!
|Ra'jah D. O'Hara||0.116||0.605||0.279||BTM2|
|A'keria Chanel Davenport||0.231||0.390||0.379||SAFE|
|Silky Nutmeg Ganache||0.255||0.466||0.279||WIN|
|Mercedes Iman Diamond||0.298||0.334||0.368||ELIMINATED|
|Brooke Lynn Hytes||0.315||0.398||0.287||HIGH|
|Vanessa Vanjie Mateo||0.424||0.296||0.280||LOW|
Episode 3: Diva Worship
Predicted to Win: Shuga Cain Actual Winner: Nina West
Predicted to Lose: Nina West Sent Home: Honey Davenport
This week's team challenge produced an unprecedented 6-way Lip Sync! The algorithm struggled again this week. I think it's stuck in a rut and over-weighing age. Maybe now that Nina West has been successful, age will be less of a factor.
|Silky Nutmeg Ganache||0.189||0.632||0.179||SAFE|
|Ra'jah D. O'Hara||0.227||0.606||0.167||BTM6|
|A'keria Chanel Davenport||0.281||0.536||0.183||BTM6|
|Mercedes Iman Diamond||0.311||0.489||0.201||SAFE|
|Brooke Lynn Hytes||0.370||0.575||0.0546||SAFE|
|Vanessa Vanjie Mateo||0.435||0.458||0.107||HIGH|
Episode 2: Good God Girl, Get OutPredicted to Win: Shuga Cain Actual Winner: Scarlet Envy & Yvie Oddly Predicted to Lose: Nina West Sent Home: Kahanna Montrese
|Silky Nutmeg Ganaceh||0.206||0.602||0.192||SAFE|
|R'ajah D. O'Hara||0.267||0.625||0.109||SAFE|
|A'keria Chanel Davenport||0.268||0.583||0.149||SAFE|
|Vanessa Vanjie Mateo||0.367||0.509||0.124||SAFE|
|Brooke Lynn Hytes||0.395||0.549||0.0561||LOW|
|Mercedes Iman Diamond||0.4||0.459||0.141||BTM2|