Prismata's performance rating formula

Hi guys,

Just a quick update regarding another change that will be going in soon.

Since deploying the first version of Arena mode, the “temporary arena results screen” has had a line in it called “Performance” that’s supposed to provide an indication of how well you played during your arena run (a “performance rating”.) It currently looks like this:

The temporary arena results screen. Performance rating has been highlighted.

We’ve never explained how performance ratings are calculated, and there are actually many different kinds of formulas that can be used. We currently use the so-called “algorithm of 400” that’s commonly used in chess tourneys. After adjusting for the different weights of the games (so that blitz games are worth a third as much as long games), the formula we use looks like this:

Prismata’s performance rating calculation code. Here “gameRecord” is a list of games, each having 3 pieces of information: whether the game was a win or a loss, how many medals it was worth, and what the opponent’s rating was.

Note that all underlying ratings, including the performance rating, are simply numbers (the same as Master Tier ratings). These are later translated to Tier/% values for players that haven’t hit Tier X yet.

Problems with the current method

One phenomenon that exists with the algorithm of 400, as well as a number of performance rating systems, is the following: it’s possible to play an arena run in which you end with a higher rating than you started at, but your performance rating is lower than your starting rating.

How can this happen? Well, to give an idea, consider the following example:

You start your run at a rating of 1500.
First, you beat 3 players rated 1500.
Then, you lose against 3 players rated 1500, ending your run.

Assuming all games are equally weighted, you’d receive a performance rating of 1500. However, your final rating would be LESS than 1500, because you’d lose more points for your losses than you would gain for your wins (since you’d be beating worse players, but losing to weaker players).

Another issue is that if you’re playing players whose ratings differ from yours by a wide margin, the algorithm of 400 can disagree from the true rating update code. This can lead to inaccurate results during arena runs where many of your opponents are 100 or more rating points away from you in skill.

Alternative performance rating functions

There are a few other formulas we’re considering for performance ratings. They’re more difficult to calculate, but give more accurate results. Here are a few:

Maximum likelihood rating: Given your streak of wins and losses, what is the rating of a player that has the highest probability of having that same streak of wins and losses?

Netzero rating: Calculate the rating of a player that, upon undergoing the streak of wins and losses that you underwent, in the same order, ends up at the same rating that he or she started at (a “net change of zero”).

Unordered netzero rating: Calculate the rating X such that if a player of rating X played the same opponents you did and had the same results, but their rating did not vary from X between different games, the sum of the points gained or lost after each game would be zero. (This is similar to netzero, except the order of games doesn’t matter).

Which performance rating function should we use?

All of the above functions are fairly easy to compute (it’s even possible to do so via binary search), so the question is mostly one of taste—which seems most reasonable?

One benefit of the “netzero” rating is that your performance rating will always be higher than your initial rating if your rating increased, and lower than your initial rating if your rating decreased. But unordered netzero and maximum likelihood have the benefit of not depending on the order in which you played the games.

I’m interested to hear your comments. Let me know on Reddit!

PS…

Yes, “Magic Find” is going to be changed. Probably to “Rarity Bonus” or something of that nature. 😉

Insights and Updates

Insights and Updates

Prismata’s performance rating formula 1

Problems with the current method

Alternative performance rating functions

Which performance rating function should we use?

PS…