Longshanks stats - for entertainment purposes only

  


Let's be skeptical of stats.

I studied causal inference at the postgraduate level.  Which is fancy words for saying we looked at how you can use data to draw conclusions. On the first day of the course, my lecturer said “by the end of this course, you will believe nothing. You will never trust data again.” And while of course he was exaggerating, he did inject us with a healthy dose of skepticism when it comes to stats.

In this post I’ll talk about what we are trying to analyze, two ways that rigorous statistical analysis could be done, look at how Longshanks does not do that, and finish with what I believe the best way to analyze model power in Malifaux is given the tools we have.

What makes an overpowered crew?

If we’re using stats to analyze model power, we probably want to think about what an overpowered crew is.

My view is that a crew is overpowered and needs a nerf if it is still winning when people are doing their best to beat it.  That is, if after people analyze the crew, tech against it, learn the skills they need to beat it, and they are still losing to it on a frequent basis then that is a problem.  For instance when I was a new player starting out and losing to M3E Kirai, it wasn’t that she was overpowered. I needed to skill up against summoning crews in order to be able to beat her. But if top tables at tournaments are consistently losing to a crew, they are probably doing the work and still losing.

On the other hand some folks just prefer to set things to the level of the average player.  If the average player cannot beat a thing, they would rather nerf it than skill up.  After all, people lead busy lives and don’t have infinite time to invest in the game skilling-up or researching crews.

Everything below can apply to either interpretation, but it is good to remember when analyzing what the target is for determining whether a crew is overpowered. Now onto methods of statistical analysis.

The gold standard: the experimental method.

Often considered the best possible method of analysis is conducting an experiment.

Volunteers could randomly be assigned masters to play in random matchups, and then we could observe their winrates with those masters. Random assignment controls for a LOT of things such as player skill and player playrates.

It doesn’t account for everything.  For instance if a crew is especially hard to learn, this would drag down the winrate even with random assignment.

Nevertheless, with a large enough sample size and timeline this would likely give extremely robust results that we could consider useful.  But this is logistically speaking very improbable.

Actually doable: regressions

Something that could be done, but has not been done, is regressions.

Regressions are a fancy tool that let you basically tell a computer to account for all sorts of factors when analyzing data. For instance you might tell a computer to account for:

  • Player skill
  • Opponent’s skill
  • Opponent’s crew
  • Deployment type
  • Strategy

And if you put all this in and used regressions to account for all of that, it would be data that could be considered robust and useful.  However, the result that it would spit out would likely be ‘insufficient sample size.’ The level of detail required for serious analysis would likely require significantly larger sample sizes.  But over the course of the edition it could potentially become more potent, especially as baselines for some of the crews began to get established.

If someone wanted to take a serious crack at building a rigorous analysis of Malifaux stats, they would probably use regressions.

Longshanks stats 


It is right there on the Longshanks page - their stats are for entertainment purposes only.  Why? Because they don’t do the robust analysis of something like regressions or randomized experiments.

Many things can bias Longshanks data.  It doesn’t control how often a give player plays a particular crew.  So for instance if there is a player who is THE Marcus player, they put in a lot more reps and so the winrate for that master would be skewed to match that player skill.

Malifaux players also tend to self-police.  Often players will adjust what they play to match their local meta - nobody likes absolute stomp-fests. So for example when Damian was overpowered, many top players dropped playing him, so it is likely that his real winrate stats were being dragged down.

There are also metas that vary by popularity.  So if the Silken King had particular weaknesses but was most popular in a meta where no one owned the crews that beat Silken King, that could inflate winrate.

In statistical analysis we are taught that unless you work very hard, these issues creep into the data. And with Malifaux players we see so many of these statistical issues present that the data quality is very poor.

Well, isn’t some data better than nothing? I would say no. The illusion of knowledge is worse than an absence of knowledge.  Longshanks stats are as they say on the tin - they are for entertainment purposes only.

So what do we do? How do we call for nerfs??

There is a way that I analyze a crew before I call for nerfs, and I think it is the most robust method we can use given the tools and numbers available.

I consider a crew overpowered if it wins even against people who have prepared to face that crew. Where does that happen?  Tournaments, particularly top tables.

Thus we can perform a pseudo-experiment: identify a master that we believe is overpowered.  Have top players play that crew (as some always do). Now look at multiple metas.

If we can see that a master overperforms consistently across multiple metas after developing a reputation so that players know to prepare for it, we can safely conclude that the crew is overpowered and needs a nerf.

So how soon can we make robust calls for nerfs in M4E? It will likely be quite some time before we know what adequate preparation looks like and what a meta looks like.

How soon can we make salty calls for nerfs in M4E?  Well, some discords have a salt channel. Never too early to misuse the stats and let the salt flow!  Just make sure you know when you’re using stats for salty entertainment and when you’re engaging in robust analysis.

Comments

Popular Posts