Player Reliability

As a rule of thumb, the list team considers the opinions of every victor when initially constructing an opinion list. However, opinions may be weighted differently based on several factors that inherently affect player reliability. Players will be asked by a team member to self-assess their reliability for each list opinion on a scale from 0 to 10. Reliability factors that players should consider include but are not limited to the following list:

  • Verifications: A level's difficulty may be adjusted significantly during the verification process, which can significantly affect the verifier's opinion compared to other players. Verifiers that playtest the layout prior to completing the full level are also subject to inherent biases. As such, unless there are little to no victors of any given level, the opinion of its verifier will likely be omitted from the opinion list.

  • Outliers: A player may simply have an opinion that drastically differs from other victors of the level. We typically omit these opinions as well, as a clear consensus may exist when excluding outlier opinions.

  • Previous Experience: A player's experience with other list demons significantly affects their reliability when we collect their opinions. In particular, if the level in question is the player's hardest demon, we will likely omit their opinion because they would not have other list demons to use for comparisons. We will also consider the opinion less if the player has not beaten many list demons of similar difficulty to the level in question. To help identify this uncertainty, we ask players to provide the list team with an opinion range between two levels they have completed. Failure to do so decreases the reliability of the opinion and could lead to a less accurate placement.

  • Level Experience: A player may be generally reliable across multiple list demons but could have an experience with a particular level that makes their opinion for it less reliable. Such experiences include completing the level without dying at parts that most other victors died to (commonly called "fluking"), or taking unusually long to complete the level due to getting "unlucky." The player's perspective on the level's difficulty is often skewed in either of these cases, so we will likely omit their opinion unless there is a lack of many other reliable opinions for it.

  • Time Between Completions: When comparing the difficulties of two levels, player opinions may be unreliable if there is a large time gap between completions. For instance, a player's skill level may improve greatly in periods as short as a month or two, so players should consider if these differences apply when assessing their reliability.

Player Exclusion

Players that do not cooperate with the list team when providing opinions (e.g. deliberately providing inaccurate information) will usually not be considered for future list placements or movements. Players may also choose to be excluded from providing list opinions by contacting a member of the list team.

Low Detail Modes

In addition to these factors, sometimes a level may contain a built-in Low Detail Mode that makes it significantly easier. Since all built-in LDMs are considered acceptable for list records, if the LDM makes a particular level much easier, the list team will primarily consider opinions from players that use it. This emphasis allows the list placement to correspond to the easiest "official" version of the level.

Opinion List Analysis

Once the list team has an opinion list, we aim to analyze its statistics to determine the most likely accurate placement given what we know. The final analysis begins once reliability is considered; if the eventual placement does not seem to line up with some opinions you saw from other players, then it's likely that one or more of the factors above decreased their reliability.

Sometimes, there is a lot of variation in the opinions we receive! Some recent notable examples include Wasureta and Void Wave (something about purple levels, maybe). While we will try our best to figure out a consensus with what we have, please keep in mind that these placements are significantly more likely to change following the initial estimate based on the opinions from new victors.

Mean vs. Median

Although the pure average of the list of opinions is not always the best way to estimate the proper placement, it's usually a good starting point. Especially if the distribution of opinions is relatively symmetric and if there are a lot of victors, we often use the average as the tentative placement for the new level.

However, because no distribution is perfectly balanced, we try to capture any "skew" in the data that stems from reliable opinions. As such, the median is never a good metric for determining list placements.

Example: Let's say a level has three opinions so far: #75, #77, and #82. Without any other opinions, we would consider #78 - the average - to be a reasonable placement for it. The median of this list is #77, which does not account for the "skew" towards the higher value (#82 is farther away from the median than #75).

Limitations of the Mean

Nonetheless, using the mean has its weaknesses as well. Most importantly, any outliers in the dataset have a magnified effect on the average, which could offset it away from an otherwise clear consensus.

Example: Let's say a level has five opinions: two at #67, two at #68, and one at #84. The average of this distribution is #71, but without the outlier opinion (#84), we see a clear consensus of either #67 or #68. We'd likely pick one of those two placements as a good fit for the level in question.

Considering Existing List Demons

In some cases, such as the example provided above, there may be more than one possible placement that we think is reasonable for a list-worthy level. When we have multiple options, we often consider the levels currently occupying those positions and whether they may be moved up or down in the near future.

For instance, if two possible placements for a level are #103 and #104, and the level currently at #103 is generally thought to be underrated, #104 would be a better placement for the new level.