Examples of scoring disfunction

I looked to see whether I’d had a beer from a US state and then looked at states I’d had beers from and looked at Illinois. More beers from there than I’d have guessed. They are all Goose Island beers. OK, sure.

Order them by score. Gosh they have a lot >4.00 beers. (Amusing myself) - Goose Island has 64 beers that score over 4 and Germany one (unless you count the 2010 130 bottle retired lambic).

Yeah, sure.

But hold on, I’ve had this one:

Redline IPA. How does that show up in the brewery list as having a score of 4.03? There are 3 rating visible: me (3.5) mkel07 (3.5) (most rates for an Australian) and macca147 who’s been around for ages (3.2). That’s a mean of 3.4. And the weighted average is 3.13. So it shows up as having a score of 4.03 by dint of 4 non review ratings which we can’t see and can’t tell who left them.

This is an absolute joke.

[edit] Years into the process and it’s not possible to distinguish between failed development and vandalism.

1 Like

There are 7 ratings and 3 reviews. Meaning 4 ‘ticks’. If the ticks are included, they must have an average of 4.5.

I don’t ‘tick’ but I imagine that it’s easier to give higher scores when working with a simpler system (I’d likely have quite a few 4 or 5 star ticks if this were my preferred method). Look at the high scores some mediocre beers get on Untappd.

So, it’s not a great system, and probably favours more readily available macros as these will attract the casual and less discerning drinker. Unlikely to be subterfuge although maybe I’m too trusting…

Yeah, now they show the REAL AVERAGE FOR ALL RATINGS (REVIEWS +TICKS) but with the OVERALL/STYLE SCORE BASED ON THE WEIGHTED AVERAGE BASED ON REVIEWS ONLY.

This bring us with stupidities like this:

It’s easy to compare two extremes against one another… Unfortunately due to it’s high profile and accessibility, Kentucky brunch is a very problematic beer that is plagued with:

  • Users who only have 1 review or rating (most of whom are unlikely inactive users)
  • Users who have accidentally rated the beer
  • Users who have used this beer to test out the review/rating feature

There are many more other factors that play a part, but I’m confident those 3 are the biggest contributors. I’ve gone ahead and pulled in some statistics of the other Top 10 - All-time - In production beers so we can have more holistic view of what is happening.

It is important to also consider the Performance of the feedback here.

  1. Dogfish Head Scratch-Made Hand Sanitizer (TP Aged)
    Straight average: 4.85
    Weighted average: 4.63
    Variance: +0.22
    Performance: 28 ratings, 100% reviewed
    .
    With a 100% rating/review ratio, we see the weighting in full effect here.
  1. Toppling Goliath Kentucky Brunch
    Straight average: 3.54
    Weighted average: 4.52
    Variance: -0.98
    Performance: 1,026 ratings, 15% reviewed
    .
    15% ratio… compare that to every other beer in the Top 10
  1. Närke Kaggen Stormaktsporter
    Straight average: 4.27
    Weighted average: 4.48
    Variance: -0.21
    Performance: 812 ratings, 70% reviewed
  1. Westvleteren 12 XII
    Straight average: 4.43
    Weighted average: 4.42
    Variance: +0.01
    Performance: 7,283 ratings, 50% reviewed
    .
    If you compare the variance to performance of this beer, it is actually giving you insight RateBeer never highlighted before. The fact that it is able to maintain such a small variance with a high volume of ratings means that it is a very consistent performer for both reviewers and tickers.
  1. Schramm’s The Heart of Darkness
    Straight average: 3.91
    Weighted average: 4.42
    Variance: -0.51
    Performance: 287 ratings, 34% reviewed
    .
    A low review percentage in this example here doesn’t necessarily translate to a higher variance.
  1. Toppling Goliath Mornin’ Delight
    Straight average: 4.14
    Weighted average: 4.40
    Variance: -0.26
    Performance: 511 ratings, 55% reviewed
  1. Three Floyds Dark Lord - Bourbon Barrel Aged
    Straight average: 4.25
    Weighted average: 4.36
    Variance: -0.11
    Performance: 660 ratings, 63% reviewed
  1. Russian River Pliny the Younger
    Straight average: 4.19
    Weighted average: 4.35
    Variance: -0.16
    Performance: 1,289 ratings, 58% reviewed
  1. Three Floyds Dark Lord - Bourbon Vanilla Bean
    Straight average: 4.35
    Weighted average: 4.34
    Variance: +0.01
    Performance: 572 ratings, 76% reviewed
  1. Schramm’s A Smile of Fortune
    Straight average: 4.33
    Weighted average: 4.35
    Variance: -0.02
    Performance: 102 ratings, 66% reviewed
2 Likes

It is very important to understand the reason for this change. All the pros and cons were communicated in a previous thread before we went ahead with it.

I would hazard a guess and say that the RateBeer score is still the more important score to most of us here. Just like Overall scores, the Average ratings provide value for a different audience.

However… when you compare these scores together, this is when you can get a better picture of how certain products are perceived by a whole range of different customers.

An acceptable or negligible variance for me I feel would be 0.4 (just under half a star). With the small sample base above; we see 8/10 beers meeting well below that number. I’ll have a talk to @joet to see if anything can be done to normalise Kentucky Brunch’s straight average rating, but the beers RateBeer score and weighted average should already speak volumes.

1 Like

You just noted 3 flaws of the website that are currently on and not taken care of.

Users with only 1 ratings shouldn’t count normally to the score (real or weighted), even more so if it’s a Tick compared to full review because they cannot be verified. (the tick number before it counts toward the score should be way higher than the Review number, since it’s so quick and easy to make false ratings)

We can still Quick Rate a beer without any form of confirmation leading do many Accidental Ratings (but we need to confirm if we want to Remove a beer from our Favorites)

Users used this beer to test it: since it’s the first beer listed on the Main beer page since forever, it’s probably right (a mix between accidental ticks and test ticks) but what can admins do about it? Erase 700 ticks to replace the score where it should belong ???

No matter the reason, The result right now is that the BEST BEER OF WORLD OF ALL-TIME right now on RB is displayed with a score of 3.54 which makes the site look like a fucking joke.

And it’s been there for 2 weeks now as the NEW BEST BEER IN THE WORLD and no official DEV even notice this thing is just a farce, not even one rating serious… Just to let you remember that is site is still barely alive depending if the Admins decide it still is or not…

It’s almost impossible right now to understand how scores works on this site because the HOW OUR SCORE WORKS page and STATISITCS pop up (no typo, as displayed… because, yeah Beta Testing is forbidden here, Live Error is king) definitions/explications are not even correctly worded. All of this have been reported… The page already exist, this is marely some Text Replacement. And you either have to stick with one kind of listing (real average or weighted average) and stick with it for all scores because most people won’t search how the maths work for the score and won’t understand why a 3.54 beer is top of the world when the next one with higher score is not…

1 Like

I agree it shouldn’t count towards weighted, and it’s not. I disagree with a straight/real average however, that would make the score inaccurate if you were to put special conditions around it. And we don’t need another REAL Real average rating amongst the mix.

We have a RateBeer score/weighted average combination that already accurately ranks products in the correct order.

Whilst it may not be exactly the same, the logic in this adaptation is somewhat similar:
Metacritic

I’ve reached out to Joe to see if he has any thoughts on what we could do ethically. Regardless of what happens, anything that will cause a change in the RateBeer score/weighted average should be avoided.

And ignoring/hiding the value of our many ‘ticks’ because it’s causing problems instead of trying to find a way to solve it is also just as bad. Or we can also continue to support users contributing to the problem by using ticks as a way to organise their cellar instead of opening up a way to look into a proper feature.

I think we’re also forgetting about the beers with a low amount of ratings/reviews here. The beers in which we could add up all the scores and calculate the real average, most of our new beers actually. To me that was super shady to see that the average didn’t calculate as expected. And I don’t think I’m alone there.

You’ll need to take this one up with Joe, as I understood… we’ve already made the changes that were applicable at the time. This one is out of our hands.

If you can Viper, please help us find more beers which present the same problem that Kentucky Brunch does. We are still treating this as an exception and will look to correct it somehow, but for now… we can’t find any other product which has such a large variance and low rating/review ratio.

Well right now for example, Best beer in the World 2018 #3 (Wetvleteren) Have a better displayed score than #2 (närke) and #1 (Kentucky)…

So an An-InBev subsidiary is receiving inflated ratings from a beer rating website that is also an AB-InBev subsidiary? Shocking.

My God, this sight has gone to shit.

2 Likes

Funnily enough, one of the three 5.0 ticks that boost it comes from an employee of Cascade, which is ABI-owned. How cute.

1 Like

You see an example of “super shady” dealings coming up in a site bought up by ABI right in the first post. An ABI beer boosted by among others by an ABI employee gets a crazy high score.

You know what’s also super shady with that? That just about all ABI beers have a higher score because of letting such ratings count. Of course, that might help other similar producers, but people will latch onto that.

This makes RB look bad and may well be used to further bring the site down.

Right now, the site was brought to such a level that the 20 year anniversary event that’s being flaunted around has 11 “interested” + 2 on the duplicate event:

The site has been brought to that. By nothing but the developments in the last couple of years and exclusively that.

Think about that. What the most respected beer site in the world at some point and is still the best site for some regions due to the hard work of admins and users counts for nowadays. And you think that letting abusers’ ratings count will help it… Jesus… way to bring down already terrible optics.

5 Likes

Wow. This thread really opened a can of worms…

You see an example of “super shady” dealings coming up in a site bought up by ABI right in the first post. An ABI beer boosted by among others by an ABI employee gets a crazy high score.

Funnily enough, one of the three 5.0 ticks that boost it comes from an employee of Cascade, which is ABI-owned. How cute.

So an An-InBev subsidiary is receiving inflated ratings from a beer rating website that is also an AB-InBev subsidiary? Shocking.

But most of all

And it’s been there for 2 weeks now as the NEW BEST BEER IN THE WORLD and no official DEV even notice this thing is just a farce, not even one rating serious… Just to let you remember that is site is still barely alive depending if the Admins decide it still is or not…

Too bad I forgot my popcorn before reading…

2 Likes

To be fair, that’s one user in one case, no ABI conspiracy as such in that particular case.

But it’s sad really - you have some limits that were there because of 2 decades of experience with how people act - cheat, hype, bomb, etc. on websites like these. And then they randomly get removed because… reasons. It allegedly looks far worse to have beers with zero score than beers scoring 5.0 from a single textless 5.0 from a one-time visitor.

1 Like

Probably not, but who can tell us who all those BarrelDunkel87, RaterTap43, ResinousSampler23 are?

1 Like

What? No. It’s not.

What?

4.03 with 3 reviews under 3.6, 1 private review at 3.0 and 3 private reviews with perfect scores from raters with no ratings. So the three reviews that shouldn’t count have the biggest impact on the score.

2 Likes

Don’t get me wrong on ticks, I’m one of the few admins who think ticks are a good think to complement reviews to have a better overall real score. But, with the number of unverified / fake accounts (all the NAME1NAME2XX accounts right now) and the fact that the site doesn’t have many reviewers/tickers anymore you simply can’t let abnormal ticks destroy all beer scores. This is why we all say that you should implement user account verification, coupled with a minimum requirement before the user scores count.

As for users using Ticks as Beer Cellar (I’m guilty as charged myself), we lost this very valuable function 2 years ago and it was never ported to phoenix so we have no working option right now. This mean you guys need to get to work on this, we can’t. And as those Cellar ticks breaking beer scores, don’t remember those beer normally don’t stay in cellar indefinitely so the scores should fix themselves overtime… so this problem is way less important than fraudulent ratings.

3 Likes

Yes Joe, yes it is, though CUB is about to be sold to Asahi, and someone got ahead of themselves and already edited the name, it most definitely was ABI at the time of the review and AFAIK still is (albeit not for long).

We are talking about the three 5.0 ticks, not the three legitimate ratings like hawthorne’s

1 Like

It would be really nice again to use our beer cellar again and it seems it would prevent future problems with people using ticks to keep track of what they want or have.

@joet

1 Like