• To log on and post you will need to create an account in the forums even if you already registered on the main Hotness Rater site. These registrations are separate.

Ratings and Sampling

When you have a user vote on which of two images is hotter, what you are really doing is sampling. You are trying to find out what the result would be if everybody voted on the matchup. If everybody voted, you would have the answer qith 100% confidence. When you have fewer people vote, you are hoping the result will conform to the result you would get if everybody voted.

The margin of error is how big a spread you need in the vote to be able to rely on the result. When you take a sample, the margin of error is 1 divided by the square root of the sample size, for a confidence level of 95%. So a sample size of 1 gives a 100% margin of error, you need to exceed a 100% difference. This means it is mathematically impossible to form any hypothesis based on your sample, such as "Pic A is hotter than Pic B". To make this more understandable, consider a sample size of 25. The margin of error would be 20%. This means if pic A wins by less than 20 points it is within the margin of error and you cannot say with any certainty that your result is correct. However, if A wins by more than 20% you have a significant result, it is extremely likely (95%) that your result is correct. Another way of saying it is that it is extremely unlikely (less than 5%) you would get a different result if you took another sample.

The math is more hairy for other confidence levels. You can see more math here: https://www.dummies.com/education/math/statistics/how-sample-size-affects-the-margin-of-error/
 

HotnessRater

Administrator
Staff member
Yes which is why I don't rate pictures off of 1 vote. I look at their voting history. If a picture wins against 10 9.5 votes and loses against 10 9.5, I can say the picture is a 9.5. They don't have to be the same picture.

If they win against 15 9.5s and lose against 10 9.5s, their rating rises beyond 9.5 until they settle in an area where their wins and loses are equal. Yes some of these wins might be wrong, but also some of their loses can be wrong. Since I get a much larger sample size. These 'errors' will cancel each other out.

If only 60% of votes are accurate, that is enough. 40% of the wins against pictures in their own group will be wrong as will 40% of the loses, which allows the remainder to push the picture in one direction or another. If there is an error in rating it is because I use just too small of a sample size. I allow a picture to be rated with only 12 votes that fall into the range. I say within the range because a 9.5 winning against a bunch of 7s is meaningless. It really only matters how they performed against pictures close to their range. That is roughly equivalent to rating a picture after having 3 match ups with your method. That would be just as error prone. I have it set low just in the interest of getting them rated to have something to show but that could probably be made more strict.

I understand the math you posted. I don't need a math lesson. I have an engineering degree and aced my math classes thank you.
 
Top