2.4 ML models with bias

Models might end up biased, why is that?

[source: https://www.youtube.com/watch?time_continue=1&v=tlOIHko8ySg&feature=emb_logo]

With a unsuitable reward function an undesired result can occur

  • Framing the problem Smiley face

  • Collecting dataSmiley face

    • Unrepresentative of reality
      • Collecting images of zebras only when sun shines => model might look for shadow for classifying a zebra
    • Reflects existing prejudices
      • Historical data might lead recruiting tools to dismiss female candidates
  • Preparing the dataSmiley face

    • Selecting attributes to be considered might lead to bias
      • Attribute gender might lead to bias

2.4.1 What is the relation between bias in machine learning and priors of Bayes theorem?

The theorem of Bayes gives the probability of an event based on conditions that might be related to the event. Does that mean that stereotypes can be used for machine learning models?

Daniel Kahneman, a Senior Scholar at the Woodrow Wilson School of Public and International Affairs. He is also Professor of Psychology and Public Affairs Emeritus at the Woodrow Wilson School, the Eugene Higgins Professor of Psychology Emeritus at Princeton University, and a fellow of the Center for Rationality at the Hebrew University in Jerusalem writes in his book “Thinking fast, thinking slow”:

The social norm against stereotyping, including the opposition to profiling, has been highly beneficial in creating a more civilized and more equal society. It is useful to remember, however, that neglecting valid stereotypes inevitably results in suboptimal judgments. Resistance to stereotyping is a laudable moral position, but the simplistic idea that the resistance is costless is wrong. The costs are worth paying to achieve a better society, but denying that the costs exist, while satisfying to the soul and politically correct, is not scientifically defensible.

— Kahneman, Daniel. Thinking, Fast and Slow (p. 169). Penguin. Kindle Edition.

An example for Bayes theorem can be found in chapter 2.2.1

An introduction to Naive Bayes classifiers is given in chapter 9.9

2.4.2 How to avoid bias

Avoiding bias is harder than you might think

  • Unknown unknownsSmiley face

    • Gender might be deducted by recruiting tool from use of language
  • Imperfect processes
    • Test data has same bias as training data
    • Bias not easy to discover

2.4.3 Human bias

Machine learning model can be biased for several reasons as shown above, how about humans?

  • Study in GermanySmiley face

  • Judges read description of shoplifter
  • Rolled a pair of loaded dice
  • Dice = 3 => Average 5 months prison
  • Dice = 9 => Average 8 months prison