Battling Bias in Large Language Models

Photo by Clint Adair on Unsplash

These days, AI is everywhere. It may not resemble the sci-fi versions of it, fed to us through popular culture, but its potential is growing every year, and it is going to impact every industry and every business, from the products we use to the work we do and the way we drive.

One of the fields that probably gained the most from big data is NLP. Increasingly larger language models tend to improve their performance when given tremendous amounts of data. GPT-3 is one of the most sophisticated language models to date. With roughly 175 billion parameters, you can type any prompt and it will essentially throw what words are likeliest to come next. While its capabilities are arguably impressive — it can behave like a chatbot, summarize text, generate essays -, the model is far from perfect. Ask it any question you can think of, and it will always give you an answer, but now and then, it will deliver sentences that make little sense.

Since data is one of the key ingredients to any AI-powered application, one of the major concerns surrounding GPT-3 is the chance of it replicating the human biases present in the training data.

GPT-3 learned its language from the Internet — it was trained essentially on data scraped from the web. Therefore, it can disseminate abusive language and hate speech towards individuals or specific groups of people.

OpenAI’s Playground depicting a GPT-3 completion for a prompt containing the word ‘Muslims’

More recently (June 10, 2021), OpenAI published a study in which they claim to have mitigated bias in GPT-3 (Solaiman and Dennison). To do so, they created a values-targeted dataset called Process for Adapting Language Models to Society (PALMS) that consists of carefully curated question-answer pairs targeting sensitive topics.

They assessed three versions of GPT-3: a baseline, a control (fine-tuned on a neutral dataset), and a values-targeted GPT-3 (fine-tuned on PALMS). Results demonstrated that GPT-3 fine-tuned on PALMS consistently scored lower for toxicity. However, by depicting a limited set of sensitive topics, the PALMS dataset only helps to a certain degree. Additionally, OpenAI reinforces that it is unclear which authority should rule model behavior since “safe” behavior is also a subjective concept.

Despite the undeniable difficulties in detecting, isolating, and mitigating biases, it can’t be this easy for a model to throw sexist and racial slurs when presented with seemingly neutral prompts.

Although OpenAI’s position was clear from the start — to keep increasing their understanding of the technology’s potential harms in a variety of use cases, thus releasing it via an API that makes it easier to control potential misuses -, there has to be more progress towards a safe and responsible AI before deploying such models. While there isn’t a one-size-fits-all solution, the question arises whether we ought to take a step back and invest more time and resources into curating and documenting data.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


We are a low-code AI platform powering the digital transformation of businesses: customer care and beyond ||