Now We’ve Got Proof that Wikipedia is Biased

30.06.2024 05:46

Let’s put a final nail in the coffin of Wikipedia as a politically unbiased source of information. In a recently released report, Manhattan Institute Senior Fellow and research scientist David Rozado demonstrates that Wikipedia routinely confers more positive treatment upon leftist politicians, institutions, and media sources than on their right-leaning counterparts. That bias, in turn, feeds into artificial intelligence language-generation modules like ChatGPT, which use Wikipedia as a major source of training data.

Anecdotally, many of us have known for some time that Wikipedia does not exactly call it down the line. Take, for example, its entry for “Trans woman,” which begins with the statement “A trans woman (short for transgender woman) is a woman who was assigned male at birth.” There is not even a hint to suggest that such a definition is politically contentious and represents a minority view, after all 57 percent of Americans believe sex is determined at birth. A definition of “trans woman” supported by that 57 percent majority would be something more akin to “a biological male who insists on identifying as female.” But far from presenting any such definition, Wikipedia does not even acknowledge the existence of a fundamental disagreement. The largest number of words in the entry is devoted, instead, to discussing the discrimination trans women face in our mean and bad society. (READ MORE: Victory or Defeat? The Supreme Court’s Abortion Pill Ruling Is Neither.)

Many of us also find Wikipedia to be a frustrating source of misinformation on politically fraught topics about which we happen to be knowledgeable. In 2018, for example, I wrote a widely read essay published in Tablet explaining the sinister ideology of “Cultural Marxism” that had infiltrated academia and, increasingly, society at large. I debunked, in the process, the oft-perpetuated canard that “Cultural Marxism,” when it is properly understood, is any sort of anti-Semitic conspiracy theory. My essay was informed by firsthand reading of the original source material I discussed, as well as reading of relevant academic commentary, and documented with links to various sources. Yet Wikipedia’s entry for “Cultural Marxism” is a mere redirect to an entry on the “Cultural Marxism conspiracy theory,” which continues to call it “a far-right antisemitic conspiracy theory that misrepresents the Frankfurt School as being responsible for modern progressive movements, identity politics, and political correctness.” The balance of the article is a shameless exercise in the cherry-picking of sources to make the case for the conspiracy theory version of the story while ignoring the actual and rather direct intellectual lineage between the mid-20^th century Frankfurt School’s “critical theory” and its contemporary offshoots in “critical legal studies,” “critical race theory” and “critical gender theory” — all major players in the coming-to-be of our toxic identity politics.

Don’t Worry. You Aren’t Just Imagining It.

But none of this proves Wikipedia is politically biased because, after all, we could be the ones with the biases or, perhaps, just being drawn to egregious instances that don’t represent a fair sample of the larger corpus of Wikipedia articles. This is where David Rozado’s research comes in.

Starting with a list of politically salient terms (such as names of recent Presidents, state governors and members of Congress, as well as Supreme Court justices), Rozado traced every mention of such terms in any Wikipedia article and then fed a random sampling of Wikipedia text snippets containing each such term to ChatGPT to have the ChatGPT automatically annotate the snippets with the prevailing sentiment/emotion associated with that snippet — and, as a bottom line, whether that sentiment/emotion was primarily positive or negative. For example, in the Wikipedia entry on Donald Trump, a sentence (appearing in the second paragraph of the article) claiming that “[a] special counsel investigation established that Russia had interfered in the election to favor Trump” — talk about a debunked conspiracy theory! — would likely be seen as discussing Trump in a primarily negative context. To verify that the methodology was sound, Rozado tested it on terms (corruption, tyranny, or violence) and people (Osama bin Laden or Adolf Hitler) that pretty much everyone would agree are primarily seen by us as negative and other terms (joy, healing or compassion) and people (Frederick Douglass or Mother Teresa) that we all generally see in a positive light, and indeed, ChatGPT correctly coded the contexts in which those terms and people generally appeared in Wikipedia articles as positive and negative respectively. (READ MORE: A Quandary For Those Touting the ‘Rule of Law’)

In addition to political figures, Rozado also went through the same process for political leanings and ideologies, such as “far-right,” “conservative,” “liberal,” “far-left” and “progressive,” for popular journalists and media figures of all political persuasions, such as Ann Coulter, Tucker Carlson, Fareed Zakaria, Paul Krugman and Arianna Huffington and for media institutions, such as Fox News, NPR, the Nation and Breitbart.

Rozado’s results demonstrated that Wikipedia has, at the very least, “a mild tendency” to place terms associated with the right side of the aisle in more negative contexts, but the results were starker in the case of presidents, members of Congress, and state governors. Wikipedia’s tendency to favor the left-leaning figures came through quite clearly and distinctly. Wikipedia’s bias, in other words, is not just in our heads; it is real and demonstrable.

Using the prominent psychologist Paul Ekman’s basic emotion terms — anger, disgust, fear, joy, sadness, and surprise — Rozado then also had ChatGPT code which of these emotions was most strongly associated with Wikipedia’s mentions of the various political figures, institutions, and ideologies. Lo and behold, while the right-leaning terms were most strongly associated with the emotional categories of anger and disgust, the left-leaning terms were presented in contexts most often suggestive of joy. The thought of leftists appears to send Wikipedia editors to their happy place.

Bias at Wikipedia Means Bias in AI

Lest we attempt to console ourselves with the notion that such bias is confined to Wikipedia, Rozado disabuses us of any such illusion. Wikipedia, he reminds us, is one of the main sources of raw data the creators of large language models such as ChatGPT and others use to teach these bots how to speak and what to say. Do Wikipedia’s biases, then, inform downstream biases in these chatbots that are becoming ever more ubiquitous among us?

Answer: yes. Rozado finds that the kinds of words among which the political terms he traces tend to be embedded in Wikipedia overlap with the kinds of words among which those same terms tend to be embedded in ChatGPT training data. Since Wikipedia is a major source of such training data, that is not at all surprising — but we should, then, likewise not be surprised to find that the output we wind up getting from the likes of ChatGPT is also politically biased, as I have discussed elsewhere.

As I have also previously explained, most large language models like ChatGPT have at least three possible stages at which bias may be introduced. There is what I would call “hard-coded bias,” which is what happens, for example, when human programmers make an explicit decision to force their chatbots to promote a leftist vision of “social justice,” even while declining to answer certain questions they deem racist, sexist or otherwise “problematic.” Most of us have probably heard, at this point, of the rather glaring instances of such hard-coded biases that led Google, earlier this year, to shut down the image-generation component of its Gemini bot due to such lowlights as representing even America’s Founding Fathers and German Nazis as racially diverse.

Second, there is what I would call “reinforcement learning bias,” which may be introduced at the stage where the output of a chatbot is fed to human subjects, who then are then asked to evaluate how satisfied they are with varying responses to given queries; their feedback then drives further, iterative rounds of product development. How those human subjects are chosen and what they are instructed to do matters greatly to what biases may come through in the end, of course. (READ MORE: AI Has Thoughts on Trump’s Guilty Verdict)

The third type of bias that might be introduced is what might be called “training data bias.” This is the stage at which large language models are fed the raw data they use — from sources like Wikipedia — to develop their language-generation capacities and simultaneously arm them with the very content they draw on in responding to our prompts much later down the line. In many ways, this level of bias is the most insidious, for once introduced, it becomes sort of like the inconspicuous wallpaper by which we are surrounded.

If the vaguely off-white virtual rooms we keep finding ourselves in are ever-so-subtly blue-tinted rather than red-tinted, before long, that shading is normalized, opening the door for more and more pronounced shades of blue in subsequent go-rounds. As the output of AI chatbots, in other words, comes to occupy more and more of the World Wide Web in which we are caught, the training data for future versions of such bots will inevitably be drawn in larger and larger part not from human-generated text but from prior chatbot outputs. In this manner, even a comparatively small political bias that is smuggled in at the initial stages of development may be amplified many times over before all is said and done. In the end, what we get is an utterly distorted, AI-generated picture of reality in which there are fewer and fewer sources we may chance to come upon that show us the way out of the labyrinth.

This is part of why Rozado’s work exposing Wikipedia’s political biases and their impact on ChatGPT’s biases is so important. We are still at an early stage when it remains possible to do such eye-opening research exposing the biases that are being imperceptibly sewn into the fabric of our society — and, if we proceed wisely and strategically, to do something about it, shouting about it from the rooftops being the first critical step in that process. We will not be at that early stage for much longer.

The post Now We’ve Got Proof that Wikipedia is Biased appeared first on The American Spectator | USA News and Politics.