The Bias of Disclosure: How Knowing AI Helped You Write Changes How You Are Judged

Introduction

We have entered a new era of digital composition. Gone are the days when “writing assistance” simply meant a red squiggly line under a misspelled word. With the advent of Large Language Models (LLMs) like GPT-4, writing has evolved into a co-creative process. Humans prompt, AI drafts, humans refine, and AI polishes. This paradigm shift raises profound questions about authorship, creativity, and quality.

However, a critical psychological question remains unanswered: How do readers react when they know a piece of text was co-written by an AI?

If you read a brilliant essay, does your opinion change if you find out ChatGPT wrote the first draft? Do we evaluate the quality of the text based solely on the words on the page, or are we biased by the knowledge of the process behind it?

In the paper “How Does the Disclosure of AI Assistance Affect the Perceptions of Writing?”, researchers from Purdue University and the University of Connecticut devised a clever two-phase experiment to answer these questions. They sought to understand not just if disclosure matters, but how it changes perceptions regarding quality, originality, and even hiring potential.

Background: The Human-AI Co-Creation Paradigm

Before diving into the experiment, it is essential to understand the context. Generative AI has moved beyond simple grammar checking to become a tool for “ideation” and “content generation.” Previous research has shown that distinguishing between human and AI text is increasingly difficult, often bordering on impossible for the average reader.

However, as AI integration deepens, calls for transparency have increased. Ethical guidelines and academic policies often demand the disclosure of AI assistance. Yet, empirical evidence on the consequences of this transparency is scarce. Does transparency come at a cost to the writer?

The researchers hypothesize that disclosure might trigger a negative bias. If readers perceive AI-generated content as requiring less human effort, they might undervalue the final product regardless of its objective quality. To test this, they needed a controlled environment where they could isolate the effect of the “disclosure” variable.

Phase 1: Creating the Artifacts

To study how people perceive AI-assisted writing, the researchers first needed a corpus of writing samples produced under different conditions. They conducted a “Phase 1” study involving 407 participants.

The Writing Tasks

The participants were assigned one of two distinct types of writing tasks:

Argumentative Essay: Writing a persuasive piece on a TOEFL-style topic (e.g., whether governments should tax junk food). This requires logic, structure, and argumentation.
Creative Story: Writing a short fiction piece based on a prompt (e.g., Someone saying “Let’s go for a walk”). This requires imagination, narrative flow, and emotional resonance.

The Writing Modes

Crucially, the writers did not just write freely. They were assigned to one of three “Writing Modes” that reflect real-world usage of LLMs:

Independent: The participant wrote the article entirely on their own without AI help.
AI Editing: The participant wrote the draft but could use ChatGPT to polish, edit, or fix grammar. The AI was restricted from generating new content.
AI Generation: ChatGPT drafted the initial version of the article. The participant then directed revisions and provided feedback to the AI to shape the final output.

This design ensured the researchers had a diverse library of essays and stories ranging from purely human-made to heavily AI-influenced.

Table 1: The number of articles collected in Phase 1 for each writing mode across the two types of writing tasks.

As shown in Table 1 above, the researchers collected a robust dataset across all three modes for both task types, providing a solid foundation for the evaluation phase.

Phase 2: The Disclosure Experiment

The core of the research happened in Phase 2. A new set of 786 participants was recruited to act as “raters.” They were asked to review the articles collected in Phase 1.

Here is the twist: The raters were randomly assigned to one of two treatments:

Non-Disclose Treatment: Raters evaluated the articles blindly. They had no idea if the author used AI or not.
Disclose Treatment: Before reading, raters were explicitly told how the article was written (e.g., “The draft of this article was generated by ChatGPT…”).

Raters evaluated the texts on overall quality, willingness to hire/shortlist the writer, and specific attributes like creativity and originality. By comparing the scores of the same articles between the “Disclose” and “Non-Disclose” groups, the researchers could isolate the pure effect of knowing about the AI.

Core Results: The Penalty of Disclosure

The results revealed a significant “penalty” for using AI, but it depended heavily on the type of assistance.

1. Impact on Perceived Quality

When writers used AI merely for editing (polishing text), the disclosure had a minor or negligible negative effect on perceived quality for argumentative essays, though it did hurt creative stories slightly.

However, when writers used AI Generation (where AI drafted the content), the effect was stark.

Figure 1: Comparing average ratings of the overall quality of articles generated under the independent, AI editing, or AI generation writing modes.

Figure 1 illustrates this quality drop. Look at the “AI generation” columns on the far right of both charts:

Teal Bar (Non-disclose): When raters didn’t know AI drafted the text, they gave it high ratings.
Red Striped Bar (Disclose): When raters knew AI drafted the text, the ratings dropped significantly.

This suggests that the text itself was high quality (as evidenced by the teal bars), but the knowledge of AI involvement caused readers to downgrade their opinion. The bias is particularly strong for creative stories (Chart b), suggesting readers value human effort more in creative domains than in argumentative ones.

2. The Variance Problem: Uncertainty Increases

The disclosure of AI assistance didn’t just lower the average score; it made the scores more chaotic. The researchers looked at the variance in ratings—essentially, how much the raters disagreed with each other.

Figure 2: Comparing the variance in the overall quality ratings of articles.

As shown in Figure 2, disclosing AI generation (the rightmost group in both charts) significantly increased the variance.

Why does this matter? It implies that when AI use is disclosed, evaluation becomes unpredictable. Some raters might not care and rate based on the text; others might be harsh critics of AI. This introduces a high degree of subjectivity and “noise” into the evaluation process. If you are a writer submitting AI-assisted work, your grade depends heavily on who is grading you, rather than just what you wrote.

3. Impacts on Creativity and Originality

The researchers dug deeper into specific metrics. Did the disclosure affect perceptions of creativity and originality?

Figure C.3: Comparing the average ratings of originality of articles.

Figure C.4: Comparing the average ratings of creativity of articles.

The data confirms that the penalty extends to these nuances. In Figure C.3 (Originality) and Figure C.4 (Creativity), we see the same pattern: the “Disclose” bars (red) are consistently lower than the “Non-disclose” bars (teal) for AI-generated content. Readers seem to believe that if an AI drafted the text, the human author cannot claim the work is “original” or “creative,” even if the final output reads well.

Who Judges the Hardest?

Not all readers react to AI disclosure in the same way. The study identified two key characteristics of the raters that moderated their bias: Writing Confidence and ChatGPT Familiarity.

The Confident Writer Bias

Participants who identified as confident writers themselves were much harsher critics of AI assistance.

Figure 3: The average difference between an article’s overall quality ratings in the “Disclose” and “NonDisclose” treatments, among raters with high versus low confidence.

Figure 3 shows the “difference” in ratings (Disclose minus Non-Disclose). A negative value means disclosure hurt the score.

Purple Bars (High Confidence): These raters punished AI generation significantly (large negative bars).
Dark Blue Bars (Low Confidence): These raters barely changed their scores.

This suggests that people who take pride in their own writing skills may view AI assistance as “cheating” or a shortcut, leading to a harsher penalty.

The Familiarity Paradox

Interestingly, familiarity with ChatGPT showed a different pattern, particularly for argumentative essays.

Figure 4: The average difference between an article’s overall quality ratings in the “Disclose” and “NonDisclose” treatments, among raters with high versus low familiarity with ChatGPT.

In Figure 4 (Chart a, Argumentative essay), the Low Familiarity group (dark blue) actually penalized the AI-generated essays more than the High Familiarity group. This might stem from fear or skepticism among those who don’t use the technology. However, for creative stories (Chart b), both groups applied a penalty, with high-familiarity users being slightly harsher regarding AI editing.

The Consequence: Hiring and Ranking

The most practical implication of this study is how it affects “ranking.” in the gig economy or academic admissions, we often care about the “Top 10%.” Does disclosing AI use knock you out of the top tier?

Figure 5: Within the top percentages of articles, the percentages of articles that were written in each of the three writing modes.

Figure 5 (Chart a) reveals a concerning trend for Argumentative Essays.

The Green Striped segments represent AI-generated essays in the top tier (Top 10% to 50%).
When AI use is Disclosed (the right bar in each pair), the green segment shrinks dramatically compared to the Non-Disclose (left bar) scenario.

This means that for argumentative writing, disclosing that you used AI to draft your content significantly reduces your chances of being ranked as a top performer. Interestingly, Chart (b) shows this effect is much weaker for creative stories, likely because AI-generated stories were already rated somewhat lower to begin with, so they weren’t populating the “Top 10%” as heavily even before disclosure.

Authorship Attribution

Finally, the researchers asked a fundamental question: When AI helps, do we still give the human credit?

Figure G.1: Comparing people’s authorship attribution of writings to human writers.

Figure G.1 shows the psychological shift. When AI generation is disclosed (far right bar), the perceived authorship score drops below 3.0. This indicates that readers no longer view the human as the primary “creator” or “owner” of the text, even though the human guided the AI and finalized the draft.

Conclusion and Implications

This research highlights a significant tension in the modern writing workflow. On one hand, AI tools (especially for content generation) can help produce high-quality work efficiently—often rated highly when readers are unaware of the source. On the other hand, transparency carries a penalty.

When writers disclose that they utilized AI for drafting, they face:

Lower Quality Ratings: Readers downgrade the work, likely due to a bias against “low effort.”
Loss of Authorship: Readers hesitate to credit the human for the work.
Unpredictability: The variance in evaluation increases, making success dependent on the specific biases of the reviewer.
Competitive Disadvantage: AI-assisted works are less likely to appear in “top-ranked” lists when disclosure is mandatory.

What Does This Mean for the Future?

For platforms and policymakers, this suggests that mandatory disclosure labels (e.g., “Written with AI”) function as “warning labels” that fundamentally alter user perception. If we want to encourage ethical disclosure without punishing writers, we may need to rethink how we evaluate writing.

For students and professionals, the takeaway is cautious. While AI can elevate the objective quality of a draft, the social perception of that draft is fragile. As we move forward, society will need to decide: do we care more about the final product, or the human sweat equity required to produce it? The data suggests that, for now, we still deeply value the human touch.

Introduction#

Background: The Human-AI Co-Creation Paradigm#

Phase 1: Creating the Artifacts#

The Writing Tasks#

The Writing Modes#

Phase 2: The Disclosure Experiment#

Core Results: The Penalty of Disclosure#

1. Impact on Perceived Quality#

2. The Variance Problem: Uncertainty Increases#

3. Impacts on Creativity and Originality#

Who Judges the Hardest?#

The Confident Writer Bias#

The Familiarity Paradox#

The Consequence: Hiring and Ranking#

Authorship Attribution#

Conclusion and Implications#

What Does This Mean for the Future?#