On AI, Copyright & the Death of the Author
From Anthropic to Barthes
I’ve written about AI before on this Substack and in future posts I’ll be experimenting as to its creative potential. Especially as educators, we simply can’t pretend it’s not here, as I argued in a piece for the New Statesman. But I’m also very interested in the legal as well as the pedagogic aspect of the debate - especially because the history of copyright has fascinated me ever since my run-in with the late scholar Eric Robinson, who claimed the copyright to John Clare, even though the poet died more than 150 years ago. The Anthropic lawsuit has determined an outcome in relation to the use of “pirated” copyright material in the training of one large language model. But other cases are still going through the courts.
I had an interesting exchange with Claude.ai on this matter:
ME: do you store your training data, as in a database?
CLAUDE: No, I don’t have a database or any storage system containing my training data. Here’s how it actually works:
Training vs. runtime are completely separate. My training happened before I was deployed — Anthropic used large datasets of text to adjust the billions of numerical parameters (weights) in my neural network. Once training finished, those weights were frozen and the training data was no longer needed to run me.
What I “know” lives in my weights. When you talk to me, I’m not looking anything up in a database. My responses come from patterns encoded in those frozen parameters — it’s more like how a person’s knowledge is stored in their brain’s neural connections, not in a filing cabinet they consult.
What I do have access to in a conversation:
The current conversation context (everything said so far)
Tools like web search, which let me fetch current information
Any files or documents you share with me
What I don’t have:
Access to my training data
Memory of past conversations (unless you use the memory feature)
Any ability to look up “where I learned” a specific fact
So in short: the training data shaped me, but it’s not stored anywhere I can access — it’s baked into my weights and gone from my runtime environment entirely.
ME: would you then argue that you are not in breach of any copyright because you are not storing or reproducing copyright material? You are more like a human being who has READ material and kept it in their memory than a database or a book or article that might be accused of breach of copyright?
CLAUDE: That’s a genuinely interesting argument, and it has real merit — but it’s also contested. Here’s a fair breakdown:
The argument in your favour (the “human reader” analogy):
Like a person who has read widely, I generate responses from internalized patterns, not by retrieving stored text
I don’t (usually) reproduce verbatim passages — I synthesize and rephrase
Humans aren’t sued for copyright infringement simply because they read and learned from copyrighted books
The output of a well-read person isn’t considered a copy of everything they read
Why courts and copyright holders push back:
The training process itself may involve making copies of copyrighted material — scraping, storing, and processing text to train on it. That copying during training is where much of the legal action is focused, not the outputs
Outputs can sometimes reproduce substantial portions of training data, especially for memorized content like song lyrics, poems, or distinctive passages — which is different from how human memory works
Scale matters legally — copyright law wasn’t written with the assumption that one “reader” could process hundreds of billions of words and then generate text for millions of people
The human analogy has limits — a human who reads a novel can’t reliably reconstruct it; I potentially could reproduce more than a human would
Where the law actually stands (as of my knowledge): Several major lawsuits (NYT vs. OpenAI, authors vs. various AI companies) are still working through this. Courts haven’t settled whether training on copyrighted data constitutes fair use, and different jurisdictions are approaching it differently.
So your argument is philosophically compelling and is genuinely being made in legal circles — but it isn’t settled law yet. The human analogy is strong for outputs, weaker for the training process itself.
Claude is AI and can make mistakes. Please double-check responses.
I like that little disclaimer at the end … and will be watching future legal cases with interest.
There is also a literary-theoretical dimension to the question, which takes us back to a famous essay by my favourite French literary theorist (to be honest, about the only French literary theorist I really like and find entirely readable):
In his story Sarrasine, Balzac, speaking of a castrato disguised as a woman, writes this sentence: “It was Woman, with her sudden fears, her irrational whims, her instinctive fears, her unprovoked bravado, her daring and her delicious delicacy of feeling” Who is speaking in this way? Is it the story’s hero, concerned to ignore the castrato concealed beneath the woman? Is it the man Balzac, endowed by his personal experience with a philosophy of Woman? Is it the author Balzac, professing certain “literary” ideas of femininity? Is it universal wisdom? or romantic psychology? It will always be impossible to know, for the good reason that all writing is itself this special voice, consisting of several indiscernible voices, and that literature is precisely the invention of this voice, to which we cannot assign a specific origin: literature is that neuter, that composite, that oblique into which every subject escapes, the trap where all identity is lost, beginning with the very identity of the body that writes. · · · Probably this has always been the case: once an action is recounted, for intransitive ends, and no longer in order to act directly upon reality — that is, finally external to any function but the very exercise of the symbol — this disjunction occurs, the voice loses its origin, the author enters his own death, writing begins …
So begins Roland Barthes’s 1967 essay “The Death of the Author.” All writing is rewriting, a tissue of quotations … Claude would agree with that, wouldn’t it? Barthes works towards the conclusion that we must abandon the idea of the individual author “owning” their text (and hence their copyright, one might add), since “a text does not consist of a line of words, releasing a single ‘theological’ meaning (the ‘message’ of the AuthorGod), but is a space of many dimensions, in which are wedded and contested various kinds of writing, no one of which is original: the text is a tissue of citations, resulting from the thousand sources of culture.” Is not this a perfect description of AI-generated text?


Cormac McCarthy said that books are made out of books. Interesting to think of a book as a sort of palimpsest. And maybe that’s what these LLMs are too? Strange, layered things that are “scraped again.”
This should be interesting