A Case for Watermarking Generative Models

Laurence Liang,Mon Sep 15 2025

This essay is a work in progress. New paragraphs may emerge, and existing ones may evolve.

It is September 2025.

Anyone with reliable Internet access can easily generate text, audio, images, video, and code for very cheap.

There are some direct implications, including but not limited to:

Photos on demand. Designers and developers who need a placeholder photo can simply "prompt" a photo on demand - in any style that they desire.
Generating code is faster and less energy demanding than writing code. Prompting code has some pretty wild leverage - a user simply types a few instructions in natural language, and code is generated - seemingly out of thin air. This makes developers write faster: in measurable terms, the "average lines of code written per minute" count increases. (Determining whether this is a valid metric is a relevant question, but let's just assume that an overall increase in the rate of code added correlates to an increase in feature rollout.)
There is no excuse for "absent" user interfaces. I enjoy demos that live in the command line, because the terminal interface is sufficent to encapsulate core API demos that I usually am interested in. However, most users prefer graphical user interfaces, and because code is available for so cheap, there is no excuse for having an "absent" or "broken" user interface. The expectations of a demo (i.e. $E(X)=?$ ) has increased, because generating tokens for code is available for so cheap.

A first question that arises is: will these foundation models eventually replace human jobs for these respective tasks?

We can draw an interesting analogy to speculative decoding (opens in a new tab): at a very high level, let's portray speculative decoding as a very large, costly model $L(X)$ periodically verifying the outputs of a small, cheap and fast model $P(X)$ . Speculative decoding, while it may sound paradoxal at first in terms of cost and speed improvements, works in pratice. We can apply the same analogy to generative models and humans - perhaps a human can play the role of $L(X)$ because humans are more sensitive to energy expended, while a generative model $P(X)$ can run faster and longer within reasonable bounds. A human, similar to the speculative decoding example, can play the role of a verifier $L(X)$ , checking in at regular intervals that generative model $P(X)$ is doing its job correctly. To answer the first question, it's possible that foundation models will not replace human jobs, but rather humans can play the role of verifiers.

A second question that arises is: if foundation models are so accessible and cheap, is there still a demand for human-generated work?

Let's take the case of writing. Anecdotally, people can still tell whether a blog post or essay has "characteristics" of being generated by a language model - and people still yearn for meticulously crafted written works by human authors.

This may not be a universal case. A simple counter-example is a hackathon environment, where participants simply want code that works, not necessarily hand-crafted code (though there was a time where handcrafted code was commonplace).

However, assuming that the demand for LLM-free content is still generalizable to a variety of other tasks, how can we verify that an output (text, image, audio or other) is devoid of any LLM-generated content?

A complication is that language models are already in the wild. They are not sandboxed - any text that is on the web from a language model has no metadata linking it to a language model - text by itself is free.

Logging may also prove to be difficult for every model out there. Open weights models are runnable on any local machine, and the recipe for training models is out there. Sure, maybe there are moats to being able to run the most sophisticated models that can only be run by select companies who can afford the compute resources needed. Perhaps the most intelligent frontier models can be fully sandboxed and logged. But even an "average" open weights model out there today can generate convincingly consistent output in a variety of modalities.

It would also be difficult to watermark every LLM out there, unless we can magically recall all the open weights ones out there. Or maybe, we could hope that all future LLM releases will have identifiable, jailbreak-safe watermarks in the outputs and that these models will eventually phase out pre-watermark models by shear volume and future releases. Though this is speculation.

Perhaps the following points are fully speculative as well and nothing more.

One approach is analogous to the tests in Blade Runner that assess whether a subject was human or android. A series of questions and answers.

Perhaps we would have to have a long conversation with a model to determine if it has a breaking point.

Or maybe, we could just find one-shot cases that require a very long output. Suppose there exists a length $L_k$ for any generated output $o_k$ longer than length $L_k$ , there emerges some content $s_k \in o_k$ that is identifiably from a language model.

Proving the existence of such content $s_k$ and that a length $L_k$ exists for every single generated output $o_k$ may be challenging. Though if we assume that current open weights models cannot be put back in a sandbox, nor be superseded by future releases of watermarked foundation models, the $L_k$ approach may be a first remedy for identifying LLM-generated outputs.

Though I would love to be proven wrong.