Skip to content

AI “water papers” face a 1-year suspension and co-signing penalties! arXiv’s strictest new rules are here, with Terence Tao in support

· 量子位
国内AI

A person who submits a paper padded with AI-generated content and is found to have done so after fact-checking will be banned from submitting for one year.

To resume submitting afterward, they must first pass peer review; only then will they be eligible to return to arXiv.

That is the new rule announced on X by Thomas Dietterich, head of arXiv’s computer science division. The wording is firm and leaves no room for debate.

If there is solid evidence that a paper contains LLM-generated content that the authors have not verified, all signed authors will be punished together. There are no exceptions.

Following the announcement, Terence Tao also weighed in.

As one of the mathematicians best known for actively embracing AI, he posted on Mathstodon and examined the new policy one by one against four guidelines he had previously proposed. He then summed it up as follows:

In an era when generating something is far easier than reading and digesting a paper, this is the right direction.

The post sparked a huge response online. Some applauded it, while others said this should have been done long ago.

On the other hand, some raised the question: “So does that mean research use of the internet should have been banned back in 2005?”

Others criticized the policy more broadly, asking: “Under this kind of collective responsibility, are all coauthors supposed to verify every single reference in a paper?”

arXiv’s strictest new policy: you are responsible the moment you sign

In physics, mathematics, and computer science, most papers are first posted on arXiv as preprints before entering peer review.

arXiv is one of the most important pieces of infrastructure supporting global scholarly communication. It is currently preparing to separate from Cornell University and transition into an independent nonprofit organization, which could give it more resources and agility to take a more active role in operational improvements.

The essence of arXiv’s new policy is just one thing: once you sign, you are responsible, regardless of how the content was produced.

Here is what Thomas Dietterich said:

If generative AI tools produce inappropriate language, plagiarism, biased content, errors, false citations, or misleading material that is incorporated into scholarly writing, all authors are responsible.
We have recently clarified the sanction standard for this kind of conduct.

It is worth noting that the new policy does not ban AI use itself. It is fine to use AI to polish writing or assist with literature review.

What arXiv is drawing a line against is something else: whether you have truly read the paper you are signing.

So what counts as “solid evidence”? Dietterich gave several examples.

A “hallucinated citation” to a paper that does not exist.

For example, if a manuscript still contains an LLM meta-comment such as, “This is a 200-word summary. Would you like me to revise it?”

Or if a table still contains an unfilled placeholder like, “Insert experimental data here.”

If a formally submitted paper still contains such elementary mistakes, it suggests the authors handled it carelessly and did not seriously check their own work.

And once such evidence is found, the rest of the paper is also treated as untrustworthy.

If a violation is confirmed, the authors will be banned from submitting to arXiv for one year, and after the ban is lifted, any new submission must first go through formal peer review.

Moreover, the punishment is collective: all signed authors are subject to it.

Dietterich says this is a “one-strike” rule, though authors do have the opportunity to appeal.

Internally, the process is said to work as follows: a moderator first records the issue, and then the section head reviews it before the sanction is carried out.

Even so, while arXiv’s latest punishment targets low-level mistakes, it is undeniably quite strict.

Tao also supports it: digesting a paper is harder than producing one

The new policy quickly became a major talking point, and Tao offered his view in response.

He made several posts on Mastodon, using the four proposals he had presented in a talk as the framework for evaluating arXiv’s new policy one by one.

Those four proposals were:

  1. Clearly define and strictly enforce the scope of acceptable AI assistance in traditional workflows;
  2. Avoid overemphasizing “being first” or “solving the problem,” and instead place more value on “digesting” the result;
  3. Create new kinds of challenge problems for submitters who rely heavily on AI, so that real scholarly value can still be produced;
  4. Clearly explain the goals of a project — both explicit and implicit — and why they matter, whether the workflow is traditional or nontraditional.

His overall view was that the new policy aligns very strongly with the first two proposals.

The scope of responsibility arXiv has now set out is exactly an implementation of Proposal 1: it explicitly defines what counts as “acceptable AI assistance.”

The overall philosophy behind the new policy also directly reflects the core of Proposal 2.

Tao’s key judgment was this: in today’s world, producing a paper is much easier than understanding and digesting one.

For that reason, any move that nudges the balance in traditional academia back toward digestion should be welcomed.

As for Proposal 3, Tao believes there is already a place for it.

Platforms like viXra place almost no restrictions on AI-assisted submissions and can serve as independent archives for content that has not been digested.

But his assessment is clear: such venues are optimized for “production,” not for “digestion,” and should not be part of the legitimate scholarly citation circle. They should not be cited in arXiv papers or journal references.

In other words, there is a place for people who want to mass-produce papers with AI, but don’t assume that means entry into the legitimate academic system.

That said, some commenters pointed out that even that outlet is already closed off.

viXra itself also prohibits AI-generated papers, and reportedly has a separate site, ai.viXra.org, specifically for such submissions.

Proposal 4 raises a more fundamental question: who are preprint platforms actually for?

In Tao’s view, authors’ needs and readers’ needs are interdependent, and both must be satisfied at the same time.

As he put it:

A platform that serves only authors will be flooded with low-quality submissions, which readers do not want to read.
A platform that serves only readers will deter authors with cumbersome submission procedures.
As a new mismatch emerges between paper generation and paper understanding, the quality bar will inevitably rise. But in the end, that is a desirable change for both sides.