Table of Contents
- Why context matters in sensitive conversations
- Improving safety across conversations
- Working with mental health experts
- Measuring improvement
- Looking ahead
With a new safety update, ChatGPT is now better able to respond safely even when risk emerges gradually over time.
Every day, people talk with ChatGPT about a wide range of important topics, from everyday questions to more personal and complex conversations. Among the hundreds of millions of interactions are conversations with people who may be distressed or struggling. We design our systems to respond carefully in these moments, including by directing people to crisis resources or connecting them with a trusted contact when appropriate.
Today, we’re sharing details about a new safety update that helps ChatGPT better recognize subtle or gradually emerging signals so it can more accurately detect when risk is building during a conversation and reflect that context in its safe responses. This helps ChatGPT better distinguish between the hundreds of millions of safe interactions people have every day and the rarer situations that require extra care, enabling more thoughtful responses such as de-escalating the situation, refusing to provide dangerous details, or guiding someone toward safer alternatives.
These improvements build on years of broader work, including model training, evaluation, monitoring systems, and more than two years of collaboration with mental health and safety experts.
Why context matters in sensitive conversations
In sensitive conversations, context matters just as much as the individual message. A request that seems ordinary or ambiguous on its own can take on a very different meaning when considered alongside prior signs of distress or potentially harmful intent. To respond appropriately, we train ChatGPT to recognize potentially harmful intent in surrounding context so it can refuse requests, de-escalate situations, or connect people to help.
These cases are rare, but they matter enormously and must be handled correctly. Our goal is for ChatGPT to connect the relevant signals when needed, while avoiding overreacting in normal conversations.
This work focuses on acute scenarios involving suicide, self-harm, and harm to others. Working with mental health experts, we updated our model policies and training so ChatGPT can better identify warning signs that emerge gradually throughout a conversation and use that context to respond more carefully.
In these rare, high-risk situations, ChatGPT is now better able to distinguish between a harmless request and one that may indicate a greater risk of harm. This builds on our safe completion approach, which aims to refuse the unsafe parts of a user’s request while responding cautiously when it can do so safely. Our goal is to respond more appropriately to context: becoming more cautious when there are signs of harm in the conversation, while still remaining helpful in benign situations.
Improving safety across conversations
Safety risks can emerge gradually across multiple conversations. One conversation may contain subtle signs of potentially harmful intent, and a related request in another conversation may only become concerning when combined with earlier context. Without that safety context, later conversations—and any important warning signs they may contain—can appear harmless.
Building on years of work helping ChatGPT better recognize signs of distress, we developed safety summaries. These are concise, factual records of prior safety-related context in a small number of high-risk cases. The summaries are generated by models trained specifically for safety reasoning tasks, with tightly limited scope, short retention periods, and use only when there is a significant safety concern. The goal is to preserve factual safety context, not to use it as general personalization or long-term memory. As noted above, we also train ChatGPT to use this context more carefully, so it can better identify when extra caution is needed and respond appropriately by de-escalating, refusing to provide details, or offering safer alternatives.
Working with mental health experts
We co-developed these systems with mental health experts across our Global Physician Network, including psychiatrists and psychologists with expertise in forensic psychiatry, suicide prevention, and self-harm.
These experts helped us determine when safety summaries should be created, how much prior context may be relevant, and how long the model should consider that context when responding. Their input helps ground this work in real-world clinical expertise and supports more appropriate responses in sensitive situations.
Measuring improvement
With these updates, ChatGPT is better able to recognize patterns of potentially harmful intent both within a conversation and across conversations. Even when concerning signals emerge gradually, it can more effectively identify those patterns and respond more safely.
In internal evaluations designed to measure performance on difficult cases, we saw major improvements in safe responses when risk becomes clear over time. These tests measure how often the model can produce the intended safe response in simulated high-risk conversations.
In long single-conversation scenarios, safe response performance improved by 50% for suicide and self-harm cases and by 16% for harm-to-others cases. This means the model is more often able to understand how earlier parts of a conversation change the meaning of a later request and respond appropriately.
We also tested performance across multiple conversations and multiple models to ensure these improvements hold up as models evolve. In GPT‑5.5 Instant, the current default model in ChatGPT, safe response performance improved by 52% for harm-to-others cases and by 39% for suicide and self-harm cases.
We also evaluated the quality of the safety summaries themselves. Across more than 4,000 evaluations, the average score for safety relevance was 4.93/5, the average score for factuality was 4.34/5,