OpenAI Is Giving Free ChatGPT to U.S. Doctors — 99.6% Safe in Pre-Launch Testing, but Read the Fine Print

On April 22, 2026, OpenAI released ChatGPT for Clinicians, a free workspace built for U.S. medical professionals. It runs on GPT-5.4, searches clinical literature in real time, cites its sources, and automates administrative tasks like referral letters and prior authorizations.

In pre-launch testing across roughly 7,000 clinical conversations, physician evaluators rated 99.6% of responses as safe and accurate. That leaves 0.4%, or about 4 out of every 1,000 responses, that were flagged as unsafe or inaccurate. Whether those numbers hold in real clinical use is an open question.

Who Can Use It?

ChatGPT for Clinicians is limited to verified U.S. healthcare professionals:

Physicians (MDs and DOs)

Nurse practitioners (NPs, 전문간호사)
Physician assistants (PAs, 의사보조)
Pharmacists

Registration is individual. You don't need your hospital or health system to sign a contract. If you hold one of those credentials and can verify it, you can sign up.

General users cannot access this product. The gating matters. Medical AI tools carry different risks than general-purpose chatbots. Restricting access to licensed professionals means the person interpreting the output has clinical training to catch errors. That's the theory, at least.

OpenAI says additional countries are planned but hasn't announced timing.

What's Under the Hood?

The model is GPT-5.4. On top of the base model, OpenAI added several healthcare-specific features.

Real-time clinical search. The model pulls from medical journals and peer-reviewed sources during conversations. Responses include citations so clinicians can verify the underlying evidence.

CME credit. CME (Continuing Medical Education, 의사 보수교육) is the ongoing training that licensed medical professionals are required to complete. ChatGPT for Clinicians offers a research mode where interactions can count toward CME credits. For doctors who already spend time reading journals, getting CME credit from an AI-assisted research session is a practical incentive.

HIPAA compliance option. Through a BAA (Business Associate Agreement, 비즈니스 제휴 계약), clinicians can use the tool in a way that meets HIPAA requirements for handling protected health information. Without a BAA, patient data should not be entered into the system.

Workflow automation. The tool can draft referral letters, prior authorization requests, patient instructions, and clinical notes. These are the administrative tasks that consume a significant portion of a clinician's workday.

What Does 99.6% Actually Mean?

This number requires careful reading.

OpenAI ran approximately 7,000 real clinical conversations (6,924 to be precise) through the system before launch. Physician advisors evaluated each response across three categories: clinical care, documentation, and research.

99.6% were rated safe and accurate. That means roughly 28 out of 6,924 were not.

Two things to understand about this number.

First, this is a pre-launch evaluation, not a clinical trial. The conversations were structured tests reviewed by physician advisors working with OpenAI. They were not live patient encounters in an emergency room at 3 AM with incomplete information and time pressure. The gap between controlled evaluation and real-world clinical use is significant in medicine.

Second, 0.4% sounds small. In most software contexts, 99.6% accuracy is excellent. In medicine, the math works differently. If a hospital system processes 10,000 AI-assisted interactions per day, 0.4% means 40 potentially unsafe responses daily. Whether those responses reach patients, and whether the clinician catches the error before acting on it, determines whether the 0.4% matters or doesn't.

OpenAI says physician advisors continue to review outputs on an ongoing basis after launch.

Two Modes: Workflows and Research

The product has two primary use cases.

Workflow automation. A physician finishes a patient visit and needs to write a referral letter, submit a prior authorization to an insurance company, and prepare discharge instructions. Each of these tasks follows a predictable structure and requires specific clinical details from the encounter. ChatGPT for Clinicians can draft all three from the conversation context.

The appeal is time. U.S. physicians report spending nearly as much time on documentation and administrative work as they do on direct patient care. If AI handles the structured writing, the clinician reviews and edits rather than drafting from scratch. That's a meaningful workflow change.

Research mode. A clinician wants to review the latest evidence on a drug interaction or a treatment protocol. The research mode searches medical literature, summarizes findings, cites sources, and can apply the interaction toward CME credit requirements.

This is where the CME feature becomes strategically important. Doctors are required to complete CME hours to maintain their licenses. If those hours can be earned through an AI-assisted research workflow that also helps with patient care decisions, the adoption incentive aligns with an existing professional obligation.

Why Does HealthBench Professional Matter More Than the Product?

On the same day, OpenAI released HealthBench Professional, an open benchmark for evaluating AI performance in clinical contexts.

HealthBench Professional measures AI models across three domains:

Care consult: How well does the model handle clinical advisory scenarios?

Writing and documentation: How accurate and complete are generated clinical documents?

Medical research: How well does the model retrieve, synthesize, and cite medical evidence?

The benchmark also measures hallucination rates, safety violations, and guideline adherence.

Here's why this may be more important than the product itself. Right now, there is no standard way to compare medical AI tools. Every company runs its own internal evaluations with its own datasets and its own definitions of "safe." HealthBench Professional is open. Any AI lab, health system, or researcher can use it to evaluate any model.

If HealthBench Professional becomes the accepted standard, it creates a common ruler. Anthropic's Claude, Google's Gemini, and every healthcare AI startup would be measured against the same criteria. That transparency benefits everyone except companies whose products don't perform well under scrutiny.

What Do Clinicians Get Out of This?

The value proposition is direct.

Less time on paperwork. Referral letters, prior authorizations, and clinical notes are necessary but repetitive. Automating the first draft saves time on every patient encounter.

Free CME credits. For a profession that requires ongoing education to maintain licensure, earning credits through a tool you're already using for clinical work is efficient.

Evidence access. Real-time literature search with citations is faster than manually searching PubMed or UpToDate, especially for questions that come up during patient encounters.

The limits are also direct. AI performs differently across specialties. A prompt that works well for primary care may produce less useful results for a subspecialty like pediatric cardiology. The tool's utility will vary by clinician, by specialty, and by the complexity of the clinical question.

What Are the Concerns?

The 4-in-1,000 problem. If 0.4% of responses are unsafe, the question is what happens when one of those responses involves a medication dosage, a drug interaction, or a diagnostic recommendation. The clinician is expected to catch the error. In practice, alert fatigue, time pressure, and over-trust in AI outputs can reduce that safety net.

HIPAA coverage is not automatic. The BAA option exists, but clinicians must actively set it up. Without a BAA in place, entering patient information into ChatGPT creates a HIPAA compliance risk. The distinction between "HIPAA-capable" and "HIPAA-compliant by default" matters.

FDA classification is unresolved. The FDA has not classified ChatGPT for Clinicians as a medical device. If it were classified as one, it would face a different regulatory pathway with different requirements for clinical validation. The current ambiguity means the product operates in a space where the rules haven't been written yet.

Liability is unclear. If a clinician acts on an AI-generated recommendation that harms a patient, the liability question is: does the physician bear full responsibility (as with any clinical decision), or does OpenAI share liability for the tool's output? This question has no settled legal answer in the U.S. as of this writing.

What Does This Mean for Other AI Labs?

If HealthBench Professional gains traction as a standard, every AI company building healthcare tools will need to publish results against it. That includes Anthropic, Google, and the growing number of healthcare-specific AI startups.

The companies that welcome this are the ones confident in their safety and accuracy numbers. The companies that resist it are the ones who prefer to define their own evaluation criteria.

OpenAI releasing the benchmark as open is a competitive move disguised as a public good. It sets the evaluation framework on OpenAI's terms while appearing generous. Whether the benchmark is genuinely comprehensive and fair is something the medical AI community will determine as it gets used.

ChatGPT for Clinicians is free, available now, and limited to verified U.S. medical professionals. HealthBench Professional is open and available to anyone. The product serves doctors. The benchmark may reshape how all medical AI gets evaluated.

Follow @easyai.ai for more breakdowns like this.

---

Sources

---

Want more?

Browse our prompt packs, guides, and automation tools.

Browse products →