What Are My Copays?
How a three-layer AI architecture answers the question a generic assistant can't.
Ask a generic AI assistant what your Medicare copays are and it will tell you that copays vary by plan, typically ranging from a few dollars for primary care to a hundred or more for specialist visits, and that you should check your Evidence of Coverage for specifics.
That answer is not wrong. It is also not useful.
In April, I built a proof-of-concept Medicare Navigator. A user had completed onboarding — Medicare Advantage plan selected, Humana H7617-111 on record — and uploaded their plan documents: Summary of Benefits and Evidence of Coverage. They opened a Q&A session and asked: “What are my copays?”
The Navigator returned in-network figures: $0 PCP, $45 specialist, $15 urgent care, $130 ER, $400/day inpatient (days 1–7), $500 deductible, $6,750 out-of-pocket maximum. Attributed to the Humana H7617-111 Summary of Benefits.
A follow-up: “Do I need pre-approval for anything?”
The Navigator returned 15-plus service categories requiring prior authorization, cited the Evidence of Coverage as the source, and correctly noted, based on the plan documents and PPO plan type, that no referral was required.
That is a plan-specific answer drawn from the user’s actual documents. It is not a range. It is not a redirect. The question had document-specific answers, and the Navigator found the relevant ones.
The Friction
Medicare is an unusually punishing environment for generic AI. The plan landscape is vast — thousands of Medicare Advantage plans, each with different cost structures, formularies, network designs, and prior authorization requirements. A specialist copay that’s $45 on one plan is $0 on another. Prior auth requirements that apply to every specialist visit on one plan don’t apply at all on another. The correct answer to almost any specific cost question is: it depends on your plan.
Generic AI knows this. So it hedges. It gives ranges. It says to check your documents. These answers are technically accurate and practically inert — they confirm what the user already suspected (that copays exist and vary) without answering what the user actually needs (what their copay is).
The consequences are not trivial. A Medicare beneficiary who underestimates their annual out-of-pocket exposure can end up materially under-resourced for care costs. The gap between a generic answer and the correct plan-specific answer is not merely a quality difference — in this domain, it can be a meaningful financial decision.
The correct answer requires three things: how Medicare works as a system, what this user’s situation is, and what this user’s plan actually says. A generic assistant working from training data has the first and partial versions of the second, but not the third. That’s not a prompting failure. The plan document isn’t in the model. No amount of prompt engineering puts it there.
The Build
The Navigator stack has three layers. Each is load-bearing for a different part of the answer. Each does a different kind of work.
Layer 1: The knowledge file. A structured, governed representation of Medicare as a system — how Parts A, B, C, and D work; what prior authorization means and how it differs from a referral; what an Evidence of Coverage document is; what coinsurance is and how it differs from a copay; how coordination of benefits works between Medicare and a secondary payer. A governed Medicare knowledge file was included in the Q&A context on every call, with plan documents given precedence for plan-specific answers. Without it, the Navigator can retrieve plan-specific figures but cannot interpret them correctly in context.
Layer 2: The user profile. Built during onboarding — plan selection, coverage type, enrollment status, insurer. This is what scopes every answer to the correct frame. When the demo user asked about copays, the profile record showing Humana H7617-111 / Medicare Advantage told the Navigator to surface the MA cost-sharing schedule — not Original Medicare rates, not generic MA averages. The profile also constrained the prior-auth answer: because the plan type was PPO, the Navigator correctly reported no referral required, even though prior authorization for specific services was required. Those are different requirements, and the profile provided the plan-type context needed to distinguish them.
Layer 3: The extracted documents. The user’s uploaded Summary of Benefits and Evidence of Coverage — each PDF extracted via Gemini, stored as plain text in the database, and injected into the Q&A context on every call. This is the layer that makes plan-specific answers possible. The copay figures, the prior authorization list, the out-of-pocket maximum — all of it came from the extracted document text, not from the model’s training data. The system prompt policy was explicit: plan documents take precedence over general knowledge for plan-specific questions; cite which document.
The pipeline: user uploads PDF → extraction edge function sends document to Gemini and stores plain text in the database → at inference time, the Q&A function retrieved all processed documents for the user and injected them into context → the answer was generated with plan documents, user profile, and Medicare knowledge file all present. For the POC, this was context injection rather than production-grade selective retrieval: all processed documents were included in full. That worked at demo scale, but it would not scale to many long documents without chunking, reranking, or document routing.
What the demo showed, layer by layer. When the user asked “What are my copays?”, Layer 3 supplied the specific figures from the Summary of Benefits. Layer 2 scoped the answer to the MA cost-sharing schedule and plan type. Layer 1 interpreted what the numbers mean — explaining the difference between the $45 specialist copay (fixed cost per visit) and the $400/day inpatient rate (daily cost-sharing, not per-admission), and flagging the $500 deductible as applicable to some services. When the user asked about prior authorization, Layer 3 returned the actual list from the Evidence of Coverage. Layer 1 explained the difference between prior auth and referral. Layer 2 supplied the PPO plan type that made the “no referral required” answer correct for this user.
If the documents hadn’t been uploaded — or hadn’t processed yet — the system prompt instructed the Navigator not to fabricate plan-specific figures. It would answer from general Medicare knowledge only and tell the user their plan document was needed for a specific answer. The citation requirement made that boundary auditable: if there was nothing to cite, there should be no plan-specific figure.
The Insight
The removal test shows why each layer is load-bearing in a different way.
Remove Layer 3 — the extracted documents — and every copay answer goes generic. The Navigator knows Medicare and has the user’s profile, but without the plan document, there are no plan-specific figures to return. It can tell you what copays typically look like for a Humana MA plan. It cannot tell you what yours are.
Remove Layer 2 — the user profile — and the system loses user-plan binding: it no longer knows which plan context, plan type, and document set govern the answer. The Navigator can retrieve cost-sharing figures from the uploaded document, but without knowing the plan type, it can’t correctly scope the referral question. More practically: without knowing which plan the user has, the document injection can’t be scoped to the right EOC. The profile is what ties the document to the user.
Remove Layer 1 — the Medicare knowledge file — and the Navigator can retrieve and quote correctly but interprets poorly. An Evidence of Coverage is a specific, technical document. “Prior authorization required” means something precise in Medicare — it’s not the same as a referral, it doesn’t apply to all providers equally, and it has an appeals pathway. Without structured Medicare knowledge backing the interpretation, the system can return the prior auth list accurately and explain it incorrectly — for example, conflating prior authorization with referral requirements.
The distinction between a tool and a Navigator is not primarily about which model is running or how the prompt is written. It’s about what data is in the room when the model answers. A generic assistant may answer from training data and whatever context the user manually supplies. A Navigator is designed so the relevant governed context is already in the room: a knowledge file, a persistent user profile, and the user’s actual documents — all active on every answer.
That framing sidesteps one real counterargument: many general-purpose assistants now accept file uploads, support memory, and allow custom instructions. A well-configured ChatGPT or Gemini session might have some of these ingredients. The distinction isn’t that generic tools have none of these capabilities. It’s that the Navigator architecture governs their combination — persistence, domain-specific constraints, citation requirements, and scope enforcement — under a single design intent. An ad-hoc configuration with uploaded files and remembered preferences is not the same architecture, even if the output looks similar on a simple question.
The Honest Part
This was a proof-of-concept. The demo was real — Humana H7617-111 documents uploaded, actual plan figures returned, citation behavior verified in the tested demo path. But the gap between a working demo and a system appropriate for Medicare beneficiaries making real coverage decisions is not small, and it’s worth being specific about why.
The hardest extraction risk isn’t missing text — it’s table structure. Medicare cost-sharing schedules are dense multi-column tables: service category, in-network copay, out-of-network copay, deductible applicability, per-visit vs. per-admission vs. per-day, limits. Naive PDF extraction flattens tables into sequences of text that lose the column relationships. If the extraction assigns a specialist copay to the wrong service category, the answer is wrong and it cites a real source, which is worse than an answer that admits uncertainty.
The demo EOC processed correctly. A production system would need explicit table-extraction handling — structured parsing that preserves column relationships — and test coverage against the specific table formats used by major Medicare Advantage carriers.
There are other failure modes. Retrieval can select the wrong section for a broad question: “What are my copays?” could retrieve the medical cost-sharing table, the drug tier table, the out-of-network table, or the exceptions section, depending on chunking and retrieval scoring. A cited answer can still be wrong if it cited the wrong benefit category. The prior-auth answer in the demo returned 15-plus service categories — but whether it surfaced the right ones for this user’s specific likely care needs, given their conditions, is a harder question that the demo didn’t test.
Documents also go stale. Mid-year prior auth requirement changes, formulary updates, and benefit corrections don’t automatically update the extracted text in the database. A production system needs document versioning and a mechanism to prompt re-upload when plan documents change.
What the POC demonstrates is narrower but still useful: under controlled conditions, the three-layer architecture produces governed, plan-specific answers from user-uploaded documents in a way a generic session is not designed to sustain. In the tested demo path, citation behavior worked, and the no-document boundary held — when document context was absent, the system correctly declined to fabricate figures. The architecture is buildable. What production requires is the discipline layer: table-aware extraction, retrieval validation, document versioning, and a test set of known questions with known answers to catch regressions. For real beneficiary use, high-impact answers would also need escalation language: verify with the plan or provider before acting, especially for network status, prior authorization, and deductible questions.
What This Is Actually About
The case for persistent, document-aware AI is easiest to see in domains where the generic answer is specifically, measurably wrong. Medicare is a good test case because the wrongness is concrete: “specialist copay varies by plan, typically $20–$50” is not just vague — it’s a number someone might use to estimate their annual care costs and end up meaningfully off. The plan-specific answer is $45 for this user, which is in that range, but for a different plan on a different network structure it could be $0 or $150. The range answer doesn’t help anyone plan.
The pattern here — knowledge file + user profile + extracted documents — applies wherever the question “what does this mean for me?” requires knowing the domain, knowing the person, and knowing their actual documents. Medicare cost-sharing is one instance. Insurance coverage determination is another. Pension benefit calculation is another. Legal document review is another. In each case, the generic answer is available everywhere and actionable nowhere in particular. The specific answer requires all three layers.
The Navigator also gets more useful as context accumulates. As the user uploads additional documents — formulary, supplemental coverage, coordination-of-benefits letter — the Q&A context expands and drug-cost answers and secondary-coverage questions become answerable with the same precision as the original copay question. At production scale, more documents cannot simply mean more context; the system needs document routing, source prioritization, and conflict handling. The profile updates if the user’s plan changes. Each validated addition can make the next answer more specific. A generic session often has to be reassembled. A Navigator is designed around persistent, governed context from the start.
That compounding is the architectural argument — not that the underlying LLM is more capable, but that the system gets more useful with every piece of context added. The Medicare copay question is the proof of concept. The pattern should extend to questions like “what does my formulary say about my arthritis medication?” — but that would need its own extraction and validation path, because formularies have different structure and failure modes than an Evidence of Coverage.
The generic answer is: it depends on your plan.
The Navigator’s answer is the relevant figures from the Summary of Benefits, cited by source, scoped to what the plan type means for referrals and prior auth.
Those are different answers. The architecture is why.
Case Study Insight: A generic session answers “what are typical Medicare copays?” A Navigator — knowledge file + user profile + extracted plan documents — answers “what are your copays, per Section 4 of your Humana H7617-111 Summary of Benefits.” The architectural gap between those two answers is why domain-specific AI systems need persistent, governed context, not just better prompts.*
Robert Ford builds products, writes stories and essays, and publishes The Intelligence Engine — a Substack about building AI practices that compound. His other writing lives at Brittle Views.


