AI Can Make Mistakes
Large language models generate responses by predicting likely continuations of text based on patterns learned during training. This process is powerful but imperfect. Even the most capable models make errors. Common types of errors:- Hallucinations — The model confidently states something factually incorrect. Most common for: obscure topics, precise numerical data (statistics, dates, dollar amounts), specific quotes, and niche subjects.
- Outdated information — The model’s training data has a cutoff date. Facts that changed after that cutoff will not be reflected unless you enable Web Search.
- Invented citations — Models sometimes generate plausible-looking but non-existent citation titles, paper names, author names, or URLs. Always verify citations before using them in serious work.
- Logical errors — Even reasoning-capable models can make mistakes in multi-step logic, especially in long chains of reasoning or unusual problem structures.
- Ambiguity misinterpretation — If your question is ambiguous, the model may interpret it differently than you intended and give a technically accurate answer to the wrong question.
Knowledge Cutoffs
Each AI model has a training cutoff date — the point at which its training data ends. The model has no knowledge of events, publications, software releases, or developments that occurred after that date.| Model / Provider | Approximate Knowledge Cutoff |
|---|---|
| OpenAI GPT-5 | Early 2025 |
| Anthropic Claude Sonnet 4.6 | Early 2025 |
| Google Gemini 2.5 Pro | 2025 |
| DeepSeek Chat / Reasoner | 2025 |
| xAI Grok-4 | Near real-time (Grok has live X/Twitter data access) |
| Most other models | Within 6–18 months of current date |
What to Do When a Response Seems Wrong
Verify with web search
Enable Web Search in the prompt bar and ask the same question. ZeroTwo pulls live sources and cites them inline so you can check the original references directly.
Try a different model
Different models have different strengths, training data, and weaknesses. If one model gives a suspect answer, try the same question with another. Reasoning models (o3, o4-mini, DeepSeek Reasoner, Claude with extended thinking) are more reliable for complex factual or logical questions.
Rephrase your question
Ambiguity is a frequent cause of poor answers. Try asking your question more specifically, breaking it into smaller sub-questions, or providing more context about what you already know and what you specifically need.
Ask for step-by-step reasoning
Add “Think through this step by step” or “Explain your reasoning before giving your final answer” to your prompt. This often surfaces errors in the model’s logic and makes it easier to identify where things went wrong.
Code Quality
ZeroTwo’s models generate high-quality code in dozens of languages, but generated code should always be reviewed and tested before production use. Common issues with AI-generated code:- Security vulnerabilities — Models may generate code with subtle security issues (SQL injection risks, insecure credential handling, improper input validation). Always review for security implications, especially for user-facing or data-handling code.
- Logic errors — Code that is syntactically correct and looks reasonable may have incorrect logic in edge cases the model did not anticipate.
- Outdated APIs — If a library or framework changed its API after the model’s training cutoff, generated code may reference deprecated or removed functions.
- Hallucinated package names — Models occasionally suggest npm packages, PyPI libraries, or other dependencies that do not exist. Always verify package names before installing.
Consistency and Variation
The same prompt can produce different outputs on different runs. AI responses are probabilistic — there is inherent variation in every generation. This means:- Running the same prompt twice may produce different outputs
- A prompt that worked well yesterday may produce a different result today if the model was updated
- High-stakes tasks benefit from multiple runs and comparison
- Write structured prompts with explicit format requirements (“always output as JSON with these fields…”)
- Provide examples of the output you want (few-shot prompting)
- Specify length, format, tone, and style explicitly
- Use reasoning models for tasks requiring logical consistency across steps
Limitations by Task Type
Factual questions about the real world
Factual questions about the real world
Models can answer many factual questions accurately, but hallucination risk increases for obscure topics, precise statistics, specific names, and recent events. For factual questions that matter, always use Web Search or verify with authoritative sources. Do not cite AI-generated facts in formal work without independent verification.
Mathematical calculations
Mathematical calculations
Models can handle arithmetic and many math problems, but they can make calculation errors, especially with large numbers, multi-step computations, or unusual problem structures. For reliable arithmetic, use a calculator or code interpreter. For conceptual math, proofs, and logic, reasoning models (o3, o4-mini, DeepSeek Reasoner) are significantly more reliable.
Legal, medical, and financial advice
Legal, medical, and financial advice
Models can provide general information about legal, medical, and financial topics, but they are not licensed professionals and cannot provide advice specific to your situation. Information may be incomplete, outdated, or inapplicable to your jurisdiction or circumstances. Always consult a qualified human professional for decisions in these domains.
Real-time and live data
Real-time and live data
Without Web Search enabled, the model has no access to current prices, live sports scores, breaking news, API status, stock prices, or any data that changes in real time. Enable Web Search or use Perplexity Sonar for any query requiring live or current data.
Personal or private information
Personal or private information
Models have no access to your personal information, private documents, or data from external services unless you explicitly provide it — either by pasting it into the conversation or by connecting an integration via Connectors. The model cannot access your email, calendar, files, or accounts without a configured and authorized connector.
Long documents and large codebases
Long documents and large codebases
Models with larger context windows (Claude Sonnet/Opus 4.6 at 200k tokens, Gemini 2.5 Pro at 1M tokens) handle long documents much better than models with smaller windows. Even large-context models may lose track of details buried deep in very long inputs. For extremely large documents, use specific section references (“analyze only Section 3”) rather than “analyze everything.” For large codebases, consider file-by-file analysis rather than pasting everything at once.
Improving Output Quality
Related Pages
- Models Overview — choosing the right model for each task type
- Shared Context and Continuity — how context window size and conversation length affect response quality
- Prompts: Overview — strategies for writing better prompts
- Web Search — grounding responses in live data to address knowledge cutoffs
- Model Picker — how to find and select the best model for a given task

