It's a Reflection
LLMs can not be trained on data that cannot be accessed.
What is NOT in the training set: not exhaustive
- Internal, authoritative documents
- Non-public / closed source, high-impact software source code
(And many other things, like physical books...)
Good Habits
illustrative
- Pre-training: a lot of data ("pretraining corpora") like Common Crawl to "wire up" the LLM to "understand"
- Post-training: get the LLM respond well, follow instructions etc.
→ General availability
- Fine-tuning: refine LLM with non-public data (documents, code etc.)
→ Available for the organisation
LLMs Change — For Better or Worse!
example
2025
- Aug, GPT-5: huge jump from 4.x to a very effective platform
- Nov, GPT-5.1: "personalities" (?!) and coding quality drops → need to force 5.0
- Dec, GPT-5.2: back to high coding quality
Not picking on OpenAI, it's just how I saw it.
System Prompt
illustrative
[SYSTEM PROMPT]
[SYSTEM PROMPT OF SUB-PROVIDER] → optional! Could be inserted by your IDE or your own system
[User said: ...]
[AI said: ...]
[User said: ...]
[Attachment]
[AI thought: ...]
[AI said: ...]
... and so on
Play to the Strengths
Claude or Codex better at coding?
→ Use them for coding
Gemini great at code review?
→ Use it for code review
Simple? Yes!
And still, one can easily get tangled up trying to get one do well where another is better.
Predictable?!?!
Computers used to be predictable.
That was their strength: same request → same result, every time
AI is not predictable.
We have to now handle different results for the same request.
And the variance itself varies!
Context Window
A little bit of memory.
illustrative
Context window "The Brain" (fixed)
-------------- -------------------
[Prompt ] [ Pretrained ]
[Response ] [ Transformers ]
[Prompt ] --> [ (and other things) ] --+
[Response ] [ ] |
[Prompt ] [ fixed in ] |
[Response ] [ model versions ] |
^ |
| |
+-- LLM output appended to response ---+
The amount of "memory" in the context window is minuscule
compared to the data encoded in the LLM itself.
Context Window Poisoning
illustrative
Context window "The Brain" (fixed)
--------------- ----------------------
[Old stuff ] [ Pretrained ]
[Old stuff ] [ Transformers ]
[New objective] --> [ (and other things) ]
[New objective] [ ]
[Prompt ] [ fixed in ]
[Response ] [ model versions ]
What to pay attention to? The old or the new??