Prompt Physics
Prompts behave like physical systems under constraint. Temperature controls entropy. Context windows create artificial time horizons. Token limits enforce hard boundaries on complexity.
These constraints interact non-linearly. A tight token budget amplifies temperature effects. A wide context window can mask poor prompt structure.
Understanding these dynamics transforms prompting from guesswork into engineering. You can predict where outputs will diverge, compress, or collapse.
This note maps the observable physics of prompt behavior across constraint dimensions.
What I Observed
Temperature isn't just "randomness." At temperature 0, the model becomes deterministic but brittle—it amplifies any ambiguity in your prompt into consistent wrong answers. At temperature 1, outputs scatter wildly, but patterns emerge across multiple runs that reveal what the model "believes" about your query.
Context windows create an artificial Now. When you feed the model its own outputs iteratively, it drifts. The longer the conversation, the more it forgets the original intent, like a game of telephone played with weighted probabilities. This isn't a flaw—it's the model treating recency as truth.
Token limits force compression. Ask for a 300-word summary versus a 50-word summary, and you don't just get trimming—you get structural reorganization. The model prioritizes differently. Short outputs emphasize conclusions. Long outputs emphasize context. The shape of the answer changes, not just the length.
These constraints stack. A high-temperature, short-context, low-token-budget prompt produces outputs that look random but cluster around specific failure modes. A low-temperature, long-context, high-token-budget prompt produces outputs that look precise but can lock into verbose dead-ends. The interaction between constraints creates phase transitions in output quality.
Why It Happens
Temperature controls the softmax distribution at each token prediction step. Low temperature sharpens the distribution—the top token gets even more weight. High temperature flattens it—lower-probability tokens become viable. This compounds across tokens. After 100 tokens at temperature 0, you're walking one very narrow path. At temperature 1, you're exploring an exponentially expanding tree of possibilities.
Context windows aren't memory—they're re-evaluation. Every token the model generates gets fed back as input, but it doesn't "remember" generating it. It just sees it as fact. Self-reinforcement loops emerge. If the model hallucinates a detail early, it treats that detail as ground truth in later tokens. The longer the sequence, the more compounded the drift.
Token limits force the model to allocate its budget. Internally, it's not "counting words"—it's selecting tokens that maximize likelihood under the constraint. Short outputs favor high-information-density tokens. Long outputs allow for conjunctions, hedging, elaboration. The constraint shapes which reasoning pathways the model explores, not just how it formats the result.
Constraints interact through the attention mechanism. Temperature affects which past tokens get weighted heavily. Context length determines how far back attention can reach. Token budgets determine how much future runway exists for the current token choice. These aren't independent dials—they're coupled variables in a dynamical system. Tweak one, and the equilibrium shifts across all three.
What I Do Now
I treat temperature as a control for exploration versus exploitation. Temperature 0 for tasks where I've already validated the prompt and need consistent output (data extraction, formatting). Temperature 0.7–1.0 for tasks where I need the model to explore alternatives (brainstorming, creative reframing). I don't use temperature to "add creativity"—I use it to widen the search space when my prompt might be under-specified.
I structure prompts to resist context drift. Front-load critical constraints in the first 200 tokens. Repeat key instructions before generating each output segment. Use explicit delimiters to create boundaries the model can anchor on. When iterating, I inject fresh context rather than relying on continuity. Treat each turn as a cold start with anchored state, not a continuous conversation.
I specify token budgets explicitly in the prompt, not just as an API parameter. "Write exactly 5 bullet points" works better than setting max_tokens to 150 and hoping. Explicit constraints become part of the model's reasoning context. It allocates differently when it knows the limit versus when it hits it unexpectedly.
I test constraint combinations empirically. Run the same prompt at temperature 0 and 0.8, with 100-token and 500-token budgets. The output distribution reveals which parts of the prompt are under-constrained. If low temperature changes the answer, the prompt is ambiguous. If short tokens drop critical info, the task structure is misaligned. Constraint variation is a diagnostic tool, not just a tuning knob.
Practical Checklist
- Run critical prompts at temperature 0 and temperature 0.8—if outputs diverge significantly, add constraints to reduce ambiguity
- Front-load instructions in the first 200 tokens to minimize context drift in long conversations
- Specify output length explicitly in the prompt ("5 bullets", "100 words") rather than relying solely on token limits
- Use delimiters (e.g., triple backticks, XML tags) to create structural anchors that resist context bleeding
- When iterating, inject fresh context each turn instead of relying on conversational continuity
Glossary
- Temperature
- Softmax parameter controlling output randomness. 0 = deterministic, 1 = maximum entropy. Affects token selection probability distribution.
- Context Window
- Maximum number of tokens the model can "see" at once (input + output combined). Creates artificial time horizon for attention mechanisms.
- Token Budget
- Maximum output length in tokens. Constrains how much the model can generate, forcing compression and prioritization.
- Context Drift
- Phenomenon where model outputs diverge from original intent over long conversations due to self-reinforcement of hallucinated details.
- Softmax Distribution
- Probability distribution over next-token predictions. Temperature parameter reshapes this distribution to control randomness.