← Back to Articles

Prompt Engineering Tips for Tech Leaders

October 30, 2025
Software Engineering Developer Tools Prompt engineering LLM prompts Few-shot prompting

Prompt Engineering That Scales: A Pragmatic Guide for Tech Leaders

Prompt engineering is just interface design for systems that don’t always do what you want. You’re trying to turn “make this work” into something that actually works the same way twice.

I’ve wasted hours on prompts that seemed great until they didn’t.

Here’s what actually stuck across different models and vendors: Unambiguous language, explicit schemas, testing your assumptions.

There’s a full before/after example later that shows all of this in one shot.

One github link that I’ve found helpful will help you think about this if you are using a CLI or similar tool such as Claude to format your requests into a PRD then create tasks/subtasks to break down the requirements

15 Things That Keep Working

  1. Use workbench/playground models like platform.openai.com/playground… way easier to iterate
  2. Shorter prompts work better (250-500 tokens sweet spot) but don’t skip examples
  3. Understand the different prompt types (system - who am I, user - tell the model what to do/instructions, assistant - model feedback/template for future outputs)
  4. Use one- or few-shot prompting. This just refers to the number of examples provided to the LLM in your prompt
  5. Conversational vs knowledge engines - pick one
  6. Say exactly what you mean
  7. Define tone of voice: For example, “Use “spartan” in tone of voice”
  8. Test your prompts with real data, not made-up scenarios
  9. Define the output format explicitly
  10. Remove conflicting instructions (“detailed summary” makes no sense)
  11. Learn JSON, XML, CSV - you’ll need them
  12. Context, Instructions, output format, rules, example. In that order.
  13. Use AI to generate examples for AI
  14. Tokens are cheap. Use the smarter model unless you’re running millions of requests.
  15. Use ‘ask’ mode a few times before ‘agent’ mode in your CLI or Copilot

Bonus

  1. Give it a role (who), give it a goal (what), give it all context, be clear on output format. And let it ask questions first if it needs to.

Before and After Example

Scenario: You want a technical design plan for a Signup service with rate limiting, returned as JSON.

I’ve done this wrong so many times. Here’s what failure looks like:

BEFORE Prompt

Hey! Write a super detailed but also short doc about building a signup thing with rate limits. Explain all the best practices, include tables and code, and make it fun but professional. You can add anything you think is cool. Maybe talk about databases. We might be in AWS or GCP, not sure. Output however you want. Thanks!!

This is terrible:

  • No role
  • “Super detailed but also short” - pick one
  • No format
  • No context
  • Ambiguous everything
  • You’ll get a different answer every time

AFTER Prompt

Put this in your model playground. Start with a smart model while you’re designing. You can use cheaper ones later when you scale.

SYSTEM
You are a senior backend architect. You design with crisp trade-offs and minimal prose.
Tone: spartan. No marketing language.
USER
Context
- Product: Signup service for a consumer app, single region to start.
- Constraints: Postgres primary DB; Redis available; 100 rps peak; 99.9% target.
- Requirements: Rate limit 5 requests/min/IP; idempotent POST /signup; audit log of attempts; email verification webhook; PII handled via data minimization; no external calls during signup path.
- Non-goals: UI, analytics.
Instructions
- Mode: knowledge engine. Ground all outputs only in provided context. If info is missing, ask up to 2 clarifying questions. If no answers, proceed with conservative assumptions and list them.
- Task: Produce a design sketch that a senior engineer can implement.
- Keep instructions + examples concise; avoid repetition to reduce prompt length drift.
Output format (JSON only; no markdown)
{
"components": ["string"],
"data_model": [{"entity":"string","fields":[{"name":"string","type":"string","notes":"string"}]}],
"api": [{"method":"string","path":"string","request":"object","responses":[{"code":200,"body":"object"}]}],
"rate_limiting": {"key":"string","algo":"string","limits":{"unit":"string","value":number},"storage":"string","notes":"string"},
"risks": [{"risk":"string","mitigation":"string"}],
"test_cases": [{"id":"string","description":"string"}],
"assumptions": ["string"]
}
Rules
- Output valid, minified JSON that matches the schema.
- Do not invent external services. Do not include explanations outside JSON.
- If asking questions, ask them first as a JSON array: {"questions":["...","..."]}. After answers, return final JSON only.
Examples (few-shot; compact)
Example context -> output fragment:
- Context: "Passwordless magic-link login service; Redis; 50 rps; 3/min/IP; no PII."
- Output fragment:
{"components":["API","RateLimiter","TokenStore"],
"rate_limiting":{"key":"ip","algo":"fixed-window","limits":{"unit":"minute","value":3},"storage":"redis","notes":"expire per window"}}
Assistant template (style anchor)
{"components":["API"],"data_model":[{"entity":"Example","fields":[{"name":"id","type":"uuid","notes":"pk"}]}],"api":[{"method":"GET","path":"/health","request":{},"responses":[{"code":200,"body":{"status":"ok"}}]}],"rate_limiting":{"key":"ip","algo":"token-bucket","limits":{"unit":"minute","value":60},"storage":"redis","notes":"simplified"},"risks":[{"risk":"none","mitigation":"n/a"}],"test_cases":[{"id":"T0","description":"health"}],"assumptions":[]}
ASSISTANT
(If needed) {"questions":["List user attributes stored at signup?","Should email verification be synchronous or async?"]}

What changed:

  • Built for the playground
  • Compact examples
  • Clear roles
  • Specific output format (JSON schema)
  • No contradictions
  • Spartan tone
  • It can ask questions before answering
  • Everything is explicit

What Actually Works in Production

Treat prompts like code. Version control, code review, the works. I skip inline comments unless something is genuinely weird.

Test everything. Fixed test cases. Parse the outputs. Check schema validity. Track metrics, not your gut feeling about whether it’s “better.”

Log everything. Prompts, responses, token counts, latencies, errors. When something breaks, you want to know why.

Control your context. Don’t dump everything. Curate what you feed in. Precise snippets beat wall-of-text context.

Have a fallback. For high-volume stuff, use cheaper models with tighter prompts. Back it up with deterministic code when the model fails.

Watch your token budget. Shorten instructions if you need to, but keep the context examples. Those matter.

What to Do Next

Take your three most important prompts. Rewrite them using the pattern above. Test them with real cases. Measure what changes.

Then scale.

Copy the After prompt, swap in your own context, and run it in a playground. Version your changes. Track what works.

Wrestling with a technical challenge?

I help companies automate complex workflows, integrate AI into their stacks, and build scalable cloud architectures.