
There's a lot written about agent skills, but not much about actually implementing them.
This post shows you how they work and how to integrate them into your existing agent.
View the complete implementation on GitHub
Agent skills solve a simple problem: your system prompt gets bloated when you try to make your agent good at everything.
Instead of this:
You're an expert at code review, git, file organization, API testing...
[2000 lines of instructions]
You do this:
You have access to these skills:
- code-review: Reviews code for bugs and security
- git-helper: Git workflows and troubleshooting
- file-organizer: Organizes files intelligently
- api-tester: Tests REST APIs
Load them when needed.
Skills are markdown files that live in a directory. Each skill has two parts:
When the agent needs expertise, it loads the relevant skill on the fly.
User: "Review this code for SQL injection"
↓
Agent: "I need the code-review skill"
↓
System: [Loads SKILL.md with security guidelines]
↓
Agent: [Follows those guidelines]
The key insight is that skills are just structured prompts. But they're modular, discoverable, and loaded on demand.
Scan a directory for SKILL.md files and parse their frontmatter. You only load the metadata initially, not the full content. This keeps memory usage low. (See the discovery implementation)
skills/
├── code-review/
│ └── SKILL.md # name: code-review, description: ...
├── git-helper/
│ └── SKILL.md
During discovery, you extract just the YAML frontmatter (name, description). The full markdown content stays on disk until needed.
Convert each skill into an OpenAI function tool. The LLM sees these as callable functions:
activate_skill_code_review: "Reviews code for bugs, security, best practices"
activate_skill_git_helper: "Git workflows and troubleshooting"
The description is critical because it's what the LLM uses to decide which skill to activate. Be specific and clear.
When the LLM calls a skill function:
SKILL.md content from diskThis is lazy loading. You only fetch content when actually needed. If you have 20 skills but only use 2, you've only loaded 2.
The LLM reads the skill instructions and follows them. The skill acts like a temporary system prompt for that specific task. Once the task is done, the skill instructions fade from context (unless you keep them for multi-turn conversations).
---
name: code-review
description: Reviews code for bugs, security, and best practices
version: 1.0.0
---
# Code Review Skill
You are an expert code reviewer.
## Check For
1. **Security**
- SQL injection in queries
- XSS in user inputs
- Auth bypasses
2. **Quality**
- Readability
- Maintainability
- DRY violations
3. **Performance**
- N+1 queries
- Memory leaks
- Inefficient algorithms
## Response Format
**Summary**: Brief assessment
**Critical Issues**: Security problems (if any)
**Improvements**: Suggestions for better code
**Positives**: What works well
Notice the structure: clear sections, bullet points, and expected output format. The LLM follows structured instructions much better than prose. (For more on crafting effective prompts, see this guide.)
1. Context Efficiency
Instead of loading 10KB of instructions upfront, you load 100 bytes of metadata. Full instructions only come in when needed. This matters when you're paying per token.
2. Modularity
Each skill is independent. Add a new one by dropping in a SKILL.md file. No code changes needed. Want to remove a skill? Delete the directory.
3. Clarity
When debugging, you can see exactly which skill was activated and what instructions it provided. This makes troubleshooting much easier than a monolithic prompt.
4. Reusability
Share skills across projects. Someone else's api-tester skill works in your agent with zero modification. Skills become a shared library of expertise.
Don't load all skills into memory at startup. This defeats the purpose because you're back to loading everything upfront.
Do load on demand. Parse frontmatter during discovery, but keep full content on disk until the LLM actually requests it.
Prefix skill functions clearly: activate_skill_code_review. This makes it obvious in logs what's happening. When you see activate_skill_* in your logs, you know a skill was activated.
The exact sequence matters. Here's what happens:
If you skip step 3, OpenAI will reject your request. The tool_calls must be properly formatted with a type field and nested function object. This is a common gotcha. (See OpenAI's tools documentation for details.)
Skills can chain. A skill might activate code execution, which might need another skill. Your agent should loop until there are no more tool calls:
while True:
response = llm.chat(messages=messages, tools=tools)
if not response["tool_calls"]:
break
handle_tool_calls(response)
Always pass tools in every call, even after skill activation. Otherwise, skills can't use other tools like code execution. (See full implementation for the complete loop logic.)
Skill Scope
One skill equals one domain. Keep them focused.
Good examples: code-review, git-helper, api-tester Bad example: developer-tools (too broad)
Skill Structure
Use clear sections with examples:
A wall of text doesn't work. Structure helps the LLM follow instructions.
Error Handling
What if a skill doesn't exist? Return a helpful error:
"Skill 'xyz' not found. Available: code-review, git-helper"
Some implementations load all skills at startup. This defeats the purpose because you're back to loading everything upfront, wasting memory and context tokens.
Fix: Load metadata only during discovery. Activate skills when needed.
The LLM uses skill descriptions to decide which to activate. Be specific.
❌ "Helps with code" ✅ "Reviews Python/JavaScript code for security vulnerabilities, PEP 8 compliance, and performance issues"
Include what the skill does, what types of tasks it handles, and key capabilities.
Error: Missing required parameter: messages[1].tool_calls[0].type
Cause: OpenAI requires a specific nested structure. The tool_calls must have a type field and nest the function details under a function key.
Fix: Use the correct format with type: "function" and nested function object. Don't flatten it. See OpenAI's tools documentation for the exact message format.
Problem: After activating a skill, the LLM can't use other tools like code execution.
Fix: Always pass tools in every LLM call. Don't remove tools after skill activation because skills might need them.
A wall of text doesn't work. Use clear headings, bullet points, code examples, and expected output formats. The LLM follows structured instructions much better than prose.
Good fit:
Not needed:
Don't over-engineer. If your system prompt is small and manageable, you probably don't need skills.
AgentSkills.io defines the open format:
Following the standard means your skills work with other implementations. Skills become portable across projects and teams.
Create the directory: mkdir -p skills/my-first-skill
Create SKILL.md with YAML frontmatter and markdown instructions
Integrate SkillsManager into your agent (see GitHub repo for full code)
Test it by asking your agent to use the skill and verifying it activates
That's it. No code changes needed to add new skills. Just drop in a SKILL.md file.
Agent skills are structured prompts with a loading mechanism.
The pattern works because:
You can build a working implementation in an afternoon. The core SkillsManager is about 130 lines of Python. (View the implementation)
Start with one skill. See if it helps. Expand from there.
The complete working implementation is available on GitHub. Use it as a reference or starting point for your own agent.