AGENTS.md vs Skills vs Plain Scripts: What Goes Where, and Why It Matters
This week I gave an AI agent a small, wrong instruction. Not on purpose. I had a file in my project telling it how to behave, and one line in that file pointed at something that had since moved. The agent read the file, trusted it completely, and confidently did the wrong thing. No error. No warning. Just a quietly worse outcome that took me a minute to even notice.
That is the whole reason this post exists. We now have a few different ways to hand an AI agent instructions about our projects, and they look similar enough that people throw them around interchangeably. They are not interchangeable. Each one has a job, a cost, and a failure mode. Get them mixed up and you do not get an error message, you get an agent that is subtly worse than the one you started with.
There is a bigger shift underneath this. For years, the way we automated work was a folder full of little scripts. A deploy.sh here, a seed-db.py there, a dozen helpers that each did one thing. They worked beautifully right up until an API changed, a token expired, or the task needed one more condition the script did not anticipate. In 2026 the industry is steadily moving past that brittleness toward agents that read context, decide, and adapt. But “give it to an agent” is not one decision. It splits into several, and that is what trips people up.
Let me walk through the pieces you actually run into: a project’s AGENTS.md file, Skills, the plain scripts you already have, and where MCP fits alongside all of it. By the end you will know exactly what belongs where, and why putting the wrong thing in the wrong place hurts.
First, the thing nobody tells you
Every coding agent, before it touches your project, does the same thing a new hire does on their first day. It looks around. It reads the file tree, the package manifest, the README. The problem is that a README was written for a human. It explains what the project is. It does not explain how an agent should work in it: which build command with which flags, which files to never touch, what the house style is when it differs from the default.
That gap is what AGENTS.md fills. It is a plain Markdown file you drop at the root of your repo, and it has become a genuine open standard. As of 2026 it is read by more than thirty different agents, across tens of thousands of repositories. OpenAI’s Codex, Cursor, GitHub Copilot, Gemini’s CLI, and many more all look for it.
build: npm run build
test: npm test -- --silent
# Rules
- never edit files in /vendor
- commit messages: present tense
- the API client lives in src/api, not src/lib
I actually made one of these for this very website. It is not committed publicly, it sits on my machine, and it carries the rules I care about: never publish my resume, do not invent fake citations, keep a specific writing voice. When an agent works on the site, it reads that file first and stays inside the lines. That is AGENTS.md doing its job: persistent project context, loaded every time, shaping everything that follows.
And here is the trap
If AGENTS.md is loaded every time and shapes everything, then a wrong line in it poisons everything. This is not hypothetical hand-wringing. There is real 2026 research on exactly this, looking at over a hundred real-world repositories.
The finding is uncomfortable and worth sitting with. Context files that were generated automatically by an LLM actually reduced how often agents succeeded at their tasks, while increasing cost by more than twenty percent. Human-written files did better, but only marginally, and only when they were short and precise. A bloated or slightly-stale instruction file is not neutral. It is a tax you pay on every single request, and sometimes it actively steers the agent wrong.
This is exactly what happened to me. The lesson is not “do not use AGENTS.md.” It is the opposite of how most people treat it. Keep it short. Keep it true. Treat it like code that can rot, because it can. Every line in there runs on every task, so every stale line is a bug that fires every time. When mine bit me, the fix was not to add more instructions. It was to delete the wrong one.
Skills: instructions that show up only when needed
Now, the second tool, and the one people find most confusing because it sounds like the first.
AGENTS.md is always on. But most expertise is not needed most of the time. The detailed steps for filling out a PDF form are useless on a task about database migrations. If you stuffed every specialised workflow into your always-on context, you would drown the agent in irrelevant detail and pay for it on every request. That is precisely the bloat the research warned about.
Skills solve this with a beautifully simple idea: load the detail only when it is relevant. A Skill, in Anthropic’s design, is just a folder with a file called SKILL.md inside it. That file starts with a tiny bit of structured metadata, called frontmatter, and only two fields are required: a name and a description.
name: pdf-forms
description: Fill and extract fields from PDF forms.
---
# How to fill a PDF form
(the full instructions live down here,
read only when the task actually needs them)
The clever part is how it loads, in three tiers. This is the idea called progressive disclosure, and once it clicks, you see why Skills scale where a giant instruction file does not.
per skill
~5k tokens
on demand
That economy is the whole point. With an always-on file, everything you add is loaded forever. With Skills, the agent reads a one-line summary of each, and only opens the full thing for the one that matches the task in front of it. You get a big library of expertise without a big bill on every request.
So how is a Skill different from just keeping a script?
This is the question I think is genuinely worth asking, because teams already have scripts. A deploy.sh, a seed-db.py, a folder of little helpers. If the agent can run a script, why wrap it in a Skill at all?
The difference is discovery and judgement, not execution.
A plain script sits in your repo doing nothing until a human decides to run it and knows which one and with what arguments. The agent will not reach for it unless you explicitly tell it to, every time. A script is a tool waiting for someone who already knows it exists.
A Skill is that same capability, but it announces itself. Its description sits in the agent’s awareness from the start, so when a relevant task comes up, the agent recognises “this is a job for that” on its own, reads the how-to, and proceeds. And critically, a Skill can bundle a script and run it without ever loading the script’s code into the conversation. The agent runs the tool, gets the result, and spends none of its limited attention reading the implementation. The script stays a black box that just works.
| A plain script | A Skill | |
|---|---|---|
| Discovery | Human must know it exists and invoke it | Agent notices it fits the task on its own |
| Guidance | None, it is just a file | Carries instructions on when and how to use it |
| Context cost | Zero until run, but invisible to the agent | ~one line until needed, then loads in tiers |
| Can bundle code | It is the code | Yes, and runs it without reading it into context |
So you do not throw away your scripts. The good pattern is often a Skill around a script: the script does the mechanical work, and the Skill is the thin layer that tells the agent this tool exists, when it applies, and how to call it.
Where does MCP fit in all this?
If you have spent any time around AI agents lately, one more term keeps coming up, and it gets tangled with Skills constantly: MCP, the Model Context Protocol. People ask “should I build a Skill or an MCP server?” as if they are two answers to the same question. They are not. They answer different questions, and once you see the split it stops being confusing.
Here is the cleanest way I have heard it put, and it comes straight from Anthropic: MCP connects the agent to your data. Skills teach the agent what to do with that data.
Think about querying your company database. Before the agent can do anything, it needs to be able to reach the database at all, to open a connection, run a query, get rows back. That reaching-out, that plumbing to an external system, is MCP’s job. It is about connectivity. Now, separately, there is the question of how your team wants queries done, always filter by date range first, never run an unbounded scan, format results a certain way. That know-how is a Skill. One gets the agent to the data, the other tells it how to behave once it is there.
So the honest answer to “Skill or MCP?” is usually both. MCP gives the agent its hands, the ability to touch external systems. Skills give it the training, the knowledge of how your team does the thing. I have built MCP servers myself, including one for OpenStack, and the mental model that finally stuck was exactly this: the MCP server is the wiring to the infrastructure, and everything about how to use it well is a separate, teachable layer on top.
Putting it together: what goes where
Here is the mental model I have landed on. Three questions, three homes.
And one rule that sits above all three, because it is the one that actually bit me: whatever you write down, keep it true and keep it small. The research is blunt about it. More context is not better. A short, accurate AGENTS.md beats a long one. A handful of well-described Skills beats a sprawling pile. The cost of a wrong or bloated instruction is not paid once when you write it, it is paid on every request the agent ever makes, forever, until you notice and fix it.
I learned that this week, from one stale line, in a file I wrote myself. The agent did exactly what I told it. That was the problem. These tools are powerful precisely because the agent trusts them completely, which means the responsibility for keeping them honest is entirely yours.
Write less. Keep it true. Let the agent reach for the rest only when it needs it.