AGENTS.md vs Skills vs Plain Scripts: What Goes Where, and Why It Matters

This week I gave an AI agent a small, wrong instruction. Not on purpose. I had a file in my project telling it how to behave, and one line in that file pointed at something that had since moved. The agent read the file, trusted it completely, and confidently did the wrong thing. No error. No warning. Just a quietly worse outcome that took me a minute to even notice.

That is the whole reason this post exists. We now have a few different ways to hand an AI agent instructions about our projects, and they look similar enough that people throw them around interchangeably. They are not interchangeable. Each one has a job, a cost, and a failure mode. Get them mixed up and you do not get an error message, you get an agent that is subtly worse than the one you started with.

There is a bigger shift underneath this. For years, the way we automated work was a folder full of little scripts. A deploy.sh here, a seed-db.py there, a dozen helpers that each did one thing. They worked beautifully right up until an API changed, a token expired, or the task needed one more condition the script did not anticipate. In 2026 the industry is steadily moving past that brittleness toward agents that read context, decide, and adapt. But “give it to an agent” is not one decision. It splits into several, and that is what trips people up.

Let me walk through the pieces you actually run into: a project’s AGENTS.md file, Skills, the plain scripts you already have, and where MCP fits alongside all of it. By the end you will know exactly what belongs where, and why putting the wrong thing in the wrong place hurts.

First, the thing nobody tells you

Every coding agent, before it touches your project, does the same thing a new hire does on their first day. It looks around. It reads the file tree, the package manifest, the README. The problem is that a README was written for a human. It explains what the project is. It does not explain how an agent should work in it: which build command with which flags, which files to never touch, what the house style is when it differs from the default.

That gap is what AGENTS.md fills. It is a plain Markdown file you drop at the root of your repo, and it has become a genuine open standard. As of 2026 it is read by more than thirty different agents, across tens of thousands of repositories. OpenAI’s Codex, Cursor, GitHub Copilot, Gemini’s CLI, and many more all look for it.

AGENTS.md

# Build
build: npm run build
test: npm test -- --silent

# Rules
- never edit files in /vendor
- commit messages: present tense
- the API client lives in src/api, not src/lib

An AGENTS.md is just Markdown. No required fields, no schema. The point is to write down the things a README leaves out: exact commands, hard boundaries, and the small rules that are obvious to you and invisible to a newcomer.

I actually made one of these for this very website. It is not committed publicly, it sits on my machine, and it carries the rules I care about: never publish my resume, do not invent fake citations, keep a specific writing voice. When an agent works on the site, it reads that file first and stays inside the lines. That is AGENTS.md doing its job: persistent project context, loaded every time, shaping everything that follows.

And here is the trap

If AGENTS.md is loaded every time and shapes everything, then a wrong line in it poisons everything. This is not hypothetical hand-wringing. There is real 2026 research on exactly this, looking at over a hundred real-world repositories.

The finding is uncomfortable and worth sitting with. Context files that were generated automatically by an LLM actually reduced how often agents succeeded at their tasks, while increasing cost by more than twenty percent. Human-written files did better, but only marginally, and only when they were short and precise. A bloated or slightly-stale instruction file is not neutral. It is a tax you pay on every single request, and sometimes it actively steers the agent wrong.

"the API client lives in src/api"

→ correct, agent edits the right file

"the API client lives in src/api"

→ but you moved it to src/core last month

Same line, two different worlds. The agent has no way to know your file is stale. It trusts the instruction over the actual code, edits the wrong place, and reports success. The file did not get an error, it got obeyed.

This is exactly what happened to me. The lesson is not “do not use AGENTS.md.” It is the opposite of how most people treat it. Keep it short. Keep it true. Treat it like code that can rot, because it can. Every line in there runs on every task, so every stale line is a bug that fires every time. When mine bit me, the fix was not to add more instructions. It was to delete the wrong one.

Skills: instructions that show up only when needed

Now, the second tool, and the one people find most confusing because it sounds like the first.

AGENTS.md is always on. But most expertise is not needed most of the time. The detailed steps for filling out a PDF form are useless on a task about database migrations. If you stuffed every specialised workflow into your always-on context, you would drown the agent in irrelevant detail and pay for it on every request. That is precisely the bloat the research warned about.

Skills solve this with a beautifully simple idea: load the detail only when it is relevant. A Skill, in Anthropic’s design, is just a folder with a file called SKILL.md inside it. That file starts with a tiny bit of structured metadata, called frontmatter, and only two fields are required: a name and a description.

pdf-forms/SKILL.md

---
name: pdf-forms
description: Fill and extract fields from PDF forms.
---

# How to fill a PDF form
(the full instructions live down here,
read only when the task actually needs them)

A Skill is a folder with a SKILL.md. The frontmatter up top (name and description, both required) is the only part the agent reads by default. The body below is loaded on demand.

The clever part is how it loads, in three tiers. This is the idea called progressive disclosure, and once it clicks, you see why Skills scale where a giant instruction file does not.

At startup: just the name and descriptionEvery installed Skill contributes only its one-line summary, so the agent knows what exists and when each might apply.

~100 tokens
per skill

On a match: the full SKILL.mdWhen your task matches a description, the agent reads that skill's full instructions. Nothing else loads.

under
~5k tokens

Only if needed: bundled files and scriptsExtra reference files or code the skill ships with are opened or run only at the exact moment they are required.

loaded
on demand

Three tiers, like a manual with a table of contents, then chapters, then an appendix. Because tier one is so cheap, you can have fifty Skills installed and pay almost nothing for the forty-nine that are not relevant to what you are doing right now.

That economy is the whole point. With an always-on file, everything you add is loaded forever. With Skills, the agent reads a one-line summary of each, and only opens the full thing for the one that matches the task in front of it. You get a big library of expertise without a big bill on every request.

So how is a Skill different from just keeping a script?

This is the question I think is genuinely worth asking, because teams already have scripts. A deploy.sh, a seed-db.py, a folder of little helpers. If the agent can run a script, why wrap it in a Skill at all?

The difference is discovery and judgement, not execution.

A plain script sits in your repo doing nothing until a human decides to run it and knows which one and with what arguments. The agent will not reach for it unless you explicitly tell it to, every time. A script is a tool waiting for someone who already knows it exists.

A Skill is that same capability, but it announces itself. Its description sits in the agent’s awareness from the start, so when a relevant task comes up, the agent recognises “this is a job for that” on its own, reads the how-to, and proceeds. And critically, a Skill can bundle a script and run it without ever loading the script’s code into the conversation. The agent runs the tool, gets the result, and spends none of its limited attention reading the implementation. The script stays a black box that just works.

	A plain script	A Skill
Discovery	Human must know it exists and invoke it	Agent notices it fits the task on its own
Guidance	None, it is just a file	Carries instructions on when and how to use it
Context cost	Zero until run, but invisible to the agent	~one line until needed, then loads in tiers
Can bundle code	It is the code	Yes, and runs it without reading it into context

A script is a capability. A Skill is a capability that knows when to volunteer itself, explains how it should be used, and can still run real code underneath. Skills do not replace scripts. They wrap them in discovery and judgement.

So you do not throw away your scripts. The good pattern is often a Skill around a script: the script does the mechanical work, and the Skill is the thin layer that tells the agent this tool exists, when it applies, and how to call it.

Where does MCP fit in all this?

If you have spent any time around AI agents lately, one more term keeps coming up, and it gets tangled with Skills constantly: MCP, the Model Context Protocol. People ask “should I build a Skill or an MCP server?” as if they are two answers to the same question. They are not. They answer different questions, and once you see the split it stops being confusing.

Here is the cleanest way I have heard it put, and it comes straight from Anthropic: MCP connects the agent to your data. Skills teach the agent what to do with that data.

Think about querying your company database. Before the agent can do anything, it needs to be able to reach the database at all, to open a connection, run a query, get rows back. That reaching-out, that plumbing to an external system, is MCP’s job. It is about connectivity. Now, separately, there is the question of how your team wants queries done, always filter by date range first, never run an unbounded scan, format results a certain way. That know-how is a Skill. One gets the agent to the data, the other tells it how to behave once it is there.

MCP is about connection. "Let the agent reach this database / repo / tool."

connect to

Skills are about procedure. "Here is how we want that tool used."

how to use

Scripts are the mechanical action underneath either one.

the doing

Three different axes, not three competing choices. MCP opens the door to an external system. A Skill is the instruction manual for working inside it. A script is the actual mechanical step. The strongest setups use all three together: MCP for access, a Skill for judgement, a script for the work.

So the honest answer to “Skill or MCP?” is usually both. MCP gives the agent its hands, the ability to touch external systems. Skills give it the training, the knowledge of how your team does the thing. I have built MCP servers myself, including one for OpenStack, and the mental model that finally stuck was exactly this: the MCP server is the wiring to the infrastructure, and everything about how to use it well is a separate, teachable layer on top.

Putting it together: what goes where

Here is the mental model I have landed on. Three questions, three homes.

Is this always true about the project? (build commands, hard rules, boundaries)

AGENTS.md

Is this expertise for a specific kind of task the agent should reach for only sometimes?

a Skill

Is this a mechanical action with no judgement, that a human or a Skill triggers?

a script

The same litmus test, every time. Always-on truth goes in the always-on file. Sometimes-needed know-how becomes a Skill. Pure mechanical work stays a script, ideally wrapped by a Skill so the agent can find it.

And one rule that sits above all three, because it is the one that actually bit me: whatever you write down, keep it true and keep it small. The research is blunt about it. More context is not better. A short, accurate AGENTS.md beats a long one. A handful of well-described Skills beats a sprawling pile. The cost of a wrong or bloated instruction is not paid once when you write it, it is paid on every request the agent ever makes, forever, until you notice and fix it.

I learned that this week, from one stale line, in a file I wrote myself. The agent did exactly what I told it. That was the problem. These tools are powerful precisely because the agent trusts them completely, which means the responsibility for keeping them honest is entirely yours.

Write less. Keep it true. Let the agent reach for the rest only when it needs it.