A2A Protocol
A Lingua Franca for AI Agents, a look into the A2A specification
What and why A2A?
Specificaions like MCP or frameworks like the Vercel SDK have seen strong adoption because they offered ways to standardize LLM interactions. Switching your applications models or MCP Servers/tools is now a breeze, lowering the barrier to entry and encouraging more development in areas such as LLM tooling as builders can be sure integrating their tools will be less hassle for end users if all parties follow the spec.
That's where the A2A Protocol comes into play. It's google's apporoach to standardized agent to agent communication.
As it provides a common language for agents built using diverse frameworks and by different vendors enabling interoperability.
So A2A aims to be the common langguage for agents allowing them to communicate / orchestrate one another.
How does it work?
A2A focuses only on the inputs and outputs of agents omitting agent internal details such as memory, tools, or resource access to align with
standard client-server security paradigms, treating remote agents as standard HTTP-based enterprise applications.
Similar to MCP its build on top of HTTP and JSON-RPC (find more JSON-RPC details in my MCP Compendium).
In short you have a A2A Client (Client Agent) sending requests to a A2A Server (Server Agent) answering with a response, the servers can also send notifications
for long runnning async operations and the responses can also be streamed in via SSE (Server-Sent Events) to receive real-time, incremental results or status updates.
Authentication
Like MCP A2A uses standard authorization protocols such as OAuth 2.0 to authenticate clients and servers. In practice OAuth tokens, API keys are typically passed through HTTP headers, separate from the A2A protocol messages.
Discovery / Agent Card
The most common ways for A2A Clients to discover A2A Servers is via a well-known URI like https://{agent-server-domain}/.well-known/agent-card.json
which serves as a pointer to the Agent Card which is a json file defining the
agent's identity, capabilities, endpoint, skills, and authentication requirements you can think of it like an Agents ID Card.
1// Example Agent Card
2{
3 "name": "Meal Planner Agent",
4 "description": "Plans a meals for the week based on the dietary preferences",
5 "url": "https://meal-planner-agent.com/a2a",
6 "version": "1.0.0",
7 "capabilities": {
8 "streaming": true,
9 "pushNotifications": true,
10 "stateTransitionHistory": false
11 },
12 "defaultInputModes": ["text", "text/plain"],
13 "defaultOutputModes": ["text", "text/plain"],
14 "skills": [
15 {
16 "id": "plan-meal",
17 "name": "Plan Meal",
18 "description": "Plans a meal for the week based on the dietary preferences"
19 },
20 {
21 "id": "generate-recipe",
22 "name": "Generate Recipe",
23 "description": "Generates a recipe for a meal based on available ingredients"
24 }
25 ],
26 "securitySchemes": {
27 "apiKeySecurityScheme": {
28 "in": "header",
29 "type": "apiKey",
30 "name": "X-API-Key",
31 "description": "API key authentication required"
32 }
33 },
34 "security": [
35 {
36 "apiKeySecurityScheme": []
37 }
38 ]
39}
40Communication Types
When an agent receives a message from a client, it can respond in one of two fundamental ways
- Respond with a Stateless
Message:
- response used for immediate, self-contained interactions that conclude without requiring further state management (e.g. a simple question and answer).
- Initiate a Stateful
Task:
- response used for longer running interactions that require state management (e.g. a complex task that requires multiple steps and inputs).
Response Components
An Agent can respond with a Message or an Artifact where an Artifact is a tangible output generated by an agent during a task,
composed of Parts which are the smallest unit of content within a Message or Artifact (for example, TextPart, FilePart, DataPart).
Terminology Overview
| Element | Description | Key Purpose |
|---|---|---|
| A2A Client | agent that initiates requests to an A2A Server on behalf of a user or another system | Initiates requests |
| A2A Server | agent that exposes an A2A-compliant endpoint, processing tasks and providing responses | Processes tasks and provides responses |
| Agent Card | JSON file contianing all relevant information about agents capabilities, skills, endpoint, and authentication requirements | Enables clients to understand how to interact with them securely and effectively. |
| Task | state of agent 2 agent conversation initiated by an agent, with a unique ID and defined lifecycle. | Facilitates stateful multi-turn interactions and collaboration. |
| Message | single turn of communication between a client and an agent | Prompt / Task, Answer / Result |
| Artifact | output generated by an agent during a task (e.g. document, image, or structured data). | Medium for concrete results of agent work, ensuring structured and referenceable outputs. |
| Part | smallest unit of returned result (e.g. TextPart, FilePart, DataPart) | flexible medium for agent to express its results |
| Context | server-generated identifier to group related tasks. | Allows logical grouping of related tasks. |
| Extension | extending the A2A protocol with new data, requirements, RPC methods, and state machines defined in the agent card | Declares additional functionality beyond the specification |