In a recent consultation with a financial company, we were reminded of a simple truth: AI features are only as good as the content they rely on.
The company added AI capabilities to its developer portal to improve search, support, and API understanding. Yet despite using reliable and well-tested AI tools, developers struggled to find answers, and search results often returned irrelevant documents. As it turned out, the content those AI systems depended on was neither structured nor consistent enough to support them effectively.
This issue is not unique. As more organizations enhance their developer portals with AI, it is easy to overlook a critical truth: successful AI-augmentations need clear, structured, and consistent content.
In this article, we will explain why content strategy is essential for AI success and what are the investment requirements of different structured content formats.
Table of Contents:
- The impact of content strategy on AI success
- Investment requirements of different structured content formats
- Benefits of high-quality content
- Content strategy and AI success: closing thoughts
- 5 Practical Actions Toward AI-readiness (Part 2)
The impact of content strategy on AI success
AI holds great potential, but according to a Harvard Business Review article from late 2023, around 80% of AI projects fail, often due to:
- Poor data quality
- Lack of relevant data
- A limited understanding of AI processing
The solution starts with the foundation: turning raw data into high-quality, structured content that aligns with how AI systems search and retrieve information. This approach not only makes your content AI-ready but also enhances the developer experience by speeding up onboarding and increasing self-service success rates. Quality content is a valuable investment, offering measurable returns, whether or not AI is used.
So, what exactly makes content structured and AI-ready? And how does unstructured content fall short in this context?
The problem with unstructured content
Unstructured in this context does not mean messy or disorganized. It means the content lacks a machine-readable structure. Formats like PDFs, Word documents, or Google Docs are created for humans to read and print, not for machines to process.
These formats rely on visual cues (like font size, styles, or spacing) to indicate structure. Technically speaking, unstructured content has only a presentation layer, lacking a semantic layer.
Semantics refer to the meaning behind the structure, language, and flow of the documentation. Can semantics be designed? Absolutely. API documentation writers and designers often intentionally design the semantics, just like UX designers do for apps. A thoughtful structure helps users understand not just how to use an API, but why it works the way it does. A semantic layer helps all users, but it is mission-critical for machine-readability, especially if the goal is meaningful interpretation, automation, or transformation.
The image has been generated with the help of ChatGPT and Bing Create (DALL-E).
While modern AI tools can often guess what is a heading or a list, they are still working with limited context. This guesswork makes it harder and more resource-intensive to extract consistent meaning across documents, especially when styles vary.
Unstructured content remains dominant in (internal) knowledge bases because it aligns with familiar habits. Most subject matter experts are used to writing with tools like Word or Google Docs, and these editors have become standard across organizations. Companies often have large volumes of content already created in these formats. These tools make it easy to “decorate” text, such as changing font size and bolding headings, without assigning any real semantic meaning. As a result, content may look organized to the eye, but lacks the markup that helps AI understand the purpose and relationships among content elements.
When the goal is to support AI-driven features like smart search or automated assistance, these formatting tools and habits of people often become a limitation.
Since companies have already invested in at least one publishing system, it is understandable that they seek solutions that do not require a complete overhaul or the adoption of complex tools. To make a future-proof decision, it is important that you see how transitioning to a more structured approach to content involves practical adjustments, not necessarily major changes.
But before we dive into the practicalities, let's first explore structured content and how choosing the right format can benefit both startups and enterprise corporations on their AI journey.
The importance of structured content
As mentioned previously, "unstructured does not mean messy or disorganized", and similarly, structured content is not inherently more user-friendly or well-organized.
What makes content truly structured is its use of semantic markup: markers and tags that define the meaning and relationships within the content, not (just) its visual appearance. Semantic markup ensures that both humans and AI can understand the context and intent of the content.
Where semantics matter in narrative API documentation
To support automation, search, and content reuse, the narrative layers of API documentation must be structured with semantic clarity. This is especially important in tutorials, onboarding guides, and use case documentation, which often contain key implementation knowledge not captured in reference materials. Without structure, much of this content remains opaque to machines and difficult to process at scale.
Consider step-by-step tutorials. These are typically written in prose and may vary significantly in format. When each step is clearly defined as an action or phase (such as “install,” “configure,” “send request,” or “verify response”) documentation tools and processing systems can more easily extract meaningful sequences or convert them into alternative formats, such as onboarding flows or test scripts.
Elements like prerequisites, configuration steps, and environment variables are far easier to validate and automate when presented in a structured way. Tutorials that include .env files, shell setup commands, or YAML examples can be tested or verified automatically, provided the context and intent are clearly indicated.
In use case documentation, semantics help convey the relationships between steps and concepts. A guide on recurring payments, for instance, is more than a list of API calls: it describes a workflow involving domain-specific elements like “subscription,” “billing cycle,” or “webhook.” When these are structured and named consistently, machines can help maintain consistency, link related documentation, and improve navigation across topics.
Even the inline elements of a guide carry weight. Marking callouts as “Note,” “Warning,” or “Best Practice” allows downstream tools such as intelligent linters, onboarding bots, or automated documentation parsers to surface them contextually.
Investment requirements of different structured content formats
Common structured content formats that use semantic markup include HTML, Markdown, DITA, ReStructured Text, YAML, and JSON.
While all these formats provide structure, they differ in complexity and AI-processing efficiency. Now, we will compare them based on computational cost, learning curve, writing and maintenance ease, and implementation costs.
Computational costs
AI systems process content by breaking it into tokens (small chunks of meaningful data). The more complex the format, the more tokens are needed to represent the same information, which increases computational costs.
Here is a breakdown of how different formats compare in terms of tokens per 100 words:
Format |
Token estimate per 100 words |
What the estimate includes |
Markdown | 120-140 | Includes typical structure: headers (# Heading), emphasis (**bold**), links ([text](url)), and lists. |
DITA XML | 300-350 | Heavy use of tags: <topic>, <title>, <section>, <p>, <codeblock>, etc. - markup often outweighs content. |
ReStructured Text (RST) | 150-200 | Includes headers, inline roles (e.g., :fieldname: Field content), directives, and links. |
YAML | 160-180 | Includes typical structure: nested key-value pairs, simple schemas (e.g., API specs, config files). |
JSON | 180-200 | Includes quoted keys/values, brackets, braces — more verbose than YAML. |
Skills, tools, and time
Markdown
- Learning curve: Markdown’s simple syntax makes it accessible even to non-technical users.
- Writing and maintenance: Easy to write and maintain due to its minimalism. Content is easier to edit and read compared to more complex formats.
- Implementation costs: Very low. Markdown is widely supported and does not require special tools or subscriptions.
- Best for: Ideal for most developer portal content, including API docs, tutorials, and guides.
Example Syntax:
# Title This is a paragraph with [a link](http://link.com). |
DITA XML
- Learning curve: Steep. DITA is XML-based, requiring understanding of tags, attributes, and content models.
- Writing and maintenance: Writing is slow due to its detailed structure, and maintenance can be difficult with large-scale content.
- Implementation costs: High. DITA requires specialized tools, often adding extra costs for frameworks or subscriptions.
- Best for: Suitable for large, complex technical documentation, but overkill for smaller developer portals. DITA’s detailed structure is ideal for organizations with specific, enterprise-level needs.
Example Syntax:
<topic> <title>Title</title> <body> <p>This is a paragraph with <link href="link.com">a link</link>.</p> </body> </topic> |
ReStructured Text (RST)
- Learning curve: Moderate. Syntax is more complex than Markdown but easier than DITA.
- Writing and maintenance: Easier than DITA but still requires more effort to format documents. Works well for technical documentation with code blocks and tables.
- Implementation costs: Low to moderate. Tools like Sphinx are often used, with minimal setup.
- Best for: Common in Python documentation and other software-based portals. A solid middle ground for technical content with moderate complexity.
Example Syntax:
Title ===== This is a paragraph with `a link <http://link.com>`_. |
YAML
- Learning curve: YAML is easy to read but requires attention to indentation and structure.
- Writing and maintenance: Simple for developers and easy to maintain, especially for structured data like configuration files and API documentation.
- Implementation costs: Low. YAML is widely supported in the tech ecosystem, and many tools can parse and generate YAML content.
- Best for: Suitable for API reference documentation where structured data is necessary. Frequently used in REST API docs and configuration management.
Example syntax:
title: "API Documentation" description: "This is the API reference." endpoints: - name: "Get User" url: "/user" method: "GET" |
JSON
- Learning curve: Moderate to steep. While JSON is simple to read, it requires precision in formatting, especially for complex data structures.
- Writing and maintenance: Easy to read but can become cumbersome with large or deeply nested data. Requires attention to detail to avoid errors in the structure.
- Implementation costs: Moderate. Many tools support JSON, but managing large datasets might require more advanced tooling or custom scripts.
- Best for: Widely used in API reference documentation due to its structured format and easy integration with systems. Works well for machine-readable content and integration with various platforms.
Example Syntax:
{ "title": "API Documentation", "description": "This is the API reference.", "endpoints": [ { "name": "Get User", "url": "/user", "method": "GET" } ] } |
Which format is right for you?
For most developer portals, Markdown is the best choice due to its simplicity, low computational cost, and broad support. YAML and JSON are excellent choices for API reference documentation because they provide structured data that integrates easily with systems. RST is a good middle ground for more technical content, while DITA is better suited for large, complex documentation systems in enterprise environments.
A plain text editor that displays structured content with semantic markup on the left and the rendered version of the content on the right. Source: markdownguide.org.
Benefits of high-quality content
As discussed earlier, structure alone does not guarantee good content. Just like unstructured content can be clear and useful, structured content can still be inconsistent, incomplete, or irrelevant. For AI to generate meaningful results and for users to have a smooth experience, quality is non-negotiable.
Structured content only becomes truly effective when it is also high-quality: well-written, complete, up-to-date, and tailored to its audience. This combination does not just help AI systems, it improves every aspect of content management and user engagement.
Here you can find some examples of how quality structured content benefits both users and AI systems:
Quality ingredient |
User benefits |
AI benefits |
Clear structure and navigation | Improves readability and makes it easier to scan and navigate content. | Enables better AI-powered search, summarization, and contextual assistance. |
Semantic markup and metadata | Enhances internal search and user experience with clearer organization. | Boosts discoverability in AI systems and improves recommendation accuracy. |
Readable, accessible format | Supports screen readers and improves accessibility for all users. | Helps AI interpret and process content it cannot visually perceive (e.g., alt text, headings). |
Complete, example-rich content | Provides clarity and real-world context through code samples and guides. | Improves AI responses, auto-generation quality, and contextual relevance. |
Enforced style and governance | Ensures consistency across teams and content types through linting and rules. | Guides AI to produce content aligned with your voice, tone, and documentation standards. |
Versioning and updates | Keeps users informed with the latest and most relevant information. | Prevents outdated or conflicting AI outputs by offering reliable, up-to-date source material. |
User feedback and analytics | Enables continuous improvement based on real user behavior and needs. | Helps AI personalize results and adapt recommendations based on user interaction patterns. |
AI does not just "consume" content - it learns from it. It mimics its structure, language, and tone. If the source is poor, the output will be poor too.

How do AI tools interact with developer portals, and how do their actions mimic or differ from human users?
Content strategy and AI success: closing thoughts
Quality, structured content is the fundamental building block for effective AI integration in developer portals, leading to better search, automation, accessibility, and personalization. As we mentioned at the beginning of the article, AI features are only as effective as the content they rely on.
While humans easily interpret structure based on visual presentation, like layout, font size, or grouping, AI systems must infer meaning from limited or inconsistent cues. This process is both resource-intensive and prone to errors, especially when content is styled for aesthetics rather than structured for clarity.
As we see, there is a growing need for structured formats, even lightweight ones, like Markdown, which use semantic markup to convey meaning and relationships within the content. If you make your content AI ready, it will also support your users, your team, and your long-term strategy.
Our Technical Writer team can assist in enhancing the clarity and usability of your content through in-depth reviews, actionable recommendations, and structured templates.

How does your developer portal’s UX and Technical Writing compare to industry standards? Our assessment provides a health score and highlights key areas for improvement.
Part 2 of the article examines concrete actions organizations can take to prepare their content for effective AI implementation.
Resources
- Preparing for a World Mediated by AI
- Keep Your AI Projects on Track by Iavor Bojinov (Harvard Business Review)
- 3 ideas on AI readiness, the role of APIs and developer portals in generative AI systems
- Fabrizio Ferri-Benedetti’s Should you write documentation differently for LLMs?
- Meet your future co-worker: Understanding the rise of AI Agents
- Chris Despopoulos’:
- Dries Buytaert’s How AI could reshape CMS platforms
- AI-Ready Content Accelerator
- Unlocking the Value of AI-Ready Content: Navigating Regulatory Compliance
- Before you invest in AI, assess your AI-readiness + Reduce image hide-and-seek with artificial intelligence
- How to Prepare Content for AI
- Technical Writing Guidelines to Create AI Friendly Content
- How “AI-Ready” Is Your Content?
- AI in CMS: What Can You Really Do with Your Website?
- Preparing Product Content for AI: Unified Knowledge and Governance in Technical Documentation
- 5 Key Benefits of Integrating AI into Your Business
- AI-ready data: Roadblocks, best practices, benefits, applications, tools and technologies, and future trends
All Pronovix publications are the fruit of a team effort, enabled by the research and collective knowledge of the entire Pronovix team. Our ideas and experiences are greatly shaped by our clients and the communities we participate in.