Experiment #3.

Challenge:
Can AI meaningfully assist in evaluating and improving accessibility when paired with real assistive technology testing?

Rather than relying on generalized WCAG checklists, the challenge was to validate a dynamic, traveler-specific module using actual screen reader navigation — and determine where AI guidance aligns with lived interaction.


Theory:

If I combine AI-generated structural recommendations with hands-on testing in Safari using VoiceOver, I should be able to identify gaps between ideal semantic markup and real user experience.

AI can surface patterns and best practices quickly. Assistive technology testing reveals how those patterns actually behave.

Together, they may create a more rigorous accessibility workflow.


Assumptions:

  • The dynamic Family Traveler module represents critical information and warrants structural hierarchy (H2 + H3).

  • AI accessibility reviews will be broad but not definitive.

  • Screen reader navigation patterns (Rotor, Tab, VO arrow navigation) will surface issues not visible in visual review.

  • Some perceived accessibility issues may be environmental or configuration-based rather than structural.

AI Tools:

• Google Notebook
• Chat GPT
• v0 by Vercel

Additional Tools:

• Safari Browser
• VoiceOver
• Silktide

Persona driven scenarios:

Context: Booked lodging accessibility audit

Persona type: UX Designer

Goal: Evaluate where AI meaningfully accelerates accessibility review — and where it introduces risk or false confidence.

Clean the Environment

I began with the Family Traveler screen (from Experiment #2) and removed comparison color coding and variant controls from earlier experiments. The goal was to evaluate the module in a realistic, production-like state rather than a testing environment.

Broad WCAG Review via Agent

I prompted the agent to conduct a broad WCAG review of the page.

The output identified potential issues and best practices, but it remained generalized. It provided guidance — not confirmation. That distinction became important.

Prompts and agent responses

Assistive Technology Review
(Safari + VoiceOver)

I reviewed the page in Safari with VoiceOver enabled, testing multiple navigation models:

  • Rotor menus (Headings, Links, Landmarks)

  • Tab key navigation

  • Control + Option + Left/Right Arrow navigation

Each interaction model surfaced different insights about structure, hierarchy, and focus behavior. This moved the exercise from theoretical compliance to lived navigation.

Screen with web Rotor menus overlayed

Introduce Structural Hierarchy
(Pool Hours)

Because the Family Traveler module contains critical, contextual information, I promoted it to an Accessibility Header H2 and added corresponding Accessibility Header H3s for each item listed in the Pool Hours tab.

I reviewed again using VoiceOver and Rotor navigation. The heading structure improved discoverability and supported section-level skimming.

Prompts in insert Heading levels followed by review in the Rotor menu

Extend Hierarchy to Kid-Friendly Tab

I repeated the same structural approach for the Kid-Friendly tab, adding an Accessibility Header H2 and nested Accessibility Header H3s for each restaurant listed.

During Accessibility review, I noticed the restaurant links were not receiving focus during tab navigation. I prompted the agent to evaluate the issue.

Reviewing Heading levels and then Focus order with Silktide plugin control

Configuration vs. Structural Issue

Further testing revealed the issue was environmental, not semantic. Full Keyboard Access was disabled. Once enabled, the links received focus correctly. They had always appeared in the Rotor’s Links menu.

This reinforced an important distinction: not all accessibility issues are structural. Some are configuration-based.

Semantic HTML + Simulation

I prompted the agent to generate semantic HTML and simulate screen reader output. When the agent started by narrating the visual experience, I redirected it to focus strictly on screen reader behavior.

I then prompted the agent to simulate keyboard navigation using the Tab key and evaluate the resulting experience.

Structurally, the output was correct. However, during real testing with VoiceOver, the system-level instructional prompts (e.g., “to interact with this button…”) add additional verbosity. While technically helpful, this layered narration raises a broader question: when does assistive guidance begin to create
cognitive drag?

Insight: Simulated vs. Verified Accessibility

The simulated narration was structurally sound — perhaps even idealized. That raised a deeper question: is AI modeling best-case semantics, or reflecting real-world implementation constraints?

The tension between simulated accessibility and verified experience introduced broader considerations around cognitive load, redundancy, and practical DOM behavior.

Accessibility is not just structural correctness. It is experiential clarity.

Was this a successful experiment?

Yes. This experiment moved beyond theoretical accessibility and into real assistive technology testing. AI helped surface and remediate potential WCAG issues early, which streamlined the review process.

But pairing those updates with live VoiceOver testing revealed a deeper layer — structural correctness does not automatically translate to experiential clarity.

What surprised me?

How relatively easy it was to append accessibility annotations through the agent and see them reflected immediately in the live environment.

Adding structural updates — like heading hierarchy and semantic adjustments — translated quickly into real VoiceOver behavior. The feedback loop between prompting, implementation, and testing was faster than expected.

The level of narrative detail in the simulated walkthrough.

When prompted to simulate screen reader behavior, the agent didn’t just describe structure — it produced a guided narration of the experience. While informative, it blurred the line between realistic assistive output and idealized explanation.

What did I not expect?

What would I have done differently?

Given that this was my first deliberate accessibility review using assistive technology, it’s difficult to say.

The process itself revealed the learning curve — understanding VoiceOver behavior, configuration settings, verbosity layers, and how AI simulation differs from real interaction. In hindsight, I might have defined clearer evaluation criteria at the outset, but much of the value came from discovering what I didn’t yet know.