AI systems fail in boring ways first. The button does nothing. The auth flow breaks. The agent uses the wrong account. The report looks plausible and is missing the one field the operator needed.

This is a contract role for someone who likes finding those problems before a client finds them. You should be comfortable testing web apps, internal tools, automations, AI workflows, and handoffs between systems. Some of the work will look like normal QA. Some of it will require asking what the AI system was supposed to do, then trying to make it do the wrong thing.

The right person writes clean bug reports, builds useful test checklists, notices edge cases, and understands that quality is partly technical and partly operational. If a workflow passes the happy path and still confuses the user, you should catch that too.

What you will do

Test client-facing and internal systems before release.
Run structured QA passes across web apps, agents, automations, integrations, and reporting workflows.
Write clear reproduction steps, expected behavior, screenshots, screen recordings, and severity notes.
Create lightweight regression checklists for systems that will keep changing after launch.
Test AI-specific failure modes: bad instructions, missing context, hallucinated fields, permission mistakes, brittle prompts, and confusing handoffs.

What we need

You have done QA, product testing, support engineering, implementation QA, or similarly detail-heavy work.
You can test like a user and report like an engineer.
You are comfortable with modern software tools: browsers, consoles, logs, issue trackers, screenshots, screen recordings, and basic API inspection.
You can understand a workflow well enough to spot when the software technically works and still fails the job.
You communicate precisely. A vague bug report is just a small crime.

Strong signals

If any of these describe you, the conversation will move quickly.

You naturally try the weird path after the happy path works.
You can turn a messy workflow into a practical test plan without a giant QA ceremony.
You have tested AI tools, automations, data pipelines, internal tools, or products with lots of edge cases.
You care about the user's actual job, not just whether the acceptance criteria were technically satisfied.