Job Description

**This is a Hybrid role requiring 2 days a week in a Wolters Kluwer office**

We are seeking a Lead AI Quality Engineer to ensure the quality, reliability, and trustworthiness of AI-powered product experiences in Wolters Kluwer Tax and Accounting. This role goes beyond validating that buttons click—you will design tests that confirm the system behaves correctly, measuring retrieval accuracy, citation correctness, and overall alignment of responses with user intent. You will be a key contributor in helping us deliver a system customers can trust.

Key Responsibilities:

· Design and implement evaluation harnesses to measure retrieval accuracy, citation correctness, response quality, and overall system behavior

· Develop automated tests for APIs, ingestion pipelines, and chat workflows

· Collaborate with developers and product managers to define quality metrics (accuracy, latency, cost, hallucination rate)

· Analyze logs, traces, and feedback signals to identify root causes of failures in AI-driven responses

· Create regression suites to ensure changes to prompts, chunking, or embeddings don’t break existing behavior

· Validate REST APIs and service integrations for resilience, correctness, and security

· Contribute to observability by instrumenting metrics and dashboards for system performance

· Participate in sprint planning and retrospectives, ensuring testability is built into features from day one

Key Requirements:

·Bachelors Degree in Computer Science or equivalent

.5+ years of experience in software testing, quality engineering, or equivalent engineering roles with a focus on validation and reliability.

· Experience with AI evaluation frameworks (e.g. LlamaIndex evals, OpenAI Evals, Ragas, TruLens, or custom harnesses)

· Strong skills in Python testing frameworks (Pytest, unittest, or equivalent)

· Experience testing web applications and APIs

· Familiarity with AI/ML or non-deterministic system testing

· Knowledge of CI/CD pipelines, Git, and automated regression testing

· Strong analytical skills: able to define metrics and success criteria where outputs aren’t deterministic

· Comfortable working in a fast-paced Agile environment with weekly sprints, pairing, and close collaboration with PM/UX/Dev

Desired Qualifications:

· Knowledge of retrieval-augmented generation (RAG) pipelines

· Experience with metrics/observability tooling (Grafana, Prometheus, Datadog)

· Familiarity with containerized environments (Docker, Kubernetes)

· Exposure to performance/load testing tools (Locust, k6, JMeter)

This role is critical in ensuring our AI solutions meet the high standards of accuracy and reliability expected in professional tax and accounting software.

Our Culture:
At Wolters Kluwer, our core values—Focus on Customer Success, Make it Better, Aim High and Deliver, and Win as a Team—guide everything we do. We are committed to driving success for our customers by delivering innovative solutions that exceed expectations. We continually strive to improve our processes and products, aiming for excellence in all our efforts. Collaboration and teamwork are central to our culture, enabling us to achieve great results together.

Our Interview Practices

Compensation:

$89,600.00 - $157,000.00 USDThis role is eligible for Bonus.

Additional Information:

Wolters Kluwer offers a wide variety of competitive benefits and programs to help meet your needs and balance your work and personal life, including but not limited to: Medical, Dental, & Vision Plans, 401(k), FSA/HSA, Commuter Benefits, Tuition Assistance Plan, Vacation and Sick Time, and Paid Parental Leave. Full details of our benefits are available upon request.

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application