Machine Learning Engineer (w/m/d)

AI Evaluation, Research Methods, Python, LLMObservability
Salary rangeequity, depending on experience (up to 100,000 forcandidates with exceptional relevant experience)com and tell us a little bit about yourselfand your interest in the future of writing, along with your CV or a link to your CV site.Marker is an AI-native Word Processor a reimagining of Google Docs and Microsoft Word.Join us in building the next generation of agentic AI assistants supporting serious writers in their work.We are a small, ambitious company using cutting-edge technology to give everybody writing superpowers.We are looking for someone with a couple of years experience in academia or industry who can help us bringrigour and insight to our AI systems through evaluation,research, and observability. You''ll work directly with Ryan Bowman (CPO) to help us understand and improvehow our AI assists writers. Design and implement evaluation frameworks for complex, subjective AI outputs (like writing feedbackthat''s meant to inspire rather than just correct)Build flexible evaluation pipelines that can assess quality across multiple dimensions - from humanpreference to actual writing improvementResearch and prototype new evaluation methodologies for creative and subjective AI tasksHelp define what quality means for different AI outputs and create metrics that actually matter forour usersHow do we automatically evaluate whether an AI comment successfullyencourages thoughtful revision?Fun, creative, novel, and interesting technical work at the intersection of AI research and productdevelopmentAn opportunity to work with and learn about the latest advancements in AI evaluation and language modelsDirect collaboration with leadership to shape how we understand and improve our AI systemsAs much responsibility and growth opportunities as you want to take onYou have experience with AI/ML evaluation methodologies and can speak the language of AI researchYou''ve worked hands-on with language models and understand the challenges of evaluating subjective,creative outputsYou are familiar with and have worked on related technical systems (evaluation pipelines, datacollection tools) but don''t need to be a full-stack engineer. You have some programming experience (Python preferred) and can work independently on technical projectsYou''re interested in the intersection of AI capabilities and human creativityExperience building evaluation systems for generative AI in production environmentsKnowledge of TypeScript and ability to integrate with our existing systemsBackground in human-computer interaction, computational creativity, or writing researchExperience with A/B testing, statistical analysis, and experimental designFamiliarity with modern AI observability and monitoring toolsPublished research or deep interest in AI evaluation methodologiesInterest in writing (fiction, non-fiction, essays)Be a senior software engineer - we''re looking for someone who can build evaluation systems, notarchitect our entire backendOur AI engine uses a range of models, including self-hosted and fine-tuned open source models, as wellas latest reasoning models from Anthropic and OpenAIEvaluation and research tools built primarily in Python, with integration into our TypeScriptinfrastructureOur agentic AI execution platform is written in TypeScript, hosted on Cloudflare WorkersStandard ML tooling: various evaluation frameworks, data analysis tools, and monitoring systemsOur text editor frontend is a web application built with React, TypeScript and ProseMirrorPlease note that this role is currently only available based in ourLondon hub, and at this time we are not able to sponsor work visas in the UK.#
Other jobs of interest...
Perform a fresh search...
-
Create your ideal job search criteria by
completing our quick and simple form and
receive daily job alerts tailored to you!