Coding Challenge #122 - AI-Powered Contract Review Agent
This challenge is to build your own AI powered agent to review documents.
Hi, this is John with this week’s Coding Challenge.
🙏 Thank you for being a subscriber, I’m honoured to have you as a reader. 🎉
If there is a Coding Challenge you’d like to see, please let me know by replying to this email📧
Coding Challenge #122 - AI-Powered Contract Review Agent
This challenge is to build your own AI-powered contract review agent using Trigger.dev - an application that takes a PDF contract, breaks it into clauses, analyses each one for risk in parallel using LLMs, pauses for human review, and streams back a final summary. Whilst the example of contract review, the workflow is applicable to many other domains.
This challenge was created in collaboration with Trigger.dev, whose platform provides durable background tasks with no timeouts, built-in retries, concurrency controls, human-in-the-loop pause points, real-time streaming, and full observability - all in TypeScript.
Contract review is one of the most time-consuming parts of legal work. Lawyers spend hours poring over dense documents looking for risky clauses, ambiguous language, and missing terms. An AI agent that can do the first pass, flag issues, and then incorporate human feedback before producing a final report would save enormous amounts of time.
Building this from scratch means you’d need to solve several hard infrastructure problems: job queuing with retries, parallel execution with concurrency control, durable pause-and-resume for human review, real-time streaming to the frontend, and execution tracing. That’s exactly what Trigger.dev handles for you. You define your workflow as a set of tasks - functions that can run for as long as needed - and Trigger.dev takes care of the rest. Your focus stays on the application logic: extracting clauses, analysing risk, and generating summaries.
By the end you’ll have a deep understanding of durable workflow orchestration and how a platform like Trigger.dev makes building AI-powered applications dramatically simpler.
The Challenge - Building Your Own AI-Powered Contract Review Agent
You’re going to build a web application that lets users upload PDF contracts and then kicks off a series of durable Trigger.dev tasks. Those tasks will extract text, split it into clauses, analyse each clause in parallel with automatic retries, pause for human approval, and stream a final summary back to the frontend in real time. The system will handle contracts with 50+ clauses reliably, support multiple LLM providers, and give you complete visibility into every step through Trigger.dev’s built-in dashboard.
Step Zero
In this introductory step you’re going to set your environment up ready to begin developing and testing your solution.
You’ll need to make a few decisions:
Set up your Trigger.dev project. Trigger.dev is a TypeScript-first platform, so you’ll be working in that ecosystem. Scaffold a new project using
npx create-trigger@latestand follow the quickstart to get a task running. Choose a framework for your web app - Next.js is a natural fit since it pairs well with Trigger.dev, but you can also use Express, Remix, or any Node.js framework. Get your Trigger.dev API key from the dashboard and configure your environment.Choose your LLM providers. Sign up for API keys. You’ll build an abstraction layer so you can swap providers without changing the rest of your code.
Choose your database. You’ll need to persist users, contracts, clause analyses, reviewer decisions, and final summaries. PostgreSQL works well and integrates naturally with Prisma. Pick what you’re most comfortable with.
Understand the Trigger.dev project structure. Your tasks live in the
trigger/folder. Each file defines one or more tasks using thetask()function. These are functions that can run indefinitely with no timeouts, automatic retries on failure, and built-in logging. You trigger tasks from your web app, and you can chain tasks together usingtriggerAndWait()orbatchTriggerAndWait(). Take a few minutes to read through the tasks documentation and get comfortable with the concepts.
Testing: Run the example task that create-trigger generates (usually called hello-world). Trigger it from your web app and verify it appears in the Trigger.dev dashboard. Make a simple API call to each of your chosen LLM providers with a basic prompt and verify you get a coherent response. Set up your database, create a test table, and verify you can read and write data. Once all three are working independently, you’re ready to start building.
Step 1
In this step your goal is to build user authentication and a lightweight homepage.
Create a sign-up and login system using email and password. You’ll need a user model in your database, registration and login forms, and session management. Keep the auth simple - you don’t need OAuth or social login, just email/password with hashed passwords.
Build a lightweight homepage that explains what the product does. It doesn’t need to be a full marketing site - just a clear explanation that this is an AI-powered contract review tool, what it does, and a call-to-action to sign up or log in.
Testing:
Visit the homepage and verify it renders correctly with the product explanation.
Register a new account with an email and password. Verify you’re redirected and logged in.
Log out and log back in with the same credentials.
Try registering with an email that already exists. You should get an appropriate error.
Try logging in with the wrong password. You should get an appropriate error, not a crash.
Step 2
In this step your goal is to build the PDF upload and text extraction pipeline.
Create a web UI where logged-in users can drag and drop or select a PDF contract file for upload. Once uploaded, store the file and extract the raw text from the PDF. Use a PDF parsing library - pdf-parse for simpler PDFs or pdf.js for more complex documents.
This is a great place to introduce your first custom task. Create a processContractUpload task in your trigger/ folder. When a user uploads a PDF, your web app should store the file, create a contract record in your database, and then trigger the task with the contract ID. The task should:
Extract raw text from the PDF
Store the extracted text back to the database, linked to the contract
Update the contract status
You don’t need to worry about timeouts - Trigger.dev tasks can run as long as needed, which is important for large PDFs. You also don’t need to worry about what happens if the server restarts mid-processing. The task will resume where it left off.
Testing:
Upload a multi-page PDF document. Verify the text is extracted and stored in the database.
Upload a PDF with unusual formatting (headers, footers, columns). Check how well your extraction library handles it. Some garbled text is expected.
Try uploading a non-PDF file. Your application should reject it with a clear error message.
Try uploading without being logged in. The application should redirect to the login page.
Open the Trigger.dev dashboard. You should see your
processContractUploadtask run with a status, duration, and any logs you emitted.
Step 3
In this step your goal is to split the extracted text into individual clauses using an LLM.
Raw extracted PDF text is rarely clean. You’ll have page numbers, headers, footers, and sometimes text in the wrong order. Your job now is to take that extracted text and use an LLM to identify and split it into individual, well-formed clauses.
Extend your processContractUpload task or create a new child task that takes the extracted text, sends it to an LLM with a prompt instructing the LLM to return the text split into clauses, and stores the results. Each clause should be a distinct logical unit - a paragraph, a condition, a definition, a warranty.
Store the identified clauses in your database, linked to the contract. Each clause should have a reference number (1, 2, 3...) so you can refer to it later.
Testing:
Upload a contract and verify it gets split into multiple clauses. For a typical multi-page contract, you should get at least 10-15 clauses.
Inspect the clauses in your database. Each one should be a coherent, self-contained piece of text, not a fragment mid-sentence.
Upload a very short document (a single paragraph). It should still work, returning one clause.
Check the Trigger.dev dashboard and verify the clause-splitting step appears in the run timeline.
Step 4
In this step your goal is to analyse each clause in parallel using LLMs to flag risk levels and ambiguous language.
This is the core of the application - and where Trigger.dev’s features really shine. For each clause, you need to send it to an LLM for analysis. The analysis should identify:
Risk level - high, medium, or low
Risk explanation - a short explanation of why the clause is risky
Ambiguous language - any vague terms like “reasonable efforts”, “as soon as practical”, “material adverse change” that could be interpreted differently
Recommendations - suggested changes to reduce risk or clarify ambiguity
Create an analyseClause task that takes a clause ID and text, calls your LLM, and stores the analysis result in the database. Configure it with retry settings so transient LLM API failures are handled automatically - Trigger.dev will retry with exponential backoff by default.
Now for the parallelism. From your parent task, use analyseClause.batchTriggerAndWait() to trigger all clause analyses in a single batch call. Trigger.dev will execute them in parallel (up to your environment’s concurrency limit), collect all the results, and return them to your parent task. A 50-clause contract is no problem - you get fan-out parallelism without writing any queue infrastructure.
Set a concurrencyLimit on the analyseClause task’s queue if you need to respect LLM API rate limits. For example, if your OpenAI tier allows 10 concurrent requests, set queue: { concurrencyLimit: 10 }.
Testing:
Upload a contract and verify all clauses are analysed, each with a risk level, explanation, ambiguity flags, and recommendations.
Check the Trigger.dev dashboard run view. You should see the parent task with all the child
analyseClauseruns, their individual statuses, durations, and any retries.Temporarily use a rate-limited API key and verify that failed analyses are automatically retried and eventually succeed. Watch the retries in the dashboard.
Upload a 50+ clause contract. Verify it completes reliably. All 50+ analyses should be in the database.
Inspect a few analyses. A clause that says “The Provider shall not be liable for any damages” should be flagged as high risk. A clause that mentions “reasonable efforts” should be flagged for ambiguous language.
Step 5
In this step your goal is to aggregate the clause analyses into a structured review report and pause for human review.
Once all clause analyses are complete, your parent task should aggregate them into a structured review report. The report should show each clause number, the clause text, the risk level, the analysis explanation, any ambiguous language found, and recommendations. Group clauses by risk level (high first) so the reviewer can tackle the most important issues first.
Now for the human-in-the-loop part - this is where Trigger.dev’s waitpoint system comes in. After aggregating the results, use wait.createToken() to create a pause point. Store the token ID alongside the contract in your database so your review dashboard can reference it later. Then call wait.forToken() - your task will suspend at this point. Trigger.dev checkpoints the task state and releases compute resources. You’re not paying for idle time, and there’s no timeout.
Send an email notification to the user with a link to the review dashboard. Trigger.dev has hooks for this - you can use the onSuccess hook of the analysis task, or send the email before the waitpoint. Use a transactional email service like Resend, SendGrid, or Mailgun.
Testing:
Complete an analysis on a contract. Verify the aggregated report is stored in the database, ordered by risk level.
Verify the email notification is sent containing the correct summary statistics and a working link to the dashboard.
Check the Trigger.dev dashboard. The run should show as
WAITING- it’s suspended at the waitpoint, waiting for the review token to be completed.Verify that the task does not proceed until someone completes the token.
Step 6
In this step your goal is to build the review dashboard where a reviewer can approve, reject, or annotate each flagged clause.
Build a web dashboard that displays the aggregated review report. For each clause, the reviewer should be able to:
Approve the clause as is (no changes needed)
Reject the clause (it needs revision)
Annotate it with a free-text note explaining their reasoning or providing instructions
The dashboard should show the original clause text alongside the AI’s analysis so the reviewer has full context to make a decision. Make it easy to navigate between clauses and see at a glance which ones have been reviewed and which still need attention.
When the reviewer is done and submits their review, your application should save all decisions and annotations to the database, then complete the waitpoint token to resume the suspended task. Use wait.completeToken() from your web app backend (or send a POST to the token’s URL from the frontend using the public access token). The task will resume exactly where it left off, with all the reviewer’s decisions available from the token’s output.
You can also use Trigger.dev’s Realtime hooks in your dashboard. useRealtimeRun() lets you subscribe to run status changes without polling - so your dashboard can show the live status of the contract review workflow.
Testing:
Navigate to the review dashboard for a contract. Verify all clauses are displayed with their AI analysis.
Approve a few clauses, reject a few, and add annotations to some. Verify the decisions are saved to the database.
Submit the review. Verify the token is completed and the Trigger.dev task resumes.
Watch the Trigger.dev dashboard during review submission. The run should transition from
WAITINGto running again.Before submitting, check that you can see visually which clauses have been reviewed and which haven’t.
Step 7
In this step your goal is to make the system LLM provider-agnostic with a configurable abstraction layer.
Your clause analysis and summary generation tasks currently call one or two specific LLM providers. Build an abstraction layer so you can swap providers without changing your task code.
Define a common interface for LLM interactions: a function that takes a prompt (or messages), configuration (temperature, max tokens, etc.), and returns a standardised response with the generated text and metadata (tokens used, finish reason, etc.).
Configuration should be externalised. Provider selection, model choice, and API keys should come from environment variables. Your Trigger.dev tasks should interact with the abstraction layer, not with any specific provider’s SDK directly.
Testing:
Run a full contract analysis using one LLM provider. Verify it works end to end.
Switch the configuration to use another provider instead. Rerun the same contract. The analysis should complete with comparable results.
Swap providers without changing a single line of task code (only environment variables).
Verify your abstraction layer captures provider-agnostic metadata regardless of which provider is underneath.
Step 8
In this step your goal is to generate the final summary report using an LLM, incorporating the clause analyses and reviewer feedback.
Now that the human review is complete, the waitpoint token has been completed, and your task has resumed, it’s time to generate the final summary.
Create a generateSummary task that loads all clause analyses, reviewer decisions, and annotations from the database, sends everything to an LLM, and asks it to synthesise a final report. The final summary should include:
Executive summary - a high-level overview of the contract’s risk profile
Key findings - the most important issues identified, incorporating reviewer feedback
Risk breakdown - a summary of risk levels across the contract
Clause-by-clause detail - for each clause, the original risk assessment and the reviewer’s decision, combined into a final recommendation
This should be a well-written, professional document that could be shared with a client or colleague. Your parent task should trigger the summary generation using triggerAndWait() so it gets the result back.
Testing:
Generate a final summary for a contract that has been fully reviewed. Verify it includes all sections and incorporates both the AI analysis and the reviewer feedback.
Check that clauses the reviewer approved show as “accepted” in the final report, while rejected clauses include the reviewer’s annotations and reasoning.
Verify the summary is stored in the database and linked to the contract.
Read the summary from start to finish. It should read as a coherent, professional document, not a jumble of disconnected analyses.
Step 9
In this step your goal is to stream the final summary to the frontend in real time as it is being generated.
Final summaries can be long, and waiting for a complete document before showing anything is a poor user experience. Trigger.dev has first-class support for streaming data from tasks to your frontend. Use Realtime streams to pipe LLM tokens directly to the browser as they’re generated.
Define a stream using streams.define() - give it a clear ID like "summary-output" and a type for the stream chunks. In your generateSummary task, configure your LLM call to stream tokens, and pipe the stream to your defined stream using .pipe().
In your frontend, use the useRealtimeStream() React hook to subscribe to the stream. As tokens arrive, your component renders them incrementally. No polling, no WebSocket management, no SSE wiring - the hook handles the connection automatically.
The user should also be able to receive the final summary via email as an alternative. Once the stream is complete, send the full summary as an email.
Testing:
Generate a final summary and watch the frontend. Tokens should appear incrementally, not all at once at the end.
Verify that the stream works across page refreshes - existing chunks should be replayed.
Check that the streaming handles slow generation gracefully. Partial content should render without freezing.
Verify the email delivery option works. Trigger an email with the completed summary and check your inbox.
Generate summaries from different providers. Streaming should work regardless of which provider is configured.
Step 10
In this step your goal is to explore Trigger.dev’s built-in observability - run tracing, logging, and monitoring.
You’ve already been using Trigger.dev’s dashboard throughout this challenge to see your tasks run. Now let’s make the most of its observability features. Unlike building your own tracing system from scratch, Trigger.dev gives you this out of the box.
Add tags to your tasks and runs so you can filter them in the dashboard. For example, tag runs with the contract ID, the user ID, the LLM provider used, and the workflow stage.
Use runs.metadata to attach structured data to your runs that updates as the workflow progresses. For example, set metadata for the number of clauses found, the count of high/medium/low risk clauses, the review status, and any error counts. This metadata appears in the dashboard and is available via the SDK.
Use Trigger.dev’s built-in logger throughout your tasks. It automatically captures log entries with timestamps and attaches them to the run - no log aggregation infrastructure needed.
Finally, explore the dashboard’s run view. You can see the full timeline of your contract review workflow: when each analyseClause child task started and completed, which clauses triggered which retries, how long each LLM call took, and any errors that occurred. The batch trigger view shows all parallel clause analyses at a glance, with individual run statuses and durations.
Testing:
Run a complete contract review. Open the Trigger.dev dashboard and find your run.
Verify you can see every step in the timeline: PDF extraction, clause splitting, batch clause analysis (with all child runs), waitpoint pause, waitpoint completion, and summary generation.
Click into individual
analyseClauseruns. You should see logs, duration, and whether any retries occurred.Apply filters in the dashboard using your tags. Filter by status (failed runs only), by user, or by date range.
Add metadata to your runs and verify it appears in the dashboard.
Going Further
Want to take this further? Here are some ideas:
Use
wait.for()to schedule follow-ups. Trigger a task that waits 7 days, then sends a reminder to review a contract that hasn’t been actioned.Use input streams for cancellation. Add a cancel button to the frontend that uses Trigger.dev’s input streams to abort a running summary generation mid-stream.
Add support for more file formats. DOCX is even more common than PDF for contracts. Add support for Word documents and other formats.
Add role-based access control. Different users might need different permissions - uploaders, reviewers, and administrators.
Add comparison mode. Upload two versions of the same contract and have the LLM identify what changed and whether the changes alter the risk profile.
Add custom risk categories. Let users define their own risk categories and rules, then use Trigger.dev’s
wait.forToken()to collect approval for each category.Add a clause library. Build a library of standard, low-risk clause templates that the LLM can suggest as replacements for high-risk clauses.
Use concurrency keys for multi-tenancy. Leverage Trigger.dev’s
concurrencyKeyto give each organisation or user their own isolated queue.
Share Your Solutions!
If you think your solution is an example other developers can learn from please share it, put it on GitHub, GitLab or elsewhere. Then let me know via Bluesky or LinkedIn or just post about it there and tag me. Alternately please add a link to it in the Coding Challenges Shared Solutions Github repo
Request for Feedback
I’m writing these challenges to help you develop your skills as a software engineer based on how I’ve approached my own personal learning and development. What works for me, might not be the best way for you - so if you have suggestions for how I can make these challenges more useful to you and others, please get in touch and let me know. All feedback is greatly appreciated.
You can reach me on Bluesky, LinkedIn or through SubStack
Thanks and happy coding!
John

