Job Scraper and Enrichment Engine

The initial request was simple: automate lead generation. Find jobs, find companies, send emails. But digging deeper revealed something more interesting.

This led to something bigger: a full operational audit. We mapped the entire lead-to-placement journey, identified where things broke down, and built the automation around the actual process, not just the symptom.

What the Operations Audit Revealed

The audit revealed 40+ new automations and ecosystems needed to make the full lead-to-placement journey work properly. This is currently ongoing and scheduled to be completed in 2026 as part of a broader data and AI plan. For the job scraper specifically, the requirements were:

Find relevant jobs matching specific criteria (job titles, locations, industries)
Filter out noise with AI that understands "Senior Data Engineer" isn't the same as "Data Entry Clerk"
Enrich companies with size, industry, funding, and contact data
Check against existing clients in the CRM so we don't cold-email our own customers
Categorise relationship strength (A/B/C/D leads) based on CRM history
Route appropriately: existing clients go to account managers, new companies go to outreach
Push to email campaigns with personalised sequences
Track everything so we know what's working

How We Built a Job Scraping Engine

Where this story begins

The Company

The Process (Before)

The Goal

The Constraint

What we're building with

n8n

Supabase

jSearch API

GPT-4o-mini

Clay

Smartlead

Why jSearch (Google Jobs API)?

Why Supabase?

Why GPT-4o-mini?

Why Clay?

More than just "we need leads"

The "Too Successful" Problem

What the Operations Audit Revealed

The Lesson

12 workflows, one ecosystem

Naming Convention

Job Scraper

Key Features

Technical Notes

Job Processor

Processing Steps

Why Split?

AI Gatekeeper

Three AI Tasks

Rejection Tracking

Company Enrichment

Two Enrichment Paths

Webhook Pattern

Jobs Normaliser

Extracted Fields

Location Normalisation

Lead Tagging

Priority Tags

Job Grouping

Decision Maker Enrichment

Contact Data

Verification

Email Campaign Push

Campaign Matching

Status Tracking

Why split into 12+ workflows?

Runtime & Memory

API Rate Limits

Debugging & Recovery

Atomic Status Tracking

The "Passing the Baton" Pattern

What we'd tell ourselves before starting

Start with the end in mind

Prompts are never done

Paid APIs are worth it

Status tracking is essential

AI isn't always the answer

Build for recovery, not perfection

Log everything you discard

Think about the humans

Document obsessively

Low-code isn't no-code-out

Want to build something similar?