My first complex ecosystem. The project that taught me version control, atomic database operations, fuzzy matching, and the limits of low-code tools. Everything I know about building automation at scale started here.
This is the first complex ecosystem I ever built. Not because I planned it that way, but because the problem kept getting bigger.
I'd already built loose automations for property data. Small, disconnected pieces that did one thing each. When I finally saw the full picture of what was needed, I realised those fragments could become something bigger. The ecosystem grew from there.
Some days I'd end work stuck on a problem and literally dream about the solution. I'd wake up anxious to get to my desk and try it. That excitement is what I'm still chasing today with every new project.
Everything I know about building automation at scale, I learnt the hard way on this engine. Version control. Passing data between workflows. Atomic database operations. Status flags. Fuzzy matching. Rate limiting. Breaking workflows into manageable chunks. When to loop, when to batch, when to split. This project forced me to figure it all out.
Property professionals need to know when properties in their portfolio hit the market. The data exists, but getting it is harder than it should be.
Rightmove shows display addresses like "Victoria Road, London" but not the full registered address. To match a listing to a client's property, you need the actual address. That's locked behind EPC certificates.
Check Rightmove. Find properties in your postcodes. Click through to the EPC certificate to get the real address. Cross-reference against your portfolio spreadsheet. Repeat for every postcode, every day. Some companies pay VAs over £500/month just to do this matching work.
Different clients, different postcodes, but the same underlying problem. Each needs property intelligence. Building a separate system for each would be wasteful. We needed a platform approach.
Build once, serve many. A universal property intelligence database that scrapes, enriches, and stores listings. Each client gets their own matching layer on top, delivered to their preferred format.
Every component was chosen for reliability, cost, and the ability to run unattended.
Workflow orchestration
Central database
Property data source
EPC address matching
Client dashboards
Listing source
Rightmove actively blocks scrapers. Maintaining our own would be a full-time job. Propsense aggregates from multiple sources and handles the cat-and-mouse game for us.
EPC certificates contain the full address, but it's not always formatted consistently. "Flat 3" might be "Flat Three" or "Unit 3" or "Apartment 3". AI handles the fuzzy matching human-style.
Scraping is expensive. Enrichment is expensive. If two clients both care about postcodes in Manchester, scraping twice is waste. The universal database scrapes once, clients query against it.
Clients don't want another login. They don't want to learn a dashboard. They want data where they already work. Google Sheets is familiar, shareable, and requires zero training.
The universal layer scrapes and enriches. The client layer matches and delivers. They run independently but share the same data.
Think of it like a utility company. We build the water treatment plant (scraping + enrichment). Each client gets their own tap (matching + delivery). Adding a new client means adding a tap, not building a new plant.
Runs once, serves all clients. Scrapes listings, enriches with EPC data, maintains the master database.
The core scraping engine. Controller workflow manages which outcodes to scrape, processor handles pagination, deduplication, and database storage. Supports full sync and incremental updates.
The secret sauce. EPC certificates have a slider showing the property's energy rating. The scraper retrieves all EPC records for a postcode, then AI corrects any address formatting issues and proposes matching options based on the rating band. It replaces the human brain work of identifying which EPC record belongs to which listing.
pending_enrichment → enrichment_in_progress → ready_for_match_accurate / ready_for_match_approx / enrichment_not_possible
Properties get sold or removed. This workflow periodically checks if matched listings are still active. If not, it flags them so clients know the opportunity has passed.
Clients acting on stale data waste time. Knowing a property is delisted is as valuable as knowing it's listed. This keeps the intelligence current.
After updating the universal database, it triggers client-specific workflows to update their matched listings too.
Each client gets their own matching and delivery workflows. Queries the universal database, matches against their portfolio, delivers to their dashboard.
Takes the client's portfolio of addresses and finds matches in the scraped listings. Uses a high-precision key matching algorithm: numbers, standalone letters, and core text stripped of noise words.
Full property details with client contact info attached. Includes listing URL, price, property type, EPC rating, agent details, match confidence, and all relevant metadata for the client dashboard.
Triggered by the universal Delisted Checker. Reviews the client's current matches, flags any that are no longer live, and updates their dashboard accordingly.
The dashboard always reflects reality. No phantom listings. If something is sold or withdrawn, the client knows immediately.
Triggered as part of the daily maintenance cycle. Client doesn't need to do anything. Data stays fresh.
Everything I learnt, I learnt by getting it wrong first.
The sheer amount of data to be scraped was overwhelming. Every API call costs money. Every field stored costs storage. I had to get ruthless about what actually mattered.
Push too few records and you waste API calls. Push too many and you hit memory limits or timeouts. Finding the sweet spot took trial and error.
A single workflow that does everything sounds elegant. In practice, it's a nightmare to debug, test, and maintain. I learnt to break at natural boundaries.
Running the same enrichment twice? Records getting processed out of order? Duplicate entries? I hit every concurrency bug possible before discovering atomic PostgreSQL patterns.
n8n loops are convenient but slow. Batch operations are fast but less flexible. I wasted weeks using the wrong approach for different scenarios.
n8n is powerful, but it has limits. Memory caps. Execution time limits. Data transfer between nodes. This project pushed me into every one of them.
The most challenging part of the entire engine. Getting this right took weeks of iteration.
Client data says "Flat 3, 42 Victoria Road, London, SW1A 1AA". The EPC enriched address says "Apartment 3, 42 Victoria Rd, SW1A 1AA". Are they the same property? A human can tell instantly. A computer struggles.
I tried exact matching. Failed constantly. Tried basic fuzzy matching. Too many false positives. Tried AI matching. Too slow and expensive at scale. I needed something smarter.
After a lot of trial and error, brainstorming with AI, and testing edge cases, I landed on a PostgreSQL query that breaks addresses into components:
Numbers key: Extract all numbers. "Flat 3, 42 Victoria Road" becomes "3 42". Compare numeric fingerprints.
Letters key: Extract standalone letters. "Block A, Flat B" becomes "A B". Catches apartment/unit designations.
Postcode chunks: Break the postcode into outcode (SW1A) and incode (1AA). Match on both separately. Handles formatting variations.
Core text: Strip noise words (road, street, lane, flat, apartment). Compare what's left. "Victoria" matches "Victoria" regardless of "Road" vs "Rd".
By breaking addresses into components and matching on each, I get high precision without requiring exact matches. The query scores each component and returns a confidence level. High scores go straight through. Approximate matches get flagged for review.
It's not perfect. Edge cases still slip through. But it gets 92% right automatically, which means manual review is focused on the 8% that actually need human judgement.
Building property intelligence revealed patterns I didn't expect. Every lesson here cost me time.
"Flat 3" vs "Apartment 3" vs "Unit 3" vs "3". Same property, four different representations. Fuzzy matching was the only reliable solution. Pure string matching fails constantly.
Display addresses are intentionally vague. Full registered addresses are on EPC certificates. Access to that data transforms what's possible. It's the difference between "somewhere on Victoria Road" and "123 Victoria Road, Flat 4".
With thousands of listings flowing through enrichment, tracking where each one is becomes critical. pending → in_progress → ready → matched → archived. Every listing has exactly one status at any time.
I only started versioning workflows properly because this project forced me to. Breaking changes, lost work, "which version was working?"... Never again. Every change gets a version bump now.
Everyone wants new listings. But knowing something is no longer available is equally valuable. Clients were wasting hours chasing properties that were already sold. The delisted checker fixed that.
Building client-specific scrapers would have been faster initially. But the platform approach pays off with every new client. One enrichment run serves everyone. Costs stay flat as client count grows.
We could have built a fancy dashboard. Instead, we push to Google Sheets. Clients already know how to use it. Zero training, zero friction, immediate adoption. Technology should disappear, not impress.
Some of my best solutions came after sleeping on a problem. Literally dreaming about code. If you're stuck, step away. Your brain keeps working. Wake up anxious to try the new idea. That's when you know you've found something good.
Whether you manage a portfolio or just want to know when properties in your postcodes hit the market, we can help.