SiteBrief/Documentation

Broken Link Detection

Automatically crawl your clients' sites and find 404s, dead links, and broken references — before their visitors do.

How it works

SiteBrief starts a headless crawler at the site's root URL and follows links throughout the site. Every internal link is fetched, and the HTTP response code is recorded. External links are also checked. Any link that returns a 4xx or 5xx status code — or fails to load — is flagged as broken.

The results show the broken URL, the HTTP status code, and the page where the link was found — so you know exactly where to fix it.

What gets flagged

StatusMeaning
404 Not FoundPage or resource no longer exists at this URL
410 GoneResource intentionally removed — no redirect
500 Server ErrorInternal server error when fetching the linked URL
403 ForbiddenURL exists but access is denied (usually a permissions issue)
TimeoutLink didn't respond within 30 seconds
DNS failureThe linked domain doesn't resolve — domain may be expired or typo
Connection refusedServer is not accepting connections on that port
ℹ️
Note:301 and 302 redirects are followed and not flagged as broken — only the final destination is checked. However, redirect chains longer than 5 hops are flagged.

Crawler behavior

  • Start URL: the site's configured URL (root of the site)
  • Scope: crawls all pages reachable from the homepage
  • Internal links: followed and crawled recursively
  • External links: checked for HTTP status but not crawled further
  • Images, CSS, JS: checked if referenced in HTML (src, href attributes)
  • Crawl speed: rate-limited to avoid overloading the server
⚠️
Warning:Crawl time depends on site size. A small site (20–50 pages) takes 1–3 minutes. Large sites (500+ pages) can take 15–30 minutes. The panel shows a progress indicator while the crawl is running.

How to run a broken links check

Broken link scanning must be enabled in the site's settings (toggle "Broken links check" on). Once enabled, go to the site detail page and the Broken Linkspanel will appear. Click "Scan now" to start a crawl.

After the scan completes, the panel shows:

  • Total number of broken links found
  • Date and time of the last scan
  • A list of each broken URL with status code and referring page

Common scenarios

Client migrated from HTTP to HTTPS and has internal links to http://

This is extremely common. Run the broken links scan — all http:// internal links will show as redirects (not broken) if HTTPS redirects are in place. But if any are hardcoded to http://and redirects aren't set up for those specific paths, they'll 404.

Client deleted a product or page without setting up a redirect

The broken links scan will find all pages that still link to the deleted URL. Share the list with the client and set up 301 redirects from old URLs to the closest current equivalent.

Third-party resources (fonts, JS libraries) from dead CDN

If a client is loading a library from a CDN that went offline or changed URLs, the scan will flag it. Update the site to use a current CDN URL or self-host the resource.

Using results in client communication

A broken links report is a concrete, tangible deliverable that demonstrates the value of your monitoring. Instead of saying "we monitor your site", you can say: "We found 7 broken links on your site last week — here's the list, and we've already fixed 4 of them."

💡
Tip:Include the broken links count in the monthly client report. Even "0 broken links found" is a positive message — it shows you're actively checking.

Frequently asked questions

Will the crawler log in to check pages behind authentication?
No. The crawler accesses your site as an anonymous visitor. Pages behind login are not crawled.
My site has thousands of pages — will the crawler check all of them?
Yes, but with a limit. The crawler checks up to 1,000 pages per site to keep scan time reasonable. For very large sites, focus on the most important sections.
Some flagged links return 403 on the scanner but work fine in a browser — why?
Some servers return 403 to bots and crawlers but allow real browsers (user-agent filtering). These aren't truly broken for real users. Look for links returning 404 or 410 — those are the ones that need fixing.
How often should I run the broken links scan?
Monthly is a good baseline. After any major content update (new posts, deleted pages, menu changes), run it immediately. For e-commerce sites with frequently changing inventory, weekly scans are recommended.
Can I exclude certain URLs from the scan?
Not yet — exclusions are planned for a future release. Currently all reachable links on the site are checked.