What is Maigret: OSINT Scanner That Doesn't Break

Most OSINT tools age fast. The web changes underneath them, sites move endpoints, captchas evolve, and the tool dies in two years. Maigret is the exception. It has been running for years, supports 3,000+ sites, ships a Python package, a Telegram bot, and a web UI, and the engineering inside it is a small masterclass in how to build a scanner that does not break every time a site changes.

This guide is for engineers, not for casual lookups. It walks through what Maigret does, the legitimate research and security use cases that justify it, the architecture that lets it scale to thousands of sites, and how the same testing patterns Maigret uses (signature databases, drift detection, recursive verification) translate into the API testing work you do every day with Apidog.

button

If you have not read it yet, our API testing without Postman in 2026 post covers similar pattern-matching and drift-detection ideas in a friendlier domain.

TL;DR

Maigret collects a public dossier on a person by username only, checking 3,000+ sites for accounts and extracting public profile information.
The engineering is impressive: a versioned site-signature database, recursive search, automatic drift detection, captcha bypass, and an optional AI summary mode.
Legitimate uses include OSINT investigations by journalists, account recovery, missing-persons searches, security audits, and corporate brand-abuse monitoring.
Authorized red-team engagements rely on tools like Maigret to map an organization’s public attack surface; using it on people without their consent crosses into harassment and stalking.
The architectural ideas (signature-driven detection, recursive verification, automated drift alerts) transfer directly to API testing; we show how to apply them with Apidog.
Download Apidog to design and test signature-style assertions on your own APIs the same way Maigret tests sites.

button

What Maigret is and is not

Maigret is a Python tool, MIT-licensed, maintained by soxoj. The README pitch: “collect a dossier on a person by username from 3,000+ sites.” Run pip install maigret, give it a username, and it queries the sites in its database, scrapes whatever public profile information sits behind each found account, and produces a report.

Three things to be clear on.

It only uses public data. No login, no credential abuse, no API keys. If a site exposes a profile to anonymous visitors, Maigret reads it; if not, it returns “username not found” or a flagged page.

It is widely used in legitimate research contexts. Journalists at major investigative outlets, missing-person volunteers, fraud and brand-protection teams, and authorized red teams use it daily. The Maigret README itself lists references in academic OSINT curricula.

It can be misused. Like any OSINT tool, running it on a private individual without their consent crosses ethical and in many jurisdictions legal lines. Stalking laws in the EU, US, and most other regions apply. Read your local rules before pointing this at any person.

The rest of this article focuses on the engineering and on transferable testing patterns, not on the human-targeting workflow.

The site signature database

The single best engineering idea in Maigret is the site signature database. Each entry describes one site with enough information that the scanner can decide:

Does the username exist on this site?
What does a “found” page look like?
What does a “not found” page look like?
What information can we extract from a found page?
Does this site rate-limit or captcha?

The database is JSON, versioned in the repo, and auto-updated from GitHub once per 24 hours when the tool runs. If the maintainers update a signature for a site that recently changed, every Maigret install picks it up the next day without a reinstall.

This pattern is exactly the same one you want for an API test suite. Your project has 50, 500, or 5,000 endpoints. Each endpoint has a signature: expected status codes, response shapes, error envelopes. When a vendor changes the shape, you want the test suite to fail fast with a useful diff. We covered the same idea in contract-first API development and in the MCP server testing playbook.

How Maigret detects “username found” vs “not found”

A naive scanner does an HTTP GET on https://example.com/user/<username> and checks the status code. That works for maybe 10 percent of real sites. The other 90 percent return 200 with a “no such user” page, or 200 with a cached homepage, or 200 with a captcha challenge.

Maigret’s database describes each site with a richer set of detection rules:

A urlMain and url template
A presenseStrs list (substrings that must appear when the user exists)
An absenceStrs list (substrings that confirm the user does not exist)
A regex for username extraction from the page
Optional headers (some sites need a custom user agent)
Tags for category and country

A “found” verdict requires all presenseStrs to be present in the response and none of the absenceStrs. A “not found” verdict is the inverse. Anything else is an “unknown” result the user can investigate manually.

This is the same kind of multi-signal assertion you want when testing complex APIs. A 200 status is not enough; you need to assert on body content too. Apidog supports both status-code and body-content assertions in the same request, which is the API-testing equivalent of Maigret’s presenseStrs plus absenceStrs.

Recursive search and information extraction

Once Maigret finds an account, it does two more things.

It scrapes the public profile page for additional identifiers: linked email addresses, phone numbers, real names, other usernames. The extraction rules are also signature-driven, defined per site. A LinkedIn profile yields different fields than a GitHub profile.

Then it recurses. New identifiers feed back into the search loop, expanding the dossier across linked accounts. A username on one site might lead to a real name; that name might unlock a different account on another site; that account might link to an Instagram handle; and so on.

For OSINT, this is the difference between “I found one Twitter account” and “I traced this person across 12 services.” For an API test suite, the same pattern is valuable: when you discover an undocumented field in one endpoint’s response, follow it. It often points to a related endpoint, a downstream system, or a missing test case.

Captcha and rate-limit handling

Maigret partially bypasses captchas and detects rate limits by reading the response shape. Bypass strategies include:

Rotating user agents
Honoring per-site retry headers
Falling back to the site’s mobile or simplified domain
Routing through Tor or I2P when the site permits

The README is honest that this is partial. If a site has aggressive anti-automation, Maigret records “captcha detected” and lets the user investigate manually. The tool does not try to defeat hostile defenses; it works with sites that allow basic anonymous access.

The pattern transfers: when you build an API client or test runner, design it to detect rate-limit responses and back off gracefully, not to brute-force through them. The same defensive posture that keeps Maigret on the right side of vendor terms keeps your API tests from getting the team’s IP banned.

The signature drift problem

A 3,000-site database is only useful if it stays current. Sites redesign profile pages, change URL patterns, add captchas, or get acquired and rebranded. A stale signature returns false negatives (your search finds nothing) or false positives (it finds accounts that do not exist).

Maigret addresses this with three layers:

Auto-update from the central GitHub repo every 24 hours
Community pull requests that maintain individual site signatures
An --update flag that forces a fresh fetch
A built-in test harness that validates each signature against a known-existing username before shipping

The third item is the one most engineering teams overlook. Maigret keeps a known-existing username for each site (typically a developer or maintainer who consented). The harness queries that username and confirms the signature still works. Drift detected, signature flagged, contributors notified.

This is exactly the kind of regression suite you want for your own API contracts. Apidog supports the same pattern: save a known-good response per endpoint, replay against the live endpoint on a schedule, diff the result, and alert on drift. Our DeepSeek V4 API guide covers the manual side of this for one specific vendor.

The optional AI summary mode

The --ai flag turns Maigret’s raw findings into a short investigation summary using an OpenAI-compatible LLM endpoint. You bring the API key; Maigret structures the prompt and the call.

This is a nice example of LLM-as-postprocessor done right. The model never decides whether a username matches; that is rule-based and deterministic. The model only summarizes, which it is good at, and it operates over a constrained input. Hallucinations are bounded.

The same architecture works well for API monitoring: deterministic rule-based assertions in Apidog, with an LLM postprocessor that turns the run report into a Slack-friendly summary at the end. Our computer use vs structured APIs post explains why the structured layer should always come first.

Legitimate use cases worth knowing

Five contexts where running Maigret is unambiguously appropriate.

Account recovery for yourself. Find every old account tied to a username you used in 2014. Useful before privacy audits or when shutting down a digital footprint.

Brand-abuse monitoring. Companies run Maigret on their brand or product names to detect impersonation accounts. Most jurisdictions encourage this kind of monitoring; some require it.

Missing-person volunteer work. Search-and-rescue and missing-person organizations use Maigret with family consent to track digital footprints. Always coordinate with law enforcement; freelancing here often makes investigations harder.

Authorized red-team engagements. Pentest teams under signed contract scope use Maigret to map an organization’s public attack surface. The contract defines the scope; the tool is just the implementation.

Investigative journalism. Reporters investigating fraud, public-figure misconduct, or organized crime use OSINT tools under editorial and legal review.

What is not on this list: looking up a stranger out of curiosity, surveilling an ex-partner, or building a dataset on people who did not consent. Those uses cross legal lines in most jurisdictions and ethical lines everywhere.

Patterns from Maigret you can apply to API testing

Five engineering ideas that translate directly.

Signature databases over hand-coded checks. Define each endpoint’s expected behavior as data, not code. New vendors get added without recompiling.

Multi-signal assertions. Status code plus body content plus header check, all required. Reduces false positives from cached responses or generic error pages.

Auto-updating signatures. Pull the latest assertions from a central repo on a schedule. Apidog projects support cloud sync; use it. We covered the workflow in API testing without Postman.

Drift detection. Schedule a periodic replay against a known-good fixture and diff the result. Alert on shape changes before they break production.

LLM-as-postprocessor, not LLM-as-judge. Let deterministic rules decide pass/fail. Use the LLM only to turn the report into something readable.

Apply these and your API test suite gains the same longevity Maigret has. Most test suites die because they were written once, hand-coded, and never updated. Maigret’s architecture is a model for one that survives.

Common pitfalls when running Maigret

For engineers experimenting with the tool itself.

Running without -a and assuming completeness. The default scans the top-500 sites by traffic. If your investigation needs the long tail, pass -a for the full 3,000+. Note the run takes longer.

Ignoring tags. The --tags flag narrows by category or country. A user in Russia or Japan will be missed by a US-centric default; tag-filtering catches them.

Skipping the auto-update. Old signature databases produce false positives and false negatives. Let the auto-update run, or use --update manually before serious investigations.

Running it on Tor without permission from the target site. Some sites block Tor exit nodes; Maigret detects this. Do not interpret a Tor block as a signal about the user.

Believing extracted fields without verifying. The tool extracts what the page exposes. Pages can be fabricated. Treat findings as leads, not as evidence.

Real-world use cases

A security consultancy uses Maigret as the first step in every red-team scoping engagement. The output goes into the kickoff report so the client sees their public attack surface before the engagement starts.

A freelance fraud investigator uses Maigret with the --ai flag to summarize a 3,000-site scan into a 200-word brief for non-technical clients. The deterministic search is the data; the LLM is the readable layer.

An engineering team uses the same architectural ideas (signature database, drift detection, periodic replay) to keep their internal API test suite current across 200 microservices. They built it in Apidog; the principles are Maigret’s.

Conclusion

Maigret is a working example of how to build a tool that scales to thousands of detection rules without breaking every time the underlying surfaces change. The engineering is worth studying even if you never run an OSINT investigation: signature databases, multi-signal assertions, auto-updating data, drift detection, and LLM postprocessing are all transferable to the API testing work you do daily.

Five takeaways:

Maigret checks 3,000+ sites for a username using a versioned, auto-updating signature database.
Multi-signal detection (presence strings plus absence strings) beats simple status-code checks for reliability.
Drift is the enemy of any long-lived test suite; periodic replay against known fixtures catches it early.
LLM-as-postprocessor (the --ai flag) is the right architecture: deterministic rules, summarized output.
The same patterns work for API testing in Apidog; we have applied them across our customers’ contract suites.

Next step: read the Maigret site database format, then open Apidog and design one endpoint in your project the same way: signature-driven, multi-signal, with a saved fixture for drift detection. The discipline pays off the first time a vendor renames a field at 2 a.m. and your suite catches it before users do.

button

FAQ

Is Maigret legal to use?

It depends on the jurisdiction and the target. Running it on yourself, on accounts you own, on a company you have written authorization to test, or as part of authorized journalism is generally fine. Running it on an unsuspecting individual can cross stalking and harassment laws in the EU, US, UK, and most other regions. Read your local rules before any use that targets a third party.

Does Maigret work without Python?

The official package is Python 3.10+. The author maintains a Telegram bot for casual lookups and a Cloud Shell setup for users who do not want a local install.

How accurate is the 3,000-site claim?

The site database in the repo lists 3,000+ entries; not all are active at any given moment. The auto-update keeps a working subset current. Tag filtering helps you focus on sites likely to matter for your scope.

What does the AI mode add?

The --ai flag uses an OpenAI-compatible LLM to summarize the deterministic findings into a readable report. It does not change the search itself; it only postprocesses. Bring your own API key.

Can I use Maigret in CI?

For OSINT investigations, no; that is interactive work. The architectural patterns Maigret uses (signature databases, drift detection, scheduled replay) are exactly what belongs in your CI pipeline for API testing. Apidog implements them natively.

How is this different from Sherlock?

Sherlock is the older, simpler ancestor. Maigret extends it with information extraction, recursive search, captcha handling, the AI summary mode, and a richer site database. Both are MIT-licensed and worth knowing about.

Where do I report a stale signature?

The README points to GitHub issues and pull requests on the Maigret repo. Community contributions keep the database current; one PR per stale site is the norm.