Web_Fingerprint_Script

Web Fingerprint — Passive-First Python Script

Overview

Automated web/URL fingerprinting tool that prioritizes stealth. Executes Tier 0 (zero-touch passive — no packets to target) and Tier 1 (near-passive — looks like normal browsing).

Quick Start

# Install dependencies
pip install requests mmh3 python-whois dnspython shodan

# Basic usage (Tier 0 + Tier 1)
python3 web_fingerprint.py target.com

# Passive only (Tier 0 — no packets to target)
python3 web_fingerprint.py target.com --tier 0

# With Shodan IP lookup + JSON output
python3 web_fingerprint.py target.com --resolve-ip -o results.json

What It Does

Tier	Module	Method	What It Finds
0	Whois	Public WHOIS	Registrar, dates, org, nameservers
0	DNS Records	dig @8.8.8.8	MX, TXT, NS, A, CNAME + fingerprints providers
0	crt.sh	CT log API	Subdomains from certificate transparency
0	Shodan	Cached host lookup	Open ports, banners, OS, vulns (0 credits)
0	Passive Subdomains	subfinder/amass/assetfinder	Subdomain aggregation from APIs
0	Wayback/GAU	Historical URLs	Old endpoints, params, exposed files, API paths
1	HTTP Headers	Single HEAD request	Server, X-Powered-By, security headers, CDN
1	SSL Certificate	TLS handshake	SANs, issuer, org, expiry, TLS version
1	Cookie Analysis	From response headers	Backend tech from cookie names/flags
1	robots.txt	Single GET	Disallowed paths, sitemaps, interesting dirs
1	Error Page	Single 404 request	Framework from default error pages
1	Tech Detection	whatweb -a1 / httpx	Full technology stack identification
1	Favicon Hash	Single GET	Shodan-searchable hash for framework ID

Optional CLI Tools

For best results, have these in your PATH:

# Subdomain enumeration
go install github.com/projectdiscovery/subfinder/v2/cmd/subfinder@latest
go install github.com/owasp-amass/amass/v4/...@master
go install github.com/tomnomnom/assetfinder@latest

# URL mining
go install github.com/tomnomnom/waybackurls@latest
go install github.com/lc/gau/v2/cmd/gau@latest

# Tech detection
apt install whatweb
go install github.com/projectdiscovery/httpx/cmd/httpx@latest

The script gracefully degrades — if tools aren't installed, it skips those modules or uses built-in fallbacks (e.g., Wayback CDX API).

Shodan Setup

# Option A: Environment variable
export SHODAN_API_KEY="your-key-here"

# Option B: Shodan CLI init
pip install shodan
shodan init your-key-here

Full Script

#!/usr/bin/env python3
"""
Web & URL Passive Fingerprinting Tool
======================================
Stealthy enumeration from zero-touch passive to near-passive.
Executes Tier 0 (no packets to target) and Tier 1 (looks like browsing).

Usage:
    python3 web_fingerprint.py target.com
    python3 web_fingerprint.py target.com --tier 0          # Passive only
    python3 web_fingerprint.py target.com --tier 1          # Include near-passive
    python3 web_fingerprint.py target.com --all             # All modules
    python3 web_fingerprint.py target.com -o report.json    # JSON output
    python3 web_fingerprint.py target.com --resolve-ip      # Also fingerprint resolved IP via Shodan

Requirements:
    pip install requests mmh3 python-whois dnspython shodan

Optional (for enhanced results):
    - Shodan API key in SHODAN_API_KEY env var or ~/.shodan/api_key
    - subfinder, amass, waybackurls, gau, httpx, whatweb in PATH

Author: Stay Low / Red Team OPSEC Toolkit
"""

import argparse
import hashlib
import json
import os
import re
import shutil
import socket
import ssl
import subprocess
import sys
import textwrap
import time
import urllib.parse
import urllib.request
from collections import OrderedDict
from datetime import datetime, timezone
from typing import Any, Optional


# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------

BANNER = r"""
 __        __   _       _____ _                                  _       _
 \ \      / /__| |__   |  ___(_)_ __   __ _  ___ _ __ _ __  _ __(_)_ __ | |_
  \ \ /\ / / _ \ '_ \  | |_  | | '_ \ / _` |/ _ \ '__| '_ \| '__| | '_ \| __|
   \ V  V /  __/ |_) | |  _| | | | | | (_| |  __/ |  | |_) | |  | | | | | |_
    \_/\_/ \___|_.__/  |_|   |_|_| |_|\__, |\___|_|  | .__/|_|  |_|_| |_|\__|
                                       |___/          |_|
  Passive-First Web Fingerprinting — Stay Low
"""

USER_AGENT = (
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
    "AppleWebKit/537.36 (KHTML, like Gecko) "
    "Chrome/120.0.0.0 Safari/537.36"
)

COOKIE_TECH_MAP = {
    "PHPSESSID": "PHP",
    "JSESSIONID": "Java (Tomcat/JBoss/Spring)",
    "ASP.NET_SessionId": "ASP.NET",
    "CFID": "ColdFusion",
    "CFTOKEN": "ColdFusion",
    "connect.sid": "Node.js (Express)",
    "laravel_session": "Laravel (PHP)",
    "_rails_session": "Ruby on Rails",
    "_session_id": "Ruby on Rails",
    "csrftoken": "Django (Python)",
    "sessionid": "Django (Python)",
    "ci_session": "CodeIgniter (PHP)",
    "wp-settings": "WordPress",
    "AWSALB": "AWS ALB",
    "AWSALBCORS": "AWS ALB (CORS)",
    "__cfduid": "Cloudflare",
    "cf_clearance": "Cloudflare",
    "ROUTEID": "HAProxy",
    "BIGipServer": "F5 BIG-IP",
    "SERVERID": "HAProxy",
    "incap_ses": "Imperva Incapsula",
    "visid_incap": "Imperva Incapsula",
}

HEADER_FINGERPRINTS = {
    "server": {
        "apache": "Apache",
        "nginx": "Nginx",
        "microsoft-iis": "IIS (Windows)",
        "cloudflare": "Cloudflare",
        "litespeed": "LiteSpeed",
        "openresty": "OpenResty (Nginx)",
        "gunicorn": "Gunicorn (Python)",
        "uvicorn": "Uvicorn (Python ASGI)",
        "cowboy": "Cowboy (Erlang/Elixir)",
        "caddy": "Caddy",
        "envoy": "Envoy Proxy",
        "kestrel": "Kestrel (.NET)",
    },
    "x-powered-by": {
        "php": "PHP",
        "asp.net": "ASP.NET",
        "express": "Express (Node.js)",
        "next.js": "Next.js",
        "nuxt": "Nuxt.js (Vue)",
        "servlet": "Java Servlet",
        "jsp": "Java JSP",
        "perl": "Perl",
        "plack": "Plack (Perl)",
        "ruby": "Ruby",
    },
}

DNS_FINGERPRINTS = {
    "mx": {
        "aspmx.l.google.com": "Google Workspace",
        "mail.protection.outlook.com": "Microsoft 365",
        "pphosted.com": "Proofpoint",
        "mimecast": "Mimecast",
        "barracuda": "Barracuda",
        "messagelabs": "Symantec Email Security",
    },
    "cname": {
        "cloudfront.net": "AWS CloudFront",
        "azurewebsites.net": "Azure Web Apps",
        "herokuapp.com": "Heroku",
        "netlify.app": "Netlify",
        "vercel-dns.com": "Vercel",
        "pages.dev": "Cloudflare Pages",
        "firebaseapp.com": "Firebase",
        "s3.amazonaws.com": "AWS S3",
        "storage.googleapis.com": "GCP Cloud Storage",
        "fastly.net": "Fastly CDN",
        "akamaiedge.net": "Akamai CDN",
        "edgekey.net": "Akamai CDN",
        "cdn.shopify.com": "Shopify",
        "squarespace.com": "Squarespace",
        "ghost.io": "Ghost CMS",
        "wpengine.com": "WP Engine",
        "pantheonsite.io": "Pantheon",
    },
    "ns": {
        "awsdns": "AWS Route 53",
        "cloudflare.com": "Cloudflare DNS",
        "azure-dns": "Azure DNS",
        "google.com": "Google Cloud DNS",
        "digitalocean.com": "DigitalOcean DNS",
        "domaincontrol.com": "GoDaddy DNS",
    },
}


# ---------------------------------------------------------------------------
# Utility helpers
# ---------------------------------------------------------------------------

class Colors:
    HEADER = "\033[95m"
    BLUE = "\033[94m"
    CYAN = "\033[96m"
    GREEN = "\033[92m"
    YELLOW = "\033[93m"
    RED = "\033[91m"
    BOLD = "\033[1m"
    DIM = "\033[2m"
    END = "\033[0m"

    @classmethod
    def disable(cls):
        for attr in dir(cls):
            if attr.isupper() and not attr.startswith("_"):
                setattr(cls, attr, "")


def _log(level: str, msg: str):
    ts = datetime.now().strftime("%H:%M:%S")
    icons = {
        "info": f"{Colors.CYAN}[*]{Colors.END}",
        "ok": f"{Colors.GREEN}[+]{Colors.END}",
        "warn": f"{Colors.YELLOW}[!]{Colors.END}",
        "err": f"{Colors.RED}[-]{Colors.END}",
        "tier": f"{Colors.BOLD}{Colors.HEADER}[▶]{Colors.END}",
    }
    icon = icons.get(level, "[?]")
    print(f"  {Colors.DIM}{ts}{Colors.END} {icon} {msg}")


def _header(text: str):
    width = 60
    print()
    print(f"  {Colors.BOLD}{'═' * width}{Colors.END}")
    print(f"  {Colors.BOLD}{text.center(width)}{Colors.END}")
    print(f"  {Colors.BOLD}{'═' * width}{Colors.END}")


def _subheader(text: str):
    print(f"\n  {Colors.BOLD}{Colors.BLUE}── {text} ──{Colors.END}")


def _tool_available(name: str) -> bool:
    return shutil.which(name) is not None


def _run_cmd(cmd: list[str], timeout: int = 60) -> Optional[str]:
    try:
        result = subprocess.run(
            cmd,
            capture_output=True,
            text=True,
            timeout=timeout,
        )
        return result.stdout.strip() if result.returncode == 0 else None
    except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
        return None


def _http_get(url: str, timeout: int = 10, head_only: bool = False) -> Optional[dict]:
    """Perform a single HTTP(S) request with browser-like User-Agent."""
    try:
        req = urllib.request.Request(url, headers={"User-Agent": USER_AGENT})
        if head_only:
            req.method = "HEAD"
        ctx = ssl.create_default_context()
        ctx.check_hostname = False
        ctx.verify_mode = ssl.CERT_NONE
        with urllib.request.urlopen(req, timeout=timeout, context=ctx) as resp:
            headers = dict(resp.headers)
            body = "" if head_only else resp.read(500_000).decode("utf-8", errors="replace")
            return {
                "status": resp.status,
                "headers": headers,
                "body": body,
                "url": resp.url,
            }
    except Exception:
        return None


# ---------------------------------------------------------------------------
# Tier 0 modules — Zero-touch passive
# ---------------------------------------------------------------------------

def module_whois(domain: str) -> dict:
    """Whois lookup via system command."""
    _subheader("Whois Lookup")
    result: dict[str, Any] = {}

    raw = _run_cmd(["whois", domain], timeout=15)
    if not raw:
        _log("err", "whois command failed or not available")
        return result

    for line in raw.splitlines():
        line = line.strip()
        lower = line.lower()
        if "registrar:" in lower:
            result["registrar"] = line.split(":", 1)[1].strip()
        elif "creation date:" in lower or "created:" in lower:
            result["creation_date"] = line.split(":", 1)[1].strip()
        elif "expir" in lower and "date" in lower:
            result["expiry_date"] = line.split(":", 1)[1].strip()
        elif "name server:" in lower:
            ns = line.split(":", 1)[1].strip().lower()
            result.setdefault("name_servers", []).append(ns)
        elif "registrant org" in lower:
            result["registrant_org"] = line.split(":", 1)[1].strip()

    for k, v in result.items():
        if k == "name_servers":
            _log("ok", f"Name Servers: {', '.join(v)}")
        else:
            _log("ok", f"{k}: {v}")

    return result


def module_dns(domain: str) -> dict:
    """DNS record lookup via dig against public resolvers."""
    _subheader("DNS Records (Public Resolver)")
    result: dict[str, Any] = {}

    record_types = ["A", "AAAA", "MX", "TXT", "NS", "CNAME", "SOA", "CAA"]
    for rtype in record_types:
        raw = _run_cmd(["dig", "+short", domain, rtype, "@8.8.8.8"], timeout=10)
        if raw:
            records = [line.strip() for line in raw.splitlines() if line.strip()]
            if records:
                result[rtype] = records
                _log("ok", f"{rtype}: {', '.join(records[:5])}")

    # Fingerprint from DNS
    fingerprints = []
    for mx in result.get("MX", []):
        mx_lower = mx.lower()
        for sig, tech in DNS_FINGERPRINTS["mx"].items():
            if sig in mx_lower:
                fingerprints.append(f"Email: {tech}")
                break

    for ns in result.get("NS", []):
        ns_lower = ns.lower()
        for sig, tech in DNS_FINGERPRINTS["ns"].items():
            if sig in ns_lower:
                fingerprints.append(f"DNS: {tech}")
                break

    for cname in result.get("CNAME", []):
        cname_lower = cname.lower()
        for sig, tech in DNS_FINGERPRINTS["cname"].items():
            if sig in cname_lower:
                fingerprints.append(f"Hosting: {tech}")
                break

    # SPF / DMARC from TXT records
    for txt in result.get("TXT", []):
        txt_lower = txt.lower()
        if "v=spf1" in txt_lower:
            fingerprints.append("SPF record found")
            if "google.com" in txt_lower:
                fingerprints.append("Email infra: Google Workspace (SPF)")
            elif "protection.outlook.com" in txt_lower:
                fingerprints.append("Email infra: Microsoft 365 (SPF)")
        if "v=dmarc1" in txt_lower:
            fingerprints.append("DMARC record found")
        if "ms=" in txt_lower:
            fingerprints.append("Microsoft domain verification found")
        if "google-site-verification" in txt_lower:
            fingerprints.append("Google site verification found")
        if "docusign" in txt_lower:
            fingerprints.append("DocuSign integration found")
        if "facebook-domain-verification" in txt_lower:
            fingerprints.append("Facebook domain verification found")
        if "atlassian-domain-verification" in txt_lower:
            fingerprints.append("Atlassian domain verification found")

    if fingerprints:
        result["dns_fingerprints"] = list(set(fingerprints))
        for fp in result["dns_fingerprints"]:
            _log("ok", f"  → {fp}")

    # Reverse lookup
    try:
        ip = socket.gethostbyname(domain)
        result["resolved_ip"] = ip
        _log("ok", f"Resolved IP: {ip}")
        try:
            reverse = socket.gethostbyaddr(ip)
            result["reverse_dns"] = reverse[0]
            _log("ok", f"Reverse DNS: {reverse[0]}")
        except socket.herror:
            pass
    except socket.gaierror:
        _log("warn", "Could not resolve domain to IP")

    return result


def module_crtsh(domain: str) -> dict:
    """Certificate Transparency log search via crt.sh."""
    _subheader("Certificate Transparency (crt.sh)")
    result: dict[str, Any] = {}

    url = f"https://crt.sh/?q=%25.{domain}&output=json"
    try:
        req = urllib.request.Request(url, headers={"User-Agent": USER_AGENT})
        ctx = ssl.create_default_context()
        with urllib.request.urlopen(req, timeout=20, context=ctx) as resp:
            data = json.loads(resp.read().decode())
    except Exception as e:
        _log("err", f"crt.sh query failed: {e}")
        return result

    subdomains = set()
    issuers = set()
    for entry in data:
        name = entry.get("name_value", "")
        for line in name.splitlines():
            cleaned = line.strip().lstrip("*.")
            if cleaned and domain in cleaned:
                subdomains.add(cleaned)
        issuer = entry.get("issuer_name", "")
        if issuer:
            issuers.add(issuer)

    result["subdomains"] = sorted(subdomains)
    result["subdomain_count"] = len(subdomains)
    result["issuers"] = sorted(issuers)[:10]

    _log("ok", f"Found {len(subdomains)} unique subdomains from CT logs")
    if len(subdomains) <= 30:
        for sub in sorted(subdomains):
            _log("info", f"  {sub}")
    else:
        for sub in sorted(subdomains)[:20]:
            _log("info", f"  {sub}")
        _log("info", f"  ... and {len(subdomains) - 20} more")

    return result


def module_shodan(domain: str, ip: Optional[str] = None) -> dict:
    """Shodan cached host lookup (0 credits)."""
    _subheader("Shodan Cached Data")
    result: dict[str, Any] = {}

    api_key = os.environ.get("SHODAN_API_KEY")
    if not api_key:
        key_file = os.path.expanduser("~/.shodan/api_key")
        if os.path.exists(key_file):
            with open(key_file) as f:
                api_key = f.read().strip()

    if not api_key:
        if _tool_available("shodan"):
            if ip:
                raw = _run_cmd(["shodan", "host", ip], timeout=15)
            else:
                try:
                    resolved_ip = socket.gethostbyname(domain)
                    raw = _run_cmd(["shodan", "host", resolved_ip], timeout=15)
                except socket.gaierror:
                    raw = None

            if raw:
                result["raw"] = raw
                _log("ok", "Shodan data (via CLI):")
                for line in raw.splitlines()[:25]:
                    _log("info", f"  {line}")
            else:
                _log("warn", "Shodan CLI returned no data")
        else:
            _log("warn", "No Shodan API key found. Set SHODAN_API_KEY or run 'shodan init <key>'")
        return result

    target_ip = ip
    if not target_ip:
        try:
            target_ip = socket.gethostbyname(domain)
        except socket.gaierror:
            _log("err", f"Cannot resolve {domain}")
            return result

    url = f"https://api.shodan.io/shodan/host/{target_ip}?key={api_key}"
    try:
        req = urllib.request.Request(url, headers={"User-Agent": USER_AGENT})
        ctx = ssl.create_default_context()
        with urllib.request.urlopen(req, timeout=15, context=ctx) as resp:
            data = json.loads(resp.read().decode())
    except Exception as e:
        _log("warn", f"Shodan API query failed: {e}")
        return result

    result["ip"] = data.get("ip_str")
    result["org"] = data.get("org")
    result["os"] = data.get("os")
    result["isp"] = data.get("isp")
    result["country"] = data.get("country_name")
    result["city"] = data.get("city")
    result["hostnames"] = data.get("hostnames", [])
    result["ports"] = data.get("ports", [])
    result["vulns"] = data.get("vulns", [])

    services = []
    for item in data.get("data", []):
        svc = {
            "port": item.get("port"),
            "transport": item.get("transport"),
            "product": item.get("product"),
            "version": item.get("version"),
            "module": item.get("_shodan", {}).get("module"),
        }
        services.append(svc)
    result["services"] = services

    _log("ok", f"IP: {result['ip']}  Org: {result.get('org', 'N/A')}  Country: {result.get('country', 'N/A')}")
    _log("ok", f"OS: {result.get('os', 'N/A')}  ISP: {result.get('isp', 'N/A')}")
    if result["ports"]:
        _log("ok", f"Open Ports: {', '.join(map(str, result['ports']))}")
    for svc in services:
        product = svc.get("product") or svc.get("module") or "unknown"
        ver = svc.get("version") or ""
        _log("info", f"  {svc['port']}/{svc.get('transport', 'tcp')} — {product} {ver}".strip())
    if result["vulns"]:
        _log("warn", f"Known Vulnerabilities: {', '.join(result['vulns'][:10])}")

    return result


def module_passive_subdomains(domain: str) -> dict:
    """Passive subdomain enumeration via available CLI tools."""
    _subheader("Passive Subdomain Enumeration")
    result: dict[str, Any] = {"subdomains": set()}
    tools_tried = []

    if _tool_available("subfinder"):
        tools_tried.append("subfinder")
        raw = _run_cmd(["subfinder", "-d", domain, "-silent"], timeout=120)
        if raw:
            for line in raw.splitlines():
                sub = line.strip().lower()
                if sub and domain in sub:
                    result["subdomains"].add(sub)
            _log("ok", f"subfinder: {len(raw.splitlines())} subdomains")

    if _tool_available("amass"):
        tools_tried.append("amass")
        raw = _run_cmd(["amass", "enum", "-passive", "-d", domain], timeout=180)
        if raw:
            for line in raw.splitlines():
                sub = line.strip().lower()
                if sub and domain in sub:
                    result["subdomains"].add(sub)
            _log("ok", f"amass: {len(raw.splitlines())} subdomains")

    if _tool_available("assetfinder"):
        tools_tried.append("assetfinder")
        raw = _run_cmd(["assetfinder", "--subs-only", domain], timeout=60)
        if raw:
            for line in raw.splitlines():
                sub = line.strip().lower()
                if sub and domain in sub:
                    result["subdomains"].add(sub)
            _log("ok", f"assetfinder: {len(raw.splitlines())} subdomains")

    if not tools_tried:
        _log("warn", "No subdomain tools found (subfinder, amass, assetfinder). Skipping.")

    result["subdomains"] = sorted(result["subdomains"])
    result["tools_used"] = tools_tried
    result["total_unique"] = len(result["subdomains"])
    _log("ok", f"Total unique subdomains (all sources): {result['total_unique']}")
    return result


def module_wayback(domain: str) -> dict:
    """Historical URL mining from Wayback Machine and GAU."""
    _subheader("Historical URLs (Wayback / GAU)")
    result: dict[str, Any] = {"urls": [], "params": [], "interesting_files": [], "api_endpoints": []}
    all_urls: set[str] = set()

    if _tool_available("waybackurls"):
        raw = _run_cmd(["waybackurls", domain], timeout=120)
        if raw:
            for line in raw.splitlines():
                all_urls.add(line.strip())
            _log("ok", f"waybackurls: {len(raw.splitlines())} URLs")

    if _tool_available("gau"):
        raw = _run_cmd(["gau", domain], timeout=120)
        if raw:
            for line in raw.splitlines():
                all_urls.add(line.strip())
            _log("ok", f"gau: {len(raw.splitlines())} URLs")

    if not all_urls:
        if not _tool_available("waybackurls") and not _tool_available("gau"):
            _log("warn", "No URL mining tools found (waybackurls, gau). Skipping.")
            _log("info", "Trying Wayback CDX API directly...")
            cdx_url = f"https://web.archive.org/cdx/search/cdx?url=*.{domain}/*&output=text&fl=original&collapse=urlkey&limit=500"
            resp = _http_get(cdx_url, timeout=30)
            if resp and resp["body"]:
                for line in resp["body"].splitlines():
                    all_urls.add(line.strip())
                _log("ok", f"Wayback CDX API: {len(all_urls)} URLs")
        else:
            _log("warn", "Tools available but returned no results")

    params = set()
    interesting = set()
    api_eps = set()
    interesting_exts = re.compile(
        r"\.(sql|bak|conf|env|log|xml|json|yml|yaml|ini|cfg|zip|tar|gz|rar|"
        r"old|orig|backup|swp|db|sqlite|dump|csv|xls|xlsx|doc|docx|pdf|key|pem)$",
        re.IGNORECASE,
    )
    api_pattern = re.compile(r"/(api|v[0-9]+|graphql|rest|ws)/", re.IGNORECASE)

    for url in all_urls:
        if "=" in url:
            params.add(url)
        if interesting_exts.search(url):
            interesting.add(url)
        if api_pattern.search(url):
            api_eps.add(url)

    result["total_urls"] = len(all_urls)
    result["params"] = sorted(params)[:100]
    result["interesting_files"] = sorted(interesting)[:50]
    result["api_endpoints"] = sorted(api_eps)[:50]

    _log("ok", f"Total unique URLs: {len(all_urls)}")
    _log("ok", f"URLs with parameters: {len(params)}")
    _log("ok", f"Interesting file URLs: {len(interesting)}")
    _log("ok", f"API endpoints: {len(api_eps)}")

    if interesting:
        _log("warn", "Interesting files found in history:")
        for f in sorted(interesting)[:15]:
            _log("info", f"  {f}")
    if api_eps:
        _log("info", "API endpoints found in history:")
        for ep in sorted(api_eps)[:15]:
            _log("info", f"  {ep}")

    return result


# ---------------------------------------------------------------------------
# Tier 1 modules — Near-passive (looks like browsing)
# ---------------------------------------------------------------------------

def module_http_headers(domain: str) -> dict:
    """HTTP response header analysis (single HEAD request)."""
    _subheader("HTTP Response Headers")
    result: dict[str, Any] = {"technologies": [], "security_headers": {}}

    for scheme in ["https", "http"]:
        url = f"{scheme}://{domain}"
        resp = _http_get(url, head_only=True)
        if resp:
            result["scheme"] = scheme
            result["status_code"] = resp["status"]
            result["final_url"] = resp.get("url", url)
            result["headers"] = resp["headers"]

            server = resp["headers"].get("Server", resp["headers"].get("server", ""))
            if server:
                result["server"] = server
                _log("ok", f"Server: {server}")
                for sig, tech in HEADER_FINGERPRINTS["server"].items():
                    if sig in server.lower():
                        result["technologies"].append(tech)

            xpb = resp["headers"].get("X-Powered-By", resp["headers"].get("x-powered-by", ""))
            if xpb:
                result["x_powered_by"] = xpb
                _log("ok", f"X-Powered-By: {xpb}")
                for sig, tech in HEADER_FINGERPRINTS["x-powered-by"].items():
                    if sig in xpb.lower():
                        result["technologies"].append(tech)

            sec_headers = {
                "Strict-Transport-Security": "HSTS",
                "Content-Security-Policy": "CSP",
                "X-Frame-Options": "X-Frame-Options",
                "X-Content-Type-Options": "X-Content-Type-Options",
                "X-XSS-Protection": "X-XSS-Protection",
                "Referrer-Policy": "Referrer-Policy",
                "Permissions-Policy": "Permissions-Policy",
            }
            for hdr, label in sec_headers.items():
                val = resp["headers"].get(hdr, resp["headers"].get(hdr.lower(), ""))
                if val:
                    result["security_headers"][label] = val
                    _log("ok", f"  {label}: {val[:80]}")
                else:
                    result["security_headers"][label] = "MISSING"

            for hdr_key, hdr_val in resp["headers"].items():
                hk = hdr_key.lower()
                if "cf-ray" in hk:
                    result["technologies"].append("Cloudflare CDN")
                    _log("ok", "  CDN: Cloudflare detected (CF-Ray)")
                elif "x-amz-cf" in hk:
                    result["technologies"].append("AWS CloudFront")
                    _log("ok", "  CDN: AWS CloudFront detected")
                elif "x-cache" in hk and "varnish" in hdr_val.lower():
                    result["technologies"].append("Varnish Cache")
                elif "x-drupal" in hk:
                    result["technologies"].append("Drupal CMS")
                elif "x-generator" in hk:
                    result["technologies"].append(f"Generator: {hdr_val}")
                    _log("ok", f"  X-Generator: {hdr_val}")

            result["technologies"] = list(set(result["technologies"]))
            break

    if not result.get("headers"):
        _log("err", "Could not connect to target via HTTP or HTTPS")
    return result


def module_ssl_cert(domain: str) -> dict:
    """SSL/TLS certificate inspection."""
    _subheader("SSL/TLS Certificate")
    result: dict[str, Any] = {}

    try:
        ctx = ssl.create_default_context()
        ctx.check_hostname = False
        ctx.verify_mode = ssl.CERT_NONE
        with socket.create_connection((domain, 443), timeout=10) as sock:
            with ctx.wrap_socket(sock, server_hostname=domain) as ssock:
                cert = ssock.getpeercert(binary_form=False)
                if cert:
                    subject_dict: dict[str, str] = {}
                    for item in cert.get("subject", ()):
                        if item and item[0]:
                            k, v = item[0]
                            subject_dict[k] = v
                    result["subject"] = subject_dict

                    issuer_dict: dict[str, str] = {}
                    for item in cert.get("issuer", ()):
                        if item and item[0]:
                            k, v = item[0]
                            issuer_dict[k] = v
                    result["issuer"] = issuer_dict

                    result["not_before"] = cert.get("notBefore")
                    result["not_after"] = cert.get("notAfter")
                    result["serial"] = cert.get("serialNumber")

                    sans: list[str] = []
                    for san_entry in cert.get("subjectAltName", []):
                        if isinstance(san_entry, tuple) and len(san_entry) >= 2:
                            sans.append(str(san_entry[1]))
                    result["sans"] = sans

                    _log("ok", f"Subject: {result['subject']}")
                    _log("ok", f"Issuer: {issuer_dict.get('organizationName', 'N/A')}")
                    _log("ok", f"Valid: {result['not_before']} → {result['not_after']}")
                    if sans:
                        _log("ok", f"SANs ({len(sans)}):")
                        for san in sans[:20]:
                            _log("info", f"  {san}")
                        if len(sans) > 20:
                            _log("info", f"  ... and {len(sans) - 20} more")

                result["tls_version"] = ssock.version()
                _log("ok", f"TLS Version: {result['tls_version']}")
                result["cipher"] = ssock.cipher()
                if result["cipher"]:
                    _log("ok", f"Cipher: {result['cipher'][0]}")

    except ssl.SSLError as e:
        _log("warn", f"SSL error: {e}")
    except (socket.timeout, ConnectionRefusedError, OSError) as e:
        _log("warn", f"Cannot connect to port 443: {e}")

    if not result.get("sans") and _tool_available("openssl"):
        cert_text = _run_cmd(
            ["bash", "-c", f"echo | openssl s_client -connect {domain}:443 -servername {domain} 2>/dev/null | openssl x509 -noout -ext subjectAltName"],
            timeout=10,
        )
        if cert_text:
            sans = re.findall(r"DNS:([^\s,]+)", cert_text)
            result["sans"] = sans
            _log("ok", f"SANs from openssl ({len(sans)}):")
            for san in sans[:20]:
                _log("info", f"  {san}")
    return result


def module_cookies(domain: str) -> dict:
    """Cookie-based technology fingerprinting."""
    _subheader("Cookie Analysis")
    result: dict[str, Any] = {"cookies": [], "technologies": []}

    for scheme in ["https", "http"]:
        url = f"{scheme}://{domain}"
        resp = _http_get(url, head_only=True)
        if not resp:
            continue
        for hdr_key, hdr_val in resp["headers"].items():
            if hdr_key.lower() == "set-cookie":
                cookies_raw = hdr_val.split(",")
                for cookie in cookies_raw:
                    cookie_name = cookie.split("=")[0].strip()
                    result["cookies"].append(cookie.strip()[:100])
                    for sig, tech in COOKIE_TECH_MAP.items():
                        if sig.lower() in cookie_name.lower():
                            result["technologies"].append(tech)
                            _log("ok", f"Cookie '{cookie_name}' → {tech}")
                    cookie_lower = cookie.lower()
                    if "httponly" not in cookie_lower:
                        _log("warn", f"Cookie '{cookie_name}' missing HttpOnly flag")
                    if "secure" not in cookie_lower and scheme == "https":
                        _log("warn", f"Cookie '{cookie_name}' missing Secure flag on HTTPS")
        break

    result["technologies"] = list(set(result["technologies"]))
    if not result["cookies"]:
        _log("info", "No cookies set on initial response")
    return result


def module_robots_sitemap(domain: str) -> dict:
    """Fetch robots.txt and sitemap.xml."""
    _subheader("robots.txt & sitemap.xml")
    result: dict[str, Any] = {"disallowed": [], "sitemaps": [], "interesting_paths": []}

    resp = _http_get(f"https://{domain}/robots.txt")
    if not resp or resp["status"] != 200:
        resp = _http_get(f"http://{domain}/robots.txt")

    if resp and resp["status"] == 200 and "disallow" in resp["body"].lower():
        _log("ok", "robots.txt found:")
        for line in resp["body"].splitlines():
            line = line.strip()
            if line.lower().startswith("disallow:"):
                path = line.split(":", 1)[1].strip()
                if path and path != "/":
                    result["disallowed"].append(path)
                    _log("info", f"  Disallow: {path}")
            elif line.lower().startswith("sitemap:"):
                sm = line.split(":", 1)[1].strip()
                if sm.startswith("http"):
                    sm = line.split(" ", 1)[1].strip()
                result["sitemaps"].append(sm)
                _log("info", f"  Sitemap: {sm}")

        interesting = [p for p in result["disallowed"] if any(
            kw in p.lower() for kw in ["admin", "api", "config", "backup", "debug", "test", "staging", "dev", "internal", "private", "secret", "upload"]
        )]
        if interesting:
            result["interesting_paths"] = interesting
            _log("warn", "Interesting disallowed paths:")
            for p in interesting:
                _log("warn", f"  → {p}")
    else:
        _log("info", "No robots.txt found or empty")

    if not result["sitemaps"]:
        for path in ["/sitemap.xml", "/sitemap_index.xml"]:
            resp = _http_get(f"https://{domain}{path}")
            if resp and resp["status"] == 200 and "<?xml" in resp["body"][:100]:
                urls_found = re.findall(r"<loc>(.*?)</loc>", resp["body"])
                result["sitemap_urls"] = len(urls_found)
                _log("ok", f"sitemap.xml: {len(urls_found)} URLs found")
                break

    resp = _http_get(f"https://{domain}/.well-known/security.txt")
    if resp and resp["status"] == 200 and ("contact:" in resp["body"].lower() or "policy:" in resp["body"].lower()):
        result["security_txt"] = True
        _log("ok", "security.txt found (security-conscious org)")
    else:
        result["security_txt"] = False
    return result


def module_error_page(domain: str) -> dict:
    """Error page fingerprinting via 404 response."""
    _subheader("Error Page Fingerprinting")
    result: dict[str, Any] = {}

    random_path = f"/nonexistent-page-{int(time.time())}"
    resp = _http_get(f"https://{domain}{random_path}")
    if not resp:
        resp = _http_get(f"http://{domain}{random_path}")
    if not resp:
        _log("err", "Could not fetch error page")
        return result

    body = resp["body"]
    result["status_code"] = resp["status"]

    signatures = [
        (r"Apache/[\d.]+ \([\w]+\) Server at", "Apache (default error page)"),
        (r"<title>404 Not Found</title>.*nginx", "Nginx (default error page)"),
        (r"Microsoft-IIS/[\d.]+", "IIS (default error page)"),
        (r"Whitelabel Error Page", "Spring Boot (Java)"),
        (r"Traceback \(most recent call last\)", "Python (debug mode ON — sensitive!)"),
        (r"Cannot GET /", "Express.js (Node)"),
        (r"Not Found.*The requested URL was not found", "Flask (Python)"),
        (r"<title>Error</title>.*Django", "Django"),
        (r"wp-content|wordpress", "WordPress"),
        (r"drupal|Drupal", "Drupal"),
        (r"joomla|Joomla", "Joomla"),
        (r"<title>404.*Shopify", "Shopify"),
        (r"<title>Page not found.*GitHub Pages", "GitHub Pages"),
        (r"The page you are looking for doesn.*t exist", "Generic CMS 404"),
        (r"laravel|Laravel", "Laravel (PHP)"),
        (r"<title>404 - File or directory not found", "IIS detailed error"),
        (r"Powered by CakePHP", "CakePHP"),
        (r"Ruby on Rails", "Ruby on Rails"),
    ]

    for pattern, tech in signatures:
        if re.search(pattern, body, re.IGNORECASE | re.DOTALL):
            result["detected_technology"] = tech
            _log("ok", f"Error page reveals: {tech}")
            if "debug mode" in tech.lower() or "sensitive" in tech.lower():
                _log("warn", "⚠ Debug mode detected — potential information disclosure!")
            break

    if not result.get("detected_technology"):
        _log("info", "Error page did not match known signatures (custom or generic)")

    if any(kw in body.lower() for kw in ["stack trace", "exception", "traceback", "syntax error", "fatal error"]):
        result["info_disclosure"] = True
        _log("warn", "Error page may contain stack traces or debug information!")
    return result


def module_tech_detect(domain: str) -> dict:
    """Technology detection via whatweb or httpx CLI."""
    _subheader("Technology Detection (CLI Tools)")
    result: dict[str, Any] = {}

    if _tool_available("whatweb"):
        raw = _run_cmd(["whatweb", "-a", "1", f"https://{domain}"], timeout=30)
        if raw:
            result["whatweb"] = raw
            _log("ok", "whatweb (aggression 1):")
            for line in raw.splitlines():
                _log("info", f"  {line[:120]}")

    if _tool_available("httpx"):
        raw = _run_cmd(
            ["bash", "-c", f"echo '{domain}' | httpx -silent -title -tech-detect -status-code -server -content-length"],
            timeout=30,
        )
        if raw:
            result["httpx"] = raw
            _log("ok", "httpx:")
            for line in raw.splitlines():
                _log("info", f"  {line[:120]}")

    if not result:
        _log("info", "No CLI detection tools available (whatweb, httpx)")
    return result


def module_favicon(domain: str) -> dict:
    """Favicon hash fingerprinting."""
    _subheader("Favicon Hash")
    result: dict[str, Any] = {}

    resp = _http_get(f"https://{domain}/favicon.ico")
    if not resp:
        resp = _http_get(f"http://{domain}/favicon.ico")
    if not resp or resp["status"] != 200:
        _log("info", "No favicon.ico found")
        return result

    try:
        import codecs
        import mmh3
        favicon_b64 = codecs.encode(resp["body"].encode("latin-1"), "base64")
        fav_hash = mmh3.hash(favicon_b64)
        result["hash"] = fav_hash
        result["shodan_query"] = f"http.favicon.hash:{fav_hash}"
        _log("ok", f"Favicon hash: {fav_hash}")
        _log("ok", f"Shodan query: http.favicon.hash:{fav_hash}")
    except ImportError:
        _log("warn", "mmh3 not installed — pip install mmh3 for favicon hashing")

    md5 = hashlib.md5(resp["body"].encode("latin-1")).hexdigest()
    result["md5"] = md5
    _log("info", f"Favicon MD5: {md5}")
    return result


# ---------------------------------------------------------------------------
# Report generation
# ---------------------------------------------------------------------------

def generate_summary(domain: str, results: dict) -> str:
    """Generate text summary of all findings."""
    lines = []
    lines.append(f"\n{'=' * 60}")
    lines.append(f" FINGERPRINT SUMMARY: {domain}")
    lines.append(f"{'=' * 60}\n")

    techs = set()
    for section in results.values():
        if isinstance(section, dict):
            for t in section.get("technologies", []):
                techs.add(t)
    if results.get("http_headers", {}).get("server"):
        techs.add(f"Server: {results['http_headers']['server']}")
    if results.get("http_headers", {}).get("x_powered_by"):
        techs.add(f"Backend: {results['http_headers']['x_powered_by']}")
    if results.get("error_page", {}).get("detected_technology"):
        techs.add(results["error_page"]["detected_technology"])
    for fp in results.get("dns", {}).get("dns_fingerprints", []):
        techs.add(fp)

    lines.append("TECHNOLOGIES DETECTED:")
    for t in sorted(techs):
        lines.append(f"  • {t}")

    sec = results.get("http_headers", {}).get("security_headers", {})
    if sec:
        lines.append("\nSECURITY HEADERS:")
        for hdr, val in sec.items():
            status = "✓" if val != "MISSING" else "✗"
            lines.append(f"  {status} {hdr}: {val[:60]}")

    ct_subs = results.get("crtsh", {}).get("subdomain_count", 0)
    passive_subs = results.get("passive_subdomains", {}).get("total_unique", 0)
    total = max(ct_subs, passive_subs)
    if total:
        lines.append(f"\nSUBDOMAINS: {total} unique discovered")

    ports = results.get("shodan", {}).get("ports", [])
    if ports:
        lines.append(f"\nOPEN PORTS (Shodan cache): {', '.join(map(str, ports))}")

    vulns = results.get("shodan", {}).get("vulns", [])
    if vulns:
        lines.append(f"\nKNOWN VULNERABILITIES: {', '.join(vulns[:10])}")

    hist = results.get("wayback", {})
    if hist.get("total_urls"):
        lines.append(f"\nHISTORICAL DATA:")
        lines.append(f"  Total URLs: {hist['total_urls']}")
        lines.append(f"  With parameters: {len(hist.get('params', []))}")
        lines.append(f"  Interesting files: {len(hist.get('interesting_files', []))}")
        lines.append(f"  API endpoints: {len(hist.get('api_endpoints', []))}")

    interesting = results.get("robots_sitemap", {}).get("interesting_paths", [])
    if interesting:
        lines.append(f"\nINTERESTING DISALLOWED PATHS:")
        for p in interesting:
            lines.append(f"  → {p}")

    lines.append(f"\n{'=' * 60}")
    lines.append(f" Scan completed: {datetime.now(timezone.utc).isoformat()}")
    lines.append(f"{'=' * 60}\n")
    return "\n".join(lines)


# ---------------------------------------------------------------------------
# Main orchestrator
# ---------------------------------------------------------------------------

def run_fingerprint(domain: str, tier: int = 1, resolve_ip: bool = False, output_file: Optional[str] = None):
    """Run fingerprinting modules up to specified tier."""
    print(BANNER)
    _header(f"Target: {domain}")
    _log("info", f"Max tier: {tier}  |  Started: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    print()

    results: dict[str, Any] = {"target": domain, "scan_time": datetime.now(timezone.utc).isoformat(), "tier": tier}

    _header("TIER 0 — Zero-Touch Passive")
    _log("tier", "No packets sent to target. Querying third-party sources only.")

    results["whois"] = module_whois(domain)
    results["dns"] = module_dns(domain)
    results["crtsh"] = module_crtsh(domain)

    ip = results["dns"].get("resolved_ip")
    if resolve_ip or ip:
        results["shodan"] = module_shodan(domain, ip=ip)

    results["passive_subdomains"] = module_passive_subdomains(domain)
    results["wayback"] = module_wayback(domain)

    if tier < 1:
        summary = generate_summary(domain, results)
        print(summary)
        if output_file:
            _save_results(results, summary, output_file)
        return results

    _header("TIER 1 — Near-Passive (Looks Like Browsing)")
    _log("tier", "Sending minimal requests that look like normal web browsing.")

    results["http_headers"] = module_http_headers(domain)
    results["ssl_cert"] = module_ssl_cert(domain)
    results["cookies"] = module_cookies(domain)
    results["robots_sitemap"] = module_robots_sitemap(domain)
    results["error_page"] = module_error_page(domain)
    results["tech_detect"] = module_tech_detect(domain)
    results["favicon"] = module_favicon(domain)

    summary = generate_summary(domain, results)
    print(summary)

    if output_file:
        _save_results(results, summary, output_file)
    return results


def _save_results(results: dict, summary: str, output_file: str):
    """Save results to JSON and text summary."""
    json_file = output_file if output_file.endswith(".json") else output_file + ".json"

    def _serialize(obj):
        if isinstance(obj, set):
            return sorted(obj)
        if isinstance(obj, bytes):
            return obj.decode("utf-8", errors="replace")
        raise TypeError(f"Object of type {type(obj)} is not JSON serializable")

    with open(json_file, "w") as f:
        json.dump(results, f, indent=2, default=_serialize)
    _log("ok", f"JSON results saved to: {json_file}")

    txt_file = json_file.replace(".json", "_summary.txt")
    with open(txt_file, "w") as f:
        f.write(summary)
    _log("ok", f"Text summary saved to: {txt_file}")


def main():
    parser = argparse.ArgumentParser(
        description="Passive-first web fingerprinting tool",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog=textwrap.dedent("""\
            Tiers:
              0  Zero-touch passive (third-party databases only)
              1  Near-passive (looks like normal browsing) [default]

            Examples:
              %(prog)s example.com
              %(prog)s example.com --tier 0
              %(prog)s example.com -o results.json
              %(prog)s example.com --resolve-ip --all
        """),
    )
    parser.add_argument("domain", help="Target domain (e.g., example.com)")
    parser.add_argument("--tier", type=int, default=1, choices=[0, 1], help="Maximum tier to execute (default: 1)")
    parser.add_argument("--all", action="store_true", help="Run all tiers (same as --tier 1)")
    parser.add_argument("-o", "--output", help="Output file path (JSON)")
    parser.add_argument("--resolve-ip", action="store_true", help="Resolve domain IP and query Shodan")
    parser.add_argument("--no-color", action="store_true", help="Disable colored output")

    args = parser.parse_args()
    if args.no_color:
        Colors.disable()

    domain = args.domain.strip().lower()
    domain = re.sub(r"^https?://", "", domain)
    domain = domain.split("/")[0]
    tier = 1 if args.all else args.tier

    try:
        run_fingerprint(domain=domain, tier=tier, resolve_ip=args.resolve_ip, output_file=args.output)
    except KeyboardInterrupt:
        print(f"\n{Colors.YELLOW}[!] Interrupted by user{Colors.END}")
        sys.exit(1)


if __name__ == "__main__":
    main()

Last Updated: 2026-02-11
Companion Guides: Web_URL_Fingerprinting.md (methodology), Stay_Low.md (EDR evasion), Stay_low_Command.md (command OPSEC)