Web_Fingerprint_Script
Web Fingerprint — Passive-First Python Script
Overview
Automated web/URL fingerprinting tool that prioritizes stealth. Executes Tier 0 (zero-touch passive — no packets to target) and Tier 1 (near-passive — looks like normal browsing).
Quick Start
# Install dependencies
pip install requests mmh3 python-whois dnspython shodan
# Basic usage (Tier 0 + Tier 1)
python3 web_fingerprint.py target.com
# Passive only (Tier 0 — no packets to target)
python3 web_fingerprint.py target.com --tier 0
# With Shodan IP lookup + JSON output
python3 web_fingerprint.py target.com --resolve-ip -o results.json
What It Does
| Tier | Module | Method | What It Finds |
|---|---|---|---|
| 0 | Whois | Public WHOIS | Registrar, dates, org, nameservers |
| 0 | DNS Records | dig @8.8.8.8 | MX, TXT, NS, A, CNAME + fingerprints providers |
| 0 | crt.sh | CT log API | Subdomains from certificate transparency |
| 0 | Shodan | Cached host lookup | Open ports, banners, OS, vulns (0 credits) |
| 0 | Passive Subdomains | subfinder/amass/assetfinder | Subdomain aggregation from APIs |
| 0 | Wayback/GAU | Historical URLs | Old endpoints, params, exposed files, API paths |
| 1 | HTTP Headers | Single HEAD request | Server, X-Powered-By, security headers, CDN |
| 1 | SSL Certificate | TLS handshake | SANs, issuer, org, expiry, TLS version |
| 1 | Cookie Analysis | From response headers | Backend tech from cookie names/flags |
| 1 | robots.txt | Single GET | Disallowed paths, sitemaps, interesting dirs |
| 1 | Error Page | Single 404 request | Framework from default error pages |
| 1 | Tech Detection | whatweb -a1 / httpx | Full technology stack identification |
| 1 | Favicon Hash | Single GET | Shodan-searchable hash for framework ID |
Optional CLI Tools
For best results, have these in your PATH:
# Subdomain enumeration
go install github.com/projectdiscovery/subfinder/v2/cmd/subfinder@latest
go install github.com/owasp-amass/amass/v4/...@master
go install github.com/tomnomnom/assetfinder@latest
# URL mining
go install github.com/tomnomnom/waybackurls@latest
go install github.com/lc/gau/v2/cmd/gau@latest
# Tech detection
apt install whatweb
go install github.com/projectdiscovery/httpx/cmd/httpx@latest
The script gracefully degrades — if tools aren't installed, it skips those modules or uses built-in fallbacks (e.g., Wayback CDX API).
Shodan Setup
# Option A: Environment variable
export SHODAN_API_KEY="your-key-here"
# Option B: Shodan CLI init
pip install shodan
shodan init your-key-here
Full Script
#!/usr/bin/env python3
"""
Web & URL Passive Fingerprinting Tool
======================================
Stealthy enumeration from zero-touch passive to near-passive.
Executes Tier 0 (no packets to target) and Tier 1 (looks like browsing).
Usage:
python3 web_fingerprint.py target.com
python3 web_fingerprint.py target.com --tier 0 # Passive only
python3 web_fingerprint.py target.com --tier 1 # Include near-passive
python3 web_fingerprint.py target.com --all # All modules
python3 web_fingerprint.py target.com -o report.json # JSON output
python3 web_fingerprint.py target.com --resolve-ip # Also fingerprint resolved IP via Shodan
Requirements:
pip install requests mmh3 python-whois dnspython shodan
Optional (for enhanced results):
- Shodan API key in SHODAN_API_KEY env var or ~/.shodan/api_key
- subfinder, amass, waybackurls, gau, httpx, whatweb in PATH
Author: Stay Low / Red Team OPSEC Toolkit
"""
import argparse
import hashlib
import json
import os
import re
import shutil
import socket
import ssl
import subprocess
import sys
import textwrap
import time
import urllib.parse
import urllib.request
from collections import OrderedDict
from datetime import datetime, timezone
from typing import Any, Optional
# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
BANNER = r"""
__ __ _ _____ _ _ _
\ \ / /__| |__ | ___(_)_ __ __ _ ___ _ __ _ __ _ __(_)_ __ | |_
\ \ /\ / / _ \ '_ \ | |_ | | '_ \ / _` |/ _ \ '__| '_ \| '__| | '_ \| __|
\ V V / __/ |_) | | _| | | | | | (_| | __/ | | |_) | | | | | | | |_
\_/\_/ \___|_.__/ |_| |_|_| |_|\__, |\___|_| | .__/|_| |_|_| |_|\__|
|___/ |_|
Passive-First Web Fingerprinting — Stay Low
"""
USER_AGENT = (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
)
COOKIE_TECH_MAP = {
"PHPSESSID": "PHP",
"JSESSIONID": "Java (Tomcat/JBoss/Spring)",
"ASP.NET_SessionId": "ASP.NET",
"CFID": "ColdFusion",
"CFTOKEN": "ColdFusion",
"connect.sid": "Node.js (Express)",
"laravel_session": "Laravel (PHP)",
"_rails_session": "Ruby on Rails",
"_session_id": "Ruby on Rails",
"csrftoken": "Django (Python)",
"sessionid": "Django (Python)",
"ci_session": "CodeIgniter (PHP)",
"wp-settings": "WordPress",
"AWSALB": "AWS ALB",
"AWSALBCORS": "AWS ALB (CORS)",
"__cfduid": "Cloudflare",
"cf_clearance": "Cloudflare",
"ROUTEID": "HAProxy",
"BIGipServer": "F5 BIG-IP",
"SERVERID": "HAProxy",
"incap_ses": "Imperva Incapsula",
"visid_incap": "Imperva Incapsula",
}
HEADER_FINGERPRINTS = {
"server": {
"apache": "Apache",
"nginx": "Nginx",
"microsoft-iis": "IIS (Windows)",
"cloudflare": "Cloudflare",
"litespeed": "LiteSpeed",
"openresty": "OpenResty (Nginx)",
"gunicorn": "Gunicorn (Python)",
"uvicorn": "Uvicorn (Python ASGI)",
"cowboy": "Cowboy (Erlang/Elixir)",
"caddy": "Caddy",
"envoy": "Envoy Proxy",
"kestrel": "Kestrel (.NET)",
},
"x-powered-by": {
"php": "PHP",
"asp.net": "ASP.NET",
"express": "Express (Node.js)",
"next.js": "Next.js",
"nuxt": "Nuxt.js (Vue)",
"servlet": "Java Servlet",
"jsp": "Java JSP",
"perl": "Perl",
"plack": "Plack (Perl)",
"ruby": "Ruby",
},
}
DNS_FINGERPRINTS = {
"mx": {
"aspmx.l.google.com": "Google Workspace",
"mail.protection.outlook.com": "Microsoft 365",
"pphosted.com": "Proofpoint",
"mimecast": "Mimecast",
"barracuda": "Barracuda",
"messagelabs": "Symantec Email Security",
},
"cname": {
"cloudfront.net": "AWS CloudFront",
"azurewebsites.net": "Azure Web Apps",
"herokuapp.com": "Heroku",
"netlify.app": "Netlify",
"vercel-dns.com": "Vercel",
"pages.dev": "Cloudflare Pages",
"firebaseapp.com": "Firebase",
"s3.amazonaws.com": "AWS S3",
"storage.googleapis.com": "GCP Cloud Storage",
"fastly.net": "Fastly CDN",
"akamaiedge.net": "Akamai CDN",
"edgekey.net": "Akamai CDN",
"cdn.shopify.com": "Shopify",
"squarespace.com": "Squarespace",
"ghost.io": "Ghost CMS",
"wpengine.com": "WP Engine",
"pantheonsite.io": "Pantheon",
},
"ns": {
"awsdns": "AWS Route 53",
"cloudflare.com": "Cloudflare DNS",
"azure-dns": "Azure DNS",
"google.com": "Google Cloud DNS",
"digitalocean.com": "DigitalOcean DNS",
"domaincontrol.com": "GoDaddy DNS",
},
}
# ---------------------------------------------------------------------------
# Utility helpers
# ---------------------------------------------------------------------------
class Colors:
HEADER = "\033[95m"
BLUE = "\033[94m"
CYAN = "\033[96m"
GREEN = "\033[92m"
YELLOW = "\033[93m"
RED = "\033[91m"
BOLD = "\033[1m"
DIM = "\033[2m"
END = "\033[0m"
@classmethod
def disable(cls):
for attr in dir(cls):
if attr.isupper() and not attr.startswith("_"):
setattr(cls, attr, "")
def _log(level: str, msg: str):
ts = datetime.now().strftime("%H:%M:%S")
icons = {
"info": f"{Colors.CYAN}[*]{Colors.END}",
"ok": f"{Colors.GREEN}[+]{Colors.END}",
"warn": f"{Colors.YELLOW}[!]{Colors.END}",
"err": f"{Colors.RED}[-]{Colors.END}",
"tier": f"{Colors.BOLD}{Colors.HEADER}[▶]{Colors.END}",
}
icon = icons.get(level, "[?]")
print(f" {Colors.DIM}{ts}{Colors.END} {icon} {msg}")
def _header(text: str):
width = 60
print()
print(f" {Colors.BOLD}{'═' * width}{Colors.END}")
print(f" {Colors.BOLD}{text.center(width)}{Colors.END}")
print(f" {Colors.BOLD}{'═' * width}{Colors.END}")
def _subheader(text: str):
print(f"\n {Colors.BOLD}{Colors.BLUE}── {text} ──{Colors.END}")
def _tool_available(name: str) -> bool:
return shutil.which(name) is not None
def _run_cmd(cmd: list[str], timeout: int = 60) -> Optional[str]:
try:
result = subprocess.run(
cmd,
capture_output=True,
text=True,
timeout=timeout,
)
return result.stdout.strip() if result.returncode == 0 else None
except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
return None
def _http_get(url: str, timeout: int = 10, head_only: bool = False) -> Optional[dict]:
"""Perform a single HTTP(S) request with browser-like User-Agent."""
try:
req = urllib.request.Request(url, headers={"User-Agent": USER_AGENT})
if head_only:
req.method = "HEAD"
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
with urllib.request.urlopen(req, timeout=timeout, context=ctx) as resp:
headers = dict(resp.headers)
body = "" if head_only else resp.read(500_000).decode("utf-8", errors="replace")
return {
"status": resp.status,
"headers": headers,
"body": body,
"url": resp.url,
}
except Exception:
return None
# ---------------------------------------------------------------------------
# Tier 0 modules — Zero-touch passive
# ---------------------------------------------------------------------------
def module_whois(domain: str) -> dict:
"""Whois lookup via system command."""
_subheader("Whois Lookup")
result: dict[str, Any] = {}
raw = _run_cmd(["whois", domain], timeout=15)
if not raw:
_log("err", "whois command failed or not available")
return result
for line in raw.splitlines():
line = line.strip()
lower = line.lower()
if "registrar:" in lower:
result["registrar"] = line.split(":", 1)[1].strip()
elif "creation date:" in lower or "created:" in lower:
result["creation_date"] = line.split(":", 1)[1].strip()
elif "expir" in lower and "date" in lower:
result["expiry_date"] = line.split(":", 1)[1].strip()
elif "name server:" in lower:
ns = line.split(":", 1)[1].strip().lower()
result.setdefault("name_servers", []).append(ns)
elif "registrant org" in lower:
result["registrant_org"] = line.split(":", 1)[1].strip()
for k, v in result.items():
if k == "name_servers":
_log("ok", f"Name Servers: {', '.join(v)}")
else:
_log("ok", f"{k}: {v}")
return result
def module_dns(domain: str) -> dict:
"""DNS record lookup via dig against public resolvers."""
_subheader("DNS Records (Public Resolver)")
result: dict[str, Any] = {}
record_types = ["A", "AAAA", "MX", "TXT", "NS", "CNAME", "SOA", "CAA"]
for rtype in record_types:
raw = _run_cmd(["dig", "+short", domain, rtype, "@8.8.8.8"], timeout=10)
if raw:
records = [line.strip() for line in raw.splitlines() if line.strip()]
if records:
result[rtype] = records
_log("ok", f"{rtype}: {', '.join(records[:5])}")
# Fingerprint from DNS
fingerprints = []
for mx in result.get("MX", []):
mx_lower = mx.lower()
for sig, tech in DNS_FINGERPRINTS["mx"].items():
if sig in mx_lower:
fingerprints.append(f"Email: {tech}")
break
for ns in result.get("NS", []):
ns_lower = ns.lower()
for sig, tech in DNS_FINGERPRINTS["ns"].items():
if sig in ns_lower:
fingerprints.append(f"DNS: {tech}")
break
for cname in result.get("CNAME", []):
cname_lower = cname.lower()
for sig, tech in DNS_FINGERPRINTS["cname"].items():
if sig in cname_lower:
fingerprints.append(f"Hosting: {tech}")
break
# SPF / DMARC from TXT records
for txt in result.get("TXT", []):
txt_lower = txt.lower()
if "v=spf1" in txt_lower:
fingerprints.append("SPF record found")
if "google.com" in txt_lower:
fingerprints.append("Email infra: Google Workspace (SPF)")
elif "protection.outlook.com" in txt_lower:
fingerprints.append("Email infra: Microsoft 365 (SPF)")
if "v=dmarc1" in txt_lower:
fingerprints.append("DMARC record found")
if "ms=" in txt_lower:
fingerprints.append("Microsoft domain verification found")
if "google-site-verification" in txt_lower:
fingerprints.append("Google site verification found")
if "docusign" in txt_lower:
fingerprints.append("DocuSign integration found")
if "facebook-domain-verification" in txt_lower:
fingerprints.append("Facebook domain verification found")
if "atlassian-domain-verification" in txt_lower:
fingerprints.append("Atlassian domain verification found")
if fingerprints:
result["dns_fingerprints"] = list(set(fingerprints))
for fp in result["dns_fingerprints"]:
_log("ok", f" → {fp}")
# Reverse lookup
try:
ip = socket.gethostbyname(domain)
result["resolved_ip"] = ip
_log("ok", f"Resolved IP: {ip}")
try:
reverse = socket.gethostbyaddr(ip)
result["reverse_dns"] = reverse[0]
_log("ok", f"Reverse DNS: {reverse[0]}")
except socket.herror:
pass
except socket.gaierror:
_log("warn", "Could not resolve domain to IP")
return result
def module_crtsh(domain: str) -> dict:
"""Certificate Transparency log search via crt.sh."""
_subheader("Certificate Transparency (crt.sh)")
result: dict[str, Any] = {}
url = f"https://crt.sh/?q=%25.{domain}&output=json"
try:
req = urllib.request.Request(url, headers={"User-Agent": USER_AGENT})
ctx = ssl.create_default_context()
with urllib.request.urlopen(req, timeout=20, context=ctx) as resp:
data = json.loads(resp.read().decode())
except Exception as e:
_log("err", f"crt.sh query failed: {e}")
return result
subdomains = set()
issuers = set()
for entry in data:
name = entry.get("name_value", "")
for line in name.splitlines():
cleaned = line.strip().lstrip("*.")
if cleaned and domain in cleaned:
subdomains.add(cleaned)
issuer = entry.get("issuer_name", "")
if issuer:
issuers.add(issuer)
result["subdomains"] = sorted(subdomains)
result["subdomain_count"] = len(subdomains)
result["issuers"] = sorted(issuers)[:10]
_log("ok", f"Found {len(subdomains)} unique subdomains from CT logs")
if len(subdomains) <= 30:
for sub in sorted(subdomains):
_log("info", f" {sub}")
else:
for sub in sorted(subdomains)[:20]:
_log("info", f" {sub}")
_log("info", f" ... and {len(subdomains) - 20} more")
return result
def module_shodan(domain: str, ip: Optional[str] = None) -> dict:
"""Shodan cached host lookup (0 credits)."""
_subheader("Shodan Cached Data")
result: dict[str, Any] = {}
api_key = os.environ.get("SHODAN_API_KEY")
if not api_key:
key_file = os.path.expanduser("~/.shodan/api_key")
if os.path.exists(key_file):
with open(key_file) as f:
api_key = f.read().strip()
if not api_key:
if _tool_available("shodan"):
if ip:
raw = _run_cmd(["shodan", "host", ip], timeout=15)
else:
try:
resolved_ip = socket.gethostbyname(domain)
raw = _run_cmd(["shodan", "host", resolved_ip], timeout=15)
except socket.gaierror:
raw = None
if raw:
result["raw"] = raw
_log("ok", "Shodan data (via CLI):")
for line in raw.splitlines()[:25]:
_log("info", f" {line}")
else:
_log("warn", "Shodan CLI returned no data")
else:
_log("warn", "No Shodan API key found. Set SHODAN_API_KEY or run 'shodan init <key>'")
return result
target_ip = ip
if not target_ip:
try:
target_ip = socket.gethostbyname(domain)
except socket.gaierror:
_log("err", f"Cannot resolve {domain}")
return result
url = f"https://api.shodan.io/shodan/host/{target_ip}?key={api_key}"
try:
req = urllib.request.Request(url, headers={"User-Agent": USER_AGENT})
ctx = ssl.create_default_context()
with urllib.request.urlopen(req, timeout=15, context=ctx) as resp:
data = json.loads(resp.read().decode())
except Exception as e:
_log("warn", f"Shodan API query failed: {e}")
return result
result["ip"] = data.get("ip_str")
result["org"] = data.get("org")
result["os"] = data.get("os")
result["isp"] = data.get("isp")
result["country"] = data.get("country_name")
result["city"] = data.get("city")
result["hostnames"] = data.get("hostnames", [])
result["ports"] = data.get("ports", [])
result["vulns"] = data.get("vulns", [])
services = []
for item in data.get("data", []):
svc = {
"port": item.get("port"),
"transport": item.get("transport"),
"product": item.get("product"),
"version": item.get("version"),
"module": item.get("_shodan", {}).get("module"),
}
services.append(svc)
result["services"] = services
_log("ok", f"IP: {result['ip']} Org: {result.get('org', 'N/A')} Country: {result.get('country', 'N/A')}")
_log("ok", f"OS: {result.get('os', 'N/A')} ISP: {result.get('isp', 'N/A')}")
if result["ports"]:
_log("ok", f"Open Ports: {', '.join(map(str, result['ports']))}")
for svc in services:
product = svc.get("product") or svc.get("module") or "unknown"
ver = svc.get("version") or ""
_log("info", f" {svc['port']}/{svc.get('transport', 'tcp')} — {product} {ver}".strip())
if result["vulns"]:
_log("warn", f"Known Vulnerabilities: {', '.join(result['vulns'][:10])}")
return result
def module_passive_subdomains(domain: str) -> dict:
"""Passive subdomain enumeration via available CLI tools."""
_subheader("Passive Subdomain Enumeration")
result: dict[str, Any] = {"subdomains": set()}
tools_tried = []
if _tool_available("subfinder"):
tools_tried.append("subfinder")
raw = _run_cmd(["subfinder", "-d", domain, "-silent"], timeout=120)
if raw:
for line in raw.splitlines():
sub = line.strip().lower()
if sub and domain in sub:
result["subdomains"].add(sub)
_log("ok", f"subfinder: {len(raw.splitlines())} subdomains")
if _tool_available("amass"):
tools_tried.append("amass")
raw = _run_cmd(["amass", "enum", "-passive", "-d", domain], timeout=180)
if raw:
for line in raw.splitlines():
sub = line.strip().lower()
if sub and domain in sub:
result["subdomains"].add(sub)
_log("ok", f"amass: {len(raw.splitlines())} subdomains")
if _tool_available("assetfinder"):
tools_tried.append("assetfinder")
raw = _run_cmd(["assetfinder", "--subs-only", domain], timeout=60)
if raw:
for line in raw.splitlines():
sub = line.strip().lower()
if sub and domain in sub:
result["subdomains"].add(sub)
_log("ok", f"assetfinder: {len(raw.splitlines())} subdomains")
if not tools_tried:
_log("warn", "No subdomain tools found (subfinder, amass, assetfinder). Skipping.")
result["subdomains"] = sorted(result["subdomains"])
result["tools_used"] = tools_tried
result["total_unique"] = len(result["subdomains"])
_log("ok", f"Total unique subdomains (all sources): {result['total_unique']}")
return result
def module_wayback(domain: str) -> dict:
"""Historical URL mining from Wayback Machine and GAU."""
_subheader("Historical URLs (Wayback / GAU)")
result: dict[str, Any] = {"urls": [], "params": [], "interesting_files": [], "api_endpoints": []}
all_urls: set[str] = set()
if _tool_available("waybackurls"):
raw = _run_cmd(["waybackurls", domain], timeout=120)
if raw:
for line in raw.splitlines():
all_urls.add(line.strip())
_log("ok", f"waybackurls: {len(raw.splitlines())} URLs")
if _tool_available("gau"):
raw = _run_cmd(["gau", domain], timeout=120)
if raw:
for line in raw.splitlines():
all_urls.add(line.strip())
_log("ok", f"gau: {len(raw.splitlines())} URLs")
if not all_urls:
if not _tool_available("waybackurls") and not _tool_available("gau"):
_log("warn", "No URL mining tools found (waybackurls, gau). Skipping.")
_log("info", "Trying Wayback CDX API directly...")
cdx_url = f"https://web.archive.org/cdx/search/cdx?url=*.{domain}/*&output=text&fl=original&collapse=urlkey&limit=500"
resp = _http_get(cdx_url, timeout=30)
if resp and resp["body"]:
for line in resp["body"].splitlines():
all_urls.add(line.strip())
_log("ok", f"Wayback CDX API: {len(all_urls)} URLs")
else:
_log("warn", "Tools available but returned no results")
params = set()
interesting = set()
api_eps = set()
interesting_exts = re.compile(
r"\.(sql|bak|conf|env|log|xml|json|yml|yaml|ini|cfg|zip|tar|gz|rar|"
r"old|orig|backup|swp|db|sqlite|dump|csv|xls|xlsx|doc|docx|pdf|key|pem)$",
re.IGNORECASE,
)
api_pattern = re.compile(r"/(api|v[0-9]+|graphql|rest|ws)/", re.IGNORECASE)
for url in all_urls:
if "=" in url:
params.add(url)
if interesting_exts.search(url):
interesting.add(url)
if api_pattern.search(url):
api_eps.add(url)
result["total_urls"] = len(all_urls)
result["params"] = sorted(params)[:100]
result["interesting_files"] = sorted(interesting)[:50]
result["api_endpoints"] = sorted(api_eps)[:50]
_log("ok", f"Total unique URLs: {len(all_urls)}")
_log("ok", f"URLs with parameters: {len(params)}")
_log("ok", f"Interesting file URLs: {len(interesting)}")
_log("ok", f"API endpoints: {len(api_eps)}")
if interesting:
_log("warn", "Interesting files found in history:")
for f in sorted(interesting)[:15]:
_log("info", f" {f}")
if api_eps:
_log("info", "API endpoints found in history:")
for ep in sorted(api_eps)[:15]:
_log("info", f" {ep}")
return result
# ---------------------------------------------------------------------------
# Tier 1 modules — Near-passive (looks like browsing)
# ---------------------------------------------------------------------------
def module_http_headers(domain: str) -> dict:
"""HTTP response header analysis (single HEAD request)."""
_subheader("HTTP Response Headers")
result: dict[str, Any] = {"technologies": [], "security_headers": {}}
for scheme in ["https", "http"]:
url = f"{scheme}://{domain}"
resp = _http_get(url, head_only=True)
if resp:
result["scheme"] = scheme
result["status_code"] = resp["status"]
result["final_url"] = resp.get("url", url)
result["headers"] = resp["headers"]
server = resp["headers"].get("Server", resp["headers"].get("server", ""))
if server:
result["server"] = server
_log("ok", f"Server: {server}")
for sig, tech in HEADER_FINGERPRINTS["server"].items():
if sig in server.lower():
result["technologies"].append(tech)
xpb = resp["headers"].get("X-Powered-By", resp["headers"].get("x-powered-by", ""))
if xpb:
result["x_powered_by"] = xpb
_log("ok", f"X-Powered-By: {xpb}")
for sig, tech in HEADER_FINGERPRINTS["x-powered-by"].items():
if sig in xpb.lower():
result["technologies"].append(tech)
sec_headers = {
"Strict-Transport-Security": "HSTS",
"Content-Security-Policy": "CSP",
"X-Frame-Options": "X-Frame-Options",
"X-Content-Type-Options": "X-Content-Type-Options",
"X-XSS-Protection": "X-XSS-Protection",
"Referrer-Policy": "Referrer-Policy",
"Permissions-Policy": "Permissions-Policy",
}
for hdr, label in sec_headers.items():
val = resp["headers"].get(hdr, resp["headers"].get(hdr.lower(), ""))
if val:
result["security_headers"][label] = val
_log("ok", f" {label}: {val[:80]}")
else:
result["security_headers"][label] = "MISSING"
for hdr_key, hdr_val in resp["headers"].items():
hk = hdr_key.lower()
if "cf-ray" in hk:
result["technologies"].append("Cloudflare CDN")
_log("ok", " CDN: Cloudflare detected (CF-Ray)")
elif "x-amz-cf" in hk:
result["technologies"].append("AWS CloudFront")
_log("ok", " CDN: AWS CloudFront detected")
elif "x-cache" in hk and "varnish" in hdr_val.lower():
result["technologies"].append("Varnish Cache")
elif "x-drupal" in hk:
result["technologies"].append("Drupal CMS")
elif "x-generator" in hk:
result["technologies"].append(f"Generator: {hdr_val}")
_log("ok", f" X-Generator: {hdr_val}")
result["technologies"] = list(set(result["technologies"]))
break
if not result.get("headers"):
_log("err", "Could not connect to target via HTTP or HTTPS")
return result
def module_ssl_cert(domain: str) -> dict:
"""SSL/TLS certificate inspection."""
_subheader("SSL/TLS Certificate")
result: dict[str, Any] = {}
try:
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
with socket.create_connection((domain, 443), timeout=10) as sock:
with ctx.wrap_socket(sock, server_hostname=domain) as ssock:
cert = ssock.getpeercert(binary_form=False)
if cert:
subject_dict: dict[str, str] = {}
for item in cert.get("subject", ()):
if item and item[0]:
k, v = item[0]
subject_dict[k] = v
result["subject"] = subject_dict
issuer_dict: dict[str, str] = {}
for item in cert.get("issuer", ()):
if item and item[0]:
k, v = item[0]
issuer_dict[k] = v
result["issuer"] = issuer_dict
result["not_before"] = cert.get("notBefore")
result["not_after"] = cert.get("notAfter")
result["serial"] = cert.get("serialNumber")
sans: list[str] = []
for san_entry in cert.get("subjectAltName", []):
if isinstance(san_entry, tuple) and len(san_entry) >= 2:
sans.append(str(san_entry[1]))
result["sans"] = sans
_log("ok", f"Subject: {result['subject']}")
_log("ok", f"Issuer: {issuer_dict.get('organizationName', 'N/A')}")
_log("ok", f"Valid: {result['not_before']} → {result['not_after']}")
if sans:
_log("ok", f"SANs ({len(sans)}):")
for san in sans[:20]:
_log("info", f" {san}")
if len(sans) > 20:
_log("info", f" ... and {len(sans) - 20} more")
result["tls_version"] = ssock.version()
_log("ok", f"TLS Version: {result['tls_version']}")
result["cipher"] = ssock.cipher()
if result["cipher"]:
_log("ok", f"Cipher: {result['cipher'][0]}")
except ssl.SSLError as e:
_log("warn", f"SSL error: {e}")
except (socket.timeout, ConnectionRefusedError, OSError) as e:
_log("warn", f"Cannot connect to port 443: {e}")
if not result.get("sans") and _tool_available("openssl"):
cert_text = _run_cmd(
["bash", "-c", f"echo | openssl s_client -connect {domain}:443 -servername {domain} 2>/dev/null | openssl x509 -noout -ext subjectAltName"],
timeout=10,
)
if cert_text:
sans = re.findall(r"DNS:([^\s,]+)", cert_text)
result["sans"] = sans
_log("ok", f"SANs from openssl ({len(sans)}):")
for san in sans[:20]:
_log("info", f" {san}")
return result
def module_cookies(domain: str) -> dict:
"""Cookie-based technology fingerprinting."""
_subheader("Cookie Analysis")
result: dict[str, Any] = {"cookies": [], "technologies": []}
for scheme in ["https", "http"]:
url = f"{scheme}://{domain}"
resp = _http_get(url, head_only=True)
if not resp:
continue
for hdr_key, hdr_val in resp["headers"].items():
if hdr_key.lower() == "set-cookie":
cookies_raw = hdr_val.split(",")
for cookie in cookies_raw:
cookie_name = cookie.split("=")[0].strip()
result["cookies"].append(cookie.strip()[:100])
for sig, tech in COOKIE_TECH_MAP.items():
if sig.lower() in cookie_name.lower():
result["technologies"].append(tech)
_log("ok", f"Cookie '{cookie_name}' → {tech}")
cookie_lower = cookie.lower()
if "httponly" not in cookie_lower:
_log("warn", f"Cookie '{cookie_name}' missing HttpOnly flag")
if "secure" not in cookie_lower and scheme == "https":
_log("warn", f"Cookie '{cookie_name}' missing Secure flag on HTTPS")
break
result["technologies"] = list(set(result["technologies"]))
if not result["cookies"]:
_log("info", "No cookies set on initial response")
return result
def module_robots_sitemap(domain: str) -> dict:
"""Fetch robots.txt and sitemap.xml."""
_subheader("robots.txt & sitemap.xml")
result: dict[str, Any] = {"disallowed": [], "sitemaps": [], "interesting_paths": []}
resp = _http_get(f"https://{domain}/robots.txt")
if not resp or resp["status"] != 200:
resp = _http_get(f"http://{domain}/robots.txt")
if resp and resp["status"] == 200 and "disallow" in resp["body"].lower():
_log("ok", "robots.txt found:")
for line in resp["body"].splitlines():
line = line.strip()
if line.lower().startswith("disallow:"):
path = line.split(":", 1)[1].strip()
if path and path != "/":
result["disallowed"].append(path)
_log("info", f" Disallow: {path}")
elif line.lower().startswith("sitemap:"):
sm = line.split(":", 1)[1].strip()
if sm.startswith("http"):
sm = line.split(" ", 1)[1].strip()
result["sitemaps"].append(sm)
_log("info", f" Sitemap: {sm}")
interesting = [p for p in result["disallowed"] if any(
kw in p.lower() for kw in ["admin", "api", "config", "backup", "debug", "test", "staging", "dev", "internal", "private", "secret", "upload"]
)]
if interesting:
result["interesting_paths"] = interesting
_log("warn", "Interesting disallowed paths:")
for p in interesting:
_log("warn", f" → {p}")
else:
_log("info", "No robots.txt found or empty")
if not result["sitemaps"]:
for path in ["/sitemap.xml", "/sitemap_index.xml"]:
resp = _http_get(f"https://{domain}{path}")
if resp and resp["status"] == 200 and "<?xml" in resp["body"][:100]:
urls_found = re.findall(r"<loc>(.*?)</loc>", resp["body"])
result["sitemap_urls"] = len(urls_found)
_log("ok", f"sitemap.xml: {len(urls_found)} URLs found")
break
resp = _http_get(f"https://{domain}/.well-known/security.txt")
if resp and resp["status"] == 200 and ("contact:" in resp["body"].lower() or "policy:" in resp["body"].lower()):
result["security_txt"] = True
_log("ok", "security.txt found (security-conscious org)")
else:
result["security_txt"] = False
return result
def module_error_page(domain: str) -> dict:
"""Error page fingerprinting via 404 response."""
_subheader("Error Page Fingerprinting")
result: dict[str, Any] = {}
random_path = f"/nonexistent-page-{int(time.time())}"
resp = _http_get(f"https://{domain}{random_path}")
if not resp:
resp = _http_get(f"http://{domain}{random_path}")
if not resp:
_log("err", "Could not fetch error page")
return result
body = resp["body"]
result["status_code"] = resp["status"]
signatures = [
(r"Apache/[\d.]+ \([\w]+\) Server at", "Apache (default error page)"),
(r"<title>404 Not Found</title>.*nginx", "Nginx (default error page)"),
(r"Microsoft-IIS/[\d.]+", "IIS (default error page)"),
(r"Whitelabel Error Page", "Spring Boot (Java)"),
(r"Traceback \(most recent call last\)", "Python (debug mode ON — sensitive!)"),
(r"Cannot GET /", "Express.js (Node)"),
(r"Not Found.*The requested URL was not found", "Flask (Python)"),
(r"<title>Error</title>.*Django", "Django"),
(r"wp-content|wordpress", "WordPress"),
(r"drupal|Drupal", "Drupal"),
(r"joomla|Joomla", "Joomla"),
(r"<title>404.*Shopify", "Shopify"),
(r"<title>Page not found.*GitHub Pages", "GitHub Pages"),
(r"The page you are looking for doesn.*t exist", "Generic CMS 404"),
(r"laravel|Laravel", "Laravel (PHP)"),
(r"<title>404 - File or directory not found", "IIS detailed error"),
(r"Powered by CakePHP", "CakePHP"),
(r"Ruby on Rails", "Ruby on Rails"),
]
for pattern, tech in signatures:
if re.search(pattern, body, re.IGNORECASE | re.DOTALL):
result["detected_technology"] = tech
_log("ok", f"Error page reveals: {tech}")
if "debug mode" in tech.lower() or "sensitive" in tech.lower():
_log("warn", "⚠ Debug mode detected — potential information disclosure!")
break
if not result.get("detected_technology"):
_log("info", "Error page did not match known signatures (custom or generic)")
if any(kw in body.lower() for kw in ["stack trace", "exception", "traceback", "syntax error", "fatal error"]):
result["info_disclosure"] = True
_log("warn", "Error page may contain stack traces or debug information!")
return result
def module_tech_detect(domain: str) -> dict:
"""Technology detection via whatweb or httpx CLI."""
_subheader("Technology Detection (CLI Tools)")
result: dict[str, Any] = {}
if _tool_available("whatweb"):
raw = _run_cmd(["whatweb", "-a", "1", f"https://{domain}"], timeout=30)
if raw:
result["whatweb"] = raw
_log("ok", "whatweb (aggression 1):")
for line in raw.splitlines():
_log("info", f" {line[:120]}")
if _tool_available("httpx"):
raw = _run_cmd(
["bash", "-c", f"echo '{domain}' | httpx -silent -title -tech-detect -status-code -server -content-length"],
timeout=30,
)
if raw:
result["httpx"] = raw
_log("ok", "httpx:")
for line in raw.splitlines():
_log("info", f" {line[:120]}")
if not result:
_log("info", "No CLI detection tools available (whatweb, httpx)")
return result
def module_favicon(domain: str) -> dict:
"""Favicon hash fingerprinting."""
_subheader("Favicon Hash")
result: dict[str, Any] = {}
resp = _http_get(f"https://{domain}/favicon.ico")
if not resp:
resp = _http_get(f"http://{domain}/favicon.ico")
if not resp or resp["status"] != 200:
_log("info", "No favicon.ico found")
return result
try:
import codecs
import mmh3
favicon_b64 = codecs.encode(resp["body"].encode("latin-1"), "base64")
fav_hash = mmh3.hash(favicon_b64)
result["hash"] = fav_hash
result["shodan_query"] = f"http.favicon.hash:{fav_hash}"
_log("ok", f"Favicon hash: {fav_hash}")
_log("ok", f"Shodan query: http.favicon.hash:{fav_hash}")
except ImportError:
_log("warn", "mmh3 not installed — pip install mmh3 for favicon hashing")
md5 = hashlib.md5(resp["body"].encode("latin-1")).hexdigest()
result["md5"] = md5
_log("info", f"Favicon MD5: {md5}")
return result
# ---------------------------------------------------------------------------
# Report generation
# ---------------------------------------------------------------------------
def generate_summary(domain: str, results: dict) -> str:
"""Generate text summary of all findings."""
lines = []
lines.append(f"\n{'=' * 60}")
lines.append(f" FINGERPRINT SUMMARY: {domain}")
lines.append(f"{'=' * 60}\n")
techs = set()
for section in results.values():
if isinstance(section, dict):
for t in section.get("technologies", []):
techs.add(t)
if results.get("http_headers", {}).get("server"):
techs.add(f"Server: {results['http_headers']['server']}")
if results.get("http_headers", {}).get("x_powered_by"):
techs.add(f"Backend: {results['http_headers']['x_powered_by']}")
if results.get("error_page", {}).get("detected_technology"):
techs.add(results["error_page"]["detected_technology"])
for fp in results.get("dns", {}).get("dns_fingerprints", []):
techs.add(fp)
lines.append("TECHNOLOGIES DETECTED:")
for t in sorted(techs):
lines.append(f" • {t}")
sec = results.get("http_headers", {}).get("security_headers", {})
if sec:
lines.append("\nSECURITY HEADERS:")
for hdr, val in sec.items():
status = "✓" if val != "MISSING" else "✗"
lines.append(f" {status} {hdr}: {val[:60]}")
ct_subs = results.get("crtsh", {}).get("subdomain_count", 0)
passive_subs = results.get("passive_subdomains", {}).get("total_unique", 0)
total = max(ct_subs, passive_subs)
if total:
lines.append(f"\nSUBDOMAINS: {total} unique discovered")
ports = results.get("shodan", {}).get("ports", [])
if ports:
lines.append(f"\nOPEN PORTS (Shodan cache): {', '.join(map(str, ports))}")
vulns = results.get("shodan", {}).get("vulns", [])
if vulns:
lines.append(f"\nKNOWN VULNERABILITIES: {', '.join(vulns[:10])}")
hist = results.get("wayback", {})
if hist.get("total_urls"):
lines.append(f"\nHISTORICAL DATA:")
lines.append(f" Total URLs: {hist['total_urls']}")
lines.append(f" With parameters: {len(hist.get('params', []))}")
lines.append(f" Interesting files: {len(hist.get('interesting_files', []))}")
lines.append(f" API endpoints: {len(hist.get('api_endpoints', []))}")
interesting = results.get("robots_sitemap", {}).get("interesting_paths", [])
if interesting:
lines.append(f"\nINTERESTING DISALLOWED PATHS:")
for p in interesting:
lines.append(f" → {p}")
lines.append(f"\n{'=' * 60}")
lines.append(f" Scan completed: {datetime.now(timezone.utc).isoformat()}")
lines.append(f"{'=' * 60}\n")
return "\n".join(lines)
# ---------------------------------------------------------------------------
# Main orchestrator
# ---------------------------------------------------------------------------
def run_fingerprint(domain: str, tier: int = 1, resolve_ip: bool = False, output_file: Optional[str] = None):
"""Run fingerprinting modules up to specified tier."""
print(BANNER)
_header(f"Target: {domain}")
_log("info", f"Max tier: {tier} | Started: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print()
results: dict[str, Any] = {"target": domain, "scan_time": datetime.now(timezone.utc).isoformat(), "tier": tier}
_header("TIER 0 — Zero-Touch Passive")
_log("tier", "No packets sent to target. Querying third-party sources only.")
results["whois"] = module_whois(domain)
results["dns"] = module_dns(domain)
results["crtsh"] = module_crtsh(domain)
ip = results["dns"].get("resolved_ip")
if resolve_ip or ip:
results["shodan"] = module_shodan(domain, ip=ip)
results["passive_subdomains"] = module_passive_subdomains(domain)
results["wayback"] = module_wayback(domain)
if tier < 1:
summary = generate_summary(domain, results)
print(summary)
if output_file:
_save_results(results, summary, output_file)
return results
_header("TIER 1 — Near-Passive (Looks Like Browsing)")
_log("tier", "Sending minimal requests that look like normal web browsing.")
results["http_headers"] = module_http_headers(domain)
results["ssl_cert"] = module_ssl_cert(domain)
results["cookies"] = module_cookies(domain)
results["robots_sitemap"] = module_robots_sitemap(domain)
results["error_page"] = module_error_page(domain)
results["tech_detect"] = module_tech_detect(domain)
results["favicon"] = module_favicon(domain)
summary = generate_summary(domain, results)
print(summary)
if output_file:
_save_results(results, summary, output_file)
return results
def _save_results(results: dict, summary: str, output_file: str):
"""Save results to JSON and text summary."""
json_file = output_file if output_file.endswith(".json") else output_file + ".json"
def _serialize(obj):
if isinstance(obj, set):
return sorted(obj)
if isinstance(obj, bytes):
return obj.decode("utf-8", errors="replace")
raise TypeError(f"Object of type {type(obj)} is not JSON serializable")
with open(json_file, "w") as f:
json.dump(results, f, indent=2, default=_serialize)
_log("ok", f"JSON results saved to: {json_file}")
txt_file = json_file.replace(".json", "_summary.txt")
with open(txt_file, "w") as f:
f.write(summary)
_log("ok", f"Text summary saved to: {txt_file}")
def main():
parser = argparse.ArgumentParser(
description="Passive-first web fingerprinting tool",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=textwrap.dedent("""\
Tiers:
0 Zero-touch passive (third-party databases only)
1 Near-passive (looks like normal browsing) [default]
Examples:
%(prog)s example.com
%(prog)s example.com --tier 0
%(prog)s example.com -o results.json
%(prog)s example.com --resolve-ip --all
"""),
)
parser.add_argument("domain", help="Target domain (e.g., example.com)")
parser.add_argument("--tier", type=int, default=1, choices=[0, 1], help="Maximum tier to execute (default: 1)")
parser.add_argument("--all", action="store_true", help="Run all tiers (same as --tier 1)")
parser.add_argument("-o", "--output", help="Output file path (JSON)")
parser.add_argument("--resolve-ip", action="store_true", help="Resolve domain IP and query Shodan")
parser.add_argument("--no-color", action="store_true", help="Disable colored output")
args = parser.parse_args()
if args.no_color:
Colors.disable()
domain = args.domain.strip().lower()
domain = re.sub(r"^https?://", "", domain)
domain = domain.split("/")[0]
tier = 1 if args.all else args.tier
try:
run_fingerprint(domain=domain, tier=tier, resolve_ip=args.resolve_ip, output_file=args.output)
except KeyboardInterrupt:
print(f"\n{Colors.YELLOW}[!] Interrupted by user{Colors.END}")
sys.exit(1)
if __name__ == "__main__":
main()
Last Updated: 2026-02-11
Companion Guides: Web_URL_Fingerprinting.md (methodology), Stay_Low.md (EDR evasion), Stay_low_Command.md (command OPSEC)