conducting-external-reconnaissance-with-osint
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseConducting External Reconnaissance with OSINT
使用OSINT进行外部侦察
When to Use
使用场景
- Performing the initial reconnaissance phase of a penetration test to gather intelligence before active scanning
- Mapping an organization's external attack surface to identify unknown or shadow IT assets
- Collecting employee information, email formats, and organizational structure for social engineering campaigns
- Identifying exposed credentials, leaked data, or sensitive documents published on the internet
- Scoping the breadth of an organization's digital footprint prior to a red team engagement
Do not use for stalking, harassment, or unauthorized surveillance of individuals. OSINT gathering must be conducted within the scope of an authorized engagement and comply with applicable privacy laws (GDPR, CCPA).
- 在渗透测试的初始侦察阶段收集情报,为主动扫描做准备
- 绘制组织的外部攻击面,识别未知或影子IT资产
- 收集员工信息、邮箱格式和组织结构,为社会工程攻击做准备
- 识别互联网上暴露的凭据、泄露的数据或敏感文档
- 在红队参与前确定组织数字足迹的范围
请勿用于跟踪、骚扰或未经授权的个人监视。OSINT收集必须在授权参与范围内进行,并遵守适用的隐私法律(GDPR、CCPA)。
Prerequisites
先决条件
- Written authorization to perform reconnaissance against the target organization
- Dedicated research workstation with a VPN or Tor for anonymized queries when required
- OSINT framework tools installed: Amass, theHarvester, Shodan CLI, Recon-ng, SpiderFoot
- API keys for Shodan, Censys, SecurityTrails, Hunter.io, VirusTotal, and GitHub for enhanced results
- Disposable email accounts for accessing services that require registration during research
- 针对目标组织执行侦察的书面授权
- 配备VPN或Tor的专用研究工作站,必要时用于匿名查询
- 已安装OSINT框架工具:Amass、theHarvester、Shodan CLI、Recon-ng、SpiderFoot
- Shodan、Censys、SecurityTrails、Hunter.io、VirusTotal和GitHub的API密钥,以获取更全面的结果
- 一次性邮箱账户,用于访问研究期间需要注册的服务
Workflow
工作流程
Step 1: Domain and DNS Enumeration
步骤1:域名与DNS枚举
Enumerate all domains, subdomains, and DNS records associated with the target:
- Root domain identification: Start with the primary domain and identify all related domains through reverse WHOIS lookups on registrant name, email, and organization using or
whoxy.comdomaintools.com - Subdomain enumeration: Run multiple tools for comprehensive coverage:
- for passive subdomain discovery from 40+ data sources
amass enum -passive -d target.com -o amass_subs.txt - for fast passive enumeration
subfinder -d target.com -all -o subfinder_subs.txt - certificate transparency log queries:
crt.shcurl -s "https://crt.sh/?q=%25.target.com&output=json" | jq -r '.[].name_value' | sort -u
- DNS record analysis: Query for all record types: , check for SPF, DKIM, DMARC records that reveal email infrastructure, and enumerate MX records to identify email providers
dig target.com ANY - Zone transfer attempt: to check for misconfigured DNS servers
dig axfr @ns1.target.com target.com - Consolidate results: Merge, deduplicate, and resolve all discovered subdomains to IP addresses. Map IP addresses to ASN and hosting providers.
枚举与目标相关的所有域名、子域名和DNS记录:
- 根域名识别:从主域名开始,通过对注册人姓名、邮箱和组织进行反向WHOIS查询(使用或
whoxy.com),识别所有相关域名domaintools.com - 子域名枚举:运行多种工具以确保全面覆盖:
- :从40+数据源进行被动子域名发现
amass enum -passive -d target.com -o amass_subs.txt - :快速被动枚举
subfinder -d target.com -all -o subfinder_subs.txt - crt.sh证书透明度日志查询:
curl -s "https://crt.sh/?q=%25.target.com&output=json" | jq -r '.[].name_value' | sort -u
- DNS记录分析:查询所有记录类型:,检查SPF、DKIM、DMARC记录以揭示邮件基础设施,枚举MX记录以识别邮件提供商
dig target.com ANY - 区域传输尝试:,检查配置错误的DNS服务器
dig axfr @ns1.target.com target.com - 结果整合:合并、去重并解析所有发现的子域名到IP地址,将IP地址映射到ASN和托管提供商
Step 2: Infrastructure and Service Discovery
步骤2:基础设施与服务发现
Identify internet-facing infrastructure without directly scanning target systems:
- Shodan: to find all internet-facing services with TLS certificates for the target domain. Also search by organization name and IP ranges.
shodan search "ssl.cert.subject.cn:target.com" - Censys: Search for target's IP ranges and TLS certificates to identify services, technologies, and potential vulnerabilities indexed from internet-wide scanning
- Cloud asset discovery: Check for S3 buckets (,
target-com,target-backup), Azure Blob storage (target-dev), and GCP storage using tools liketarget.blob.core.windows.netcloud_enum - WAF and CDN identification: Use to identify web application firewalls and CDN providers that may mask the origin server IP
wafw00f target.com - Historical data: Use Wayback Machine () to find removed pages, old application versions, and forgotten endpoints
web.archive.org
在不直接扫描目标系统的情况下识别面向互联网的基础设施:
- Shodan:,查找所有带有目标域名TLS证书的面向互联网服务。也可按组织名称和IP范围搜索
shodan search "ssl.cert.subject.cn:target.com" - Censys:搜索目标的IP范围和TLS证书,识别来自全网扫描的服务、技术和潜在漏洞
- 云资产发现:使用等工具检查S3存储桶(如
cloud_enum、target-com、target-backup)、Azure Blob存储(target-dev)和GCP存储target.blob.core.windows.net - WAF与CDN识别:使用识别可能掩盖源服务器IP的Web应用防火墙和CDN提供商
wafw00f target.com - 历史数据:使用Wayback Machine()查找已删除页面、旧应用版本和被遗忘的端点
web.archive.org
Step 3: Email and Personnel Intelligence
步骤3:邮箱与人员情报
Gather employee information and email addresses for social engineering preparation:
- Email harvesting: to collect emails from search engines, LinkedIn, and data sources
theHarvester -d target.com -b all -f harvest_results.html - Email format identification: Use to determine the email format (first.last, flast, firstl) and verify deliverability
hunter.io - LinkedIn reconnaissance: Identify employees by department, particularly IT administrators, security team members, and executives. Note technologies mentioned in job postings and employee profiles.
- Organizational chart: Build an org chart from LinkedIn data to understand reporting structures, identify key personnel, and map departments
- Social media analysis: Review employee social media profiles for information about internal tools, technologies, office locations, badge photos, and security practices
- Job postings: Analyze current and historical job postings on the company career page and job boards for technology stack details, tools, and infrastructure information
收集员工信息和邮箱地址,为社会工程攻击做准备:
- 邮箱收集:,从搜索引擎、LinkedIn和数据源收集邮箱
theHarvester -d target.com -b all -f harvest_results.html - 邮箱格式识别:使用确定邮箱格式(如first.last、flast、firstl)并验证可送达性
hunter.io - LinkedIn侦察:按部门识别员工,尤其是IT管理员、安全团队成员和高管。记录职位发布和员工资料中提到的技术
- 组织结构图:从LinkedIn数据构建组织结构图,了解汇报关系、识别关键人员并映射部门
- 社交媒体分析:查看员工社交媒体资料,获取内部工具、技术、办公地点、工牌照片和安全实践的信息
- 职位发布:分析公司职业页面和招聘网站上的当前及历史职位发布,获取技术栈、工具和基础设施信息
Step 4: Credential and Data Leak Analysis
步骤4:凭据与数据泄露分析
Search for exposed credentials and sensitive data:
- Breach databases: Check API for breached email addresses associated with the target domain
haveibeenpwned.com - Paste sites: Search Pastebin, GitHub Gists, and similar paste sites for leaked credentials, configuration files, or internal documents
- Code repositories: Search GitHub, GitLab, and Bitbucket for:
- ,
org:target "password",org:target "api_key"org:target "secret" - Use or
trufflehogfor automated secret scanning across the target's public repositoriesgitleaks
- Document metadata: Download publicly available documents (PDF, DOCX, XLSX) from the target website and extract metadata using to reveal internal usernames, software versions, printer names, and file paths
exiftool - Google dorking: Use targeted search operators:
- for public documents
site:target.com filetype:pdf - for admin panels
site:target.com inurl:admin - for directory listings
site:target.com "index of /" - for paste site mentions
site:pastebin.com "target.com"
搜索暴露的凭据和敏感数据:
- 泄露数据库:通过API检查与目标域名关联的泄露邮箱地址
haveibeenpwned.com - 粘贴站点:搜索Pastebin、GitHub Gists等粘贴站点,查找泄露的凭据、配置文件或内部文档
- 代码仓库:搜索GitHub、GitLab和Bitbucket:
- 、
org:target "password"、org:target "api_key"org:target "secret" - 使用或
trufflehog自动扫描目标公共仓库中的密钥gitleaks
- 文档元数据:从目标网站下载公开可用的文档(PDF、DOCX、XLSX),使用提取元数据,以揭示内部用户名、软件版本、打印机名称和文件路径
exiftool - Google Dorking:使用定向搜索操作符:
- 查找公共文档
site:target.com filetype:pdf - 查找管理面板
site:target.com inurl:admin - 查找目录列表
site:target.com "index of /" - 查找粘贴站点提及内容
site:pastebin.com "target.com"
Step 5: Technology Stack Profiling
步骤5:技术栈分析
Identify the technologies, frameworks, and services used by the target:
- Web technology fingerprinting: Use or Wappalyzer browser extension to identify CMS, frameworks, JavaScript libraries, analytics, and server software
whatweb target.com - SSL/TLS analysis: or
sslyze target.comto identify cipher suites, protocol versions, certificate details, and cryptographic weaknessestestssl.sh target.com - JavaScript analysis: Download and review JavaScript files for framework identifiers, API endpoints, internal hostnames, and version strings
- DNS-based service identification: Review TXT records for service providers (e.g., indicates Google Workspace,
v=spf1 include:_spf.google.comindicates Microsoft 365)MS=msXXXXXX - Mobile app analysis: Download the target's mobile applications from app stores and analyze with (Android) or
apktoolfor hardcoded URLs, API endpoints, and embedded credentialsfrida
识别目标使用的技术、框架和服务:
- Web技术指纹识别:使用或Wappalyzer浏览器扩展识别CMS、框架、JavaScript库、分析工具和服务器软件
whatweb target.com - SSL/TLS分析:或
sslyze target.com识别密码套件、协议版本、证书详细信息和加密弱点testssl.sh target.com - JavaScript分析:下载并查看JavaScript文件,查找框架标识符、API端点、内部主机名和版本字符串
- 基于DNS的服务识别:查看TXT记录以识别服务提供商(例如表示使用Google Workspace,
v=spf1 include:_spf.google.com表示使用Microsoft 365)MS=msXXXXXX - 移动应用分析:从应用商店下载目标的移动应用,使用(Android)或
apktool分析硬编码URL、API端点和嵌入的凭据frida
Key Concepts
核心概念
| Term | Definition |
|---|---|
| OSINT | Open Source Intelligence; intelligence collected from publicly available sources including websites, social media, public records, and government data |
| Passive Reconnaissance | Information gathering without directly interacting with target systems, leaving no footprint in target logs |
| Active Reconnaissance | Information gathering that involves direct interaction with target systems (scanning, probing) and may be logged |
| Certificate Transparency | Public logs of TLS certificates issued by certificate authorities, queryable to discover subdomains and infrastructure |
| Attack Surface | The sum of all points where an unauthorized user can attempt to enter or extract data from an environment |
| Google Dorking | Using advanced Google search operators to find sensitive information indexed by search engines that was not intended to be public |
| Shadow IT | Technology systems and services deployed by employees or departments without the knowledge or approval of the IT department |
| 术语 | 定义 |
|---|---|
| OSINT | 开源情报;从网站、社交媒体、公共记录和政府数据等公开来源收集的情报 |
| 被动侦察 | 不直接与目标系统交互的信息收集方式,不会在目标日志中留下痕迹 |
| 主动侦察 | 涉及直接与目标系统交互(扫描、探测)的信息收集方式,可能会被记录 |
| 证书透明度 | 证书颁发机构发布的TLS证书公共日志,可查询以发现子域名和基础设施 |
| 攻击面 | 未授权用户可尝试进入环境或从中提取数据的所有点的总和 |
| Google Dorking | 使用高级Google搜索操作符查找搜索引擎索引的、非公开意图的敏感信息 |
| 影子IT | 员工或部门在IT部门不知情或未批准的情况下部署的技术系统和服务 |
Tools & Systems
工具与系统
- Amass (OWASP): Comprehensive subdomain enumeration tool that combines passive sources, DNS brute-forcing, and certificate transparency log analysis
- Shodan: Internet-wide scanning database that indexes services, banners, and metadata for internet-connected devices, searchable by IP, domain, or organization
- theHarvester: OSINT tool for gathering emails, subdomains, hosts, employee names, and open ports from public sources
- SpiderFoot: Automated OSINT collection platform that queries 200+ data sources and correlates findings into a unified graph
- Recon-ng: Modular web reconnaissance framework with a database backend for organizing and cross-referencing discovered intelligence
- Amass (OWASP):全面的子域名枚举工具,结合被动数据源、DNS暴力破解和证书透明度日志分析
- Shodan:全网扫描数据库,索引联网设备的服务、横幅和元数据,可按IP、域名或组织搜索
- theHarvester:OSINT工具,从公开来源收集邮箱、子域名、主机、员工姓名和开放端口
- SpiderFoot:自动化OSINT收集平台,查询200+数据源并将结果关联到统一图谱中
- Recon-ng:模块化Web侦察框架,带有数据库后端,用于组织和交叉引用发现的情报
Common Scenarios
常见场景
Scenario: Pre-Engagement Reconnaissance for a Red Team Exercise
场景:红队演练前的预参与侦察
Context: A technology company has contracted a red team assessment. Before active testing begins, the team conducts passive OSINT to map the attack surface and identify potential entry points. The target is a SaaS company with 500 employees and a primary domain of techcorp.io.
Approach:
- Enumerate 147 subdomains via Amass and crt.sh, including staging.techcorp.io, jenkins.techcorp.io, and vpn.techcorp.io
- Shodan reveals a forgotten Elasticsearch instance on port 9200 with no authentication exposed to the internet
- theHarvester collects 89 employee email addresses, revealing the format first.last@techcorp.io
- GitHub search discovers a former developer's public repository containing a file with AWS access keys
.env - LinkedIn analysis reveals the company uses Okta for SSO, Jira for project management, and AWS for hosting
- Google dorking finds a directory listing on docs.techcorp.io exposing internal architecture diagrams
- Compile all intelligence into a reconnaissance report that feeds directly into the threat modeling and attack planning phases
Pitfalls:
- Relying on a single subdomain enumeration tool and missing assets found by other tools using different data sources
- Failing to check cloud storage services (S3, Azure Blob, GCP) for publicly accessible buckets
- Not searching for credentials in public code repositories, which frequently yield immediate access
- Conducting active scanning (port scans, vulnerability scans) during what should be a passive-only phase
背景:一家科技公司委托进行红队评估。在主动测试开始前,团队进行被动OSINT以绘制攻击面并识别潜在入口点。目标是拥有500名员工、主域为techcorp.io的SaaS公司。
方法:
- 通过Amass和crt.sh枚举147个子域名,包括staging.techcorp.io、jenkins.techcorp.io和vpn.techcorp.io
- Shodan发现一个暴露在互联网上的、无身份验证的废弃Elasticsearch实例,端口为9200
- theHarvester收集到89个员工邮箱地址,揭示格式为first.last@techcorp.io
- GitHub搜索发现前开发者的公共仓库中包含一个带有AWS访问密钥的文件
.env - LinkedIn分析显示公司使用Okta进行SSO、Jira进行项目管理、AWS进行托管
- Google Dorking在docs.techcorp.io上发现一个目录列表,暴露了内部架构图
- 将所有情报整理成侦察报告,直接用于威胁建模和攻击规划阶段
常见误区:
- 依赖单一子域名枚举工具,错过其他工具从不同数据源发现的资产
- 未检查云存储服务(S3、Azure Blob、GCP)是否存在公开可访问的存储桶
- 未在公共代码仓库中搜索凭据,而这些凭据通常能直接获取访问权限
- 在应仅进行被动侦察的阶段进行主动扫描(端口扫描、漏洞扫描)
Output Format
输出格式
undefinedundefinedExternal Reconnaissance Report - TechCorp.io
外部侦察报告 - TechCorp.io
Attack Surface Summary
攻击面摘要
- Domains discovered: 3 (techcorp.io, techcorp.com, techcorpapp.com)
- Subdomains enumerated: 147 unique subdomains across all domains
- Unique IP addresses: 34 IPs mapped across AWS us-east-1 and us-west-2
- Email addresses collected: 89 valid corporate email addresses
- Exposed services: 12 internet-facing services identified via Shodan/Censys
- 发现域名: 3个(techcorp.io, techcorp.com, techcorpapp.com)
- 枚举子域名: 所有域名下共147个唯一子域名
- 唯一IP地址: 34个IP,分布在AWS us-east-1和us-west-2区域
- 收集邮箱地址: 89个有效企业邮箱地址
- 暴露服务: 通过Shodan/Censys识别出12个面向互联网的服务
Critical Findings
关键发现
1. Unauthenticated Elasticsearch Instance
- Host: 52.xx.xx.xx:9200 (elastic.techcorp.io)
- Indexed data: Application logs containing user session tokens and PII
- Source: Shodan search "ssl.cert.subject.cn:techcorp.io"
2. AWS Credentials in Public GitHub Repository
- Repository: github.com/former-dev/techcorp-scripts
- File: .env containing AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
- Status: Keys appear active (not tested - out of scope for passive recon)
3. Directory Listing Exposing Internal Documents
- URL: https://docs.techcorp.io/internal/
- Contents: Architecture diagrams, network topology, runbooks
- Source: Google dork "site:techcorp.io intitle:index.of"
1. 无身份验证的Elasticsearch实例
- 主机: 52.xx.xx.xx:9200 (elastic.techcorp.io)
- 索引数据: 包含用户会话令牌和PII的应用日志
- 来源: Shodan搜索 "ssl.cert.subject.cn:techcorp.io"
2. 公共GitHub仓库中的AWS凭据
- 仓库: github.com/former-dev/techcorp-scripts
- 文件: 包含AWS_ACCESS_KEY_ID和AWS_SECRET_ACCESS_KEY的.env文件
- 状态: 密钥似乎处于活跃状态(未测试 - 超出被动侦察范围)
3. 暴露内部文档的目录列表
- URL: https://docs.techcorp.io/internal/
- 内容: 架构图、网络拓扑、运行手册
- 来源: Google Dork "site:techcorp.io intitle:index.of"
Recommendations
建议
- Immediately rotate the exposed AWS credentials and audit CloudTrail logs
- Restrict Elasticsearch access to internal networks or add authentication
- Disable directory listings on docs.techcorp.io and audit all web servers
- Implement GitHub secret scanning across all organization repositories
undefined- 立即轮换暴露的AWS凭据并审计CloudTrail日志
- 限制Elasticsearch访问至内部网络或添加身份验证
- 禁用docs.techcorp.io上的目录列表并审计所有Web服务器
- 在所有组织仓库中实施GitHub密钥扫描
undefined