conducting-external-reconnaissance-with-osint

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Conducting External Reconnaissance with OSINT

使用OSINT进行外部侦察

When to Use

使用场景

  • Performing the initial reconnaissance phase of a penetration test to gather intelligence before active scanning
  • Mapping an organization's external attack surface to identify unknown or shadow IT assets
  • Collecting employee information, email formats, and organizational structure for social engineering campaigns
  • Identifying exposed credentials, leaked data, or sensitive documents published on the internet
  • Scoping the breadth of an organization's digital footprint prior to a red team engagement
Do not use for stalking, harassment, or unauthorized surveillance of individuals. OSINT gathering must be conducted within the scope of an authorized engagement and comply with applicable privacy laws (GDPR, CCPA).
  • 在渗透测试的初始侦察阶段收集情报,为主动扫描做准备
  • 绘制组织的外部攻击面,识别未知或影子IT资产
  • 收集员工信息、邮箱格式和组织结构,为社会工程攻击做准备
  • 识别互联网上暴露的凭据、泄露的数据或敏感文档
  • 在红队参与前确定组织数字足迹的范围
请勿用于跟踪、骚扰或未经授权的个人监视。OSINT收集必须在授权参与范围内进行,并遵守适用的隐私法律(GDPR、CCPA)。

Prerequisites

先决条件

  • Written authorization to perform reconnaissance against the target organization
  • Dedicated research workstation with a VPN or Tor for anonymized queries when required
  • OSINT framework tools installed: Amass, theHarvester, Shodan CLI, Recon-ng, SpiderFoot
  • API keys for Shodan, Censys, SecurityTrails, Hunter.io, VirusTotal, and GitHub for enhanced results
  • Disposable email accounts for accessing services that require registration during research
  • 针对目标组织执行侦察的书面授权
  • 配备VPN或Tor的专用研究工作站,必要时用于匿名查询
  • 已安装OSINT框架工具:Amass、theHarvester、Shodan CLI、Recon-ng、SpiderFoot
  • Shodan、Censys、SecurityTrails、Hunter.io、VirusTotal和GitHub的API密钥,以获取更全面的结果
  • 一次性邮箱账户,用于访问研究期间需要注册的服务

Workflow

工作流程

Step 1: Domain and DNS Enumeration

步骤1:域名与DNS枚举

Enumerate all domains, subdomains, and DNS records associated with the target:
  • Root domain identification: Start with the primary domain and identify all related domains through reverse WHOIS lookups on registrant name, email, and organization using
    whoxy.com
    or
    domaintools.com
  • Subdomain enumeration: Run multiple tools for comprehensive coverage:
    • amass enum -passive -d target.com -o amass_subs.txt
      for passive subdomain discovery from 40+ data sources
    • subfinder -d target.com -all -o subfinder_subs.txt
      for fast passive enumeration
    • crt.sh
      certificate transparency log queries:
      curl -s "https://crt.sh/?q=%25.target.com&output=json" | jq -r '.[].name_value' | sort -u
  • DNS record analysis: Query for all record types:
    dig target.com ANY
    , check for SPF, DKIM, DMARC records that reveal email infrastructure, and enumerate MX records to identify email providers
  • Zone transfer attempt:
    dig axfr @ns1.target.com target.com
    to check for misconfigured DNS servers
  • Consolidate results: Merge, deduplicate, and resolve all discovered subdomains to IP addresses. Map IP addresses to ASN and hosting providers.
枚举与目标相关的所有域名、子域名和DNS记录:
  • 根域名识别:从主域名开始,通过对注册人姓名、邮箱和组织进行反向WHOIS查询(使用
    whoxy.com
    domaintools.com
    ),识别所有相关域名
  • 子域名枚举:运行多种工具以确保全面覆盖:
    • amass enum -passive -d target.com -o amass_subs.txt
      :从40+数据源进行被动子域名发现
    • subfinder -d target.com -all -o subfinder_subs.txt
      :快速被动枚举
    • crt.sh证书透明度日志查询:
      curl -s "https://crt.sh/?q=%25.target.com&output=json" | jq -r '.[].name_value' | sort -u
  • DNS记录分析:查询所有记录类型:
    dig target.com ANY
    ,检查SPF、DKIM、DMARC记录以揭示邮件基础设施,枚举MX记录以识别邮件提供商
  • 区域传输尝试
    dig axfr @ns1.target.com target.com
    ,检查配置错误的DNS服务器
  • 结果整合:合并、去重并解析所有发现的子域名到IP地址,将IP地址映射到ASN和托管提供商

Step 2: Infrastructure and Service Discovery

步骤2:基础设施与服务发现

Identify internet-facing infrastructure without directly scanning target systems:
  • Shodan:
    shodan search "ssl.cert.subject.cn:target.com"
    to find all internet-facing services with TLS certificates for the target domain. Also search by organization name and IP ranges.
  • Censys: Search for target's IP ranges and TLS certificates to identify services, technologies, and potential vulnerabilities indexed from internet-wide scanning
  • Cloud asset discovery: Check for S3 buckets (
    target-com
    ,
    target-backup
    ,
    target-dev
    ), Azure Blob storage (
    target.blob.core.windows.net
    ), and GCP storage using tools like
    cloud_enum
  • WAF and CDN identification: Use
    wafw00f target.com
    to identify web application firewalls and CDN providers that may mask the origin server IP
  • Historical data: Use Wayback Machine (
    web.archive.org
    ) to find removed pages, old application versions, and forgotten endpoints
在不直接扫描目标系统的情况下识别面向互联网的基础设施:
  • Shodan
    shodan search "ssl.cert.subject.cn:target.com"
    ,查找所有带有目标域名TLS证书的面向互联网服务。也可按组织名称和IP范围搜索
  • Censys:搜索目标的IP范围和TLS证书,识别来自全网扫描的服务、技术和潜在漏洞
  • 云资产发现:使用
    cloud_enum
    等工具检查S3存储桶(如
    target-com
    target-backup
    target-dev
    )、Azure Blob存储(
    target.blob.core.windows.net
    )和GCP存储
  • WAF与CDN识别:使用
    wafw00f target.com
    识别可能掩盖源服务器IP的Web应用防火墙和CDN提供商
  • 历史数据:使用Wayback Machine(
    web.archive.org
    )查找已删除页面、旧应用版本和被遗忘的端点

Step 3: Email and Personnel Intelligence

步骤3:邮箱与人员情报

Gather employee information and email addresses for social engineering preparation:
  • Email harvesting:
    theHarvester -d target.com -b all -f harvest_results.html
    to collect emails from search engines, LinkedIn, and data sources
  • Email format identification: Use
    hunter.io
    to determine the email format (first.last, flast, firstl) and verify deliverability
  • LinkedIn reconnaissance: Identify employees by department, particularly IT administrators, security team members, and executives. Note technologies mentioned in job postings and employee profiles.
  • Organizational chart: Build an org chart from LinkedIn data to understand reporting structures, identify key personnel, and map departments
  • Social media analysis: Review employee social media profiles for information about internal tools, technologies, office locations, badge photos, and security practices
  • Job postings: Analyze current and historical job postings on the company career page and job boards for technology stack details, tools, and infrastructure information
收集员工信息和邮箱地址,为社会工程攻击做准备:
  • 邮箱收集
    theHarvester -d target.com -b all -f harvest_results.html
    ,从搜索引擎、LinkedIn和数据源收集邮箱
  • 邮箱格式识别:使用
    hunter.io
    确定邮箱格式(如first.last、flast、firstl)并验证可送达性
  • LinkedIn侦察:按部门识别员工,尤其是IT管理员、安全团队成员和高管。记录职位发布和员工资料中提到的技术
  • 组织结构图:从LinkedIn数据构建组织结构图,了解汇报关系、识别关键人员并映射部门
  • 社交媒体分析:查看员工社交媒体资料,获取内部工具、技术、办公地点、工牌照片和安全实践的信息
  • 职位发布:分析公司职业页面和招聘网站上的当前及历史职位发布,获取技术栈、工具和基础设施信息

Step 4: Credential and Data Leak Analysis

步骤4:凭据与数据泄露分析

Search for exposed credentials and sensitive data:
  • Breach databases: Check
    haveibeenpwned.com
    API for breached email addresses associated with the target domain
  • Paste sites: Search Pastebin, GitHub Gists, and similar paste sites for leaked credentials, configuration files, or internal documents
  • Code repositories: Search GitHub, GitLab, and Bitbucket for:
    • org:target "password"
      ,
      org:target "api_key"
      ,
      org:target "secret"
    • Use
      trufflehog
      or
      gitleaks
      for automated secret scanning across the target's public repositories
  • Document metadata: Download publicly available documents (PDF, DOCX, XLSX) from the target website and extract metadata using
    exiftool
    to reveal internal usernames, software versions, printer names, and file paths
  • Google dorking: Use targeted search operators:
    • site:target.com filetype:pdf
      for public documents
    • site:target.com inurl:admin
      for admin panels
    • site:target.com "index of /"
      for directory listings
    • site:pastebin.com "target.com"
      for paste site mentions
搜索暴露的凭据和敏感数据:
  • 泄露数据库:通过
    haveibeenpwned.com
    API检查与目标域名关联的泄露邮箱地址
  • 粘贴站点:搜索Pastebin、GitHub Gists等粘贴站点,查找泄露的凭据、配置文件或内部文档
  • 代码仓库:搜索GitHub、GitLab和Bitbucket:
    • org:target "password"
      org:target "api_key"
      org:target "secret"
    • 使用
      trufflehog
      gitleaks
      自动扫描目标公共仓库中的密钥
  • 文档元数据:从目标网站下载公开可用的文档(PDF、DOCX、XLSX),使用
    exiftool
    提取元数据,以揭示内部用户名、软件版本、打印机名称和文件路径
  • Google Dorking:使用定向搜索操作符:
    • site:target.com filetype:pdf
      查找公共文档
    • site:target.com inurl:admin
      查找管理面板
    • site:target.com "index of /"
      查找目录列表
    • site:pastebin.com "target.com"
      查找粘贴站点提及内容

Step 5: Technology Stack Profiling

步骤5:技术栈分析

Identify the technologies, frameworks, and services used by the target:
  • Web technology fingerprinting: Use
    whatweb target.com
    or Wappalyzer browser extension to identify CMS, frameworks, JavaScript libraries, analytics, and server software
  • SSL/TLS analysis:
    sslyze target.com
    or
    testssl.sh target.com
    to identify cipher suites, protocol versions, certificate details, and cryptographic weaknesses
  • JavaScript analysis: Download and review JavaScript files for framework identifiers, API endpoints, internal hostnames, and version strings
  • DNS-based service identification: Review TXT records for service providers (e.g.,
    v=spf1 include:_spf.google.com
    indicates Google Workspace,
    MS=msXXXXXX
    indicates Microsoft 365)
  • Mobile app analysis: Download the target's mobile applications from app stores and analyze with
    apktool
    (Android) or
    frida
    for hardcoded URLs, API endpoints, and embedded credentials
识别目标使用的技术、框架和服务:
  • Web技术指纹识别:使用
    whatweb target.com
    或Wappalyzer浏览器扩展识别CMS、框架、JavaScript库、分析工具和服务器软件
  • SSL/TLS分析
    sslyze target.com
    testssl.sh target.com
    识别密码套件、协议版本、证书详细信息和加密弱点
  • JavaScript分析:下载并查看JavaScript文件,查找框架标识符、API端点、内部主机名和版本字符串
  • 基于DNS的服务识别:查看TXT记录以识别服务提供商(例如
    v=spf1 include:_spf.google.com
    表示使用Google Workspace,
    MS=msXXXXXX
    表示使用Microsoft 365)
  • 移动应用分析:从应用商店下载目标的移动应用,使用
    apktool
    (Android)或
    frida
    分析硬编码URL、API端点和嵌入的凭据

Key Concepts

核心概念

TermDefinition
OSINTOpen Source Intelligence; intelligence collected from publicly available sources including websites, social media, public records, and government data
Passive ReconnaissanceInformation gathering without directly interacting with target systems, leaving no footprint in target logs
Active ReconnaissanceInformation gathering that involves direct interaction with target systems (scanning, probing) and may be logged
Certificate TransparencyPublic logs of TLS certificates issued by certificate authorities, queryable to discover subdomains and infrastructure
Attack SurfaceThe sum of all points where an unauthorized user can attempt to enter or extract data from an environment
Google DorkingUsing advanced Google search operators to find sensitive information indexed by search engines that was not intended to be public
Shadow ITTechnology systems and services deployed by employees or departments without the knowledge or approval of the IT department
术语定义
OSINT开源情报;从网站、社交媒体、公共记录和政府数据等公开来源收集的情报
被动侦察不直接与目标系统交互的信息收集方式,不会在目标日志中留下痕迹
主动侦察涉及直接与目标系统交互(扫描、探测)的信息收集方式,可能会被记录
证书透明度证书颁发机构发布的TLS证书公共日志,可查询以发现子域名和基础设施
攻击面未授权用户可尝试进入环境或从中提取数据的所有点的总和
Google Dorking使用高级Google搜索操作符查找搜索引擎索引的、非公开意图的敏感信息
影子IT员工或部门在IT部门不知情或未批准的情况下部署的技术系统和服务

Tools & Systems

工具与系统

  • Amass (OWASP): Comprehensive subdomain enumeration tool that combines passive sources, DNS brute-forcing, and certificate transparency log analysis
  • Shodan: Internet-wide scanning database that indexes services, banners, and metadata for internet-connected devices, searchable by IP, domain, or organization
  • theHarvester: OSINT tool for gathering emails, subdomains, hosts, employee names, and open ports from public sources
  • SpiderFoot: Automated OSINT collection platform that queries 200+ data sources and correlates findings into a unified graph
  • Recon-ng: Modular web reconnaissance framework with a database backend for organizing and cross-referencing discovered intelligence
  • Amass (OWASP):全面的子域名枚举工具,结合被动数据源、DNS暴力破解和证书透明度日志分析
  • Shodan:全网扫描数据库,索引联网设备的服务、横幅和元数据,可按IP、域名或组织搜索
  • theHarvester:OSINT工具,从公开来源收集邮箱、子域名、主机、员工姓名和开放端口
  • SpiderFoot:自动化OSINT收集平台,查询200+数据源并将结果关联到统一图谱中
  • Recon-ng:模块化Web侦察框架,带有数据库后端,用于组织和交叉引用发现的情报

Common Scenarios

常见场景

Scenario: Pre-Engagement Reconnaissance for a Red Team Exercise

场景:红队演练前的预参与侦察

Context: A technology company has contracted a red team assessment. Before active testing begins, the team conducts passive OSINT to map the attack surface and identify potential entry points. The target is a SaaS company with 500 employees and a primary domain of techcorp.io.
Approach:
  1. Enumerate 147 subdomains via Amass and crt.sh, including staging.techcorp.io, jenkins.techcorp.io, and vpn.techcorp.io
  2. Shodan reveals a forgotten Elasticsearch instance on port 9200 with no authentication exposed to the internet
  3. theHarvester collects 89 employee email addresses, revealing the format first.last@techcorp.io
  4. GitHub search discovers a former developer's public repository containing a
    .env
    file with AWS access keys
  5. LinkedIn analysis reveals the company uses Okta for SSO, Jira for project management, and AWS for hosting
  6. Google dorking finds a directory listing on docs.techcorp.io exposing internal architecture diagrams
  7. Compile all intelligence into a reconnaissance report that feeds directly into the threat modeling and attack planning phases
Pitfalls:
  • Relying on a single subdomain enumeration tool and missing assets found by other tools using different data sources
  • Failing to check cloud storage services (S3, Azure Blob, GCP) for publicly accessible buckets
  • Not searching for credentials in public code repositories, which frequently yield immediate access
  • Conducting active scanning (port scans, vulnerability scans) during what should be a passive-only phase
背景:一家科技公司委托进行红队评估。在主动测试开始前,团队进行被动OSINT以绘制攻击面并识别潜在入口点。目标是拥有500名员工、主域为techcorp.io的SaaS公司。
方法:
  1. 通过Amass和crt.sh枚举147个子域名,包括staging.techcorp.io、jenkins.techcorp.io和vpn.techcorp.io
  2. Shodan发现一个暴露在互联网上的、无身份验证的废弃Elasticsearch实例,端口为9200
  3. theHarvester收集到89个员工邮箱地址,揭示格式为first.last@techcorp.io
  4. GitHub搜索发现前开发者的公共仓库中包含一个带有AWS访问密钥的
    .env
    文件
  5. LinkedIn分析显示公司使用Okta进行SSO、Jira进行项目管理、AWS进行托管
  6. Google Dorking在docs.techcorp.io上发现一个目录列表,暴露了内部架构图
  7. 将所有情报整理成侦察报告,直接用于威胁建模和攻击规划阶段
常见误区:
  • 依赖单一子域名枚举工具,错过其他工具从不同数据源发现的资产
  • 未检查云存储服务(S3、Azure Blob、GCP)是否存在公开可访问的存储桶
  • 未在公共代码仓库中搜索凭据,而这些凭据通常能直接获取访问权限
  • 在应仅进行被动侦察的阶段进行主动扫描(端口扫描、漏洞扫描)

Output Format

输出格式

undefined
undefined

External Reconnaissance Report - TechCorp.io

外部侦察报告 - TechCorp.io

Attack Surface Summary

攻击面摘要

  • Domains discovered: 3 (techcorp.io, techcorp.com, techcorpapp.com)
  • Subdomains enumerated: 147 unique subdomains across all domains
  • Unique IP addresses: 34 IPs mapped across AWS us-east-1 and us-west-2
  • Email addresses collected: 89 valid corporate email addresses
  • Exposed services: 12 internet-facing services identified via Shodan/Censys
  • 发现域名: 3个(techcorp.io, techcorp.com, techcorpapp.com)
  • 枚举子域名: 所有域名下共147个唯一子域名
  • 唯一IP地址: 34个IP,分布在AWS us-east-1和us-west-2区域
  • 收集邮箱地址: 89个有效企业邮箱地址
  • 暴露服务: 通过Shodan/Censys识别出12个面向互联网的服务

Critical Findings

关键发现

1. Unauthenticated Elasticsearch Instance
  • Host: 52.xx.xx.xx:9200 (elastic.techcorp.io)
  • Indexed data: Application logs containing user session tokens and PII
  • Source: Shodan search "ssl.cert.subject.cn:techcorp.io"
2. AWS Credentials in Public GitHub Repository
  • Repository: github.com/former-dev/techcorp-scripts
  • File: .env containing AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
  • Status: Keys appear active (not tested - out of scope for passive recon)
3. Directory Listing Exposing Internal Documents
1. 无身份验证的Elasticsearch实例
  • 主机: 52.xx.xx.xx:9200 (elastic.techcorp.io)
  • 索引数据: 包含用户会话令牌和PII的应用日志
  • 来源: Shodan搜索 "ssl.cert.subject.cn:techcorp.io"
2. 公共GitHub仓库中的AWS凭据
  • 仓库: github.com/former-dev/techcorp-scripts
  • 文件: 包含AWS_ACCESS_KEY_ID和AWS_SECRET_ACCESS_KEY的.env文件
  • 状态: 密钥似乎处于活跃状态(未测试 - 超出被动侦察范围)
3. 暴露内部文档的目录列表

Recommendations

建议

  1. Immediately rotate the exposed AWS credentials and audit CloudTrail logs
  2. Restrict Elasticsearch access to internal networks or add authentication
  3. Disable directory listings on docs.techcorp.io and audit all web servers
  4. Implement GitHub secret scanning across all organization repositories
undefined
  1. 立即轮换暴露的AWS凭据并审计CloudTrail日志
  2. 限制Elasticsearch访问至内部网络或添加身份验证
  3. 禁用docs.techcorp.io上的目录列表并审计所有Web服务器
  4. 在所有组织仓库中实施GitHub密钥扫描
undefined