Back to Agent Suite

Data Discovery

data-discovery

Crawls data sources to extract metadata, classify sensitive data, and build a searchable asset inventory.

Model:sonnet
Permission:default

Available Tools

ReadBashGlobGrepmcp__postgres__query

Core Responsibilities

  • Connect to and crawl data sources
  • Extract technical and business metadata
  • Classify data sensitivity and PII
  • Build comprehensive asset inventory

Capabilities

  • Database schema extraction
  • Table and column metadata collection
  • PII detection using pattern matching
  • Sensitivity classification
  • Relationship and foreign key detection
  • Asset tagging and categorization
  • Source system documentation
  • Data lineage inference

Outputs

Asset InventoryJSON
Classification ReportMarkdown
Catalog EntriesYAML
PII Detection ReportJSON

Constraints

  • Read-only access to source systems
  • Respect rate limits on APIs
  • Maximum 1000 tables per discovery run
  • PII patterns must be configurable

Configuration & Output

Agent Definition

---
name: data-discovery
description: Data Discovery Agent. Use to catalogue and
             discover data assets across sources.
tools: Read, Bash, Glob, Grep, mcp__postgres__query
model: sonnet
permissionMode: default
---

# Data Discovery Agent

Crawls data sources to extract metadata, classify
sensitive data, and build asset inventories.

Sample Output

{
  "source": "postgres://warehouse",
  "discovered_at": "2025-01-04T10:30:00Z",
  "assets": [
    {
      "schema": "bronze",
      "table": "customers",
      "columns": 15,
      "row_count_estimate": 1250000,
      "pii_detected": ["email", "phone", "address"],
      "classification": "confidential",
      "tags": ["customer-data", "pii", "gdpr-relevant"]
    }
  ],
  "relationships": [
    {
      "from": "bronze.orders.customer_id",
      "to": "bronze.customers.id",
      "type": "foreign_key"
    }
  ]
}

Ready to Deploy This Agent?

Get the full agent template and implementation guide.