Skip to content

Collection Management

Manage vector database collections for AI-powered code search across projects and repositories.

Overview

Collections are isolated vector databases that store code embeddings for semantic search:

  • Framework Collection: wdg_framework - Wikit core code
  • Project Collections: project_<name> - Project-specific code
  • Platform Collection: platform - Platform infrastructure code

List Collections

View All Collections

bash
wdg collections list

Output:

Vector Database Collections:

wdg_framework
  Description: Wikit Framework
  Vectors: 15,234
  Size: 48.2 MB
  Dimensions: 768
  Distance: Cosine
  Last updated: 2024-10-14 09:15:00
  Indexed files: ~5,000

project_my_site
  Description: Project: my-site
  Vectors: 1,450
  Size: 4.7 MB
  Dimensions: 768
  Distance: Cosine
  Last updated: 2024-10-14 10:30:00
  Indexed files: ~500

project_client_website
  Description: Project: client-website
  Vectors: 3,892
  Size: 12.1 MB
  Dimensions: 768
  Distance: Cosine
  Last updated: 2024-10-13 16:45:00
  Indexed files: ~1,200

platform
  Description: Platform Infrastructure
  Vectors: 892
  Size: 2.8 MB
  Dimensions: 768
  Distance: Cosine
  Last updated: 2024-10-12 14:20:00
  Indexed files: ~300

Total: 4 collections, 21,468 vectors, 67.8 MB

List with Filters

bash
# Only project collections
wdg collections list --type=project

# Only active projects
wdg collections list --active

# Sort by size
wdg collections list --sort=size

# Sort by last updated
wdg collections list --sort=updated

Collection Details

Get Collection Info

bash
wdg collections info <collection-name>

Example:

bash
wdg collections info project_my_site

Output:

Collection: project_my_site
Type: Project
Status: Active

Statistics:
  Vectors: 1,450
  Storage: 4.7 MB
  Dimensions: 768
  Distance metric: Cosine
  Optimization: Indexed

Metadata:
  Project: my-site
  Created: 2024-10-01 14:30:00
  Last indexed: 2024-10-14 10:30:00
  Last query: 2024-10-14 11:15:00

Indexed Content:
  PHP files: 245 (890 vectors)
  JavaScript files: 120 (380 vectors)
  CSS files: 45 (110 vectors)
  Other files: 90 (70 vectors)

Top Components:
  Functions: 520
  Classes: 85
  Hooks: 145
  Blocks: 12

Performance:
  Average query time: 45ms
  Cache hit rate: 38%
  Optimization: ✓ Optimal

Create Collections

Manual Collection Creation

bash
wdg collections create <name> [--description="..."]

Example:

bash
wdg collections create project_demo \
    --description="Demo Project" \
    --dimensions=384 \
    --distance=Cosine

💡 TIP

Collections are automatically created when indexing new projects. Manual creation is rarely needed.

Create from Project

bash
wdg collections create-from-project <project-name>

Analyzes project and creates optimally configured collection.

Delete Collections

Delete Collection

bash
wdg collections delete <collection-name>

Example:

bash
wdg collections delete project_old_site

Confirmation prompt:

⚠️  WARNING: This will permanently delete the collection!

Collection: project_old_site
Vectors: 1,234
Size: 3.9 MB

This action cannot be undone.
Are you sure you want to delete? (y/N):

Force Delete (No Confirmation)

bash
wdg collections delete project_old_site --force

Delete Multiple Collections

bash
# Delete all inactive project collections
wdg collections delete --inactive

# Delete by pattern
wdg collections delete "project_old_*"

# Delete all project collections
wdg collections delete --type=project --force

Update Collections

Re-index Collection

bash
wdg collections reindex <collection-name>

Example:

bash
wdg collections reindex project_my_site

What it does:

  1. Clears existing vectors
  2. Re-parses all files
  3. Generates new embeddings
  4. Rebuilds collection
  5. Optimizes indexes

Optimize Collection

bash
wdg collections optimize <collection-name>

Optimizes collection for faster search:

  • Rebuilds HNSW index
  • Removes deleted vectors
  • Compacts storage
  • Updates statistics

Update Metadata

bash
wdg collections update <collection-name> <key> <value>

Examples:

bash
# Update description
wdg collections update project_my_site description "Client Website"

# Add custom metadata
wdg collections update project_my_site client_name "Acme Corp"

Search Collections

Search Within Collection

bash
wdg collections search <collection-name> "<query>" [--limit=10]

Example:

bash
wdg collections search project_my_site "custom post type registration"

Output:

Search Results (3 found):

1. functions.php:45 (score: 0.89)
   function register_portfolio_cpt() {
       register_post_type('portfolio', [...]);
   }

2. inc/post-types.php:12 (score: 0.84)
   class CustomPostTypes {
       public function register_types() {
           // Register custom post types
       }
   }

3. lib/register.php:78 (score: 0.76)
   add_action('init', 'register_all_post_types');
bash
# Search all collections
wdg collections search-all "<query>"

# Search multiple collections
wdg collections search-multi "project_my_site,project_client" "<query>"

Collection Analytics

Usage Statistics

bash
wdg collections stats [collection-name]

For single collection:

Statistics for project_my_site:

Storage:
  Vectors: 1,450
  Size on disk: 4.7 MB
  Average vector size: 3.2 KB
  Compression ratio: 85%

Performance:
  Average query time: 45ms
  Fastest query: 12ms
  Slowest query: 230ms
  Queries today: 156

Popular Searches:
  1. "custom post type" (23 queries)
  2. "email validation" (18 queries)
  3. "user authentication" (15 queries)

Content Breakdown:
  PHP: 890 vectors (61%)
  JavaScript: 380 vectors (26%)
  CSS: 110 vectors (8%)
  Other: 70 vectors (5%)

For all collections:

Global Collection Statistics:

Total Collections: 4
Total Vectors: 21,468
Total Storage: 67.8 MB

Collection Sizes:
  wdg_framework: 48.2 MB (71%)
  project_client_website: 12.1 MB (18%)
  project_my_site: 4.7 MB (7%)
  platform: 2.8 MB (4%)

Query Performance:
  Average: 52ms
  P50: 45ms
  P95: 180ms
  P99: 350ms

Cache Performance:
  Hit rate: 42%
  Miss rate: 58%
  Cache size: 256 MB

Growth Tracking

bash
wdg collections growth [collection-name] [--period=week|month]

Shows collection growth over time:

Collection Growth: project_my_site

Week of 2024-10-07:
  Monday: +45 vectors
  Tuesday: +23 vectors
  Wednesday: +67 vectors
  Thursday: +12 vectors
  Friday: +89 vectors
  Weekend: +8 vectors

Total growth: +244 vectors (20% increase)
Average per day: 35 vectors

Projection: 1,694 vectors by end of month

Backup and Restore

Backup Collection

bash
wdg collections backup <collection-name> [--output=<path>]

Example:

bash
wdg collections backup project_my_site \
    --output=backups/my-site-$(date +%Y%m%d).qdrant

What's included:

  • All vectors
  • Metadata
  • Index structure
  • Configuration

Restore Collection

bash
wdg collections restore <backup-file>

Example:

bash
wdg collections restore backups/my-site-20241014.qdrant

Backup All Collections

bash
wdg collections backup-all [--output=<directory>]

Collection Maintenance

Cleanup Orphaned Vectors

bash
wdg collections cleanup <collection-name>

Removes vectors for files that no longer exist.

Verify Collection Integrity

bash
wdg collections verify <collection-name>

Output:

Verifying collection: project_my_site

Checking vectors...
  ✓ All vectors valid
  ✓ No duplicate vectors
  ✓ Metadata consistent

Checking indexes...
  ✓ HNSW index intact
  ✓ Payload index valid
  ✓ Full-text index ok

Checking files...
  ✓ All source files exist
  ⚠ 3 vectors for deleted files
  → Run cleanup to remove orphaned vectors

Status: Healthy (with warnings)

Compact Collection

bash
wdg collections compact <collection-name>

Reduces storage by:

  • Removing deleted vectors
  • Optimizing index structures
  • Compressing metadata

Collection Migration

Export Collection

bash
wdg collections export <collection-name> --format=<json|csv|parquet>

Example:

bash
wdg collections export project_my_site --format=json \
    --output=exports/my-site-vectors.json

Import Collection

bash
wdg collections import <file> --name=<collection-name>

Example:

bash
wdg collections import exports/my-site-vectors.json \
    --name=project_my_site_restored

Merge Collections

bash
wdg collections merge <source1> <source2> --output=<new-collection>

Example:

bash
wdg collections merge project_old project_new \
    --output=project_combined

Advanced Operations

Collection Snapshots

bash
# Create snapshot
wdg collections snapshot <collection-name> [--name=<snapshot-name>]

# List snapshots
wdg collections snapshots <collection-name>

# Restore from snapshot
wdg collections restore-snapshot <collection-name> <snapshot-name>

# Delete snapshot
wdg collections delete-snapshot <collection-name> <snapshot-name>

Vector Operations

bash
# Count vectors matching filter
wdg collections count <collection-name> --filter='type="function"'

# Find duplicate vectors
wdg collections duplicates <collection-name>

# Update vector metadata
wdg collections update-vectors <collection-name> \
    --filter='file_type="php"' \
    --set language="php8"

Configuration

Collection Settings

bash
# View collection config
wdg collections config <collection-name>

# Update settings
wdg collections config <collection-name> set <key> <value>

Common settings:

bash
# Optimize for search speed
wdg collections config project_my_site set \
    hnsw_ef_construct 200

# Optimize for memory
wdg collections config project_my_site set \
    hnsw_m 16

# Enable compression
wdg collections config project_my_site set \
    compression true

Monitoring

Real-Time Collection Monitoring

bash
wdg collections monitor [collection-name]

Output:

Monitoring: project_my_site (Press Ctrl+C to exit)

14:23:15  Query: "custom post type" (47ms, 5 results)
14:23:42  Indexed: functions.php (+12 vectors)
14:24:08  Query: "email validation" (38ms, 3 results)
14:24:35  Optimized indexes (saved 450KB)
14:25:12  Query: "user auth" (52ms, 8 results)

Current stats:
  Vectors: 1,462 (+12 since start)
  Queries: 3 (avg 45ms)
  Size: 4.72 MB

Collection Alerts

bash
# Set up alerts
wdg collections alert <collection-name> \
    --threshold=size:100MB \
    --threshold=queries:1000 \
    --notify=email:admin@wdg.com

Troubleshooting

Collection Not Found

bash
# List available collections
wdg collections list

# Verify collection name
wdg collections info <collection-name>

# Recreate by re-indexing
wdg index <project-name>

Slow Queries

bash
# Optimize collection
wdg collections optimize <collection-name>

# Check stats
wdg collections stats <collection-name>

# Rebuild indexes
wdg collections reindex <collection-name>

Storage Issues

bash
# Check collection sizes
wdg collections list --sort=size

# Compact large collections
wdg collections compact <collection-name>

# Delete unused collections
wdg collections delete --inactive

See Also:

Released under the MIT License.