Code Indexing
Automatic code indexing transforms your codebase into searchable knowledge, enabling AI assistants to understand and work with your WordPress projects using semantic search.
Overview
Code indexing in WDG:
- Automatic: Triggered by git commits and pulls
- Incremental: Only indexes changed files
- Semantic: Understands code meaning, not just keywords
- Project-Scoped: Each project has isolated collections
- Fast: Local embeddings require no external API calls
How It Works
Indexing Pipeline
%%{init: {'theme':'neutral'}}%%
sequenceDiagram
participant Git
participant Hook
participant Indexer
participant Parser
participant Embedder
participant Qdrant
Git->>Hook: git commit
Hook->>Indexer: Trigger with changed files
Indexer->>Parser: Parse code files
Parser->>Parser: Extract components
loop For each component
Parser->>Embedder: Generate embedding
Embedder->>Embedder: all-MiniLM-L6-v2
Embedder->>Qdrant: Store vector + metadata
end
Qdrant->>Git: Indexing complete
What Gets Indexed
PHP Files
Functions:
php
// Indexed as complete unit with context
function get_user_posts($user_id, $post_type = 'post') {
global $wpdb;
return $wpdb->get_results(
$wpdb->prepare(
"SELECT * FROM {$wpdb->posts}
WHERE post_author = %d
AND post_type = %s
AND post_status = 'publish'",
$user_id,
$post_type
)
);
}Classes:
php
// Indexed with all methods and properties
class CustomPostType {
private $post_type;
public function __construct($type) {
$this->post_type = $type;
$this->register();
}
public function register() {
// Method implementation
}
}WordPress Hooks:
php
// Indexed with full context
add_action('init', function() {
register_post_type('custom_type', [
'public' => true,
'supports' => ['title', 'editor']
]);
});JavaScript Files
Functions:
javascript
// Regular functions
function validateEmail(email) {
return /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(email);
}
// Arrow functions
const fetchUserData = async (userId) => {
const response = await fetch(`/api/users/${userId}`);
return response.json();
};React Components:
jsx
// Indexed with props and hooks
function UserProfile({ userId, showEmail = false }) {
const [user, setUser] = useState(null);
useEffect(() => {
fetchUserData(userId).then(setUser);
}, [userId]);
return (
<div className="user-profile">
<h2>{user?.name}</h2>
{showEmail && <p>{user?.email}</p>}
</div>
);
}CSS/SCSS Files
scss
// Chunked by logical sections
.user-profile {
display: flex;
padding: 2rem;
&__header {
font-size: 1.5rem;
color: var(--primary-color);
}
@media (max-width: 768px) {
flex-direction: column;
}
}Block Configurations
json
{
"name": "wdg/custom-block",
"title": "Custom Block",
"category": "wdg-blocks",
"attributes": {
"content": {
"type": "string",
"default": ""
}
}
}Indexing Strategies
Automatic Indexing (Recommended)
Git hooks automatically trigger indexing:
bash
# After committing changes
git add .
git commit -m "Add new feature"
# → Automatically indexes changed files
# After pulling updates
git pull origin main
# → Automatically indexes merged changesManual Indexing
bash
# Index Wikit framework
wdg index
# Index specific project
wdg index my-site
# Force re-index everything
wdg index my-site --force
# Update repositories and index
wdg index --updateSelective Indexing
bash
# Index only specific file types
wdg index my-site --types php,js
# Index specific directory
wdg index my-site --path wp-content/themes/custom-theme
# Exclude directories
wdg index my-site --exclude node_modules,vendorVector Collections
Collection Structure
Each project gets its own collection in Qdrant:
Collections:
├── wdg_framework # Wikit core framework
├── project_my_site # Project: my-site
├── project_client_site # Project: client-site
└── project_demo # Project: demoVector Metadata
Each indexed code chunk includes rich metadata:
json
{
"id": "abc123",
"vector": [0.1, 0.2, 0.3, ...], // 384 dimensions
"metadata": {
"project": "my-site",
"file_path": "wp-content/themes/custom/functions.php",
"file_type": "php",
"component_type": "function",
"component_name": "get_user_posts",
"line_start": 45,
"line_end": 58,
"language": "php",
"content": "function get_user_posts($user_id...",
"docblock": "Get all posts for a user...",
"indexed_at": "2024-10-14T10:30:00Z",
"commit_hash": "abc123def456",
"branch": "main"
}
}Indexing Performance
Initial Indexing
bash
# Wikit Framework (~5,000 files)
Time: 5-7 minutes
Vectors created: ~15,000
Disk space: ~50MB
# Typical project (~500 files)
Time: 30-60 seconds
Vectors created: ~1,500
Disk space: ~5MBIncremental Updates
bash
# Single file change
Time: <1 second
Vectors updated: 1-10
Overhead: Minimal
# Pull with 20 changed files
Time: 5-10 seconds
Vectors updated: 50-200
Overhead: NegligiblePerformance Optimization
Batch Processing:
python
# Indexer processes files in batches
batch_size = 32
embeddings = model.encode(
code_chunks,
batch_size=batch_size,
show_progress_bar=True
)Caching:
python
# Only re-index if file changed
if file_hash != cached_hash:
index_file(file)
else:
skip_file(file)Search Capabilities
Semantic Search
Find code by meaning, not just keywords:
bash
# Search query: "validate user email address"
# Finds:
- is_valid_email($email)
- validateEmailAddress(email)
- checkUserEmailFormat()
- /^[^\s@]+@[^\s@]+\.[^\s@]+$/Cross-Language Search
bash
# Search: "fetch data from API"
# Finds across languages:
PHP: wp_remote_get($url)
JS: fetch(url).then(r => r.json())
JS: axios.get(url)Pattern Recognition
bash
# Search: "custom post type registration"
# Finds all register_post_type() calls with context:
- Portfolio custom post type
- Testimonials CPT
- Events post type
- Product catalogCode Chunking Strategy
PHP Chunking
php
// Chunk 1: Function with full body
function calculate_total($items) {
$total = 0;
foreach ($items as $item) {
$total += $item->price;
}
return $total;
}
// Chunk 2: Separate function
function apply_discount($total, $discount) {
return $total * (1 - $discount);
}JavaScript Chunking
javascript
// Chunk 1: Component definition
function ProductCard({ product }) {
return (
<div className="product-card">
<h3>{product.name}</h3>
<p>{product.price}</p>
</div>
);
}
// Chunk 2: Helper function
const formatPrice = (price) => {
return `$${price.toFixed(2)}`;
};CSS Chunking
scss
// Chunk 1: Component styles
.product-card {
display: flex;
padding: 1rem;
h3 {
font-size: 1.2rem;
}
}
// Chunk 2: Media queries
@media (max-width: 768px) {
.product-card {
flex-direction: column;
}
}Managing Collections
List Collections
bash
wdg collections listOutput:
Vector Database Collections:
wdg_framework
Vectors: 15,234
Size: 48.2 MB
Last updated: 2024-10-14 09:15
project_my_site
Vectors: 1,450
Size: 4.7 MB
Last updated: 2024-10-14 10:30
project_client_site
Vectors: 3,892
Size: 12.1 MB
Last updated: 2024-10-13 16:45Delete Collection
bash
# Delete project collection
wdg collections delete project_old_site
# Re-create by re-indexing
wdg index old-siteGit Hook Integration
Post-Commit Hook
bash
#!/bin/bash
# .git/hooks/post-commit
# Get changed files in this commit
CHANGED_FILES=$(git diff --name-only HEAD^ HEAD)
# Filter for indexable files
INDEXABLE=$(echo "$CHANGED_FILES" | grep -E '\.(php|js|jsx|scss|css|json)$')
if [ -n "$INDEXABLE" ]; then
echo "Indexing changed files..."
wdg index $(basename $(pwd)) --files "$INDEXABLE"
fiPost-Merge Hook
bash
#!/bin/bash
# .git/hooks/post-merge
# Get merged files
MERGED_FILES=$(git diff --name-only ORIG_HEAD HEAD)
# Index merged changes
if [ -n "$MERGED_FILES" ]; then
echo "Indexing merged changes..."
wdg index $(basename $(pwd))
fiInstalling Hooks
bash
# Hooks are automatically installed when:
# 1. Creating new project with --init-wikit
wdg create my-site --init-wikit
# 2. Adding repository to project
wdg my-site repo add https://github.com/client/repo
# 3. Manually install
cd projects/my-site/repositories/my-site
cp /path/to/wdg/hooks/* .git/hooks/
chmod +x .git/hooks/*Indexing Best Practices
1. Commit Frequently
bash
# Each commit triggers incremental indexing
git commit -m "Add user authentication" # Indexes auth code
git commit -m "Add email validation" # Indexes validation2. Use Descriptive Commits
bash
# Good: AI can understand context
git commit -m "Add custom post type for portfolio items"
# Bad: Less context for AI
git commit -m "Update code"3. Structure Code Well
php
// Good: Clear function separation
function get_user() { }
function validate_user() { }
function save_user() { }
// Bad: Monolithic function (harder to search)
function handle_user() {
// 200 lines of mixed logic
}4. Include DocBlocks
php
/**
* Calculate discounted price for user
*
* @param float $price Original price
* @param int $user_id User ID for discount lookup
* @return float Discounted price
*/
function calculate_discount($price, $user_id) {
// Implementation
}5. Regular Maintenance
bash
# Weekly: Update and re-index framework
wdg update
# Monthly: Clean up old collections
wdg collections list
wdg collections delete project_old_*
# Quarterly: Full re-index
wdg index --all --forceTroubleshooting
Indexing Not Triggering
bash
# Check if hooks are installed
ls -la .git/hooks/post-commit
# Verify hook is executable
chmod +x .git/hooks/post-commit
# Test hook manually
.git/hooks/post-commitSlow Indexing
bash
# Check system resources
docker stats wdg-indexer
# Use faster model
# Edit .env: EMBEDDING_MODEL=all-MiniLM-L6-v2
# Restart indexer
docker-compose restart indexerMissing Results
bash
# Verify collection exists
wdg collections list
# Check vector count
curl http://localhost:6333/collections/project_my_site
# Re-index if needed
wdg index my-site --forceOut of Disk Space
bash
# Check collection sizes
wdg collections list
# Delete old collections
wdg collections delete project_old_*
# Prune Docker volumes
docker system prune -vAdvanced Configuration
Custom Embedding Model
bash
# Edit .env
EMBEDDING_MODEL=all-mpnet-base-v2 # Higher quality, slower
# or
EMBEDDING_MODEL=all-MiniLM-L6-v2 # Faster, default
# Restart indexer service
docker-compose restart indexer
# Re-index with new model
wdg index --all --forceIndexing Filters
Create .wdg/indexing.json in project:
json
{
"include": [
"wp-content/themes/**/*.php",
"wp-content/plugins/**/*.{php,js}"
],
"exclude": [
"**/node_modules/**",
"**/vendor/**",
"**/*.min.js",
"**/dist/**"
],
"chunk_size": 512,
"overlap": 50
}Custom Metadata
Add custom metadata to vectors:
python
# indexer/custom_metadata.py
def extract_metadata(file_path, content):
metadata = {
"project": get_project_name(),
"file_path": file_path,
"author": get_git_author(file_path),
"last_modified": get_file_mtime(file_path),
"custom_tags": extract_custom_tags(content)
}
return metadataIntegration with AI Assistants
The indexed code becomes instantly searchable by AI:
bash
# AI can now answer:
"Where do we register custom post types?"
"Show me how we handle user authentication"
"Find similar implementations of email validation"
"What Wikit blocks are used in this project?"Monitoring Indexing
View Indexing Logs
bash
# Real-time logs
wdg logs indexer --follow
# Last 100 lines
wdg logs indexer --tail 100Indexing Status
bash
# Overall status
wdg status
# Project-specific status
wdg status my-siteNext Steps:
- Learn about Local Embeddings
- Configure MCP Server
- Review Git Hooks