Configuration Best Practices

This guide presents best practices, patterns, and recommendations for creating maintainable, secure, and efficient Xec configurations.

Configuration Organization

File Structure

Organize your configuration files logically:

.xec/
├── config.yaml           # Main configuration
├── profiles/            # Environment profiles
│   ├── dev.yaml
│   ├── staging.yaml
│   └── production.yaml
├── tasks/              # Reusable task definitions
│   ├── deploy.yaml
│   ├── backup.yaml
│   └── monitoring.yaml
├── targets/            # Target configurations
│   ├── aws.yaml
│   └── on-premise.yaml
└── scripts/            # Custom scripts
    ├── health-check.js
    └── data-migration.js

Modular Configuration

Break large configurations into modules:

# .xec/config.yaml
version: "1.0"
name: myapp

# Import modular configurations
$import:
  - tasks/*.yaml
  - profiles/*.yaml
  - targets/*.yaml

# Keep main config focused
vars:
  appName: myapp
  version: "2.0.0"

Separation of Concerns

# Separate by functionality
tasks:
  $import:
    - tasks/deployment.yaml    # Deployment tasks
    - tasks/maintenance.yaml   # Maintenance tasks
    - tasks/monitoring.yaml    # Monitoring tasks

Naming Conventions

Consistent Naming

# Good - consistent naming pattern
tasks:
  deploy-web:
  deploy-api:
  deploy-worker:
  
  backup-database:
  backup-files:
  backup-configs:

# Bad - inconsistent naming
tasks:
  deployWeb:
  api_deploy:
  worker-dpl:

Descriptive Names

# Good - self-documenting
targets:
  hosts:
    production-web-server:
    staging-database-primary:
    development-cache:

# Bad - cryptic names
targets:
  hosts:
    pws1:
    sdb-p:
    d-cache:

Hierarchical Naming

tasks:
  # Group related tasks with prefixes
  db:backup:
  db:restore:
  db:migrate:
  
  app:deploy:
  app:rollback:
  app:health-check:

Variable Management

Variable Hierarchy

# 1. Global defaults
vars:
  timeout: 30000
  retries: 3

# 2. Environment-specific
profiles:
  production:
    vars:
      timeout: 60000  # Override for production
      
# 3. Task-specific
tasks:
  long-running:
    vars:
      timeout: 3600000  # Task-specific override

Variable Documentation

vars:
  # API Configuration
  # ==================
  
  # Base URL for API endpoints (must include protocol)
  apiUrl: https://api.example.com
  
  # API version to use (format: vX)
  apiVersion: v2
  
  # Maximum retry attempts for failed API calls
  apiMaxRetries: 3
  
  # Timeout for API calls in milliseconds
  apiTimeout: 10000

Default Values

vars:
  # Always provide defaults
  port: ${env.PORT:-8080}
  environment: ${env.NODE_ENV:-development}
  logLevel: ${env.LOG_LEVEL:-info}
  
  # Computed defaults
  workers: ${env.WORKERS:-${runtime.cpus}}

Security Best Practices

Secret Management

# Good - use secret management
vars:
  apiKey: ${secrets.api_key}
  dbPassword: ${secrets.database_password}
  sshKey: ${secrets.deploy_key}

# Bad - hardcoded secrets
vars:
  apiKey: "sk-1234567890abcdef"  # NEVER DO THIS
  dbPassword: "password123"       # SECURITY RISK

Secure Defaults

targets:
  defaults:
    ssh:
      strictHostKeyChecking: yes
      passwordAuthentication: no
      pubkeyAuthentication: yes
    
    docker:
      privileged: false
      readonlyRootfs: true
      noNewPrivileges: true

Principle of Least Privilege

targets:
  hosts:
    web-server:
      username: deploy  # Not root
      sudo:
        enabled: false  # Only if needed
    
  containers:
    app:
      user: nobody    # Non-root user
      capDrop: [ALL]  # Drop all capabilities

Audit Trail

tasks:
  sensitive-operation:
    hooks:
      before:
        - command: echo "$(date) - ${env.USER} executing sensitive operation" >> audit.log
      after:
        - command: echo "$(date) - Operation completed with exit code $?" >> audit.log
    command: sensitive-command

Error Handling

Graceful Failures

tasks:
  resilient-deploy:
    steps:
      - name: Health check
        command: check-health
        onFailure:
          retry: 3
          delay: 10s
      
      - name: Deploy
        command: deploy-app
        onFailure:
          task: rollback
      
      - name: Cleanup
        command: cleanup-temp
        onFailure: continue  # Non-critical
        alwaysRun: true

Validation Steps

tasks:
  safe-deploy:
    steps:
      - name: Validate configuration
        command: validate-config
        onFailure: abort
      
      - name: Check prerequisites
        command: check-deps
        onFailure: abort
      
      - name: Deploy
        command: deploy

Error Recovery

tasks:
  with-recovery:
    steps:
      - name: Create backup
        command: create-backup
        register: backup_path
      
      - name: Risky operation
        command: modify-data
        onFailure:
          command: restore-backup ${backup_path}

Performance Optimization

Connection Pooling

targets:
  hosts:
    api-server:
      connectionPool:
        min: 2
        max: 10
        idleTimeout: 300000
      keepAlive: true
      keepAliveInterval: 30000

Parallel Execution

tasks:
  parallel-deploy:
    parallel: true
    maxConcurrent: 5
    failFast: false  # Continue even if one fails
    targets:
      - hosts.web-1
      - hosts.web-2
      - hosts.web-3

Resource Limits

targets:
  containers:
    worker:
      memory: 512m
      cpus: "0.5"
      pidsLimit: 100

Caching

tasks:
  expensive-operation:
    cache:
      key: "result-${params.date}"
      ttl: 3600
      storage: redis
    command: generate-report

Environment Management

Environment Isolation

profiles:
  development:
    vars:
      database: dev_db
      apiUrl: http://localhost:3000
    targets:
      containers:
        db:
          image: postgres:15
          ports: ["5432:5432"]
  
  production:
    vars:
      database: prod_db
      apiUrl: https://api.example.com
    targets:
      hosts:
        db:
          host: db.example.com

Environment Validation

tasks:
  validate-environment:
    script: |
      const required = {
        development: ['DEBUG', 'LOCAL_DB'],
        production: ['API_KEY', 'DB_PASSWORD']
      };
      
      const profile = xec.profile;
      const missing = required[profile].filter(
        key => !process.env[key]
      );
      
      if (missing.length > 0) {
        throw new Error(`Missing: ${missing.join(', ')}`);
      }

Documentation

Task Documentation

tasks:
  complex-migration:
    description: |
      Database Migration Task
      =======================
      
      Performs a complete database migration with validation.
      
      Prerequisites:
      - Database backup completed
      - All services stopped
      - Migration scripts reviewed
      
      Duration: ~15 minutes
      Risk Level: HIGH
      
      Recovery: Run 'rollback-migration' task if failed
    steps:
      # Implementation...

Inline Comments

targets:
  hosts:
    legacy-server:
      host: old.example.com
      # DEPRECATED: Will be removed in v3.0
      # Migration deadline: 2024-12-31
      # Contact: ops-team@example.com

README Integration

# .xec/README.md
tasks:
  help:
    script: |
      const readme = await fs.readFile('.xec/README.md');
      console.log(readme);

Testing Configuration

Configuration Validation

# Validate before deployment
xec config validate
xec config validate --profile production

Dry Run Testing

tasks:
  test-deploy:
    description: Test deployment without making changes
    steps:
      - command: deploy --dry-run
      - command: validate-deployment --dry-run

Smoke Tests

tasks:
  smoke-test:
    steps:
      - command: curl -f http://localhost/health
      - command: check-database-connection
      - command: verify-critical-services

Version Control

Configuration as Code

# Version control your configuration
git add .xec/
git commit -m "feat: Add production deployment configuration"
git tag -a v1.0.0 -m "Initial configuration release"

Change Management

# Track configuration changes
vars:
  configVersion: "2.1.0"
  lastModified: "2024-01-15"
  approvedBy: "platform-team"

Migration Path

# Support multiple versions
tasks:
  migrate-config:
    script: |
      const version = vars.configVersion;
      if (version < "2.0.0") {
        await xec.run('migrate-v1-to-v2');
      }
      if (version < "3.0.0") {
        await xec.run('migrate-v2-to-v3');
      }

Common Patterns

Blue-Green Deployment

vars:
  activeColor: blue
  inactiveColor: green

tasks:
  blue-green-deploy:
    steps:
      - name: Deploy to inactive
        command: deploy --target ${inactiveColor}
      
      - name: Test inactive
        command: test --target ${inactiveColor}
      
      - name: Switch traffic
        command: switch-traffic --to ${inactiveColor}
      
      - name: Update active color
        script: |
          vars.activeColor = vars.inactiveColor;
          vars.inactiveColor = vars.activeColor;

Canary Deployment

tasks:
  canary-deploy:
    params:
      - name: percentage
        type: number
        min: 1
        max: 100
    steps:
      - command: deploy --canary
      - command: route-traffic --percentage ${params.percentage}
      - command: monitor --duration 5m
      - command: |
          if [ $? -eq 0 ]; then
            route-traffic --percentage 100
          else
            rollback
          fi

Circuit Breaker

tasks:
  with-circuit-breaker:
    vars:
      maxFailures: 3
      resetTimeout: 60000
    script: |
      const failures = cache.get('failures') || 0;
      
      if (failures >= vars.maxFailures) {
        const lastFailure = cache.get('lastFailure');
        if (Date.now() - lastFailure < vars.resetTimeout) {
          throw new Error('Circuit breaker open');
        }
        cache.set('failures', 0);
      }
      
      try {
        await execute();
        cache.set('failures', 0);
      } catch (error) {
        cache.set('failures', failures + 1);
        cache.set('lastFailure', Date.now());
        throw error;
      }

Anti-Patterns to Avoid

1. Configuration Sprawl

# Bad - everything in one file
# 1000+ line config.yaml

# Good - modular configuration
# Split into logical modules

2. Magic Values

# Bad - unexplained values
timeout: 47382

# Good - documented and clear
timeout: 45000  # 45 seconds (API gateway timeout - 5s)

3. Tight Coupling

# Bad - hardcoded dependencies
tasks:
  deploy:
    command: ssh user@192.168.1.100 deploy

# Good - use targets
tasks:
  deploy:
    command: deploy
    target: hosts.production

4. Overengineering

# Bad - unnecessary complexity
vars:
  value: ${fn.compute(fn.transform(fn.parse(env.VALUE)))}

# Good - simple and clear
vars:
  value: ${env.VALUE:-default}

Monitoring and Observability

Metrics Collection

tasks:
  monitored:
    metrics:
      enabled: true
      collector: prometheus
      labels:
        service: api
        environment: ${profile}

Logging

commands:
  logs:
    tail: 100
    follow: true
    timestamps: true
    prefix: true

Health Checks

tasks:
  health-check:
    schedule: "*/5 * * * *"  # Every 5 minutes
    command: |
      curl -f http://localhost/health || alert

Next Steps

Validation - Configuration validation
Troubleshooting - Common issues

Configuration Organization​

File Structure​

Modular Configuration​

Separation of Concerns​

Naming Conventions​

Consistent Naming​

Descriptive Names​

Hierarchical Naming​

Variable Management​

Variable Hierarchy​

Variable Documentation​

Default Values​

Security Best Practices​

Secret Management​

Secure Defaults​

Principle of Least Privilege​

Audit Trail​

Error Handling​

Graceful Failures​

Validation Steps​

Error Recovery​

Performance Optimization​

Connection Pooling​

Parallel Execution​

Resource Limits​

Caching​

Environment Management​

Environment Isolation​

Environment Validation​

Documentation​

Task Documentation​

Inline Comments​

README Integration​

Testing Configuration​

Configuration Validation​

Dry Run Testing​

Smoke Tests​

Version Control​

Configuration as Code​

Change Management​

Migration Path​

Common Patterns​

Blue-Green Deployment​

Canary Deployment​

Circuit Breaker​

Anti-Patterns to Avoid​

1. Configuration Sprawl​

2. Magic Values​

3. Tight Coupling​

4. Overengineering​

Monitoring and Observability​

Metrics Collection​

Logging​

Health Checks​

Next Steps​

See Also​