Skip to content

Federation Best Practices and Troubleshooting

This page covers operational best practices and troubleshooting for Directory federation. It applies to both https_web and https_spiffe profiles. For profile selection and configuration details, see Federation Bundle Profiles.

Operational Best Practices

Universal Best Practices

  • Consistent className configuration

    • Use standardized className value across all resources (recommended: "dir-spire")
    • Ensures SPIRE Controller Manager correctly processes ClusterSPIFFEID resources
    • Prevents controller filtering issues
  • Enable ClusterFederatedTrustDomain Controller

    • Required for automatic trust bundle fetching
    • Without this controller, federation configuration is ignored
    • Verify in controller manager logs
  • Implement Bundle Refresh Monitoring

    • Monitor SPIRE server logs for "Bundle refreshed" events
    • Alert on bundle staleness exceeding 24 hours
    • Implement automated monitoring:
    kubectl logs -n dir-spire -l app.kubernetes.io/name=server -c spire-server | \
      grep "Bundle refreshed" | \
      grep "trust_domain=partner.com"
    
  • Configure Rate Limiting

    • Protect federation endpoints from abuse
    • Recommended configuration: 100 requests/second with 5x burst multiplier
    • Adjust based on number of federation partners and refresh frequency
  • Set Appropriate CA TTL

    • Recommended: 720 hours (30 days)
    • Balances security (shorter TTL) with operational stability (fewer rotations)
    • Consider operational capacity for handling rotation issues

https_web Specific Practices

  • Use Production Certificate Authority

    • Deploy with production Let's Encrypt (letsencrypt ClusterIssuer)
    • Avoid staging certificates in production federation scenarios
    • Staging certificates are not trusted by remote SPIRE servers by default
  • Monitor Certificate Expiration

    • Although cert-manager auto-renews, implement monitoring
    • Alert on certificates expiring within 7 days
    • Verify renewal process during certificate lifecycle:
    kubectl get certificate -n dir-spire spire-server-federation-cert -o yaml | \
      grep -A5 "status:"
    
  • Validate External Network Accessibility

    • Periodically test federation endpoint from external network
    • Verify DNS resolution from federation partner networks
    • Validate certificate chain visibility:
    curl -v https://spire.example.com 2>&1 | grep -A10 "SSL certificate"
    

https_spiffe Specific Practices

  • Maintain Bootstrap Bundle Freshness

    • Update bootstrap bundles before CA rotation (every caTTL period)
    • Implement bundle distribution automation where feasible
    • Document manual update procedures for operational staff
  • Document Bundle Exchange Procedures

    • Establish clear onboarding process for new federation partners
    • Define secure channels for bundle transmission
    • Maintain contact registry for federation partner technical contacts
  • Validate SSL Passthrough Persistence

    • Verify ssl-passthrough configuration after cluster maintenance
    • Test after NGINX ingress controller upgrades
    • Automate validation:
    kubectl get deployment -n ingress-nginx ingress-nginx-controller -o yaml | \
      grep "enable-ssl-passthrough" || echo "WARNING: SSL passthrough not enabled"
    

Troubleshooting

Connection Issues

# Check SPIRE agent status
spire-agent api fetch x509-svid

# Verify network connectivity
curl -v https://prod.api.ads.outshift.io

# Verify env vars are set
echo $DIRECTORY_CLIENT_SERVER_ADDRESS $DIRECTORY_CLIENT_SPIFFE_SOCKET_PATH

Federation Issues

# Verify trust bundle exchange
spire-server federation show --trustDomain prod.ads.outshift.io

# Test bundle endpoint
curl https://prod.spire.ads.outshift.io/

Common Errors

Error Solution
connection refused Check SPIRE agent running and socket path
x509: certificate signed by unknown authority Verify trust bundle configuration
context deadline exceeded Check network and firewall
permission denied Ensure SPIFFE ID registration and policies

https_web Profile Issues

Issue: "certificate is valid for ingress.local, not spire.example.com"

Root Cause: NGINX serving default certificate instead of cert-manager issued certificate.

Resolution: Ensure tlsSecret is set under spire-server.ingress in your Helm values:

# Under spire-server.ingress in values
ingress:
  tlsSecret: "spire-server-federation-cert"

Verification:

kubectl get certificate -n dir-spire spire-server-federation-cert
# Status should be READY=True

Issue: "certificate signed by unknown authority"

Root Cause: cert-manager certificate not issued or ClusterIssuer misconfigured.

Resolution:

# Check Certificate status
kubectl describe certificate -n dir-spire spire-server-federation-cert

# Check ClusterIssuer status
kubectl get clusterissuer letsencrypt -o yaml

# Review cert-manager logs
kubectl logs -n cert-manager deployment/cert-manager

Issue: "no endpoints available for service"

Root Cause: Ingress not routing traffic to SPIRE server service.

Resolution:

# Verify Ingress status
kubectl get ingress -n dir-spire

# Check service endpoints
kubectl get endpoints -n dir-spire spire-server

# Verify SPIRE server pods are running
kubectl get pods -n dir-spire -l app.kubernetes.io/name=server

https_spiffe Profile Issues

Issue: "certificate contains no URI SAN"

Root Cause: SPIRE server has not issued itself a federation X.509-SVID.

Resolution:

# Restart SPIRE server to trigger SVID issuance
kubectl rollout restart -n dir-spire deployment/spire-server

# Wait for rollout to complete
kubectl rollout status -n dir-spire deployment/spire-server

# Verify federation SVID issuance in logs
kubectl logs -n dir-spire -l app.kubernetes.io/name=server -c spire-server | \
  grep "federation"

Issue: "certificate signed by unknown authority"

Root Cause: Bootstrap bundle is missing, stale, or incorrectly formatted.

Resolution: Obtain a fresh bundle from your federation partner, validate it, and update trustDomainBundle in your federation resource. See Bootstrap Bundle Exchange Process for extraction and validation steps:

# Obtain fresh bundle from federation partner
curl -k https://spire.partner.com > fresh-partner-bundle.json

# Validate bundle structure
jq '.' fresh-partner-bundle.json

# Update trustDomainBundle in federation configuration
# Redeploy application

Issue: "ssl-passthrough annotation not taking effect"

Root Cause: NGINX ingress controller not configured with --enable-ssl-passthrough flag.

Resolution:

# Verify controller has ssl-passthrough enabled
kubectl get deployment -n ingress-nginx ingress-nginx-controller -o yaml | \
  grep "enable-ssl-passthrough"

# If not present, patch deployment
kubectl patch deployment -n ingress-nginx ingress-nginx-controller --type='json' \
  -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value":"--enable-ssl-passthrough"}]'

# Verify configuration persisted after controller restart