Advanced Tips: Tuning Claspfolio Parameters for Faster ResultsClaspfolio is a solver portfolio framework built around the ASP (Answer Set Programming) solver clasp. It combines multiple solver configurations and selector strategies to choose the most promising configuration for each instance, improving overall performance across heterogeneous problem sets. Fine-tuning claspfolio can yield significant speedups on benchmark suites and real-world workloads. This article covers principled approaches, practical tips, and configuration examples to squeeze better performance from claspfolio.
1. Understand the Components of Claspfolio
Before tuning, know the parts you can control:
- Feature extractor — computes instance features used by the selector.
- Feature selection/normalization — decides which features to use and how to scale them.
- Portfolio — set of candidate solver configurations.
- Selector model — machine learning model mapping features to choices (e.g., regression, classification, kNN).
- Pre-solving schedule — short runs of selected configurations before full selector is consulted.
- Time management — cutoff times for pre-solving, per-config runs, and full runs.
Tuning any of these can change throughput and robustness.
2. Measure Before Tuning
- Build a baseline using your typical dataset and default claspfolio settings.
- Collect per-instance runtimes, solved/unsolved counts, PAR10 (penalized average runtime), and feature extraction time.
- Keep a reproducible environment (fixed seeds where applicable).
Concrete metrics to track: PAR10, solved instances, mean runtime, feature extraction overhead.
3. Optimize Feature Extraction
Feature computation can be costly and may negate portfolio benefits if too slow.
- Use a lightweight feature set when instance sizes are large or features are expensive.
- Profile feature extraction time per instance; remove features that are slow and low-informative.
- Cache features for repeated instances or batched runs.
- For time-critical systems, consider using only static syntactic features (counts of atoms, rules) instead of dynamic probing features.
Tip: If feature extraction averages more than 5–10% of your cutoff per instance, prioritize trimming it.
4. Feature Selection and Normalization
Irrelevant or redundant features reduce model quality.
- Use automatic methods: recursive feature elimination, LASSO, or tree-based importance scores to prune features.
- Normalize numeric features (z-score or min-max) so models like kNN or SVM behave properly.
- Group or discretize highly skewed features (e.g., log-transform clause counts).
Example pipeline:
- Remove features with near-zero variance.
- Apply log(x+1) to count-based features.
- Standardize to zero mean and unit variance.
5. Build an Effective Portfolio
The choice and diversity of configurations are central.
- Start with a mix: default clasp settings, aggressive search, cautious search, restarts-heavy, different preprocessing options.
- Use automated configuration tools (e.g., SMAC, irace) to generate high-performing configurations on your training set.
- Keep portfolio size moderate: larger portfolios increase selector complexity and risk overfitting; 8–20 diverse configurations is a common sweet spot.
- Remove dominated configurations — those never selected or always worse than another config.
Example candidate configurations:
- default
- long-restarter + aggressive heuristics
- nogood learning focused
- preprocessing-heavy
- assumption-based tactics
6. Choose and Tune the Selector
Selector choice depends on dataset size and feature quality.
- k-Nearest Neighbors (kNN) is simple, robust, and often strong with small datasets.
- Regression models predict runtime per config; choose best-predicted config. Random Forests or Gradient Boosting work well.
- Classification (predict best config directly) can be effective when labels are clear.
- Pairwise or per-configuration models scale better with large portfolios.
Tuning tips:
- For kNN: tune k, distance metric, and feature weighting.
- For trees/forests: tune tree depth, number of trees, and minimum samples per leaf.
- Use cross-validation with time-based or instance-split folds to avoid overfitting.
7. Pre-solving Schedules
A short pre-solving schedule can catch easy instances quickly.
- Construct short runs of a few strong configurations (e.g., 1–3 seconds each) before running the selector.
- Keep pre-solving budget small (5–10% of total cutoff), especially when feature extraction is cheap.
- Use instance-agnostic heuristics for pre-solving (configurations that solve many easy instances).
Example schedule: config A for 2s, config B for 3s, then feature extraction + selector for remaining time.
8. Manage Timeouts and Cutoffs
- Set sensible per-configuration timeouts to avoid wasting time on hard instances.
- Implement adaptive cutoffs: if selector confidence is low, allocate slightly more pre-solving budget; if confident, launch full run immediately.
- Consider incremental solving (run short run, then extend) to allow fast successes while keeping ability to recover.
9. Reduce Overfitting and Improve Generalization
- Use nested cross-validation or holdout sets from different instance distributions.
- Regularize models and limit portfolio size to reduce variance.
- Evaluate on time-shifted or domain-shifted instances to ensure robustness.
10. Automation and Continuous Improvement
- Automate configuration generation and evaluation with tools like SMAC or irace, combined with periodic retraining of selectors.
- Maintain logs of instance features and solver outcomes to continually refine models.
- Periodically re-evaluate feature importance and prune or replace low-value configurations.
11. Practical Example: A Tuning Workflow
- Collect a representative set of instances; split into train/validation/test.
- Measure baseline with default claspfolio.
- Profile feature extraction; prune heavy features.
- Generate candidate configurations with SMAC (limit to ~20).
- Train a Random Forest regression per-config runtime predictor.
- Create a pre-solving schedule of two strong configs for 3s total.
- Evaluate on validation; prune dominated configs.
- Retrain selector on final portfolio and evaluate on test set.
12. Common Pitfalls
- Letting costly features dominate the budget.
- Overfitting the selector to a small training set.
- Using too many similar configurations in the portfolio.
- Ignoring feature drift when instance distributions change.
13. Summary Recommendations
- Profile first: know where time is spent.
- Favor informative but cheap features.
- Use automated configurators to build diverse portfolios.
- Keep portfolios moderate in size and prune dominated configs.
- Use robust selectors (kNN or ensemble methods) with careful cross-validation.
- Add a short pre-solving schedule to catch easy instances.
If you want, I can: generate a sample SMAC/irace configuration file for automated portfolio generation, propose a specific feature-reduction pipeline with code, or craft a small experimental plan tailored to your instance set—tell me which.
Leave a Reply