Claspfolio 101: Getting Started with Portfolio-Based Solving

Advanced Tips: Tuning Claspfolio Parameters for Faster ResultsClaspfolio is a solver portfolio framework built around the ASP (Answer Set Programming) solver clasp. It combines multiple solver configurations and selector strategies to choose the most promising configuration for each instance, improving overall performance across heterogeneous problem sets. Fine-tuning claspfolio can yield significant speedups on benchmark suites and real-world workloads. This article covers principled approaches, practical tips, and configuration examples to squeeze better performance from claspfolio.

1. Understand the Components of Claspfolio

Before tuning, know the parts you can control:

Feature extractor — computes instance features used by the selector.
Feature selection/normalization — decides which features to use and how to scale them.
Portfolio — set of candidate solver configurations.
Selector model — machine learning model mapping features to choices (e.g., regression, classification, kNN).
Pre-solving schedule — short runs of selected configurations before full selector is consulted.
Time management — cutoff times for pre-solving, per-config runs, and full runs.

Tuning any of these can change throughput and robustness.

2. Measure Before Tuning

Build a baseline using your typical dataset and default claspfolio settings.
Collect per-instance runtimes, solved/unsolved counts, PAR10 (penalized average runtime), and feature extraction time.
Keep a reproducible environment (fixed seeds where applicable).

Concrete metrics to track: PAR10, solved instances, mean runtime, feature extraction overhead.

3. Optimize Feature Extraction

Feature computation can be costly and may negate portfolio benefits if too slow.

Use a lightweight feature set when instance sizes are large or features are expensive.
Profile feature extraction time per instance; remove features that are slow and low-informative.
Cache features for repeated instances or batched runs.
For time-critical systems, consider using only static syntactic features (counts of atoms, rules) instead of dynamic probing features.

Tip: If feature extraction averages more than 5–10% of your cutoff per instance, prioritize trimming it.

4. Feature Selection and Normalization

Irrelevant or redundant features reduce model quality.

Use automatic methods: recursive feature elimination, LASSO, or tree-based importance scores to prune features.
Normalize numeric features (z-score or min-max) so models like kNN or SVM behave properly.
Group or discretize highly skewed features (e.g., log-transform clause counts).

Example pipeline:

Remove features with near-zero variance.
Apply log(x+1) to count-based features.
Standardize to zero mean and unit variance.

5. Build an Effective Portfolio

The choice and diversity of configurations are central.

Start with a mix: default clasp settings, aggressive search, cautious search, restarts-heavy, different preprocessing options.
Use automated configuration tools (e.g., SMAC, irace) to generate high-performing configurations on your training set.
Keep portfolio size moderate: larger portfolios increase selector complexity and risk overfitting; 8–20 diverse configurations is a common sweet spot.
Remove dominated configurations — those never selected or always worse than another config.

Example candidate configurations:

default
long-restarter + aggressive heuristics
nogood learning focused
preprocessing-heavy
assumption-based tactics

6. Choose and Tune the Selector

Selector choice depends on dataset size and feature quality.

k-Nearest Neighbors (kNN) is simple, robust, and often strong with small datasets.
Regression models predict runtime per config; choose best-predicted config. Random Forests or Gradient Boosting work well.
Classification (predict best config directly) can be effective when labels are clear.
Pairwise or per-configuration models scale better with large portfolios.

Tuning tips:

For kNN: tune k, distance metric, and feature weighting.
For trees/forests: tune tree depth, number of trees, and minimum samples per leaf.
Use cross-validation with time-based or instance-split folds to avoid overfitting.

7. Pre-solving Schedules

A short pre-solving schedule can catch easy instances quickly.

Construct short runs of a few strong configurations (e.g., 1–3 seconds each) before running the selector.
Keep pre-solving budget small (5–10% of total cutoff), especially when feature extraction is cheap.
Use instance-agnostic heuristics for pre-solving (configurations that solve many easy instances).

Example schedule: config A for 2s, config B for 3s, then feature extraction + selector for remaining time.

8. Manage Timeouts and Cutoffs

Set sensible per-configuration timeouts to avoid wasting time on hard instances.
Implement adaptive cutoffs: if selector confidence is low, allocate slightly more pre-solving budget; if confident, launch full run immediately.
Consider incremental solving (run short run, then extend) to allow fast successes while keeping ability to recover.

9. Reduce Overfitting and Improve Generalization

Use nested cross-validation or holdout sets from different instance distributions.
Regularize models and limit portfolio size to reduce variance.
Evaluate on time-shifted or domain-shifted instances to ensure robustness.

10. Automation and Continuous Improvement

Automate configuration generation and evaluation with tools like SMAC or irace, combined with periodic retraining of selectors.
Maintain logs of instance features and solver outcomes to continually refine models.
Periodically re-evaluate feature importance and prune or replace low-value configurations.

11. Practical Example: A Tuning Workflow

Collect a representative set of instances; split into train/validation/test.
Measure baseline with default claspfolio.
Profile feature extraction; prune heavy features.
Generate candidate configurations with SMAC (limit to ~20).
Train a Random Forest regression per-config runtime predictor.
Create a pre-solving schedule of two strong configs for 3s total.
Evaluate on validation; prune dominated configs.
Retrain selector on final portfolio and evaluate on test set.

12. Common Pitfalls

Letting costly features dominate the budget.
Overfitting the selector to a small training set.
Using too many similar configurations in the portfolio.
Ignoring feature drift when instance distributions change.

13. Summary Recommendations

Profile first: know where time is spent.
Favor informative but cheap features.
Use automated configurators to build diverse portfolios.
Keep portfolios moderate in size and prune dominated configs.
Use robust selectors (kNN or ensemble methods) with careful cross-validation.
Add a short pre-solving schedule to catch easy instances.

If you want, I can: generate a sample SMAC/irace configuration file for automated portfolio generation, propose a specific feature-reduction pipeline with code, or craft a small experimental plan tailored to your instance set—tell me which.

Claspfolio 101: Getting Started with Portfolio-Based Solving

1. Understand the Components of Claspfolio

2. Measure Before Tuning

3. Optimize Feature Extraction

4. Feature Selection and Normalization

5. Build an Effective Portfolio

6. Choose and Tune the Selector

7. Pre-solving Schedules

8. Manage Timeouts and Cutoffs

9. Reduce Overfitting and Improve Generalization

10. Automation and Continuous Improvement

11. Practical Example: A Tuning Workflow

12. Common Pitfalls

13. Summary Recommendations

Comments

Leave a Reply Cancel reply

More posts

WinShoe

nfsDigitalClock01: The Ultimate Digital Clock for Your Home

Eagluet vs. Competitors: A Comprehensive Comparison

How to Use an IP Configurator for Efficient Network Setup and Troubleshooting