I recently came across a Reddit post about a new gradient boosting implementation called PKBoost. The author had been working on this project to address two common issues they faced with XGBoost and LightGBM in production: performance collapse on extremely imbalanced data and silent degradation when data drifts.
The key results showed that PKBoost outperformed XGBoost and LightGBM on imbalanced data, with an impressive 87.8% PR-AUC on the Credit Card Fraud dataset. But what’s even more interesting is how PKBoost handled data drift. Under realistic drift scenarios, PKBoost experienced only a 2% degradation in performance, whereas XGBoost saw a whopping 32% degradation.
So, what makes PKBoost different? The main innovation is the use of Shannon entropy in the split criterion alongside gradients. This approach explicitly optimizes for information gain on the minority class, which helps to prevent overfitting to the majority class. Combined with quantile-based binning, conservative regularization, and PR-AUC early stopping, PKBoost is inherently more robust to drift without needing online adaptation.
While PKBoost has its trade-offs, such as being 2-4x slower in training, its ability to auto-tune for your data and work out-of-the-box on extreme imbalance makes it an attractive option for production systems. The author is looking for feedback on whether others have seen similar robustness from conservative regularization and whether this approach would be useful for production systems despite the slower training times.
