Overview of my topic
I’m trying to predict sovereign debt crisis based on various econ. indicators. I have 3 dummy columns – crisis, in1y, in2y. Crisis indicates whether a country experienced a sovereign debt crisis in a given year. In1y and in2y are the same variable lagged by 1-2 years. After deriving the latter 2, I already filtered out all crises instances to eliminate the noise. My goal is to find how well the econ. indicators predict the lagged dummies.
Where I'm stuck
Machine Learning: I predict in1y and in2y with logit and ‘vanilla’ CART to have a baseline, then I’d like to implement more complex ML techniques. Currently, those are regularized random forest, XG Boost, AdaBoost, and C5.0. However, while the first 2 can predict some crises (40-60% for in2y), the ‘better’ algos perform horribly. Despite trying many rounds of tuning, cross-validation, or different levels of regularization, I always arrive at less than 10% of crises predicted. This result is hard to believe; it goes against the past literature findings, as well as the ‘usual’ observation that advanced ML outperforms basic models.
Did anyone ever come across similar results? Is there a logical reason behind or am I simply bad at coding it right? :D For those literate in R, my dataset and current code are stored here.