A while ago I blogged about SVM Model Tampering and Anchored Learning: A Case Study in Hebrew NP Chunking by Yoav Goldberg and Michael Elhadad when I was talking about my ACL to-read pile. I just want to say that I’m a satisified customer, it works nicely on a task other than NP chunking, in a language other than Hebrew, and using a machine learning technique other than SVM (maximum entropy aka logistic regression in my case) that wasn’t trained to convergence. 数据挖掘实验室
A very short summary: model tampering (where I get most of my gains) is like feature selection, but in model tampering you take your features out of the model after you’ve built it. This has two advantages: 1) you can know what the effect of removing a feature is without having to retrain your model, so you can pick which ones to remove with reasonable speed, and 2) many of the features you’re removing are likely to be there because of overfitting - these features were used my the model to memorise tricky cases (or annotation errors!) and won’t generalise to other cases. Whereas after a feature selection, the tough cases would try to find other features to overfit with, with model tampering the tricky cases have no opportunity for overfitting. 数据挖掘研究院
There are a lot of chemoinformaticists in my building who have used SVMs at various stages for predicting the activity of chemicals - perhaps I could entice some of them into trying it. 数据挖掘研究院

