I have been participating in the kaggle digit recognizer competition and after some initial success someone suggested that I look into feature selection. I've done a bit of research but I am having a hard time wrapping my head around what exactly feature selection is.
Below is a partial description of the data I am working with:
Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive.
So if we arrange the pixels 28x28 we get them ordered like below:
000 001 002 003 ... 026 027
028 029 030 031 ... 054 055
056 057 058 059 ... 082 083
| | | | ... | |
728 729 730 731 ... 754 755
756 757 758 759 ... 782 783
So based on the above, if a number was represented it is probably safe to say that the pixel at position 000 (aka the pixel at the very top left of the image) will most likely be unimportant when determining what the number is because it will probably be the same for all numbers. I make this assumption based on the fact that the top left pixel is so far out from the center where most of the number would be.
My question: Would me excluding 000 when doing my calculation be considered a very basic form of feature selection? If not, could someone try to explain it to me?
Sure, that's an example of feature selection. But generally the term refers to a more formal process of deciding which features are relevant, and which are either irrelevant or redundant to some other features. When you want to pick only a few features, it's a good idea to determine and evaluate your choice based on examining the actual data.
Incidentally, feature selection can tell you which features to throw out, but it can't tell you where to get your features from in the first place. Tossing a bunch of pixels into a black box classifier will virtually always produce crappy results, so you'd probably want to add additional features, like edge detection at each pixel, or the average value of the top third of the image, or the location of the darkest area after blurring, or.... etc. The number of potential features is gigantic (and consists mostly of useless data), which is why feature selection is useful.