The Machine Learning Data Dilemma

To be effective, machine learning (ML) has a significant requirement: data. Lots of data. We can expect a child to understand what a cat is and identify other cats after just a few encounters or by being shown a few examples of cats, but ML algorithms require many, many more examples. Unlike humans, these algorithms can't easily develop inferences on their own. For example, machine learning algorithms interpret a picture of a cat against a grassy background differently than a cat shown in front of a fireplace.

The algorithms need a lot of data to separate the relevant "features" of the cat from the background noise. It is the same for other noise such as lighting and weather. Unfortunately, such data hunger does not stop at the separation of signal from noise. The algorithms also need to identify meaningful features that distinguish the cat itself. Variations that humans do not need extra data to understand -- such as a cat's color or size -- are difficult for machine learning. READ MORE ON: TDWI

generic11.jpg