MIT CSAIL’s Machine Learning Algorithm Helps Predict Patterns In Large Data Streams

Ever heard of the “Britney Spears problem“? Contrary to what it sounds like, it’s got nothing to do with the dalliances of the rich and famous. Rather, it’s a computing puzzle related to data tracking: Precisely tailoring a data-rich service, like a search engine or fiber internet connection, to individual users hypothetically requires tracking every packet sent to and from the service provider, which needless to say isn’t practical. To get around this, most companies leverage algorithms that make guesses about the frequency of data exchanged by hashing it (i.e., divvying it up into pieces). But this necessarily sacrifices nuance — telling patterns that emerge naturally in large data volumes fly under the radar.

Luckily, researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) believe they’ve devised a viable alternative that relies on machine learning. In a newly published paper (“Learning-Based Frequency Estimation Algorithms“), they describe a system — dubbed LearnedSketch, because of the way it “sketches” data in a data stream — that predicts if specific data elements will appear more frequently than others and, if they in fact do, autonomously separates them from the rest of the hashed portions. READ MORE ON: VENTURE BEAT