Artificial intelligence Shines Light On The Dark Web, Beneath the surface web, the public forum of the internet you use daily to check email or read news articles, exists a concealed “dark web.” Host to anonymous, password-protected sites, the dark web is where criminal marketplaces thrive in the advertising and selling of weapons, drugs, and trafficked persons. Law enforcement agencies work continuously to stop these activities, but the challenges they face in investigating and prosecuting the real-world people behind the users who post on these sites are tremendous.

“The pop-up nature of dark-web marketplaces makes tracking their participants and their activities extremely difficult,” says Charlie Dagli, a researcher in MIT Lincoln Laboratory’s Artificial Intelligence Technology and Systems Group. Dagli is referring to the fast rate at which dark-web markets close down (because they are hacked, raided, abandoned, or set up as an “exit scam” in which the site shuts down intentionally after customers pay for unfulfilled orders) and new ones appear. These markets’ short lifetimes, from a few months to a couple of years, impede efforts to identify their users.

To overcome this challenge, Lincoln Laboratory is developing new software tools to analyze surface- and dark-web data.

These tools are leveraging the one benefit this whack-a-mole-like problem presents — the connections sellers and buyers maintain across multiple layers of the web, from surface to dark, and across dark-web forums. “This constant switching between sites is now an established part of how dark-web marketplaces operate,” Dagli says.

Users are making new profiles constantly. Although they may not be employing the same usernames from site to site, they are keeping their connections alive by signaling to each other through their content. These signals can be used to link personas belonging to the same user across dark-web forums and, more revealingly, to link personas on the dark web to the surface web to uncover a user’s true identity.

Linking users on the dark web is what law enforcement already tries to do. The problem is that the amount of data that they need to manually shuffle through — 500,000 phone numbers and 2 million sex ads posted a month — is too large and unstructured for them to find connections quickly. Thus, only a low percentage of cases can be pursued.

To automate the persona-linking process, Lincoln Laboratory is training machine learning algorithms to compute the similarity between users on different forums. The computations are based on three aspects of users’ communications online: “How they identify to others, what they write about, and with whom they write to,” Dagli explains.