Skip to content
usc-breakthrough-could-help-scientists-spot-disease-clusters,-fraud-networks-and-more-–-usc-viterbi-|-school-of-engineering

USC Breakthrough Could Help Scientists Spot Disease Clusters, Fraud Networks and More – USC Viterbi | School of Engineering

A group of people are connected by nodes

Illustration: Midjourney

Emilio Ferrara has spent nearly two decades thinking about a deceptively simple question: in a network of millions of people or cells or proteins, how do you find the clusters?

The problem, known as community detection, sounds intuitive. Draw a map of friendships, and clusters naturally emerge: tight knots of people who know each other, separated by thinner threads connecting them to the wider world. Do the same with gene expression data, or cryptocurrency transactions, or brain scans, and similar structures appear. The challenge is doing this at scale, with real-world data that is messy, massive , and rich with information beyond just who is connected to whom.

For years, the field was stuck. Algorithms that could handle millions of nodes tended to ignore everything except the raw structure of the network, discarding profile data, text, and behavioral signals. The ones that could incorporate that richer information choked on anything larger than a few thousand nodes. Researchers largely accepted this as a fundamental tradeoff.

“For whatever the reason, at some point, researchers kind of gave up on this idea that you could perhaps do both things at the same time,” Ferrara said. “There were no major breakthroughs for a few years.”

Then, over dinner, his daughter said something that stuck with him.

An Unlikely Spark

Ferrara had been explaining the problem at home, the kind of abstract, decades-old puzzle he had carried since his PhD. His daughter suggested something offhand: what if the nodes sent messages to each other the way dolphins use echolocation? Like, what if they pinged their neighbors and listened for what came back?

“I paused on that,” Ferrara said. “And then it kind of got stuck in me.”

The analogy clicked. Diffusion models, the same family of algorithms that power image generators like DALL-E and video tools like Sora, work by progressively spreading and resolving signals. Ferrara realized he could adapt that framework to graphs. Instead of diffusing noise to reconstruct an image, nodes could diffuse their semantic signals outward through the network. Nodes that belonged to the same community would reinforce each other’s signals. Nodes across community boundaries would dampen them.

The result is ECHO, short for Encoding Communities via High-order Operators, a new algorithm Ferrara developed and published in the journal Machine Learning with Applications in June 2026. Ferrara is a Professor at the Thomas Lord Department of Computer Science of USC Viterbi and the USC Mark and Mary Stevens School of Computing and Artificial Intelligence, and Principal Scientist at the USC Information Sciences Institute.

Two Walls, One Solution

The solution to community detection had been stuck between what Ferrara describes as two distinct barriers.

The first is the Semantic Wall. When a neural network tries to understand a dense or noisy graph by averaging information across neighbors repeatedly, the distinct characteristics of nodes blur together. A user whose interests are niche gets averaged into the broader crowd around them. Communities stop looking like communities and start looking like noise. The technical term is “over-smoothing,” and it has made deep learning approaches unreliable on complex real-world graphs.

The second is the Systems Wall. Computing how similar every node is to every other node requires memory that scales with the square of the network size. On a graph with a million nodes, that math becomes intractable and crashes even high-end hardware.

ECHO breaks through both. Its “Topology-Aware Router” first analyzes the structure of the incoming network before doing anything else, assessing how dense it is, how noisy, how semantically coherent its connections are. Then it selects the appropriate encoding strategy automatically. The algorithm reads the room before it starts working.

“It looks at the network sort of holistically, looks at some of these properties at a very high level, and determines what’s the best representation to use for that particular network,” he said.

From there, ECHO spreads signals through the network the way a sound wave travels: strong between nearby, similar nodes, weaker across boundaries where communities differ. The algorithm learns to dampen that flow at the boundaries of groups, letting communities take shape rather than bleed into each other. On the memory side, instead of comparing every node to every other node at once, ECHO works through the network in smaller batches, keeping the computational load flat no matter how large the graph gets.

In tests on the Pokec social network, a real-world dataset with 1.6 million nodes and more than 30 million edges, ECHO completed the full analysis in under 10 minutes on a single commercial GPU, processing more than 2,800 nodes per second. Competing methods either ran out of memory or required hardware well beyond standard research setups.

“Instead of growing as the square of the number of nodes, it grows more linearly,” Ferrara said. “So it allows you to tackle much, much larger networks.”

From Brain Science to Financial Fraud

The applications are wide, and Ferrara is candid that he does not yet know all of them.

A student in his lab is already applying ECHO to social media data, identifying not just who is connected to whom, but who shares the same ideas, language, and behavioral patterns simultaneously, at a scale that was not previously feasible.

In biology, the potential is significant. Protein interaction networks, gene co-expression maps, and brain connectomes are all detailed maps of connections that share the same mathematical structure as social graphs. They are large, they are rich with attributes, and understanding their community structure could point toward new insights about disease, drug targets, or how the brain organizes itself.

“I would be thrilled if somebody picks this up and we learn something new about the brain, or about cancer,” Ferrara said.

Financial fraud detection is another target. Ferrara has previously worked on identifying cryptocurrency pump-and-dump schemes, and ECHO’s ability to spot tight clusters of coordinated behavior within massive transaction graphs could extend that work considerably. The same logic applies to human trafficking networks, where identifying clusters of activity and vulnerability requires exactly the kind of simultaneous structural and semantic analysis ECHO provides.

Ferrara is clear-eyed about the dual-use nature of a tool this powerful. A technology that can find hidden communities is, by definition, a technology that could be used to surveil them.

“One would want to be wary about the risks associated with identifying isolated groups that might not want to be identified,” he said. “There is always that ethical lens to keep in mind when applying it, especially with real people data.”

The distinction he draws is between domains. In biology or neuroscience, finding a hard-to-detect community almost always means finding something worth knowing: an anomaly, a structure, a signal. In social data, the same capability raises harder questions about who is looking and why.

ECHO is open source, available now on GitHub, and Ferrara is hoping researchers in fields he has never worked in will pick it up and find uses he has not imagined. The algorithm was designed to run on standard commercial hardware, not just the supercomputer clusters that most cutting-edge AI research requires. Looking further out, Ferrara sees it as part of a larger shift in what network science can do, including, potentially, turning its tools on AI itself.

“Echo could be used to unpack artificial neural networks and understand more about their inner workings,” he said, “and ideally perhaps make them more efficient, more accurate, but also maybe more fair and more interpretable.”

ECHO is available at github.com/emilioferrara/ECHO-GNN. The research was supported in part by NSF Award Number 2331722.

Published on June 30th, 2026

Last updated on June 30th, 2026

This article may feature some AI-assisted content for clarity, consistency, and to help explore complex scientific concepts with greater depth and creative range.

colind88

Back To Top