This research project developed a binary classification model to predict retweet likelihood using exclusively topological features from Twitter's social network structure.
Working with the comprehensive Higgs Twitter dataset from Stanford's SNAP repository, the study constructed and analyzed both social graphs and retweet graphs to extract meaningful network-based features.
The project employed advanced network science methodologies to engineer topological features, including degree centrality, clustering coefficient, PageRank, and other network metrics.
Binary classification was performed using Generalized Linear Models (GLMs) through H2O.ai's machine learning platform, demonstrating that social network topology alone could achieve high classification accuracy for retweet prediction.
Successfully demonstrated that graph-based features without content analysis could effectively predict retweet behavior, achieving high classification accuracy using only network structural information. This approach provided insights into how social network topology influences information propagation patterns on Twitter.
30 Jun 2019
Trusted by 70800+ Generalists. Try it now, free to use
Start making more money