Developed a Retweet Likelihood Prediction Model

Summer Research Internship Project, IIT Patna, 2019.

Project Overview

This research project developed a binary classification model to predict retweet likelihood using exclusively topological features from Twitter's social network structure. 

Working with the comprehensive Higgs Twitter dataset from Stanford's SNAP repository, the study constructed and analyzed both social graphs and retweet graphs to extract meaningful network-based features.

Technical Approach

The project employed advanced network science methodologies to engineer topological features, including degree centrality, clustering coefficient, PageRank, and other network metrics. 

Binary classification was performed using Generalized Linear Models (GLMs) through H2O.ai's machine learning platform, demonstrating that social network topology alone could achieve high classification accuracy for retweet prediction.

Key Findings

Successfully demonstrated that graph-based features without content analysis could effectively predict retweet behavior, achieving high classification accuracy using only network structural information. This approach provided insights into how social network topology influences information propagation patterns on Twitter.

Tools & Technologies

  • Python: NetworkX for graph analysis, Pandas for data manipulation
  • H2O.ai: A Scalable machine learning platform for GLM implementation
  • Cytoscape: Network visualization and analysis
  • Dataset Processing: Higgs Twitter dataset from Stanford SNAP repository

Skills Demonstrated

  • Network science and graph theory applications
  • Social media analytics and prediction modeling
  • Large-scale data processing and feature engineering
  • Statistical modeling with Generalized Linear Models
  • Research methodology and scientific documentation
  • Technical report writing and academic presentation

Deliverables

  • Comprehensive technical research report with literature review and methodology
  • Final presentation summarizing project goals, process, and key findings
  • Processed datasets and network analysis results
  • Network visualizations and statistical model outputs

30 Jun 2019

Keywords
Machine Learning
Social Network Analysis
Tech
Report
Coding
Research

Creating portfolio made simple for

Trusted by 70800+ Generalists. Try it now, free to use

Start making more money