Towards Building a Recommender System to improve Diversity in Online News Feed
Date of Award
12-2024
Degree Name
Master of Science
Department
Computer Science
First Advisor
Shameek Bhattacharjee Ph.D.
Second Advisor
Ajay Gupta Ph.D.
Third Advisor
Li Yang, Ph.D.
Keywords
BERT, diversity index, information retrieval, information theory, LDA, topic modeling
Access Setting
Masters Thesis-Abstract Only
Restricted to Campus until
12-1-2026
Abstract
This research proposes a method to compute the diversity index in a news feed to offer a spectrum of possibilities from the news articles available on the Internet and to give a fair representation to most perspectives and opinions. The current social media recommendation algorithms work purely in the interest of the user to enhance engagement on the platform. This creates a gap in the information provided to the user. The objective of the research is to fill this gap by offering a much diverse set of news articles from the Internet.
Firstly, the required data set surrounding a subject is collected from the internet using SerpDev search API [20] and CrewAI’s website scraping [5] tool. Once the articles are extracted from different websites, characteristics around it are brought to a common ground using pre-processing steps.
Multiple themes in a document from the corpus are then explored using Latent Dirichlet Allocation known as LDA [20] which uses Bayesian inference to get the posterior probability of topics in each document, also the posterior probability of each word in the document. A topic is essentially a collection of words that frequently appear together in documents and represent a certain theme or subject matter. Since topics are not manually seeded in LDA, the ideal number of topics for the corpus is computed using Coherence score [4] and Perplexity score [16].
To manually input topics, Bidirectional Encoder Representations from Transformers known as BERT [6] topic modeling is further explored. A set of topics is prepared to best represent the spectrum of possibilities for the corpus. Posterior probability of topics in each document combined with diversity scoring technique [7] is used to generate a diversity index for the article set. Here, diversity means the variability in the information being processed, transmitted or encoded.
Recommended Citation
George, Joseph, "Towards Building a Recommender System to improve Diversity in Online News Feed" (2024). Masters Theses. 5438.
https://scholarworks.wmich.edu/masters_theses/5438