Dissertations

Unsupervised Learning With Word Embeddings Captures Quiescent Knowledge From COVID-19 And Materials Science Literature

Tasnim H. Gharaibeh, Western Michigan UniversityFollow

Date of Award

4-2022

Degree Name

Doctor of Philosophy

Department

Computer Science

First Advisor

Dr. Elise de Doncker

Second Advisor

Dr. Alvis Fong

Third Advisor

Dr. Pnina Ari-Gur

Keywords

COVID-19 drugs, COVID-19 vaccines, giant magnetocaloric effect, laser powder bed fusion, unsupervised learning, word2vec

Abstract

Millions of scientific papers are produced each year and the scientific literature is continuing to grow at a head-spinning speed. Thus, massive scientific knowledge exists in solid text, but all these publications make it difficult, if not impossible, for researchers to keep in up to date with discoveries, even within a narrow scientific area. This massive amount of information also makes it difficult to find implicit and hidden connections, relationships, and dependencies within the information that may guide the direction of future research or lead to valuable new insights. So, there is a need for algorithms or models that can scan the text of millions of papers to uncover new scientific knowledge and search for hidden connections within this knowledge. For computer algorithms, to utilize this resource, it should be converted in terms of numbers and represent the words in some mathematical form. This is where artificial intelligence and machine learning can help. Advanced algorithms in machine learning and natural language processing can be used to make large databases more useful and easier to handle by both researchers and clinicians. We used Word2Vec for our implementation and trained many unsupervised word-embedding models on different data sets in materials science and in the medical field to extract hidden knowledge, relations, and interactions based on words that appear in similar contexts in the text while often having similar meanings. So far, we have adopted three main models. The first is trained within additive manufacturing (AM), targeting the powder bed fusion (PBF) processes, such as selective laser sintering (SLS), selective laser melting (SLM), and direct metal laser sintering (DMLS), with the goal of extracting new knowledge to improve AM processes and address material properties depending on the process used. Other properties inherent to the materials, such as the giant magnetocaloric effect, are also addressed in a specific model. The second model is trained within COVID-19 drugs literature to address what insights can be obtained on candidate drugs to treat COVID-19. Finally, the third model is trained within COVID-19 vaccine literature to predict good candidate vaccines. We thus demonstrate how word embeddings can help extract hidden knowledge from the published literature in very distinct areas of research.

Access Setting

Dissertation-Open Access

Recommended Citation

Gharaibeh, Tasnim H., "Unsupervised Learning With Word Embeddings Captures Quiescent Knowledge From COVID-19 And Materials Science Literature" (2022). Dissertations. 3823.
https://scholarworks.wmich.edu/dissertations/3823

Download

Included in

Artificial Intelligence and Robotics Commons, Data Science Commons

COinS

Dissertations

Unsupervised Learning With Word Embeddings Captures Quiescent Knowledge From COVID-19 And Materials Science Literature

Date of Award

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Access Setting

Recommended Citation

Included in

ScholarWorks

Browse

Author Corner

Links

Dissertations

Unsupervised Learning With Word Embeddings Captures Quiescent Knowledge From COVID-19 And Materials Science Literature

Author

Date of Award

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Access Setting

Recommended Citation

Included in

Share

ScholarWorks

Browse

Author Corner

Links