Skip to main content

Topic Modeling Tool

 

The next tool I moved to on my corpus analysis journey was the topic modelling tool.

 

The Topic Modeling Tool is an interesting innovation because it utilizes MALLET (Machine Learning for Language Toolkit) to perform LDA (Latent Dirichlet Allocation) topic modeling but also incorporates a user friendly interface allowing individuals like myself who can learn basic coding but just don’t understand how to troubleshoot when things go wrong.

 

The tool was created by David Newman, part of the Research Faculty of Computer Science at the University of California Irvine, and Arun Balagopalan and further developed by Jonathan Scott Enderle, a Digital Humanities Specialist at the Penn Library at the University of Pennsylvania.[1] Unfortunately Enderle has since passed and therefore development of the tool has stalled until someone else decides to take up cause.

 

Regardless the tool was still incredibly useful for my purposes.

 

It allows users to upload a stopwords list to exclude them from the analysis. The interface also allows users to select how many topics it would like to tool to run, how many topic words to print, as well as whether or not the analysis should dissect the corpus into n -word chunks. 

 


 

 

For my analysis I ran analyses with both N-word and without n-word chunks ranging from 4-12 topics in increases of 2.

 

And there were some interesting results in the data… which I will be writing about in my thesis.

 

If you would like to download the topic modeling tool and try it out you can find it here

 

https://github.com/senderle/topic-modeling-tool

 

Also, I realize that I never really explained how I chose my corpus… or what my corpus is so… next time!



[1]Jonathan Scott Enderle. “GitHub - Senderle/Topic-Modeling-Tool: A Point-And-Click Tool for

Creating and Analyzing Topic Models Produced by MALLET.” GitHub, 10 Apr. 2017,

github.com/senderle/topic-modeling-tool. Accessed 6 Feb. 2025 ; “Department of English.”

Upenn.edu, 2021, www.english.upenn.edu/people/jonathan-scott-enderle. Accessed 6 Feb.

2025.; “David Newman.” Google.com, 2020, scholar.google.com/citations?user=3z

mSpYAAAAJ&hl=en. Accessed 6 Feb. 2025.

 

Comments

Popular posts from this blog

Technology does not like me

 To date I still have yet to analyze my selected text through any software because no matter what I do or how many problems I solve I hit roadblock after roadblock.  As previously mentioned, I intend to analyze text from the House of Commons and Senate -- specificlaly the readings pertaining to the first Canadain Citizenship Act. My initial issue was that despite this resource having been digitized and OCRed (Optical Character Recognition -- when software converts images of textual documents into readable, editable and  searchable text) the OCR was conducted years ago and was not wholy accurate. Many words were incorrectly read, and despite having two separate colums to a page, the OCR sometimes only recognized them as one in sections.  Therefore my first task was to remove the old bad OCR and redo it with newer techonology to improve the accuracy. Under the recommendation of another digital humanities student, I attempted to formulate python code utilizing ChatGPT f...

DATA DATA DATA!

I have finally published the data sets from the corpus on Zenodo. The following citations contain the links to the data.  Have at it!  Amato, Natalie. “Corpus”. Zenodo , March 27, 2025. https://doi.org/10.5281/zenodo.15098565 . Amato, Natalie. “Voyant Files”. Zenodo, March 27, 2025. https://doi.org/10.5281/zenodo.14871765.  Amato, Natalie. “Voyant Files”. Zenodo, March 27, 2025. https://doi.org/10.5281/zenodo.14871765 . Amato, Natalie. “Stopwords”. Zenodo , March 28, 2025. https://doi.org/10.5281/zenodo.15103566 . Amato, Natalie. “Nvivo Files”. Zenodo , March 28, 2025. https://doi.org/10.5281/zenodo.15103555 . Amato, Natalie. “Antconc Collocate Files”. Zenodo, March 28, 2025. https://doi.org/10.5281/zenodo.15103493 .   Amato, Natalie. “Antconc Cluster Files”. Zenodo , March 28, 2025. https://doi.org/10.5281/zenodo.15103462 .   Amato, Natalie. “Antconc KWIC Files”. Zenodo, March 27, 2025. https://doi.org/10.5281/zenodo.15098553 .    Amato, Nata...