Skip to main content

Posts

Showing posts from February, 2025

Topic Modeling Tool

  The next tool I moved to on my corpus analysis journey was the topic modelling tool.   The Topic Modeling Tool is an interesting innovation because it utilizes MALLET (Machine Learning for Language Toolkit) to perform LDA (Latent Dirichlet Allocation) topic modeling but also incorporates a user friendly interface allowing individuals like myself who can learn basic coding but just don’t understand how to troubleshoot when things go wrong.   The tool was created by David Newman, part of the Research Faculty of Computer Science at the University of California Irvine, and Arun Balagopalan and further developed by Jonathan Scott Enderle, a Digital Humanities Specialist at the Penn Library at the University of Pennsylvania. [1] Unfortunately Enderle has since passed and therefore development of the tool has stalled until someone else decides to take up cause.   Regardless the tool was still incredibly useful for my purposes.   It ...

AntConc...

  What is AntConc you may ask?   AntConc is a free multi-purpose corpus analysis toolkit that houses a comprehensive set of tools that includes “concordancer, word and keyword frequency generators, tools for cluster and lexical bundle analysis, and a word distribution plot.” The software was created by Lawrence Anthony, a Professor of Applied Linguistics at Waseda University in Japan. You Can find the software here.   https://www.laurenceanthony.net/software/antconc/   Some of AntConc’s tools overlap with Voyants’, for example the trends tool in Voyant displays similar information to the plot tool in AntConc, however, whereas Voyant displays the progression over the documents over a graph and in comparison to one another, the plot tool does it over a series of horizontal bars that can be compared. Voyant’s tool is visually more appealing and displays the documents so I will be using that tool instead of AntConc’s Plot.   The Ant...

And so the data analysis begins…

  I wanted to systematically go through different programs to analyze word frequency   as well as topics. The programs I have selected are Voyant, AntConc, TopicModelling Tool, and NVivo. I began with Voyant as it is the simplest to use. What is Voyant you ask? Voyant is a website based textual analysis tool which provides the ability for users to visualize and analyze textual data and can identify patters within a corpus. The too was created and developed by Stéfan Sinclair of McGill University and Geoffrey Rockwell from the University of Alberta. [1] You can find this online free tool at voyant-tools.org. Voyant has several tools that I found to be interesting and potentially of use to this project. For example, the Summary section provides information containing the total number of words as well as the total number of unique words within the entire corpus: Total words, 335,645; Total unique words, 12,346.   This is useful to calculate manual percentages...