I wanted to systematically go through different programs to analyze word frequency as well as topics. The programs I have selected are Voyant, AntConc, TopicModelling Tool, and NVivo.
I began with Voyant as it is the simplest to use.
What is Voyant you ask?
Voyant is a website based textual analysis tool which provides the ability for users to visualize and analyze textual data and can identify patters within a corpus. The too was created and developed by Stéfan Sinclair of McGill University and Geoffrey Rockwell from the University of Alberta.[1]
You can find this online free tool at voyant-tools.org.
Voyant has several tools that I found to be interesting and potentially of use to this project. For example, the Summary section provides information containing the total number of words as well as the total number of unique words within the entire corpus: Total words, 335,645; Total unique words, 12,346. This is useful to calculate manual percentages of specific words indicating topics. However, that isn’t necessary as other programs like NVivo will do that as well. But it is an option. Voyant (like the other programs) can create word clouds of varying capacity based on word frequency. I should also mention that you can refine the total results by adding stopwords to the stopword list which I have done by amalgamating stop words suggested by my digital humanities professor, adding stop words from two separate online suggestions that incorporate common words, and adding my own stop word list to narrow context and topics. By the end of this project you will be able to view all of this online as I will be uploading data for anyone interested to view.
Moving forward, Voyant also provides a really cool tool called “Trends,” which “generates a graph that demonstrates how the frequency of a particular word changes over time.”[2] This is interesting because there are over 20 documents that take place over the course of the Citizenship Act Readings, from 1945 to 1946. This type of data can demonstrate which topics were more important than others (the topics I am looking at in this thesis). As I specifically included stem words (for example “Jap” which includes Japan and Japanese) for the topics I am investigating this tool has provided a visualization of how the topics compare to one another as well as how they compare over time.
Although Voyant has other interesting tools, the ones formerly mentioned are the ones relevant to my project at this time.


Comments
Post a Comment