I apologize for the extended delay in posting. After my last post I attempted to create a work-around to convert my two column text files into one column. This proved insanely difficult. My thought process was that if I could create OCRed readable pdf files (which I thought I had done) with Foxit then I could export them to editable word documents and then convert them from two column to one column files and then export them to txt files. Did it feel like there must be an easier way to do this? Yes. But I could not find it, at least not without hitting a pay wall. Therefore, I surmised that I would have to one-by-one open files in Foxit PDF Editor, go to the “Convert” tab and then select “To MS Office” in the menu and select “To Word”. This would bring up a new “save” window where I would need to select “settings” beside the file format. Then that would bring up another window and here is where I run into another roadblock. In this menu I should be able to select “Convert to editable documents”, but for whatever reason on my MacBook Air it is greyed out.
I tried to source some kind of fix from online forums, but nothing worked. I gave in and called Foxit PDF help, but they could not figure out why it was accessible on their computer and not mine and effectively gave up and said they possessed no solution. I would not recommend Foxit for their customer service in the future. I then called Carleton’s IT services whose customer service skills were significantly better than Foxit’s however they too had no solution, attempted to also call Foxit, and also found them incredibly unhelpful. I then requested access to Adobe Acrobat Pro since their software has the ability I lacked in Foxit, as well as OCR technology. There were several back-and-forth emails because they do not usually give access to Adobe programs to students, but they finally relented as I was basically out of options, or so I thought.
After over a month of what I just described I needed a break and pivoted to researching and writing sections of my thesis for a well needed digital break. However, I was back at it again last month. I sat down excited to finally make progress with Adobe, but no progress was made. While I was able to export my OCRed PDFs to an editable word document, and convert the two columns to one column, I very quickly realized that the OCR I had done, which I had thought was decent was riddled with errors and mish-mashed text between columns. I backtracked and attempted to begin OCRing the files again but with Adobe Acrobat, and realized it wasn’t doing a better job. I am now convinced that those PDF programs have only been trained to do simplistic one column OCR jobs and nothing more complicated.
At this point I just wanted to be done with OCRing. During this process it had been recommended to me by several individuals that I attempt to use Transkribus. I was hesitant, mainly because every new software or application has a learning curve and learning takes time, time I feel that I am very quickly running out of. Especially as I have been attempting to do this one thing on and off for over a year. But I was again out of options.
Next time: Transkribus…


Comments
Post a Comment