¶ 1 Leave a comment on paragraph 1 0 Micki Kaufman, former class visitor and creator of Quantifying Kissinger recently hosted a data visualization workshop on October 31st where she not only showed how to set projects up in Gephi and Tableau, but also showed the power of smaller script GUIs for text analysis like AntConc and Mallet. Her initial project dealt with thousands of documents, and the constraints of originally undefined metadata. How do you create raw data from documents, and the other question is – how do you visualize it in useful ways that sparks humanities based questions?
¶ 2 Leave a comment on paragraph 2 0 We started the workshop by using a dataset – in this case, Micki creatively used the sign-up data for the session and then turned it into a Google spreadsheet where we all populated it with our own attributes. The metadata included GC program, years at the grad center and departmental involvement to name a few. After populating these fields, she first went over how text files and spreadsheets were the same in terms of content editing. We then imported the created data into Gephi, and it was able to show us initial clusters in a network visualization based off of program and department for each student. From this we went on to explore multiple forms of modular analysis which would be subject to heavy change if we modified any data.
¶ 3 Leave a comment on paragraph 3 0 One of the more important questions raised with this visualization was the choice of color. Color choice is important because you need to not only present a gripping visualization, but also consider questions of disability and culture. The color red for example might mean something different across multiple cultures, and might not be the right color for a certain variable if you know your intended audience well enough. Also, how do you manage to make the visualization accessible to someone who might be colorblind?
¶ 4 Leave a comment on paragraph 4 0 After our initial visualizations in Gephi, we moved onto topic modeling. Topic modeling is a way of using programming to identify potential patterns in a certain corpus. For Micki, this was going through thousands of documents that she scraped, and finding certain trends to use in a visualization. However, topic modeling is infinitesimal – it’s not discreet and mutually exclusive. When you run a topic modeling script, you’re bound to have overlap and differing results each time which would make a good case to find average topics that appear over multiple modeling sessions.
¶ 5 Leave a comment on paragraph 5 0 We then moved onto Tableau, which is another piece of data visualization software. In tableau, we used state of the union (SOTU) data that Micki provided to play around with visualizations a bit. The visualization utilized words and word weight over the history of the addresses. In one case, we took a look at why the word “tonight” might appear more heavily in the 80s than any other year – Micki noted that it was due to the shift to television for SOTU addresses post print.
¶ 6 Leave a comment on paragraph 6 0 To close, we went over a couple of pieces of text analysis software, Micki’s 3D model of her project and the potential future of visualizations in VR. Overall it was an excellent seminar and taught everyone a great deal about not only setting up visualizations, but figuring out what you want to get out of data visualization.