One Database for All Data???

At The Lexicon of DH workshop run by the CUNY Digital Fellows on September 29, our class’ very own Jojo Karlin ran a very informative and engaging discussion of what it means to do–and how to do–digital humanities work.  In the course of the workshop, a very interesting question (to this librarian at least) was posed by a participant: Is there a database for all the data that is available?  Perhaps if there was, I’d be out of a job!

The same participant asked, for argument’s sake, where they could go for information on US theater productions in the last century.  I suggested the Internet Theater Database, but it was a resource I was already familiar with (and its scope is limited) and there’s also the Internet Broadway Database (scope also limited), but the inquirer was asking for a source to go to if they did not already know of a data set.  So, I quickly tried to find such a database of data sets during the workshop.  Not surprisingly, I have not been able to find this comprehensive database of all data (at least not yet).  Of course, there are several issues at play as to why this database does not exist, not least of which is the price of information and the commercialization of data which often puts limits on how information is shared, in addition to the cost of compiling, hosting, and designing such a database.

Nonetheless, as with much of the DH community, arguments for sharing data are strongly rooted in the idea that openness will create an opportunity for growth and development.  So, I wanted to share some of the projects and resources that I have found that are working towards an amalgamation of open data sets out there…

Conveniently enough, last week I was doing some collection development for the library I work at and came across a recommendation from Choice publication for Data USA.  According to the “About” page, this project aims to place “public US Government data in your hands.  Instead of searching through multiple data sources that are often incomplete and difficult to access…”  As with much of the data currently available through this kind of portal, it is largely produced by government agencies.

During one of our initial class discussions, we went around and stated what projects or tools interested us or what we wanted to know more about.  I stated that I was interested in the DPLA (Digital Public Library of America) and they do a great job of aggregating data from a range of cultural institutions throughout the country (and they make it pretty easy to access the data):  https://dp.la/info/developers/

Where would the state of information be if Google didn’t have some handle on aggregating data???

Open Knowledge International has put together http://dataportals.org/

Their mission statement is very noble indeed: “We want to see enlightened societies around the world, where everyone has access to key information and the ability to use it to understand and shape their lives; where powerful institutions are comprehensible and accountable; and where vital research information that can help us tackle challenges such as poverty and climate change is available to all.”

And a few other sources I’ve found (and there are so many more!):

Forbes.com published a helpful list off 33 free big data sets earlier this year: http://www.forbes.com/sites/bernardmarr/2016/02/12/big-data-35-brilliant-and-free-data-sources-for-2016/#87796d267961

So much data is out there, but not in one place.  Also, for the workshop participant inquiring about theater data, there does not seem to be a database for them.  Although, for the record, they were only inquiring about theater data as a hypothetical and did not claim to actually be interested in that line of scholarship!  It is my hope that more projects develop and the spirit of openness adopted by many scientific and government communities permeates across the spectrum of disciplines and industries so that one day there will be a database of all data.

 

 

Posted in Student Post, Uncategorized | Tagged , | Comments closed

Lexicon of DH

Last Thursday’s GC Digital Fellows Workshop, “The Lexicon of DH,” as the title suggests, provided attendees with a great overview of tools and terms to better understand DH and possible DH projects. Jojo Karlin’s presentation began with a simple parsing of the term “Digital Humanities” as follows:

Digital Humanities projects use

digital methods of research

that engage humanities topics in their materials

and/or

interpret the results of digital tools from a humanities lens

Among the attendees, I saw some of my classmates from DH Praxis alongside PhD candidates in Art History, Sociology, and Education. I was surprised to find some faculty in attendance from Literary Studies and Middle Eastern Studies. All this is to say, it was a pretty diverse group who came to the workshop with a broad range of digital skill sets, questions, and concerns. 

Jojo explained that data for humanists is “constructed, interpretable, processable (digitally), and can have evidentiary value.” Without data, digital humanities projects would not be possible and Jojo provided us with several sources such as databases (NYC Open Data, Digital Public Library of America) institutional collections (Digital Scriptorium, NYPL Digital Collections) and other digital projects (Around DH in 80 Days). She also gave us tools and techniques we could use to collect data with audio/visual (TANDEM, CUNYcast), web scraping (Scrapy), text analysis (Python, MALLET, R , AntConc) text encoding (Text Encoding Initiative), and geocoding (Mapbox, CartoDB, OpenStreetMap, QGIS). I appreciated that Jojo spent some time discussing APIs. Her definition of APIs as a way to communicate with websites, made them more approachable instead of intimidating. Jojo emphasized that “digital things” often have documentation, keys that allow them to be accessed. She also encouraged us to register for Zotero in order to easily save, access, and cite all the tools and information we come across during our research. 

A heated discussion arose regarding ethics in response to Lev Manovich’s “Selfie Project,” which involved gathering images from thousands of Instagram users’ selfies. In the age of big, rapidly accessible data, where do we draw lines based on privacy? How does a digital humanist deal with personal information when it is possible to acquire from social media?   The IRB and their scrutiny of human subject research was invoked as a possible model. Some pointed out that Manovich’s project is a creative and artistic project and not subject to IRB guidelines. We did not come to a concrete resolution but I do believe that “the human element” is an important thing to consider in designing DH projects.

Early on in the workshop, we did an exercise which involved talking to the person seated next to us about their background, digital skillsets, and research interests. This exercise helped cement the social and collaborative aspects of DH. One sentiment expressed by Micki Kaufman’s presentation and echoed by “The Lexicon of DH” workshop is to consider what information you might need for a project and to let that be a guide. Mastery of DH tools are certainly helpful but, for those who do not possess it, resources exist to have at least working knowledge in order to execute one’s project. For example, the GC has upcoming working groups for mapping and Python. Jojo’s post, Helpful Hints, has more on this.

Attending the “Lexicon of DH” workshop, we did not leave with a single easy definition of DH. However, from the examples presented and the myriad of research interests the attendees had, we left with a positive sense of DH’s potential possibilities and how we can tap into them.

Posted in Uncategorized | Comments closed

Zotero Workshop

Last week, I attended a workshop provided by the Mina Rees Library. Two librarians, from the excellent library staff, walked participants through the installation and basic use of the citation management program, Zotero. It was an incredibly helpful session, and I am excited about the ways this new tool might assist in my research.

There are different ways to use Zotero, and it works with several Internet browsers. I installed the standalone application, and synced it to Chrome and Word. Working with several open windows is more comfortable to me, so having the Zotero library separate from the browser window is ideal. After getting the basic idea of how it works, collecting metadata, files and snapshots from online sources, making folders for different projects, and adding tags and notes, I have been experimenting with the actual use of Zotero.

The benefits in terms of the actual writing of a paper, with proper citations, and a flawless bibliography, are clear. These mechanical bits of producing a research paper, are the least fun parts, and Zotero just takes care of it all. But, it only works if the data it has gathered is good and complete. It seems to work well with articles from JSTOR, and e-books from the library. Keeping track of books located online, from Hathi Trust for example, does not work as smoothly. Looking at digitized historical documents online, is another issue. After more use of Zotero, I hope to have a better understanding of what it is capturing.

All of this prompted me to think about how I research, and how other students (and non-students) do research. Zotero encourages users to look at the metadata being collected from websites, perhaps meaning researchers will give more consideration to the source. And, not just the metadata, but the information in general. This seems like a good practice. Zotero might be used to teach students, from high school and up, to critically evaluate the information they are receiving from their sources.

In two weeks, I have a paper due for my archaeology class. I intend to use Zotero, to gather citations and organize my sources, for that paper. I believe it will compliment my usual approach to a research paper, without disturbing my established, comfortable methods. In a future post, I will reflect on the experience.

 

Update 12/19

Having used Zotero now for several papers and projects this semester, I can say that it is an amazing tool. Taking away the tedious parts of making a bibliography, or proper citations for footnotes, is fantastic, and yet users are still required to understand the information included, so that the importance of the task is not diminished. Organizing the materials for a specific project into folders (called collections) within Zotero is helpful as well, and mirrors the way I usually keep track of resources on my computer. The introduction provided by the library staff at the workshop was critical to setting up the program, and getting started; further experience with the program, and a certain amount of just trying things out, revealed how useful Zotero can be.

The other two workshops I attended, a GIS introduction that included a Carto demonstration, and a beginning Python class, were helpful, and I imagine that more experience with these tools will allow me to use them effectively as well.

Posted in Student Post, Uncategorized | Comments closed

Links to Readings

Hi everyone,

I created a GitHub repository where we can add links to readings (and do more, if so inclined).

Feel free to contribute to this! It’s good practice if you’re interested in learning more about git, GitHub, and markdown.

Posted in Uncategorized | Comments closed

Avid Reader: DH Praxis Class post | Lower East Side Librarian

Here’s a blog post about our discussion on social reading in last night’s class, with references to Gold, Liu, and Moretti, as well as a Pew Internet study.

Avid Reader

One of last week’s class discussion topics was reading, and reading is one of my favorite things, mostly to do. I hadn’t thought about thinking about it. I’m a pleasure reader, so concerned about literary scholarly apparatus only in the sense that I am also a librarian and think publishers who release nonfiction books without a bibliography and index should be given a vicious wedgie.

Now that I am thinking about reading in a more academic way, my mind is spinning with it. I woke up this morning contemplating the concept of print books as “old media,” literature being read in the one-vs.-many paradigm, whether social reading is merely social or also pedagogical, and how brains process text vs. computers. Further, I got to a topic that my group touched on but didn’t explore in the report back, which is note taking.

Posted in PressForward, Student Post, Uncategorized | Tagged | Comments closed

Helpful Hints

Dear #DHPraxis16,

By now you probably are used to my face, but I wanted to draw your attention to some others that might be useful as you move forward with your projects. I encourage you to comment on this post with a couple ideas of projects or categories of tools that appeal to you.

Office Hours: Tuesdays 2-4pm

Bring your questions to the Digital Fellows! Come to Room 7414 with your computer and your troubles.

Python Users’ Group (PUG): Wednesdays 12-2pm

If you are interested in running programs in python — say automating a task on a large dataset — check out the Python Users’ Group. Digital Fellows are available to help you think through available ways

Working Groups

A new offering starting this week, the Working Groups are open to anyone for a collaborative working space. Good for sustained learning and skill sharing, think of these as a sort CUNY-centered Meetup.

Digital Fellows’ Workshops

I mention these again. These are meant for you. Try to attend as many as you can. It will help you understand your preferences as you start to think about projects in the coming semester.

ITP Workshops

The workshops run by the Interactive Technology and Pedagogy program are also incredibly helpful. There are game design workshops and visualization workshops. Check out ITP on the Commons for more information.

Past Workshops

Going back through the Digital Fellows’ blog can help give you a sense of workshop offerings. Also, check out these materials from the GC Digital Research Institute this past June. Looking at what has been done can give you a good sense of what you can do.

Project ideas

There are several sources for projects that have gone on at the GC. The Provosts’ Digital Innovation Grants fund exciting work every year. Past projects demonstrate the diverse applications of DH.

New Media Lab

The New Media Lab has an archive of projects that have come through their doors.

Meetups

Sometimes it helps to drop in to hack nights and local meetups where people are working on their own projects. New York City is an incredible place to go to school because you have access to New York City. Whether you want to build sites using Django, contribute to Wikipedia, experiment with Python or blockchain, build a mobile app, or visualize some data, there’s likely a group of other people interested in the same thing. For some, local Meetups can be a low stakes way to build your skills and meet a broader community of coders.

Let me know if you have any questions!

-Jojo

Posted in Uncategorized | Tagged , , , , , , , , | Comments closed

link: the library of babel

Hi all, since we are reading about literature, just wanted to throw in a quick link to this digital implementation of Borges’ Library of Babel, made by Jonathan Basile: https://libraryofbabel.info/

Posted in Uncategorized | Comments closed

Lexicon of DH 2016 | GC Digital Fellows

Across universities and conferences, even the LA Review of Books, the question seems to come up again and again: what is/are digital humanities? The understanding of what digital work in the humanities is remains in flux and with good reason — the tools and terms are in development, and development is part of the project of DH. It can be overwhelming to address the implementation of technical tools as a theoretical practice of scholarship. It can be destabilizing to critically assess the digital tools that undergird even the most seemingly traditional modes of scholarship. There are people here to help.

When I first began DH Praxis at the Graduate Center in 2014, I wrote a blog post about what I feared, though, hoped DH might mean. In the past two years, I have learned a dizzying amount about this community and conversation. Last year, Mary Catherine Kinniburgh with Patrick Sweeney introduced a workshop on the Lexicon of DH to much applause (read her recap here, and a review from a participant). I am looking forward to reprising the workshop (Thursday, September 29th at 6:30 in Room 6421).

Check out this blog post:

Lexicon of DH 2016 | GC Digital Fellows

for a few recommendations to people who are excited, annoyed, or overwhelmed by “digital humanities.” In the seemingly nebulous space of digital humanities, some basic terms can ground us tremendously.

 

Source: Lexicon of DH 2016 | GC Digital Fellows

Posted in PressForward | Tagged , , , | Comments closed

problem accessing The History Manifesto

The link to the History Manifesto for today’s class in our Sept. 2nd syllabus is dead. You can access it here: http://historymanifesto.cambridge.org/

Posted in Uncategorized | Comments closed

Reflections on MOMA online collections, with Blevins and Robertson, in mind.

In the past week, the Museum of Modern Art unveiled expanded offerings on its website, transforming online collections beyond a selection of items to include information and materials from all of its past exhibitions. The recent additions are part of the institutional archive of the museum, including some behind the scenes materials, like lists of the specific objects in an exhibition, used by curators and installers, or press releases and other promotional materials. Published exhibition catalogs, many out of print and/or expensive, are also included. The online archive is not static, and will continue to be added to, as new exhibitions are mounted and as additional, older material is digitized. The coverage in the New York Times quotes the museum’s chief of archives:

“The entire website is conceived of by the museum now as a living archive,” Ms. Elligott said, “and this is really just the beginning, the first phase of bringing its history out in all its detail.”

In looking at the newly released, exhibition-centric MOMA collections, through the lens of observations and arguments from Cameron Blevins in, “Digital History’s Perpetual Future Tense,” and Stephen Robertson, “The Differences between Digital Humanities and Digital History,” several themes surfaced.[1]

The online archive upgrade is about access. Not that these materials were unavailable before, but now they are accessible to anyone online.

In an assessment of recent years, Cameron Blevins says, “In digital history, the predominant goal has been to make those sources available and accessible.” Blevins then proposes that digital historians should “reengage with argumentation” because “making arguments is a fundamentally valuable and necessary way to further our collective understanding of the past.” The materials in the MOMA collections represent or document history, specifically of the institution, but also of art, culture, and society. But they do not offer interpretation or analysis, only the raw materials for it. There is some fuzziness here, in that the exhibition catalogs in the archive do include analysis and arguments, but the archive itself does not, nor was that its intent.

The archive is comprehensive, in that every exhibition is represented, in some way, though the quantity and selection of materials varies.

Robertson notes that, “much of the early digitization of historical sources was highly selective, chosen from larger collections to answer specific research questions, or to make available well-known or popular documents.” Robertson believes this tendency has limited the usefulness of history related online archives to scholars. The MOMA efforts appear to be addressing this problem, to some degree. This is perhaps the result of advances in technology, which allowed the museum to process more materials at a lower cost, or the emergence of software or hardware that can handle the data in better ways, but it might also be an understanding that materials related to the museum objects themselves offer something to researchers, and that a long, complete view is valuable.

The search is keyword based and works, though it presents problems with taxonomy. The filters are not nuanced.

Robertson is also concerned with the way users access collections, specifically the limits of search functions. In this way the MOMA collections are typical of other large online collections, maybe more limited in access points than some. The landing page for the exhibitions part of the online collections sorts the exhibitions chronologically, so the earliest are on the first page. This is a reasonable and useful way to look at the materials, for some, but the keyword search will likely be the way most users find materials. It is unclear how deep the search goes, for example it does not appear to search within the published exhibition catalogs. But, for an initial survey or review of materials, the search seems to work, especially if a user knows what they are seeking.

For researchers interested in modern art, industrial design, architecture, and planning, the enhanced MOMA online collections are an obvious upgrade. But there are materials there for many others as well. Consider the late 1942 exhibition, “Useful Objects in Wartime Under $10,” as just one example.

[1] Both articles appeared in Debates in the Digital Humanities, 2016.

Posted in Student Post, Uncategorized | Comments closed
  • Archives

  • Welcome to Digital Praxis 2016-2017

    Encouraging students think about the impact advancements in digital technology have on the future of scholarship from the moment they enter the Graduate Center, the Digital Praxis Seminar is a year-long sequence of two three-credit courses that familiarize students with a variety of digital tools and methods through lectures offered by high-profile scholars and technologists, hands-on workshops, and collaborative projects. Students enrolled in the two-course sequence will complete their first year at the GC having been introduced to a broad range of ways to critically evaluate and incorporate digital technologies in their academic research and teaching. In addition, they will have explored a particular area of digital scholarship and/or pedagogy of interest to them, produced a digital project in collaboration with fellow students, and established a digital portfolio that can be used to display their work. The two connected three-credit courses will be offered during the Fall and Spring semesters as MALS classes for master’s students and Interdisciplinary Studies courses for doctoral students.

    The syllabus for the course can be found at cuny.is/dps17.

  • Categories

Skip to toolbar