One Library’s Collection and How Can Call Number Analysis Be of Any Use?

1 Leave a comment on paragraph 1 0 word-cloud

2 Leave a comment on paragraph 2 0 As I’ve been working through the many complications of finding, downloading, cleaning, uploading, and analyzing my data set, I took a moment to create the above word cloud using Wordle which itself was a little complicated as it requires a Java Plugin that is no longer supported on any of the computers I’ve been using today.  After visiting three browsers and two computers to finally have access to adding the plugin (since I cannot easily download or run updates on my work computers due to ITS protocol), I made the above from a list of Subject Headings provided by the Library of Congress.   I actually used a spreadsheet found here.  When working with data, it seems even the simplest of tasks come with many complications!  Regardless, I am not deterred!

3 Leave a comment on paragraph 3 0 The word cloud represents one aspect of what I am trying to look at with my data – a set of call numbers from the McEngegart Hall Library’s Reference Collection at St. Joseph’s College, Brooklyn.  Call numbers have a direct relationship to their subject headings, so I will be using a data set of call numbers and their subject headings which I will use to determine if there is a fair representation of monographs, based on their call numbers and associated subject headings, in the library’s collection for each of the academic programs offered at the College.

4 Leave a comment on paragraph 4 0 Initially, I wanted to look at the entire print collection, but as I began to pull the data, I thought it might be too big (68,794 lines in Excel) of a set (at least for the time being since I plan to use the larger set if I can get the subset to make sense).  So, I’ve chosen to look at the reference collection (1,926 lines in Excel) as a subset, which may offer some insight, but I believe isn’t the best representation of the collection.  First and foremost because of the nature (and cost) of reference collections, my library has stopped adding paper bound titles and relies more heavily on online databases (whose titles are not represented here), so there will be many gaps.  Also, this past summer, we moved several of the reference titles into the circulating collection to give them a better opportunity to be found while students are browsing the collection.  Nonetheless, not an ideal representation, but a start!

5 Leave a comment on paragraph 5 0 Why would I want to know about this aside from having to pick a data set to work with??  Many academic libraries have to defend their need for the large spaces they take up on campus and as many libraries before mine have had to accept, we are losing a quarter of our space to new classroom labs, so I’ve been asked to condense the collection which required hours of measuring and project planning.  For now, I at least do not have to remove any books from the collection (and there’s a little room for growth).  Taking a look at the collection by analyzing the representation of items through their call numbers and subject headings seems to be an interesting way of interpreting the collection in addition to trying to work out the gaps in the collection and see where there may be too much in any particular area.  As I was in the stacks measuring–and I’ve known this for quite some time–there’s a great need for weeding the collection (library speak for removing titles that are superfluous to the collection and do not support the users’ needs).

6 Leave a comment on paragraph 6 0 I know this project won’t change the way library collections are seen on the macro-scale, but it will help define and defend my library’s collection.  I have been asked by two administrators in the last few years if I actually think physical books belong in a modern library.  This is not a question I love being asked as I believe there is still a great need for physical materials, but I think with this project, I can at least counter these inquiries with some impressive data and hopefully some nice visualizations.  It is worth mentioning here that I wanted to start with an open data set available through NYC’s Open Data portal.  I was inspired for this project by a data set listing Brooklyn Public Library’s Collection uploaded to the portal, but I wanted to work with something more directly related to my day to day work (not that I don’t love public libraries and all that they do for this city!).

7 Leave a comment on paragraph 7 0 Pulling the Data:

8 Leave a comment on paragraph 8 0 I thought this would be the very easiest part, and I was right!  With a simple query in an Access Database, I was able to pull call numbers and other bibliographic data that I could export to Excel.  As already mentioned, I planned to use the entirety of the collection as a set, but have since decided to just start with the reference collection subset data.  I came to this decision after grappling with time constraints on cleaning the data.

9 Leave a comment on paragraph 9 0 Cleaning the Data:

10 Leave a comment on paragraph 10 0 I thought this would be a difficult part of the project and I am finding that to be true.  First, LoC Call numbers do not sort very nicely (read: quickly) in Excel as a result of their alphanumeric configurations.  I did find some work arounds and am still experimenting with some of my options.  Another tool I’d like to look into using is OpenRefine.  Second, I also have to remove several of the duplicates due to multiple volumes of one title (or decide if that’s the right thing to do).  I also need to pull and add data on the subject headings that are assigned to each call number.  I am still working out how best to do this: a) do I rerun a query in Access to include assigned subject headings using the MARC records or b) do I work out a way to assign a subject heading based on the call number of the item (the latter could potential be achieved by writing a Python script to assign subject headings based on a call number range).  I still need to experiment with both of these possibilities.

11 Leave a comment on paragraph 11 0 Data Tools to Use:

12 Leave a comment on paragraph 12 0 I intend to first experiment with relational analysis tools listed on DIRT (DIgital Research Tools):  Also, I still have to work out how to draw a correlation between the volume of subject headings represented in the collection to the academic programs.  By taking a look at the programs offered, I will need to determine how these programs can be looked at in terms of subject headings.  Do I develop a list of each program and then determine which subject heading(s) would be of use to this program?  For example, Child Study is a major at the college and one such LoC heading and its corresponding call number range is: “Education — Theory and practice of education — Child study / LB1101-1139” but there are several other headings too.

13 Leave a comment on paragraph 13 0 Conclusions:

14 Leave a comment on paragraph 14 0 So, I still have a lot of work to be done and many more questions to be asked & answered.  I think two complementary projects for this first project would be to look at usage statistics for the titles in the collection.  Once I know what is in the collection and how it represents the programs, I can compare that to what students are actually using.  What are the students searching for in the catalog and what books are they borrowing?  Also an important outgrowth of this initial analysis of paper materials  would be looking at data from the library’s various ebook collection subscriptions, such as title lists from each database and what titles have disappeared over the years of subscribing.  With these additional points of data, an even greater holistic picture could be drawn.  Since Michelle’s presentation on maps a few weeks ago was so thought provoking, I would also like to include some physical representation of the collection using a catalog addition such as LibraryThing’s or StackMap since GPS coordinates wouldn’t work here.

15 Leave a comment on paragraph 15 0 Scope creep is becoming extremely apparent as I wrap up this post and realize that I have talked about a variety of tools I want to work with and tangential data sets that can be looked at and I am starting to feel like I’m overreaching–especially for a project that I do not feel will have some lofty impact on the library world.   However, this data set will change my day to day work world a little and that’s worth it.

16 Leave a comment on paragraph 16 0 Update as of 11/9/16: – I came across this article in my work inbox:  I plan to use it as a model to further my own project.

This entry was posted in Uncategorized. Bookmark the permalink. Both comments and trackbacks are currently closed.


  1. Posted November 12, 2016 at 6:55 pm | Permalink

    I love your project, Lauren! “Define and defend” indeed. I will be interested to see what you come up with and how your work could be replicated in my library.

  2. Posted November 16, 2016 at 5:36 pm | Permalink

    Lauren, the question of physical “measuring” in conversation with the kind of “measuring” that happens through the forms of computational analysis you mention wanting to try brings up lots of interesting questions about value and decision making. I’m not sure I follow quite yet which specific relationships you’re interested in diagramming. This is likely because as someone who is not formally trained as a librarian, I’m not as steeped in the awareness of how rich or complicated or unresolved the relationship between congressional subject heading and call numbers might be. Perhaps that’s an area in which the diagram itself could prove illustrative. I’m also curious to hear more about the way in which those correlations between subject heading and call number map to degree offerings and… perhaps… student and faculty scholarly production? In other words, do the proportions of reference materials physically and quantitatively demonstrate a correlation to the number of capstone projects students produce for various disciplines? Perhaps that is a bit too far of a reach, though. Looking forward to your next steps!

  3. Posted November 19, 2016 at 4:36 pm | Permalink

    Lauren, I love that you’ve spent as much time on figuring out why you are doing this project (which has very practical, real world consequences for you and your library’s patrons) as how you are executing it. Your project uses big data to help explain and justify how your institution might be transformed in the future. My mind went to the same scope-creep kind of considerations as yours did (for example, it would be cool and useful to graph usage in each subject area category, as you suggest, but that’s another huge data project in its own right). Just correlating the areas of academic interest/focus in the collection t seems like an ambitious enough place to start. Like Lisa, I thought a table of the LOC subject listings would be a good place to start. As a historian, I was thrilled at how large a word “History” is in your Wordle! I think the biggest problem in getting the data ready for analysis will be those instances (of which I assume there are many) where the subject of the item doesn’t easily map against your departmental structure at the college (e.g., many schools don’t have discrete academic program in the social sciences, such as Sociology or Political Science, choosing instead to lump them together in agglomerated programs called “Social Sciences,” which is how they are at several CUNY community colleges).

Additional comments powered byBackType

  • Archives

  • Welcome to Digital Praxis 2016-2017

    Encouraging students think about the impact advancements in digital technology have on the future of scholarship from the moment they enter the Graduate Center, the Digital Praxis Seminar is a year-long sequence of two three-credit courses that familiarize students with a variety of digital tools and methods through lectures offered by high-profile scholars and technologists, hands-on workshops, and collaborative projects. Students enrolled in the two-course sequence will complete their first year at the GC having been introduced to a broad range of ways to critically evaluate and incorporate digital technologies in their academic research and teaching. In addition, they will have explored a particular area of digital scholarship and/or pedagogy of interest to them, produced a digital project in collaboration with fellow students, and established a digital portfolio that can be used to display their work. The two connected three-credit courses will be offered during the Fall and Spring semesters as MALS classes for master’s students and Interdisciplinary Studies courses for doctoral students.

    The syllabus for the course can be found at

  • Categories

Skip to toolbar