Data Project: Korean tweets with “hashtag I am a feminist”

Introduction

1 Leave a comment on paragraph 1 0 Gender inequality is a persistent condition of the South Korean society. A few numbers might give a rough idea of women’s status in the country:

  • 2 Leave a comment on paragraph 2 0
  • In 2013, women were paid 30% less and were employed 23% less than men. (For comparison, the U.S. gender wage gap in 2013 was about 18%)
  • The most recent legislative election in 2016 resulted in the highest ratio of women in the National Assembly’s history: 17%

3 Leave a comment on paragraph 3 0 Violence against women is also frequent. According to the NGO Korea Women’s Hot Line, at least one woman was murdered by her partner every 4 days in 2015, only counting those reported in the news. The Korean government’s official statistics show that among victims of violent crime, around 30% were women in the 90s; however, this number has increased to more than 80% in the 2010s, pointing to an aggravation of misogynic tendencies of the society.

4 Leave a comment on paragraph 4 0 The misogynic tendency of the South Korean society has been the subject of a growing national discussion, one that I have been following online. One important instance in the discussion was the Twitter hashtag #나는페미니스트입니다 (lit. “I am a feminist”), which first appeared on Feb 9 2015.

5 Leave a comment on paragraph 5 0 The hashtag appeared after a series of comparison between the Islamic State (IS) and feminism. One South Korean male teenager went on to join IS, after tweeting out “However, the current era is the era that male are being discriminated against” (reflecting a rather common sentiment among Koreans) and “i hate feminist So I like the ISIS.”

6 Leave a comment on paragraph 6 0 image

7 Leave a comment on paragraph 7 0 A male pop columnist then ignited the hashtag, with a magazine article titled “Brainless feminism is more dangerous than IS,” arguing that feminism is confrontational and divisive, which gives rises to such reactions as the aforementioned teenager’s decision.

8 Leave a comment on paragraph 8 0 In response to such an argument, Twitter users started using the #나는페미니스트입니다 tag in order to self-identify as feminist, and also to problematize the negative social construction of the term. I think this movement was an important moment which eventually led to feminist discourse, which I feel was not significantly addressed in mainstream media or in my own social network, take on the role of a more urgent agenda. This is why I was interested in archiving these tweets; looking at the entirety of the tweets, which I experienced in small chunks in real time, might provide a better insight to what happened.

The data

9 Leave a comment on paragraph 9 0 First, I collected the tweets tagged with #나는페미니스트입니다, along with its variants, #나는_페미니스트입니다 and #나는페미니스트다. Since the Twitter API is restrictive when it comes to searching old tweets, I had to rely on Twitter’s web search interface which allows search from the beginning of the service. One thing about this interface is that only a small number of tweets appear at first; in order to see the rest, one has to scroll to the bottom of the page, which then makes the browser load more tweets. Another caveat (which I did not verify, but saw mentioned in several places) is that web browser will stop loading tweets once the number tweets loaded on the page goes above 3,200. In order to reduce the risk of not retrieving all existing tweets because of this caveat, I segmented my search by using the SINCE and UNTIL operators: this allowed me to search day by day, minimizing the tweets that each search will return.

10 Leave a comment on paragraph 10 0 I decided to collect one year’s worth of data using this specific search query. I wrote a Python script using the Selenium WebDriver and BeautifulSoup packages; the script visited Twitter’s web search page using my search query, scrolled down to the bottom of the page, checked if the scrolling down triggered more tweets to appear, then saved the tweets and metadata parsed from the source HTML, iterating over each day.

11 Leave a comment on paragraph 11 0 2015-02-10 2015-02-14 2015-02-13 2015-02-12 2015-02-11 2015-02-15

12 Leave a comment on paragraph 12 0 Over the span of the year, the hashtag was used in roughly 5,000 tweets. It doesn’t seem like a particularly large number, but I think it had the effect of opening up related discourse. I repeated the search over a longer period of time, with the search query “페미니스트 OR 페미니즘” (lit. “feminist OR feminism,” returning tweets that have either terms in it). This search returned nearly 60,000 tweets during the one year after the first use of #나는페미니스트입니다, compared to 33,700 tweets from 2009 (when Twitter started its service) until before the hashtag appeared.

13 Leave a comment on paragraph 13 0 tweets-feminism-or-feminist

14 Leave a comment on paragraph 14 0 This is evident in the visualization I drew using R, displaying the volume of the tweets per month (total ~90K): there is a dramatic increase of tweets that include “페미니스트” or “페미니즘” in February 2015. For this observation to be more valid I would need to take into account the total volume of tweets written in Korean, regardless of subject, during these periods. However, the drastic change seems to support my hunch.

15 Leave a comment on paragraph 15 0 In addition, I used the unique ids of tweets with hashtags in order to retrieve from the Twitter API a more comprehensive tweet data.

Data files

16 Leave a comment on paragraph 16 0 Tweets with hashtag | Tweets with either ‘feminist’ or ‘feminism’

Analysis

17 Leave a comment on paragraph 17 0 I conducted a basic text analysis of the 5K tweets with #나는페미니스트입니다 and looked at frequently used terms. Using Python’s re and codecs packages, I grabbed the text of each tweet cleaned the data of user handles and weird unicode (which is necessary since I am dealing with a non-Latin language). Then, using the KoNLPy package, which is an NLP tool for the Korean language, I tagged each morpheme with the corresponding part of speech.

18 Leave a comment on paragraph 18 0 A quick list of most common morphemes returns the following list:

19 Leave a comment on paragraph 19 0 common-phonemes common-nouns
However, there is a lot of unnecessary things here. For starters, hashtags and punctutations; also, the parts of speech ‘Eomi’ and ‘Josa’ exist largely for grammatical purposes—hence the adjacent list of common nouns (written with English translation below)

20 Leave a comment on paragraph 20 0 [((‘것’, ‘Noun’), 844), (Thing)
((‘여성’, ‘Noun’), 755), (Female)
((‘페미니스트’, ‘Noun’), 644), (Feminist)
((‘나’, ‘Noun’), 618), (Me)
((‘이’, ‘Noun’), 464), (This)
((‘내’, ‘Noun’), 461), (My)
((‘사람’, ‘Noun’), 459), (Person)
((‘태그’, ‘Noun’), 380), (Tag)
((‘말’, ‘Noun’), 370), (Word)
((‘페미니즘’, ‘Noun’), 363), (Feminism)
((‘여자’, ‘Noun’), 358), (Woman)
((‘더’, ‘Noun’), 333), (More)
((‘수’, ‘Noun’), 327), (Can)
((‘그’, ‘Noun’), 311), (That)
((‘남자’, ‘Noun’), 266), (Man)
((‘해시’, ‘Noun’), 264), (Hash)
((‘거’, ‘Noun’), 245), (Thing)
((‘차별’, ‘Noun’), 238), (Discrimination)
((‘때’, ‘Noun’), 214), (When)
((‘안’, ‘Noun’), 207), (Not)
((‘년’, ‘Noun’), 204), (Girl- derogatory term, but in this case often an appropriated term like it is the case in ‘slutwalk’)
((‘선언’, ‘Noun’), 198), (Declaration)
((‘일’, ‘Noun’), 192), (Work)
((‘남성’, ‘Noun’), 169), (Male)
((‘모든’, ‘Noun’), 167), (All)
((‘저’, ‘Noun’), 166), (I)
((‘평등’, ‘Noun’), 163), (Equality)
((‘생각’, ‘Noun’), 160), (Thought)
((‘우리’, ‘Noun’), 156), (We)
((‘날’, ‘Noun’), 154), (Me)
((‘뭐’, ‘Noun’), 152), (What)
((‘왜’, ‘Noun’), 150), (Why)
((‘트위터’, ‘Noun’), 147), (Twitter)
((‘운동’, ‘Noun’), 144), (Movement)
((‘사회’, ‘Noun’), 143), (Society)
((‘때문’, ‘Noun’), 138), (Because)
((‘지금’, ‘Noun’), 137), (Now)
((‘오늘’, ‘Noun’), 131), (Today)
((‘세상’, ‘Noun’), 126), (World)
((‘인간’, ‘Noun’), 125), (Human)
((‘게’, ‘Noun’), 119), (What)
((‘이유’, ‘Noun’), 119), (Reason)
((‘전’, ‘Noun’), 118), (Before)
((‘앞’, ‘Noun’), 117), (In front of)
((‘분’, ‘Noun’), 115), (Person)
((‘혐오’, ‘Noun’), 114), (Hate)
((‘좀’, ‘Noun’), 114), (A little)
((‘걸’, ‘Noun’), 113), (That)
((‘너무’, ‘Noun’), 111), (Too much)
((‘한국’, ‘Noun’), 107)] (Korea)

21 Leave a comment on paragraph 21 0 While there are interesting terms such as 차별[discrimination], 선언[declaration], 평등[equality], 운동[movement], and 혐오[hate], this is not yet enough to say something decisive about these tweets. A more detailed analysis would require comparison with other corpora (for example, tweets with other hashtags). I also would like to expand the analysis on the tweets with 페미니스트[feminist] OR 페미니즘[feminism].

Further work: Sentiment Analysis

22 Leave a comment on paragraph 22 0 The methods I employed here provide a basic insight on the quantitative aspect of the hashtag movement. However, the data I used is somewhat incomplete in the sense that it needs to be compared against other, non-topical tweets before I can make qualitative judgements about it. In addition, I would like the analysis to be more sophisticated, if time and resource permit.

23 Leave a comment on paragraph 23 0 My further goal with this project is to conduct a sentiment analysis on the tweets with 페미니스트[feminist] OR 페미니즘[feminism], to see if I can notice a difference in attitude when people are using the terms before and after the hashtag movement. There are a few ways to approach this problem, including sentiment word dictionaries and machine learning techniques. Using a sentiment lexicon is the NLP way; using dictionaries that include sentiment polarities and values for words, one can calculate and determine the overall sentiment of a sentence or a document. This might prove tricky because while Korean sentiment lexicon do exist, for example like the KOSAC, I would still need to write the algorithm to determine the sentiment; this looks like a bit more than I can achieve this term, both in terms of time and in terms of skill. Machine learning methods are diverse, but in many cases it involves training data—pre-determined “ground truths” that the algorithm relies on in order to make decisions. I would have to provide the training data that works in this context as well, which would be a laborious process. I nevertheless hope I can eventually engage in the work, for example by tweaking existing tools.

24 Leave a comment on paragraph 24 0  


References

25 Leave a comment on paragraph 25 0 Jang, Hayeon, Munhyong Kim, and Hyopil Shin. “KOSAC: A Full-fledged Korean Sentiment Analysis Corpus.” Sponsors: National Science Council, Executive Yuan, ROC Institute of Linguistics, Academia Sinica NCCU Office of Research and Development (2013): 366.

26 Leave a comment on paragraph 26 0 Eunjeong L. Park, Sungzoon Cho. “KoNLPy: Korean natural language processing in Python”, Proceedings of the 26th Annual Conference on Human & Cognitive Language Technology, Chuncheon, Korea, Oct 2014.

27 Leave a comment on paragraph 27 0 World Economic Forum. Global Gender Gap Report 2015. http://reports.weforum.org/global-gender-gap-report-2015/the-global-gender-gap-index-2015/

28 Leave a comment on paragraph 28 0  

This entry was posted in Uncategorized and tagged , , . Bookmark the permalink. Both comments and trackbacks are currently closed.

8 Comments

  1. Posted November 29, 2016 at 5:28 pm | Permalink

    Achim, this is a very promising project and I’m pleased with how far you’ve been able to carry it forward already. It would be interesting for you to speculate, even at this early stage, about what kinds of data about gender and anti-woman attitudes in Korea you might discover when doing the sentiment analysis of tweets you propose doing in the next stage. Are you assuming that those who self identify as feminist, both females and males (I assume), will be more open to other kinds of identities and political/ideological positions? What larger questions about gender and identity in contemporary Korea is this data analysis likely to reveal?

  2. Posted November 29, 2016 at 8:00 pm | Permalink

    Achim, this is really cool work. I am really interested in the computational aspect, because it is in line with my learning goals, and just as interested in how you constructed your project and the decisions you made. I have recently had the idea to collect posts from the subreddit The_Donald after discovering that I understand so little of the ideology and even rhetorical devices and memes employed there. I think my project will take a very similar direction to what you describe, though as I think of it, its scope already seems rather tremendous!

    Eddy

  3. Posted November 30, 2016 at 3:19 pm | Permalink

    Achim, what a provocative and interesting project topic! I can’t believe someone was so disturbed by feminism that they were driven to join ISIS and that so many men(is it just males?) share the sentiment. I wonder if some of them are “just” trolling or if that can be determined through analysis. Anyway, I hope you are able to move forward with it using machine learning techniques. It seems like an intimidating process but with so much potential. Would it be feasible to do it with a smaller or sample portion of your dataset? Like just using the “feminist” hashtag?

  4. Posted June 3, 2017 at 1:39 pm | Permalink

    HI,

    Thank you so much for posting this. I really appreciate your work. Keep it up. Great work!

    http://kosmiktechnologies.com/selenium/

  5. Posted August 28, 2017 at 2:34 pm | Permalink

    Good work, This data informational index is never again posted due to Twitter’s terms of administration, which we didn’t know about at the time we posted the informational index thanks.

  6. Posted August 30, 2017 at 4:32 am | Permalink

    Great to see this Post When I have perused an article that I find important and agreeable I will dependably look at the remarks http://certswarrior.com/. I will regularly remark on remarks, as I do like the communication.

  7. Posted August 31, 2017 at 7:21 am | Permalink

    I am continually scanning on the web for articles that can help me. There is clearly a great deal to think about this. I think you made some great focuses in Features moreover. Continue working, incredible occupation! CertsChief
    http://www.certschief.com/

  8. Posted April 15, 2018 at 9:25 am | Permalink

Additional comments powered byBackType

  • Archives

  • Welcome to Digital Praxis 2016-2017

    Encouraging students think about the impact advancements in digital technology have on the future of scholarship from the moment they enter the Graduate Center, the Digital Praxis Seminar is a year-long sequence of two three-credit courses that familiarize students with a variety of digital tools and methods through lectures offered by high-profile scholars and technologists, hands-on workshops, and collaborative projects. Students enrolled in the two-course sequence will complete their first year at the GC having been introduced to a broad range of ways to critically evaluate and incorporate digital technologies in their academic research and teaching. In addition, they will have explored a particular area of digital scholarship and/or pedagogy of interest to them, produced a digital project in collaboration with fellow students, and established a digital portfolio that can be used to display their work. The two connected three-credit courses will be offered during the Fall and Spring semesters as MALS classes for master’s students and Interdisciplinary Studies courses for doctoral students.

    The syllabus for the course can be found at cuny.is/dps17.

  • Categories

Skip to toolbar