´ëÇѾð¾îÇÐȸ ÀüÀÚÀú³Î

´ëÇѾð¾îÇÐȸ

27±Ç 2È£ (2019³â 6¿ù)

Identifying Suicide Notes Using Forensic Linguistics and Machine Learning

Yong-hun Lee & Gihyun Joh

Pages : 171-191

DOI : https://doi.org/10.24303/lakdoi.2019.27.2.171

PDFº¸±â

¸®½ºÆ®

Abstract

Lee, Yong-hun & Joh, Gihyun. (2019). Identifying suicide notes using forensic linguistics and machine learning. The Linguistic Association of Korean Journal, 27(2), 171-191. This paper presents how to identify the characteristic properties of suicide notes using the analysis methods in forensic linguistics and how to apply the knowledge to the machine learning research. For this purpose, a corpus was compiled with Virginia Woolfs literary works and suicide notes, which contained six texts. Then, each text was analyzed with the LIWC (Linguistic Inquiry and Word Count) software. Since the analysis results were complicated, a dimensionality reduction was conducted using a Principal Component Analysis (PCA). In the PCA analysis, it was found that, even though all the texts were written by the same author, the suicide notes were clearly identified from the literary works. The analysis results of LIWC analyses were applied to a machine learning technique (especially a Support Vector Machine; SVM), and the classification accuracy was measured using six real texts and three hypothetical texts. Through the analysis, it was found that the SVM machine identified the suicide notes from the literary works with 100% of accuracy. The current study demonstrates that the linguistic properties of texts can be used to identify the suicides notes from the other types of writings and that they can be used in machine learning research.

Keywords

# suicide notes # forensic linguistics # Linguistic Inquiry and Word Count # principal component analysis # machine learning

References

  • Ben-Hur, A., Horn, D., Siegelmann, H., & Vapnik, V. (2001) Support vector clustering. Journal of Machine Learning Research, 2, 125–137.
  • Chaski, C. (2012). Author identification in the forensic setting. In L. Solan & P. Tiermsa, (Eds.), The Oxford handbook of forensic linguistics (pp. 333-372). Oxford: Oxford University Press.
  • Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
  • Coulthard, M., & Johnson, A. (2016). An introduction to forensic linguistics. Cambridge, MA: Cambridge University Press.
  • Durkheim, E. (1951). Suicide. New York: The Free Press.
  • Edelman, A., & Renshaw, L. (1982). Genuine versus simulated suicide notes: An issue revisited through discourse analysis. Suicide and Life-Threatening Behavior, 12(2), 103-113.
  • Giles, S. (2007). The final farewell: Using a narrative approach to explore suicide notes as ultra-social phenomenon. Unpublished doctoral dissertation, University of Liverpool, Liverpool.
  • Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417-441 and 498-520.
  • Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 27, 321-77.
  • Joh, G. (2019). Forensic linguistic analysis of suicide notes. Unpublished manuscript. Kunsan National University.
  • Lee, Y., Yu, J., & Yoon, T. (2017). Predicting the occurrence of the English modals can and may using deep neural networks. Studies in Modern Grammar, 96, 167-189.
  • Leenaars, A. (1988). Suicide notes. New York: Human Sciences Press.
  • Liu, B. (2015). Sentiment analysis: Mining opinions, sentiments, and emotions. Cambridge, MA: Cambridge University Press.
  • Matykiewicz, P., Wlodzislaw, D., & Pestian, J. (2009). Clustering semantic spaces of suicide notes and newsgroups articles. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing. 179-184. Boulder, Colorado.
  • Mitchell, T. (1997). Machine learning. New York: McGraw Hill.
  • Olsson, J. (2004). Forensic linguistics: An introduction to language, crime, and the law. London: Continuum.
  • Olsson, J. (2008). Forensic linguistics, 2nd Edition. London: Continuum.
  • Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2(11), 559–572.
  • Pennebaker, W., Francis, E., & Booth, J. (2001). Linguistic inquiry and word count (LIWC): LIWC2001. Mahwah, NJ: Lawrence Erlbaum Associates.
  • Pestian, J., Matykiewicz, P., Grupp-Phelan, J., Lavanier. A., Combs, J., & Kowatch, R. (2008). Using natural language processing to classify suicide notes. In Proceedings of the workshop on current trends in biomedical natural language processing (BioNLP¡¯08), 96-99. Columbus, Ohio.
  • Pestian, J., Nasrallah, H., Matykiewicz, P., Bennett, A., & Leenaars, A. (2010). Suicide note classification using natural language processing: A content analysis. Biomedical Informatics Insights, 2010(3), 19-28.
  • Roubidoux, S. (2012). Linguistic manifestations of power in suicide notes: An investigation of personal pronouns. Unpublished doctoral dissertation, University of Wisconsin at Oshkosh, Oshkosh, Wisconsin.
  • Samuel, A. (1959). Some studies in machine learning using the game of checkers. IBM Journal, 3, 210-229.
  • Sboev, A., Gudovskikh, D., Rybka, R., & Moloshnikov, I. (2015). A quantitative method of text emotiveness evaluation on base of the psycholinguistic markers founded on morphological features. Procedia Computer Science, 66, 307-316.
  • Shapero, J. (2011). The language of suicide notes. Unpublished doctoral dissertation, University of Birmingham.
  • Sheidman E., & Faberow, N. (1963). Clues to suicide. New York: McGraw-Hill.
  • Shneidman, S. (1996). The suicidal mind. New York: Oxford University Press.
  • Svartvik, J. (1968). The Evans statements: A case for forensic linguistics. Gothenburg, Sweden: University of Gothenburg Press.
  • Tausczik, Y., & Pennebaker, J. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24-54.