´ëÇѾð¾îÇÐȸ ÀüÀÚÀú³Î

´ëÇѾð¾îÇÐȸ

31±Ç 2È£ (2023³â 6¿ù)

Analyzing Suicide Notes with Forensic Linguistics and Deep Learning Techniques

Yong-hun Lee & Gihyun Joh

Pages : 101-122

DOI : https://doi.org/10.24303/lakdoi.2023.31.2.101

PDFº¸±â

¸®½ºÆ®

Abstract

Lee, Yong-hun & Gihyun Joh. (2023). Analyzing suicide notes with forensic linguistics and deep learning techniques. The Linguistic Association of Korea Journal, 31(2), 101-122. This paper provides an analysis of suicide notes and ordinary texts using forensic linguistics and deep learning techniques. For the analysis, two types of corpora were compiled. One corpus was composed of suicide notes (SNs), and the other was for ordinary texts (OTs). Seven files were included in the first group, and eight files were contained in the second group. After these two types of corpora were compiled, each text in the corpora was linguistically analyzed with Linguistic Inquiry and Word Count (LIWC). Since the analysis results had 72 dimensions per text, both PCA and t-SNE (dimensionality reduction techniques in deep learning) were applied for the visualization of results. Then, the results were analyzed. Through the analysis, the following facts were observed: (i) suicide notes could be distinguished from ordinary texts, (ii) even though the same author wrote both types of texts, suicide notes could be distinguished from ordinary texts, and (iii) the novels with the 1st person protagonists point of view were also different from the suicide notes, though both types of texts preferred to use the 1st person pronoun I.

Keywords

# suicide notes # forensic linguistics # Linguistic Inquiry and word count # PCA # t-SNE

References

  • Coulthard, M., & Johnson, A. (2016). An introduction to forensic linguistics. Cambridge, MA: Cambridge University Press.
  • Ghosh, S., Ekbal, A., & Bhattacharyya, P. (2020). CEASE, a corpus of emotion annotated suicide notes in English. Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 1618-1626. Marseille, France.
  • Lee, Y. & Joh, G. (2019). Identifying suicide notes using forensic linguistics and machine learning. The Linguistic Association of Korean Journal, 27(2), 171-191.
  • Mitchell, T. (1997). Machine learning. New York: McGraw Hill.
  • Olsson, J. (2004). Forensic linguistics: An introduction to language, crime, and the law.London: Continuum.
  • Olsson, J. (2008). Forensic linguistics, 2nd Edition. London: Continuum.
  • Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2(11), 559-572.
  • Pennebaker, W. & King, A. (1999). Linguistic styles: Language use as an individual difference. Journal of Personality and Social Psychology, 77(6), 1296-1312.
  • Pennebaker, W., Francis, E., & Booth, J. (2001). Linguistic inquiry and word count (LIWC): LIWC2001. Mahwah, NJ: Lawrence Erlbaum Associates.
  • Roweis, S. & Hinton, G. (2002). Stochastic neighbor embedding. Proceedings of the 15th International Conference on Neural Information Processing Systems, 857-864.
  • Samuel, A. (1959). Some studies in machine learning using the game of checkers. IBM Journal, 3, 210-229.
  • Svartvik, J. (1968). The Evans statements: A case for forensic linguistics. Gothenburg, Sweden: University of Gothenburg Press.
  • Tausczik, Y., & Pennebaker, J. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24-54.
  • van der Maaten, L. & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579-2605.