: 사이트맵 : 문의메일
      연구윤리위원회 규정
      편집위원회 규정
      논문투고안내/규정
      논문작성양식
      논문투고신청
      논문자료실
      학회지관련 FAQ
 
 
 
홈 > 학회지 > 논문자료실
 
제목 소셜 미디어 텍스트의 미분석어 처리를 위한 전처리기 및 사전확장 연구
저자 최성용ㆍ신동혁ㆍ남지순
권 / 호 25권 / 4호
출처
논문게재일 2017. 12. 31.
초록(국문) Choi, Seong-Yong, Shin, Dong-Hyok & Nam, Jeesun. (2017). A methodology for building linguistic resources that recognize unanalyzed sequences in social media texts. The Linguistic Association of Korea Journal, 25(4), 193-226. This study aims to analyze linguistic problems with unanalyzed tokens of Social Media (SM) texts and to propose methodologies for dealing with them effectively. Recently, with SM users on the rise, the need for analyzing such texts has significantly increased. However, the unanalyzed tokens severally hamper the overall performance of processing SM textual data. This study proposes two methodologies: 1) a normalizing process with a preprocessing module named Preprocessing Grammar Table (PGT) to correct frequent unanalyzed sequences such as orthographic errors and space errors; 2) a lexicon-based method utilizing DECO dictionary and Local Grammar Graph (LGG). By applying PGT and an enhanced DECO dictionary to SM texts, preprocessing performance considerably improves with 87% of the unanalyzed tokens removed, which reveals the significance of the research.
초록(영문)

Choi, Seong-Yong, Shin, Dong-Hyok & Nam, Jeesun. (2017). A methodology for building linguistic resources that recognize unanalyzed sequences in social media texts. The Linguistic Association of Korea Journal, 25(4), 193-226. This study aims to analyze linguistic problems with unanalyzed tokens of Social Media (SM) texts and to propose methodologies for dealing with them effectively. Recently, with SM users on the rise, the need for analyzing such texts has significantly increased. However, the unanalyzed tokens severally hamper the overall performance of processing SM textual data. This study proposes two methodologies: 1) a normalizing process with a preprocessing module named Preprocessing Grammar Table (PGT) to correct frequent unanalyzed sequences such as orthographic errors and space errors; 2) a lexicon-based method utilizing DECO dictionary and Local Grammar Graph (LGG). By applying PGT and an enhanced DECO dictionary to SM texts, preprocessing performance considerably improves with 87% of the unanalyzed tokens removed, which reveals the significance of the research.

첨부
  10.최성용 외.pdf
  10.최성용 외.hwp
 
 
 
 개인정보보호정책 : 이메일무단수집거부 : 사이트맵 : 이메일문의하기