Ȩ : »çÀÌÆ®¸Ê : ¹®ÀǸÞÀÏ : ÀüÀÚÀú³Î
      ¿¬±¸À±¸®À§¿øȸ ±ÔÁ¤
      ÆíÁýÀ§¿øȸ ±ÔÁ¤
      ³í¹®Åõ°í¾È³»/±ÔÁ¤
      ³í¹®ÀÛ¼º¾ç½Ä
      ³í¹®Åõ°í½Åû
      ³í¹®ÀÚ·á½Ç
      ÇÐȸÁö°ü·Ã FAQ
 
 
 
Ȩ > ÇÐȸÁö > ³í¹®ÀÚ·á½Ç
 
Á¦¸ñ ¼Ò¼È ¹Ìµð¾î ÅؽºÆ®ÀÇ ¹ÌºÐ¼®¾î 󸮸¦ À§ÇÑ Àüó¸®±â ¹× »çÀüÈ®Àå ¿¬±¸
ÀúÀÚ ÃÖ¼º¿ë¤ý½Åµ¿Çõ¤ý³²Áö¼ø
±Ç / È£ 25±Ç / 4È£
Ãâó 193-226
³í¹®°ÔÀçÀÏ 2017. 12. 31.
ÃÊ·Ï Choi, Seong-Yong, Shin, Dong-Hyok & Nam, Jeesun. (2017). A methodology for building linguistic resources that recognize unanalyzed sequences in social media texts. The Linguistic Association of Korea Journal, 25(4), 193-226. This study aims to analyze linguistic problems with unanalyzed tokens of Social Media (SM) texts and to propose methodologies for dealing with them effectively. Recently, with SM users on the rise, the need for analyzing such texts has significantly increased. However, the unanalyzed tokens severally hamper the overall performance of processing SM textual data. This study proposes two methodologies: 1) a normalizing process with a preprocessing module named Preprocessing Grammar Table (PGT) to correct frequent unanalyzed sequences such as orthographic errors and space errors; 2) a lexicon-based method utilizing DECO dictionary and Local Grammar Graph (LGG). By applying PGT and an enhanced DECO dictionary to SM texts, preprocessing performance considerably improves with 87% of the unanalyzed tokens removed, which reveals the significance of the research.
÷ºÎ
  10.ÃÖ¼º¿ë ¿Ü.pdf
  10.ÃÖ¼º¿ë ¿Ü.hwp
 
 
 
 °³ÀÎÁ¤º¸º¸È£Á¤Ã¥ : À̸ÞÀϹ«´Ü¼öÁý°ÅºÎ : »çÀÌÆ®¸Ê : À̸ÞÀϹ®ÀÇÇϱâ