´ëÇѾð¾îÇÐȸ ÀüÀÚÀú³Î

´ëÇѾð¾îÇÐȸ

Table of Contents

24±Ç 4È£ (2016³â 12¿ù)

Ãß»óÀû ÀÇ¹Ì Ç¥»óÀ» È°¿ëÇÑ »çÁø ÀÚ¸· ¿µÀÛ¹® Æò°¡

±èµ¿¼º

Pages : 235-260

DOI :

PDFº¸±â

¸®½ºÆ®

Abstract

Kim, Dong-Sung. (2016). English Caption Writing Assessment Using Abstract Meaning Representation. The Linguistic Association of Korea Journal, 24(4), 235-260. Since story-telling has been used in evaluating the development of language skills, English language proficiency test such as TOEIC includes a caption writing test. This paper investigates how linguistically motivated features are used for automatically scoring a picture-description writing test. Specifically, we design to build scoring models with features under the principles of relevancy, appropriateness, and task-detailed description. For the experiment, we gather the caption writing corpus upon several images. We statistically compare different performances among 9 statistical assessment factors, revealing that Abstract Meaning Representation (AMR) produces the best results in predicting human raters scores. AMR shows the best performance in capturing the similar logico-semantic structure(s) among various sentential forms.

Keywords

# ±â°èÇнÀ(Machine Learning) # ÀÚµ¿ ÀÛ¹® äÁ¡(Automatic Writing Assessment) # ÀÚ¿¬¾ð¾îó¸®(Natural Language Processing) # ÄÄÇ»ÅÍ ¾ð¾î º¸Á¶ ÇнÀ(Computer-Assisted Language Learning)

References

  • ¹Î¼±½Ä. (2008). Toeic Writing Test °ø½Ä¹®Á¦Áý. ¼­¿ï: ½Ã»ç¿µ¾î»ç.
  • Achananuparp, P., Hu, X., & Shen, X. (2004). The evaluation of sentence similarity measures. In Song, I.-Y., Eder, J., and Nguyen, T. M. (Eds.), Lecture Notes in Computer Science (pp. 305-316). Berlin: Springer-Verlag.
  • Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rator V.2. Journal of Technology, Journal of Learning, and Assessment, 4(3), 3-30.
  • Baker, C., Fillmore, C., & Lowe, J. (1998). The Berkeley FrameNet project. In proceedings of ACL. 86-90.
  • Banarescu, L., Bonail, C., Cai, C., Georgescy, M., Griffitt, K., Hermajakob, U., Knight, K., Kohen, P., Palmer, M., & Schneider., N. (2013). Abstract meaning representation for semanbanking. In proceedings of Linguistics Annotation Workshop. 178-186.
  • Botvin, G. & Sutton-Smith, B. (1977). The development of structural complexity in children¡¯s fantasy narratives. Developmental Psychology, 13(4), 377–388.
  • Brants, T., & Franz, A. (2006). The Google web 1T 5-gram corpus version 1.1. Linguistic Data Consortium 2006T13, Philadelphia, PA. Retrieved from Novermber 11, 2016, http://www.ldc.upenn.edu/Catalog/ CatalogEntry.jsp?catalogId=LDC2006T13.
  • Brew, C., & Leacock, C. (2013). Automated short answer scoring. In Shermis, M. & Burnstein, J. (Eds.), Handbook of automated essay evaluation: Current applications and new directions (pp. 136-153). New York: Routledge.
  • Cai, S., & Knight, K. (2013). Smatch: an evaluation metric for semantic feature structures. In proceedings of the ACL. 748-752.
  • Chiang, D., Andreas, J., Bauer, D., Hermann, K., Jones, B., & Knight, K. (2013). Parsing graphs with hyperedge replacement grammars. In proceedings of ACL. 924-932.
  • Dodge, J., Goyal, A., Han, A., Mensch, A., Mitchell, M., Stratos, K., Yamaguchi, K., Choi, Y., Daume, H., Berg, A., & Berg, T. (2012). Detecting visual text. In proceedings of Conference of the NACACL. 762-772.
  • Flangian, J., Thomson, S., Carbonell, J., Dyer, C., & Smith, N. (2014). A discriminative graph-based parser for the abstract meaning representation. In proceedings of ACL. 1426-1436.
  • Habash, N., & Dorr, B. (2001). Large scale language independent generation: using thematic hierarchies. In proceedings of the MT-Summit. 139-144.
  • Knight, K., & Luk, S. (1994). Building a large-scale knowledge base for machine translation. In proceedings of AAAI. 773-778.
  • Kuznetsova P., Ordonez, V., Berg, A., Berg, T., & Choi, Y. (2013). Generalizing image captions for image-text parallel corpus. In proceedings of ACL. 790-796.
  • Langklide, I., & Knight, K. (1998). Generation that exploits corpus-based statistical knowledge. In proceedings of COLING. 704-710
  • Li, Y., Bandar, A., McLean, D., & O¡¯Shea, J. (2004). A method for measuring sentence similarity and its application to conversational agents. In proceedings of the International FLAIRS Conference. 820–825
  • Lin, C., & Hovy, E. (2003). Automatic evaluation of summaries using n-gram co-occurrence statistics. In proceedings of NACAC. 71-98.
  • Liu, H., & Wang, P. (2008). Assessing sentence similarity using WordNet based word similarity. Journal of Software, 8(6), 1451-1458.
  • Madnani, N., Burstein, J., Sabatini, J., & O'Reilly, T. (2013). Automated scoring of a summary writing task designed to measure reading comprehension. In proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications. 163-168
  • McKeough, A. & Malcolm, J. (2011). Stories of family, stories of self: Developmental pathways to interpretive thought during adolescence. New Directions for Child & Adolescent Development, 2011(131), 59-71.
  • Matthiessen, C., & Bateman, J. (1991). Text Generation and Systemic-functional Linguistics: Experiences from English and Japanese. London: Pinter Publishers.
  • de Marneffe, M.-C., & Manning, D. (2008). The Stanford typed dependencies representation. In proceedings of COLING. 1-8.
  • Meyers, A., Reeves, R., Macleod, C., Szekely, R., Zielinska, V., Young, B., & Grishman, R. (2004). The NomBank project: An interim report. In proceedings of NACACL. 24-31.
  • Ordonez, V., Han, X., Kuznetsova, P., Kulkarni, G., Mitchell, M., Yamaguchi, K., Stratos, K., Goyal, A., Dodge, J., Mensch, A., Daume, H., Berg, A., Choi, Y., & Berg, T. (2015). Large scale retrieval and generation of image description. International Journal of Computer Vision, 119(1), 46-59.
  • Palmer, M., Gildea, D., & Kingsbury, P. (2005). The Proposition Bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1), 71-106.
  • Papineni, K., Roukos, S., Ward, T., & Zhu. W. (2002). BLEU: A method for automatic evaluation of machine translation. In proceedings of ACL. 311-318.
  • Rebecca, G., Pearl, L., Dorr, B., & Resnik, P. (2001). Mapping WordNet senses to a lexical database of verbs. In proceedings of ACL. 244-251.
  • Sukkarieh, J., & Blackmore, J. (2009). C-rater: automatic content scoring for short constructed responses. In proceedings of the International FLAIRS Conference. 290-295.
  • Sukkarieh, J., & Stoyanchev, S. (2009). Automating model building in c-rater. In proceedings of the 2009 Workshop on Applied Textual Inference. 61-69.
  • Somasundaran, S., Lee, C., Chodorow, M., & Wang, X. (2015). Automated scoring of picture-based story narration. Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, 42–48.
  • Sun, L., & Nippold, M. (2012). Narrative writing in children and adolescents: Examining the literate lexicon. Language, Speech, and Hearing Services in Schools, 43(1), 2–13.
  • Vanderwende, L., Menezes A., & Quirk, C. (2014). An AMR parser for English, French, German, Spanish and Japanese and a new AMR-annotated corpus. In proceedings of NAACL-HLT. 26-30.
  • Young, P., Lai, A., Hodosh, M., & Hockenmaier, J. (2014). From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the ACL, 2(10), 67–78.