
Dilin Liu (刘迪麟),美国俄克拉荷马州立大学英文博士,美国阿拉巴马大学英语系语言学及应用语言学专业荣休教授(Professor Emeritus)、大连外国语大学启航讲座教授、国际知名语料库语言学和社会语言学专家,长期从事基于语料库和认知语言学的英语语法与词汇教学研究,并对部分语言现象进行深入的认知分析,研究成果颇丰。发表各类学术作品80多篇/部,包括在国际著名出版机构出版的学术著作6部、编著1部, 在Applied Linguistics、Cognitive Linguistics、ELT Journal、International Journal of Corpus Linguistics、Modern Language Journal等国际高影响力学术期刊发表学术论文40余篇。
A multi-dimensional comparison of the effectiveness and efficiency of association measures in collocation extraction
Yaochen Deng and Dilin Liu
Because of the ubiquity and importance of collocations in language use/learning, how to effectively and efficiently identify target collocations has been a topic of great interest. Although some studies have evaluated some of the existing association measures (AMs) used in the automatic identification of collocations, the results so far have been inconsistent and unclear due to various limitations of the existing studies. Hence, this study makes a multi-dimensional evaluation of the effectiveness and efficiency of seven major AMs in the identification of three types of collocations across five genres and seven corpora of different sizes. The results indicate that while a few AMs, such as Log Likelihood Ratio and Cubic Mutual Information (MI3) are consistently more effective and efficient than the other five AMs being examined, no AM alone may be adequate in the identification of different types collocations across different genres and corpus sizes. Research implications are also discussed.
1. Introduction
2. Background and rationale: Key issues regarding collocation definition/identification
2.1 Definition and types of collocations
2.2 Existing AMs and studies on the effectiveness and efficiency of AMs
3. Methodology
3.1 AMs and factors included for evaluation and comparison
3.2 Corpora used
3.3 Tools and procedures used for data analysis and AM evaluation/comparison
4. Results and discussion
4.1 Results for Research Question 1: Variations among AMs in the general corpus
4.2 Results for Research Question 2: Effects of genres
4.3 Results for Research Question 3: Effects of collocation types
4.4 Results for Research Question 4: Effects of text length
4.5 Summary discussion
5. Conclusion
