This site is last-updated on 2024-02-21
Important course information will be posted on this web page and announced in class. You are responsible for all material that appears here and should check this page for updates frequently.
Computational Linguistics (CL) is now a very active sub-discipline in applied linguistics. Its main focus is on the computational text analytics, which is essentially about leveraging computational tools, techniques, and algorithms to process and understand natural language data (in spoken or textual formats). Therefore, this course aims to introduce useful strategies and common workflows that have been widely adopted by data scientists to extract useful insights from natural language data. In this course, we will focus on textual data processing.
A selective collection of potential topics may include:
This course is extremely hands-on and will guide the students through classic examples of many task-oriented implementations via in-class theme-based tutorial sessions. The main coding language used in this course is Python . We will make extensive use of the language. It is assumed that you know or will quickly learn how to code in Python. In fact, this course assumes that every enrolled student has working knowledge of Python. (If you are not sure if you fulfill the prerequisite, please contact the instructor first.)
A test on Python Basics will be conducted on the first week of the class to ensure that every enrolled student fulfills the prerequisite. (To be more specific, you are assumed to have already had working knowledge of all the concepts included in the book, Lean Python: Learn Just Enough Python to Build Useful Tools). Those who fail on the Python basics test are NOT advised to take this course.
Please note that this course is designed specifically for linguistics majors in humanities. For computer science majors, please note that this course will not feature a thorough description of the mathematical operations behind the algorithms. We focus more on the practical implementation.
(The schedule is tentative and subject to change. Please pay attention to the announcements made during the class.)
|Course Orientation and Computational Linguistics Overview
|Machine Learning Basics: Regression and Classification
|Naïve Bayes, Logistic Regression
|Feature Engineering and Text Vectorization
|Common NLP Tasks (Guest Speaker: Robin Lin from Droidtown Linguistic Tech. Co. Ltd. )
|Neural Network: A Primer
|Deep Learning NLP and Word/Doc Embeddings
|Sequence Model I: RNN and Neural Language Model
|Sequence Model II: LSTM and GRU
|Sequence Model III: Sequence-to-Sequence Model & Attention
|Transformer, BERT, Transfer Learning, and Explainable AI
|LLM, RAG, and Multimodal Processing
All the course materials are available on the course website. Please consult the instructor for the direct link to the course materials. They will be provided as a series of online packets (i.e., handouts, script source codes etc.) on the course website.
While I have made every attempt to ensure that the information contained on the Website is correct, I am not responsible for any errors or omissions, or for the results obtained from the use of this information. All information on the Website is provided “as is”, with no guarantee of completeness, accuracy, timeliness or of the results obtained from the use of this information, and without warranty of any kind, express or implied.
You may print a copy of any part of this website for your personal or non-commercial use. Without the author’s prior written consent, you cannot disclose confidential information of the website (e.g., log-in username and password) to any third party.