[September 14, 2022] If you are interested in doing a natural language processing (NLP) research project via the design and implementation of statistical data analytics algorithm and web-based service in PHP with JSON and AJAX, please send Prof. Maiga Chang your application (incldes cover letter, CV, official or unofficial transcript, and student letter) by SEPTEMBER 30, 2022. The opportunity opens until filled.
Prof. Maiga Chang (http://maiga.athabascau.ca) at School of Computing and Information Systems is looking for an eligible student who is able to work full-time for a full 16 weeks between January 2023 to April 2023 at home (but if permitted, a face-face experience will be facilitated) to apply for NSERC USRA (Undergraduate Student Research Award) for doing "Valid N-Grams Identification Web Service based on Statistical Natural Language Processing Techniques" research project.
NSERC USRA Eligibility:
Check the eligibility of applying NSERC USRA at: http://www.nserc-crsng.gc.ca/Students-Etudiants/UG-PC/USRA-BRPC_eng.asp OR see the summary bullet list below:
- be a Canadian citizen, a permanent resident of Canada or a Protected Person under subsection 95(2) of the Immigration and Refugee Protection Act (Canada), as of the deadline date for applications at the institution
- have obtained, over the previous years of study, a second class cumulative average (normally a 3.5/4.0);
- not held USRA in this current fiscal year (i.e., April 1, 2022 to March 31, 2023);
- haven't held three USRAs throughout your undergraduate university career;
- have completed all the course requirements of at least the first year of university study (or two academic terms) of your bachelor's degree;
- have been registered in at least one of the two terms immediately before holding the award in a bachelor's degree program at an eligible institution;
- have not started a program of graduate studies in the natural sciences or engineering at any time;
- not have higher degrees in the natural sciences or engineering;
- be engaged on a full-time basis in research and development activities in the natural sciences or engineering during the tenure of the award.
In the past couple of years, the VIP Research Group (https://youtube.vipresearchgroup.ca) led by Prof. Maiga Chang has done several AI and Natural Language Processing (NLP) based summary generation and word/sentence similarity calculation web-based applications (https://ask4summary.vipresearch.ca) and service (https://ws-nlp.vipresearch.ca) that are capable of dealing with multi-sentence paragraph in different languages include English, French, and Hindi.
The VIP Research Group earlier designed and developed method and workflow that enable computer to self-learn vocabulary (also known as "ngrams" in NLP research area) from three sources: DBpedia Long Abstract, Google Books Ngrams, and DBpedia Tiny Diamond labels. The "n" in the term "ngram" is a number and indicates that the ngram is composed of how many words. Every individual word in a ngram has its own Part-of-Speech tag according to the context of a sentence and the ngrams where it is involved. The adoption of natural language tookit, library, or service can easily retrieve a ngram's Part-of-Speech tag.
In any content, lot of ngrams (word combinations) can be extracted. However, in order to provide accurate and relevant services (e.g., search engine, chatbot, question and answering, etc.), important and meaningful ngrams need to be identified from a given content before a quality and useful response could be generated. This research aims to investigate a method for answering the question "which ngrams are more important than others" and helping to identify valid ngrams from a piece of given text.
The student in this NSERC USRA research project will work on the following tasks:
- designing and implementing a statistical method/algorithm to narrow down the important and common PoS tag combinations from existing ngrams collected from the three sources;
- creating a browser-based dashboard to present the process of the statistical method is done step-by-step in a visualized way so everyone can understand how those important and common PoS tag combinations are identified;
- enhancing and improving the current ngrampos web service v1.1 (see https://ngrampos.vipresearch.ca/#instruct) so the new version 2.0 can use the proper method (traditional top-X PoS tag combination method OR the statistical method) to identify and return the valid ngrams for the request JSON package sent in by the user; and,
- creating the correspondent Try It function (see https://ngrampos.vipresearch.ca/tryit/index-tryitV1.1.php) and How to Access webpage (see https://ngrampos.vipresearch.ca/instructions/docV1.1.html) for the new ngrampos service version 2.0.
Skills and Experiences Requirement:
The student should have the following skills and better to have some experiences on:
Familiar with mySQL/MariaDB and SQL commands.
Information collection, re-organization, summarization, and reports.
Using screen capture software and video editing software/tool to creating instructional video clips in mp4 format.
[Asset] Familiar with Python.
[Asset] Having experience in using Natural Language Processing library (NLTK) with Python.
[Asset] Having experience in developing Python standalone application.
- Open until filled.
- Interview will be scheduled via online via Microsoft Teams @ https://meeting.vipresearchgroup.ca in the weeks of ***SEPTEMBER 26 to OCTOBER 7, 2022*** until the position is filled.
- Please prepare a very short (up to 15-minute) presentation (and live demonstration, if applies) to show the required skills you have, e.g., showing the works you have done before.
"STUDENT LETTER" (1-page maximum) needs to describe:
- a. Why they want to undertake this research project;
- b. How the research contribution and experience will advance academic or professional plans upon completion of their undergraduate degree;
- c. Qualifications (e.g., education, experience, leadership roles, etc.) for this research award;
- d. Previous awards and scholarships, if applicable;
- e. Publications and presentations (i.e., talks, posters) or other forms of knowledge creation and transfer, if applicable; and
- f. Additional research experience to date, if applicable;
- g. An additional page should be used to describe the following, if applicable:
- An explanation for any course withdrawals and/or lower transfer credits and/or semesters of study with less than 5 courses, if applicable; and
- If you have taken a full course load (5 courses/semester) and would like to explain exceptional circumstances that may have negatively impacted their GPA, you may do so.