Automating the processing of MCQ generation using state-of-the-art Natural Language Processing techniques.
MCQ's are the most commonly used assessment type and the task to create these MCQ assessments is usually cumbersome and requires a subject matter expert who reads through the content and generates appropriate questions and corresponding wrong answers (distractors).
Using advanced Natural Language Processing techniques, we will see how we can automate the process of MCQ Generation.
· · ·
How would a teacher create MCQs?
Before moving on to the technical implementation, let's go through how a question author (teacher, tutor, etc) would approach this problem. This will give us some idea on how to structure out the steps required for the solution.
Step 1: Identify key Sentences/Concepts
The first step would be to go through the sentences and figure out the key sentences/concepts being discussed in the given text. This would help us to filter out the sentences which are less important.
Step 2: Identify keywords from sentences
After filtering out the sentences, we need to extract the keywords or keyphrases. Keywords or keyphrases are the important concepts in the sentences which will serve as the correct answers, on the basics of which MCQs will be generated.
In the above text, Musk, Tesla, Bitcoin, Dogecoin, etc are the extracted keywords.
Step 3: Form Multiple Choice Questions
With the knowledge of the given text, the tutor/teacher can then form the multiple choice question. These questions are formed in such a way that the individual keywords or keyphrases are the answers to them.
Along with the question and the correct answer, the tutor/teacher will also create distractors ( wrong answer choices). The rule of thumb is that the distractors should be similar to the correct answer but not obvious enough to give away the correct answer. They are meant to confuse the quiz taker.
Posing it as an NLP Problem
The process will contain 5 steps in total -
- Abstractive/Extractive Summarization.
- Paraphrasing of sentences.
- Keyword/Keyphrase Extraction.
- Question Generation.
- Wrong choices/distractors generation.
Step 1: Abstractive/Extractive Summarization
Step 2: Paraphrasing of sentences
We would need to paraphrase the result if we are using extractive summarization in the first step. Although, if we had used abstractive summarization earlier we can skip this step as it has already been paraphrased during summarization.
We can choose amongst these paraphrases available in the Hugging Face NLP library.
Step 3: Keyword/Keyphrase extraction
Given the paraphrased sentence, next, we would extract the keywords. These keywords would serve as the basis for questions that will be generated in the next step.
Multipartite rank is one useful algorithm, used in keyword extraction. We can leverage the open-source library python-keyphrase-extraction (pk3) for this task.
Step 4: Question Generation
We have the keywords and the text, which will now be used to generate questions related to each keyword.
We can use the T5 Transformer model that is explicitly trained to take some context and a relevant keyword and generate an appropriate question as shown above.
Step 5: Getting distractors
· · ·
OVERVIEW
The diagram above shows different processes that we had used to generate Multiple Choice Questions from a given text.
Notice how using several techniques in combination helps us solve a real-world problem.
· · ·
This is how we can create Multiple Choice Questions (MCQs) automatically using AI.
If you want to know more about our tool Questgen which provides a simple and easy to use app to automatically create assessments like MCQs, True/False, etc in one click, read this blog post