Hello Students, Today we are going to share all week evaluation and quiz answers of Introduction to Machine Learning Quiz Answer in course started by Coursera completely free of cost. This is a certification course for every aspiring students.

If you did not get this course for free, you can apply for financial advertisements to get this course completely free.

Coursera, India’s largest learning platform, launched millions of free courses for students every day. These courses are from various recognized university, where industry experts and professors teach in a very good manner and in a more understandable manner.

Here, you will get the Introduction to Machine Learning Quiz Answer Exam Answers in bold color which are given below.

**These are the answers to Introduction to Machine Learning Quiz Answer questions from Coursera’s free certification course. These answers have been updated recently and are 100% correct. The final exam is on Monday, April. All answers are 100%.**

**Introduction to Machine Learning Quiz Answer**

**Week-1 **

Week 1 Comprehensive

Question 1: Given the following image of data classifications, which of the following models would you choose?

1 point

**Logistic regression**- Multilayer perceptron

Question 2: What is the primary advantage of using multiple filters?

1 point

- More complexity is always better.
- This requires less compute power.
**This allows the model to look for subtypes of the classification.**- This is simpler to implement.

Question 3 :Which of the following is convolved with layer 2 features, or sub-motifs?

1 point

- Layer 2 feature map
**Layer 1 feature map**- Layer 3 feature map

Question 4: Which one of the following best describes transfer learning in the context of document analysis?

1 point

- All parameters of the model are different between individuals.
**Parameters at the bottom of the model are transferable across all people and documents, while the parameters at the top are different between individuals.**- All parameters of the model are transferable across all people and documents.
- Parameters at the top of the model are transferable across all people and documents, while the parameters at the bottom are different between individuals.

Question 5 :Which of the following are necessary for supervised machine learning? (Choose all that are correct)

1 point

**A model****Learning from data****Labeled training data**- Human to teach the machine

Question 6 :What does transfer learning mean in the context of medical imaging?

1 point

- Just as assigning categories to images in ImageNet required millions of images, so too does analyzing medical images require millions of labeled medical images.
- Sufficient labeled radiological images can be used to learn all of the model parameters, so they can be used for ophthalmological or dermatological images.
- Once the convolutional layers are learned from labeled medical images, the top layers can be inferred from the parameters found with data from ImageNet.
**Weights of convolutional layers learned from ImageNet transfer to medical images, so we only need learn new parameters at the top of the network.**

Question 7 :What new feature did neural networks acquire in 2010?

1 point

- A new computational platform: the GPU
- A new application: image search
- A new operation: convolution
**A new name: Deep Learning**

Question 8 :What decision boundary can logistic regression provide?

1 point

- Arbitrarily complex functions
- Jagged edges
- Smooth curves
**Linear**

Question 9 :What is the primary advantage of having a deep architecture?

1 point

- There is a higher probability that each motif is used in the classifier.
**The model shares knowledge between motifs through their shared substructures.**- A model can learn each top-level motif in isolation.
- The parameters of a deep architecture are less expensive to compute.

Question 10 :Which of the following gives the best conceptual meaning of convolution?

1 point

- Surveying a feature map for high-level motif.
- Selecting an atomic element from an image.
- Stacking a collection of feature maps.
**Shifting a filter to every location in an image.**

Week 2 Comprehensive

Question 1 :What is overfitting

1 point

- Overfitting refers to the fact that more complexity is always better, which is why deep learning works.
**Model complexity fits too well to training data and will not generalize in the real-world.**- Model complexity is perfectly matched to the data.
- Model complexity is not enough to capture the nuance of the data and will under-perform in the real-world.

Question 2: What does the equation for the loss function do conceptually?

1 point

- Mathematically define network outputs
**Penalize overconfidence**- Ignore historical statistical developments
- Reward indecision

Question3 :What are the two main benefits of early stopping?

1 point

**It helps save computation cost.****It performs better in the real world.**- It improves the training loss.
- There is rigorous statistical theory on it.

Question 4:How do we learn our network?

1 point

**Gradient descent**- Downhill skiing
- Monte Carlo simulation
- Analytically determine global minimum

Question 5 : Why should the test set only be used once?

1 point

**More than one use can lead to bias.**- More than one use can lead to overfitting.
- The model cannot learn anything new from subsequent uses.
- It is expensive to use more than once.

Question 6:Why are optimization and validation at odds?

1 point

**Optimization seeks to do as well as possible on a training set, while validation seeks to generalize to the real world.**- Optimization seeks to generalize to the real world, while validation seeks to do as well as possible on a validation set.
- Optimization seeks to do as well as possible on a training set, while validation seeks to do as well as possible on a validation set.
- They are not at odds—they have the same goal.

Question 7 :What technique is used to minimize loss for a large data set?

1 point

- Newton’s method
- Taylor series expansion
**Stochastic gradient descent**- Gradient descent

Question 8 :Why is gradient descent computationally expensive for large data sets?

1 point

- Large data sets do not permit computing the loss function, so a more expensive measure is used.
**Calculating the gradient requires looking at every single data point.**- Large data sets require deeper models, which have more parameters.
- There are too many local minima for an algorithm to find.

Question 9 :Which of the following are benefits of stochastic gradient descent?

1 point

**With stochastic gradient descent, the update time does not scale with data size.**- Stochastic gradient descent finds the solution more accurately.
**Stochastic gradient descent can update many more times than gradient descent.**- Stochastic gradient descent gets near the solution quickly.
- Stochastic gradient descent finds a more exact gradient than gradient descent.

Question 10 :Which two of the following describe the purpose of a validation set?

1 point

- To estimate the performance of a model.
**To pick the best performing model.**- To test the performance in lieu of real-world data.
- To learn the model parameters.

Week 3 Comprehensive

Question1: Which of the following is used to distinguish the false positive rate from the false negative rate?

1 point

- Sensitivity
- False Negative
- Negative Predictive Value
**Specificity**

Question 2 :Which of the following is an advantage of hierarchical representation of image features?

1 point

- Eliminating bias.
- Decreasing the computational complexity.
**Better leveraging all training data.**- Decreasing variance in the model.

Question 3 :Why are nonlinear activation functions preferable?

1 point

- Nonlinear activation functions are preferable because they are used in generalized linear models in statistics.
**Nonlinear activation functions increase the functional capacity of the neural network by allowing the representation of nonlinear relationships between features in input.**- Nonlinear activation functions are preferable because they have been used historically.
- Nonlinear activation functions are NOT preferable to linear ones, as they lose information in systems with high variance.

Question 4 :Which of the following indicates whether a doctor or machine is doing well at finding positive examples in a data set?

1 point

- Positive Predictive Value
- Likelihood Ratio
**Sensitivity**- Specificity

Question 5: Which of the following can a user choose when designing a convolutional layer? (Choose all that are correct.)

1 point

**Filter depth****Filter size****Filter number****Filter stride**- Filter weights

Question 6:What is a fully connected readout?

1 point

- A layer with ten classifications.
- A layer with connections to all feature maps.
- The vectorization of a pooling layer.
**A layer with a single neuron for each output class.**

Question 7:Which of the following are benefits of pooling? (Choose all that are correct.)

1 point

**Decreases bias.****Combats overfitting.****Vectorizes the data.****Encourages translational invariance.**- Reduces computational complexity.

Question 8 :Why does transfer learning work?

1 point

**Top-level features are specialized for a particular task, while low-level features are universal to all images.**- All layers of filters can be learned by studying the mammalian receptive fields.
- Low-level features are specialized for a particular task, while top-level features are universal to all images.
- All images are composed of pixels with three color channels.

Question 9: Which of the following is the best conceptual definition of one dimensional convolution?

1 point

- “Inverting” of a shape, where the inversion matches a feature.
**“Sliding” of two signals, where a matched feature gives a high value of convolution.**- “Intertwining” of two signals, where one wraps around the other to form a feature.
- “Distortion” of one signal, according to the feature shape

Question 10: How are parameters that minimize the loss function found in practice?

1 point

- Fractal geometry
- Gradient descent
- Simplex algorithm
**Stochastic gradient descent**

Week 4 Comprehensive

Question 1: What is the continuous bag of words (CBOW) approach?

1 point

**Vectors for the neighborhood of words are averaged and used to predict word n.**- Word n is used to predict the words in the neighborhood of word n.
- Word n is learned from a large corpus of words, which a human has labeled.
- The code for word n is fed through a CNN and categorized with a softmax.

Question 2:Which word is a synonym for “word vector”?1 point

- Norm
- Array
**Embedding**- Stack

Question 3: What is the goal of learning word vectors?

1 point

- Find the hidden or latent features in a text.
- Labelling a text corpus, so a human doesn’t have to do it.
- Determine the vocabulary in the codebook.
**Given a word, predict which words are in its vicinity.**

Question 4 :What is natural language processing?

1 point

- Making natural text conform to formal language standards.
- Translating natural text characters to unicode representations.
- Translating human-readable code to machine-readable instructions.
**Taking natural text and making inferences and predictions.**

Question 5: What is the term for a set of vectors, with one vector for each word in the vocabulary?

1 point

- Space
- Array
**Codebook**- Embedding

Question 6 : What is meant by “word vector”?

1 point

- The latitude and longitude of the place a word originated.
**A vector of numbers associated with a word.**- Assigning a corresponding number to each word.
- A vector consisting of all words in a vocabulary.

Question 7: What is the goal of the recurrent neural network?

1 point

- Learn a series of images that form a video.
- Predict words more efficiently than Skip-Gram.
**Synthesize a sequence of words.**- Classify an unlabeled image.

Question 8: What function is the generalization of the logistic function to multiple dimensions?

1 point

- Hyperbolic tangent function
- Exponential log likelihood
- Squash function
**Softmax function**

Question 9: Which model is the state-of-the-art for text synthesis?

1 point

**Long short-term memory**- CNN
- Multilayer perceptron
- CBOW

Question 10: What is the Skip-Gram approach?

1 point

**Word n is used to predict the words in the neighborhood of word n.**- The code for word n is fed through a CNN and categorized with a softmax.
- Word n is learned from a large corpus of words, which a human has labeled.
- Vectors for the neighborhood of words are averaged and used to predict word n.

## Conclusion:

Quizzes can be used to generate more interest among the students who want to learn in a competitive situation. Two such quizzes that work well with students in the middle school year level include “The Hollow Square Quiz” and “The Hollow Circle Quiz.” The second quiz is called “The Student Created Quiz”, which can be used as part of the revision program. From our website the student of each group will get quiz answers on that topic.