CBSE Class 12 – Data Science Question Paper 2023

Section A

1. Answer any 4 out of the given 6 questions on Employability Skills.

(i) A/An __________ may be defined as underlying characteristic of a person which results in effective and/or superior performance in a job.
Options:
(a) uncertainty
(b) competence
(c) obstacles
(d) fear

Answer:
(b) competence

(ii) Which of the following is not true for formulas in a spreadsheet?
Options:
(a) A formula always starts with an equal to (=) sign.
(b) A formula is displayed in the formula bar.
(c) Formulae are used to calculate results through arithmetic operations.
(d) In numeric formulae, you cannot make use of operators.

Answer:
(d) In numeric formulae, you cannot make use of operators.

(iii) Given below are two statements, one labelled as Assertion (A) and the other labelled as Reason (R). Select the correct answer from the code given below:

Assertion (A): Physiological motivation can be guided by the need for achievement and the need for affiliation.
Reason (R): The need for achievement is a social form of motivation involving a competitive drive to meet the standards of excellence.

Options:
(a) Both Assertion (A) and Reason (R) are true and Reason (R) is the correct explanation of Assertion (A).
(b) Both Assertion (A) and Reason (R) are true, but Reason (R) is not the correct explanation of Assertion (A).
(c) Assertion (A) is true, but Reason (R) is false.
(d) Assertion (A) is false, but Reason (R) is true.

Answer:
(a) Both Assertion (A) and Reason (R) are true and Reason (R) is the correct explanation of Assertion (A).

(iv) If the column is not wide enough to display the value, which of the following errors is displayed?
Options:
(a) #VALUE!
(b) #DIV/0!
(c) #####
(d) #COLUMN!

2. Answer any 5 out of the given 6 questions

Answer:
(c) #####

(i) __________ is the right of an individual to have control over how his or her personal information is collected and used.
Options:
(a) Data Privacy
(b) Data Governance
(c) Governing
(d) Data Quality

Answer:
(a) Data Privacy

(ii) Which of the following refers to the process of identifying incorrect, incomplete and inaccurate data?
Options:
(a) Exploratory Data Analysis
(b) Data Cleaning
(c) Univariate Analysis
(d) Data Ethics

Answer:
(b) Data Cleaning

(iii) Every ________ represents the outcome of the test in a decision tree.
Options:
(a) Internal node
(b) External node
(c) Branch
(d) Leaf node

Answer:
(d) Leaf node

(iv) The _________ algorithm is one of the most basic and easy-to-implement supervised machine learning algorithms.
Options:
(a) Decision tree
(b) K-nearest neighbors (K-NN)
(c) Classification
(d) Scoping

Answer:
(b) K-nearest neighbors (K-NN)

(v) __________ is the square root of the variance of the residuals.
Options:
(a) Root Mean Square Deviation
(b) Required Mean Similar Deviation
(c) Root Median Similar Deviation
(d) Regular Mean Square Deviation

Answer:
(a) Root Mean Square Deviation

(vi) In which type of learning do algorithms act on data without human intervention?
Options:
(a) Unsupervised
(b) Supervised
(c) Limited
(d) Conditional

Answer:
(a) Unsupervised

3. Answer any 5 out of the given 6 questions.

(i) PDP stands for:
Options:
(a) Personal Data Protection Bill
(b) Private Data Protection Bill
(c) Personal Device Protection Bill
(d) Personal Device Prevention Bill

Answer:
(a) Personal Data Protection Bill

(ii) __________ is a more complex form of statistical analysis technique and is used to analyze more than two variables in the data set.
Options:
(a) Graphical method
(b) Multivariate analysis
(c) Univariate analysis
(d) Unsupervised learning technique

Answer:
(b) Multivariate analysis

(iii) Decision trees are used to solve _________ problems.
Options:
(a) Only classification
(b) Only regression
(c) Both classification and regression
(d) All universal

Answer:
(c) Both classification and regression

(iv) Can we use K-NN algorithms for data mining problems?

Answer:
Yes, K-NN (K-nearest neighbors) algorithms can be used for data mining problems, especially for classification and regression tasks.

(v) Y = m X + b is the formula for __________.
Options:
(a) Mean Absolute Error
(b) Root Mean Square Deviation
(c) Simple linear regression
(d) Simple linear classification

Answer:
(c) Simple linear regression

(vi) State whether the following statement is true or false:

The formula for non-linear and linear regression is the same.

Answer:
False. The formula for linear regression is $Y = mX + b$

$Y = m X + b$ , while non-linear regression uses a different equation based on the specific type of non-linear relationship.

4. Answer any 5 out of the given 6 questions

(i) The Children Online Privacy and Protection Act is a law that deals with:
Options:
(a) Privacy policy for children who are less than the age of 13 years.
(b) Privacy policy for adults.
(c) Public policy for all children.
(d) Protection policy for mothers of newborns.

Answer:
(a) Privacy policy for children who are less than the age of 13 years.

(ii) The process of Exploratory Data Analysis is done with the help of summary statistics and _________ representations.
Options:
(a) Analytical
(b) Graphical
(c) Logical
(d) Sequential

Answer:
(b) Graphical

(iii) Regression trees are used when the dependent variable is non-continuous. Is the given statement true or false?

Answer:
False. Regression trees are used when the dependent variable is continuous. For non-continuous variables, classification trees are used.

(iv) Which of the following is true for the K-NN algorithm?
Options:
(a) Interpretability of the K-NN algorithm is very low.
(b) The K-NN algorithm explicitly has a training step.
(c) The K-NN algorithm is not sensitive to outliers.
(d) K-NN works well with a small number of input variables, but as the number of variables grow, the K-NN algorithm struggles to predict the output of a new data point.

Answer:
(d) K-NN works well with a small number of input variables, but as the number of variables grow, the K-NN algorithm struggles to predict the output of a new data point.

(v) The actual value of the _________ depends on the data and accuracy required.
Options:
(a) RMSE
(b) MAE
(c) NLR
(d) Model

Answer:
(d) Model

(vi) Trigonometric functions are examples of linear functions. Is the given statement true or false?

Answer:
False. Trigonometric functions are non-linear functions.

5. Answer any 5 out of the given 6 questions.

(i) Ethics govern the behaviour or actions of an individual.
Options:
(a) True
(b) False

Answer:
(a) True

(ii) Bivariate analysis is also a good way to measure the _________ between the two variables.
Options:
(a) Difference
(b) Mean
(c) Correlation
(d) Median

Answer:
(c) Correlation

(iii) For every possible decision in a decision tree, stemming from the root makes a _________.
Options:
(a) Branch
(b) Tree
(c) Root
(d) Node

Answer:
(a) Branch

(iv) __________ is a non-parametric algorithm, as it does not assume anything about the distribution of the data.
Options:
(a) Cross Validation
(b) Regression
(c) Dataset
(d) K-NN

Answer:
(d) K-NN

(v) What is the basic objective of linear regression?
Options:
(a) To reduce the distance between the line and data points to make it minimum.
(b) To increase the distance between the line and data points to make it maximum.
(c) To find the sum of the distance between the line and data points.
(d) To find the average of the distance between the line and data points.

Answer:
(a) To reduce the distance between the line and data points to make it minimum.

(vi) A __________ regression equation has an intercept on the right-hand side and an explanatory variable with a coefficient.
Options:
(a) Circular
(b) Multiple
(c) Complex
(d) Simple

Answer:
(d) Simple

SECTION B

Answer any 3 out of the given 5 questions on Employability Skills. Answer each
question in 20 30 words.

6. Two benefits of Entrepreneurial Competencies:

Improved Decision-Making: Entrepreneurial competencies help entrepreneurs to make informed and better decisions based on market conditions, financial analysis, and long-term goals.
Increased Ability to Manage Risks: Entrepreneurs with strong competencies are better equipped to identify, assess, and mitigate risks effectively, improving their chances of success.

7. Answer the following questions from the spreadsheet given below:

My Store Spreadsheet:

A	B	C	D
1	Product	Cost Price	Selling Price
2	Ruler	9	15
3	Crayons	23	25
4	Pens	14	12
5	Cardboard	35	45
6	Highest

(a) Write a formula in cell D2, to calculate the Profit or Loss for the item “Ruler” as the difference of “Cost Price” and “Selling Price”.

Answer:
In cell D2, use the following formula to calculate the Profit or Loss for the item “Ruler”:

This formula subtracts the Cost Price (cell B2) from the Selling Price (cell C2) to give the Profit or Loss for the “Ruler”.

(b) Write a function in B6 to display the highest cost price of the product.

Answer:
In cell B6, use the following function to display the highest Cost Price from the products listed:

This function will return the highest value from cells B2 to B5, which are the Cost Prices of the products listed.

8. List any two sources of motivation and inspiration.
Answer:

Personal Goals and Ambitions: Achieving personal goals, such as pursuing a career dream or self-improvement, provides motivation and inspiration.
Role Models and Mentors: Learning from the success stories and advice of mentors or role models can inspire individuals to pursue their own aspirations.

9. Differentiate between internal and external motivation. Give a suitable example.
Answer:

Internal Motivation: This type of motivation comes from within the individual, driven by personal desires, values, or satisfaction.
Example: A person works hard to achieve personal goals, such as mastering a new skill or reaching a fitness target, because they find intrinsic satisfaction in the process.
External Motivation: This type of motivation is driven by external factors, such as rewards, recognition, or approval from others.
Example: A person works hard to meet a deadline or target because they want to receive a bonus or promotion at work.

10. Match the given attitudes of an entrepreneur with their characteristics:

Attitudes	Characteristics
(a) Interpersonal skills	(iii) Ability to work with others
(b) Taking initiative	(i) Ability to take charge and act in a situation before others
(c) Decisiveness	(iv) Ability to make quick and profitable decisions
(d) Perseverance	(ii) Ability to continue to do something even when it is difficult

Answer any 4 out of the given 6 questions in 20 30 words each.

11. Name the areas of focus of data governance.
Answer:

Data Quality Management: Ensuring the accuracy, consistency, and reliability of data across the organization.
Data Security and Privacy: Protecting data from unauthorized access and ensuring compliance with privacy regulations.

12. Differentiate between Univariate analysis and Bivariate analysis.
Answer:

Univariate Analysis: Involves the analysis of a single variable to summarize and find patterns. It helps to describe the distribution, central tendency, and variability of the data. Example: Analyzing the sales data of one product.
Bivariate Analysis: Involves the analysis of two variables to explore the relationship between them. It is used to determine how changes in one variable might affect the other. Example: Analyzing the relationship between advertising expenditure and sales.

13. Why are decision trees considered to be versatile?
Answer:
Decision trees are considered versatile because they can be used for both classification and regression tasks, and they handle both numerical and categorical data effectively. Additionally, they provide clear, interpretable results in the form of a tree-like structure, making them easy to understand and implement in various scenarios.

14. What is the principle on which K-NN algorithm works?
Answer:
The K-Nearest Neighbors (K-NN) algorithm works on the principle that data points that are close to each other (in terms of distance metrics such as Euclidean distance) are likely to share the same class or outcome. The algorithm classifies a new data point based on the majority class of its K nearest neighbors in the training dataset.

15. When do we use linear regression?
Answer:
Linear regression is used when there is a linear relationship between the dependent variable (the outcome you want to predict) and one or more independent variables (predictors). It is typically used for predicting a continuous outcome based on the input features.

Example: Predicting the price of a house based on its size and location.

16. The formula for non-linear regression is y ~ f(x, ). What do x and y denote in the given formula?
Answer:
In the formula y ~ f(x, ):

x represents the independent variable(s) or input feature(s), which are used to predict the dependent variable.
y represents the dependent variable or output, which is being predicted or modeled.

The function f(x, ) represents the non-linear relationship between the independent and dependent variables.

Answer any 3 out of the given 5 questions in 50 80 words each.

17. Write a short note on General Data Protection Regulation (GDPR).
Answer:
The General Data Protection Regulation (GDPR) is a regulation enacted by the European Union (EU) to protect the privacy and personal data of individuals within the EU and the European Economic Area (EEA). It aims to give individuals more control over their personal data and simplifies the regulatory environment for international business by unifying data protection laws across Europe. Key provisions of the GDPR include:

Data subject rights: Individuals have the right to access, rectify, and erase their personal data.
Data breach notifications: Companies must notify authorities and affected individuals in the event of a data breach.
Consent: Businesses must obtain explicit consent from individuals before collecting or processing their data.
Accountability: Organizations must demonstrate compliance with GDPR principles through regular audits and records.
Data Protection by Design: Privacy must be considered from the outset of any project involving personal data.

18. List any four tools and methods used to perform Exploratory Data Analysis (EDA).
Answer:

Descriptive Statistics: Using summary statistics like mean, median, mode, variance, and standard deviation to understand the distribution of the data.
Data Visualization: Tools such as matplotlib, seaborn, or ggplot2 help in creating histograms, box plots, scatter plots, and bar charts to visualize the data distribution and relationships.
Correlation Matrix: Using a correlation matrix to identify relationships between variables and understand how they are correlated.
Missing Value Analysis: Identifying and handling missing data using techniques such as imputation or removing rows/columns with missing values.

19. List any four important features of decision trees.
Answer:

Simple and Interpretable: Decision trees provide clear decision-making processes through tree-like structures, making them easy to understand and interpret.
Handles Both Categorical and Numerical Data: Decision trees can process both types of data efficiently without the need for scaling or normalization.
Non-linear Relationships: Decision trees are capable of modeling non-linear relationships, unlike linear models that only model linear patterns.
Feature Selection: Decision trees automatically perform feature selection by choosing the most important features at each node to split the data.

20. List any four disadvantages of K-NN compared to other algorithms.
Answer:

High Computational Cost: As K-NN requires calculating the distance between the test point and all training data points, it can be computationally expensive, especially with large datasets.
Sensitive to Irrelevant Features: K-NN’s performance can be affected by irrelevant or redundant features in the data, as it relies heavily on distance metrics.
Memory Intensive: K-NN stores the entire dataset in memory, which can be a problem when dealing with large datasets, as it requires a significant amount of storage.
Slower Prediction Time: For large datasets, K-NN can be slow during prediction time since it needs to compute distances between the test point and each training data point before making a decision.

21. Explain in points the working of K-means clustering algorithm.

Initialization: Choose the number of clusters (K) and randomly select K centroids from the data.
Assigning Points: Assign each data point to the nearest centroid based on the distance.
Recalculate Centroids: Update the centroids by computing the mean of all points in each cluster.
Repeat: Reassign points to new centroids and update centroids until they no longer change.
Convergence: The algorithm stops when centroids stabilize or after a set number of iterations.