## Statistical Analysis with Software Application

### This course reviews and expands upon core topics in probability and statistics through the study and practice of data analysis.

0.206The area of the standard normal curve to the right of z=0.82 is _______.
1A
perfect positive correlation coefficient is equal to
1What is the value of the standard deviation in a standard normal distribution?
1.02Which is NOT a value of r ?
7There are how many data mining techniques?
7What is value of quartile 3 in 2,4,4,4,5,5,6,8,9 ?
7In 2,4,4,4,5,5,6,8,9 the range is
9If
the standard deviation of a distribution is 3, the variance is
9.38In the equation of the regression line represented by Y= 1.24 X + 6.9 if X=2 then Y =?
10A score of 50 lies 2 standard deviations above a mean of 30.What is the value of the standard deviation?
12.25If the standard deviation of a distribution is 3.5, the variance is
18In α =babaa  β  =a^6b^5bb, what is the length of the concatenation of the two strings?
48On
an examination given to 1000  students,
Jef’s score of 80 was higher than the score of 480 students who took the exam.
What is the percentile for Jef’s score?
50What is the value of the mean in a normal probability density function?
If there are 101 scores the median is equal to the _____ranked score.
the _____ranked score.
84A survey of 100 consumers said that the price charged for a kilo of rice could be approximated by a normal distribution with a mean of 35 and a standard deviation of 4.How many are less than 39?
95What percent of data will lie within 2 standard deviation of the mean?
95What is the value of the mean if a score of 110 is 3 standard deviation above the mean?
95A survey of 100 consumers said that the price charged for a kilo of rice could be approximated by a normal distribution with a mean of 35 and a standard deviation of 4.How many of them lie between 27 and 43?
95Empirical rule for a normal distribution that is 2 standard deviations above and below the mean is ________% of data.
99.7Empirical rule for a normal distribution that is 3 standard deviations above and below the mean covers ______% of the data.
Area Under the CurveAUC means___________.
{ (3,4) (3,5) (2,4 ) {2,5) }If A={ 2,3} B={4,5},which of the following is a Cartesian product of the two sets?
{3,5,6,10,12}The range in  R={ (3,3), (3,6), (5,5),(5,10),(6.12)} is a binary relation in R is
{3,5,6}If R= { (3,3), (3,6), (5,5),(5,10),(6.12)} is a binary relation in R which the domain is
{A,C,I,S,T}If A= { x/x is a distinct letter in the word "MATHEMATICS"} AND B={x/x is a distinct letter in the word "STATISTICS"} then their intersection is
2x3The product of a 2x5 and 5x3 matrices is a ______matrix
5 exabytesHow many bytes of data are generated every two days in today's world?
52ndIf there are 103 scores the median is equal to the _____ranked score.
5x
8
What
is the size of the product of a 5x 6
and a 6x 8 matrices?
AWhich of the matrices is singular?
AWhich
A +
B = B+ A
Which of the following is TRUE?
A +
Which
a and bWhat conditions must be satisfied in the development of a  probability function for a discrete random variable?
analysis
of algorithms
It
is a process  of finding the
computational complexity of algorithms.
Another term for text analytics.Another term for text analytics.
ANOVAThe following are artifacts used in data analysis EXCEPT:
as x increases y also increases and vice versaPositive correlation means that_______________.
Big beta notationThe following are large inputs EXCEPT
billion
Exabyte
means ________bytes
bivariateData involving two variables.
bivariateData involving two variables are called _________data.
Business IntelligenceIt transforms data into actionable intelligence for business purposes.
business intelligenceIt is used in organization’s strategic and tactical business decision making.
Business IntelligenceIt offers a  way to examine trends from collected data and derive insights from it.
casualtyThe following are abstract notions EXCEPT
Chi-squareWhich of the following is a continuous distribution?
classificationWhich of the following data mining techniques is predictive?
cluster analysisIt includes identifying groups of data records.
Cluster analysis_____________ includes identifying groups of data record.
collectingThe following processes are used in data analysis EXCEPT:
collecting dataWhich of the following is NOT a goal in data mining?
CollectionThe following are data mining techniques EXCEPT:
computational complexity theoryis an important part of a broader_____________.
confusion matrixThe classification table that XL Stat can display.
CorrelationIt refers to the degree of relationship between two variables?
Data
It
refers to well based theories  and sound
data analysisIt has the goal of discovering useful information to support decision making.
data analysisThe process of inspecting,cleansing,transforming and modelling data with the goal of discovering useful information.
data baseWhich is Not an interaction data?
Data MiningIt is a method for discovering patterns in large data sets.
Data miningThe goal is to transform raw data into understandable business information.
Data miningIt is used to discover patterns in large data sets
data visualization___________ uses artifacts to present data visually.
data visualizationIt makes complex data more understandable and usable.
data visualizationRefers to using tools of statistics to present data visually.
dataficationThe creation of data from varied sources and its qualification into information.
DataficationThe
datalogyEarlier name for data science.
disjointThe two sets If A={ 2,3} B={4,5} are said to be
Donald
Knuth
He
coined the term “analysis of algorithms”.
Eric SchmidtHe pointed out that until 2003 ,all of mankind had generated just 5 exabytes of data
ExpectedThe _______value is the weighted average of the value the random variable may assume.
FirthHe proposed the use of a penalized likehood function.
frameIt views the world in thinking of prototypical objects.
Georg
Cantor
He
said that “ In mathematics the art of proposing a question  must be held of higher value than solving
it”.
George  E. P. Box
All models are wrong but  some  are useful “
Flu trends
It
shows a high correlation between the incidence of flu and searches about flu on
google mapsWhat is a great example of data product?
graphWhich is NOT a basic representation technology?
Grouped frequency distributionA distribution where large distribution are displayed.
Have same sizes. Addition and subtraction of matrices only is possible if two are more matrices.
and subtraction of matrices only is possible if
two are more matrices.
hiddenThe constant multiplicative factor in which algorithms are related are_______ constants.
Higher
than the mean
A
positive z-score means that the score  is
Higher than the meanA positive z-score means that the score is
HypergeometricWhich of the following is a discrete distribution?
hypergeometricWhich of the  following  does NOT use continuous distribution?
I and iiWhich pair belongs to the same family of models called GLM? i) logistic    ii) linear regression    iii.) multinomial regression     iv)probability
imperfectAll representations are ________.
inferenceAny way to get new expressions from old ones.
Intelligent ReasoningIt is a variety of  formal calculation typically deduction.
interactionThe
explosion of _______data is the main reason why every 2 days 5 exabytes of data
are generated.
interactionA new phenomenon for the explosion of _________data
Internet of thingsIOT means
INTERNISTIt sees a set of prototypes in particular prototypical diseases to be matched against the case at hand.
invertible Matrix B is
invertibleMatrix
it adheres to the functionWhich is NOT a component of KR?
JavaWhat programming language is used in Rapid miner?
jointThe sets  A= { x/x is a distinct letter in the word "MATHEMATICS"} and B={x/x is a distinct letter in the word "STATISTICS"} , the two sets are
KnimeIt is a powerful tool that shows the network of data.
KnimePrimarily used for data pre-processing.
KnimeIt is  popular among financial data analysts.
Knowledge RepresentationIt is used to enable an entity to determine consequences by thinking rather than acting.
Knowledge RepresentationKR means __________________________.
likehoodTo estimate the parameters of the model ,the ________function is maximized.
logicIt involves a commitment in viewing the world in terms of individual entities and relations.
logisticWhich of the following belong to the GLM?
logistic regressionA frequently used method as it enables binary variables, sum polytomous variable to be modelled.
loopWhich of the following is NOT a module in rapid Miner?
MeanThe
score easily affected by extreme values is the _________.
MedianThe score NOT easily affected by extreme values.
MedianThe
score NOT  easily affected by
Medium for pragmatically diligent interpretationThe following are distinct roles that KR plays EXCEPT
Medium of human expressionIt is a language that we say things about the world.
ModeThe number that occurs most frequently is called________.
ModeThe
multimodalA
distribution with 4 modes is said to be a _________distribution.
multinomial logit modelIt corresponds to the case where the dependent variable has more than 2 categories.
network topologyA network purpoting to describe family memberships.
normalA bell-shaped distribution that is symmetric about a vertical line?
NormalThe most widely used continuous probability distribution.
normal distributionA  bell shaped curve that is symmetric about a vertical  line.
nullAnother term for an empty set.
null setThe intersection of the two sets A={ 2,3} B={4,5} is a
OneThe integral of all the values of a random variable in a probability density function is equal to______.
ontologicalKR is a set of __________commitments.
OrangeIt is a perfect software which is written in Python computing language.
orangeit is  a perfect software  for machine learning.
Pearson rWhich of the following is used as a method for Correlation?
Predictive
Analytics World
PAW
means____________.
Probability densityWhich function provides the value of a function at any particular value of x but does NOT directly give the probability of the random variable?
PROBITThe most common functions used to link probability to the explanatory variables are the LOGIT model and ________model.
profile likehoodIt does NOT require the assumption that the parameters are normally distributed.
profile likehoodThe method that does not require the assumption that parameters are normally distributed.
pythonWhat programming language doe Orange use?
PythonWhich of the following is NOT a data mining tool?
Q2=medianWhich
of the following statements is TRUE?
R-programmingIt is a free software programming language.
R-programmingWhich is primarily written in C and in Fortran?
Rapid miner_____________ is rated as the number one business analytics software.
reasoningIt is a process that goes on internally while most things it wishes about exists only externally.
RegressionWhich of the following pertains to predictive data mining technique?
regressionWhich of the following is a predictive data mining technique?
RegressionThe equation of the _______line predicts the value of Y given X.
ROCIt enables the performance of a model and enables a comparison to be made with other models.
rolesWhich is NOT a KR technology?
rule basedIt views the world in terms of attributes object value triples.
run time analysisIt is a theoretical classification that estimates and anticipates the increase increase in running time for algorithms.
sensitivityThe proportion of a well-defined classified positive events.
sequenceA special type of function where the domain is a  set of consecutive integers.
SociologyThe following provided inspirations of what constitutes intelligent reasoning EXCEPT
space
complexity
It
relates the length of an algorithm to the number of storage location it uses.
speakingThese are the data skills that a good data scientist need to cultivate EXCEPT
Spearman rhoThe method of correlation used for ranked score is ________.
SPSSThe following are softwares used in  data mining  EXCEPT
squareA matrix that has the same number of rows and columns is called
StandardThe normal distribution with a mean of 0 and standard deviation of 1.
Statistics AnalyticsWhich of the following is NOT a method used in data analysis?
studioIt is a module in rapid miner that considers the workflow.
studioIt is used for prototyping in Rapid miner.
surrogateKR as a _________is a substitute for the thing itself.
Text
mining
It
expands available data enormously since there is so much more text being
generated than numbers.
Text analyticsIt extracts meaningful numerical indices from information and make it available to statistical and
The correct answers are: Mean, Median, ModeWhich of the following is TRUE when a distribution is normal?
there
is no mode.
If  in a distribution all scores are distinct
then_____________.
time
complexity
It
relates the length of an algorithm’s input to the number of steps it takes.
Turing machineAn example of an abstract computer.
unstructuredWhich of the following type of text  is processed in text analytics?
unstructuredWhat type of text are processed in Text analytics?
veracityThe following are the 3V's of big data EXCEPT
WEKAIt is a collection of machine learning algorithms for data mining task.
William
Gibson
The
person who said that “ The future is not google-able”.
worst caseThe function describing the performance of an algorithm is usually an upper bound determined from ______inputs.
x increases y decreasesA negative correlation exists when___________.
Zynga IncorporatedThe developer of farmville, a famous game in the internet.
λThe symbol used to indicate strings with no elements.
λNull strings are indicated by