## Statistical Analysis with Software Application

### This course reviews and expands upon core topics in probability and statistics through the study and practice of data analysis.

0.206 | The area of the standard normal curve to the right of z=0.82 is _______. | |

1 | A perfect positive correlation coefficient is equal to | |

1 | What is the value of the standard deviation in a standard normal distribution? | |

1.02 | Which is NOT a value of r ? | |

7 | There are how many data mining techniques? | |

7 | What is value of quartile 3 in 2,4,4,4,5,5,6,8,9 ? | |

7 | In 2,4,4,4,5,5,6,8,9 the range is | |

9 | If the standard deviation of a distribution is 3, the variance is | |

9.38 | In the equation of the regression line represented by Y= 1.24 X + 6.9 if X=2 then Y =? | |

10 | A score of 50 lies 2 standard deviations above a mean of 30.What is the value of the standard deviation? | |

12.25 | If the standard deviation of a distribution is 3.5, the variance is | |

18 | In α =babaa β =a^6b^5bb, what is the length of the concatenation of the two strings? | |

48 | On an examination given to 1000 students, Jef’s score of 80 was higher than the score of 480 students who took the exam. What is the percentile for Jef’s score? | |

50 | What is the value of the mean in a normal probability density function? | |

51 | If there are 101 scores the median is equal to the _____ranked score. | |

84 | A survey of 100 consumers said that the price charged for a kilo of rice could be approximated by a normal distribution with a mean of 35 and a standard deviation of 4.How many are less than 39? | |

95 | What percent of data will lie within 2 standard deviation of the mean? | |

95 | What is the value of the mean if a score of 110 is 3 standard deviation above the mean? | |

95 | A survey of 100 consumers said that the price charged for a kilo of rice could be approximated by a normal distribution with a mean of 35 and a standard deviation of 4.How many of them lie between 27 and 43? | |

95 | Empirical rule for a normal distribution that is 2 standard deviations above and below the mean is ________% of data. | |

99.7 | Empirical rule for a normal distribution that is 3 standard deviations above and below the mean covers ______% of the data. | |

Area Under the Curve | AUC means___________. | |

{ (3,4) (3,5) (2,4 ) {2,5) } | If A={ 2,3} B={4,5},which of the following is a Cartesian product of the two sets? | |

{3,5,6,10,12} | The range in R={ (3,3), (3,6), (5,5),(5,10),(6.12)} is a binary relation in R is | |

{3,5,6} | If R= { (3,3), (3,6), (5,5),(5,10),(6.12)} is a binary relation in R which the domain is | |

{A,C,I,S,T} | If A= { x/x is a distinct letter in the word "MATHEMATICS"} AND B={x/x is a distinct letter in the word "STATISTICS"} then their intersection is | |

2x3 | The product of a 2x5 and 5x3 matrices is a ______matrix | |

48th | On an examination given to 1000 students, Jef’s score of 80 was higher than the score of 480 students who took the exam. What is the percentile for Jef’s score? | |

5 exabytes | How many bytes of data are generated every two days in today's world? | |

52nd | If there are 103 scores the median is equal to the _____ranked score. | |

5x 8 | What is the size of the product of a 5x 6 and a 6x 8 matrices? | |

5x 8 | What is the size of the product of a 5x 6 and a 6x 8 matrices? | |

A | Which of the matrices is singular? | |

A + B = B+ A | Which of the following is TRUE? | |

a and b | What conditions must be satisfied in the development of a probability function for a discrete random variable? | |

analysis of algorithms | It is a process of finding the computational complexity of algorithms. | |

Another term for text analytics. | Another term for text analytics. | |

ANOVA | The following are artifacts used in data analysis EXCEPT: | |

as x increases y also increases and vice versa | Positive correlation means that_______________. | |

Big beta notation | The following are large inputs EXCEPT | |

billion billion | Exabyte means ________bytes | |

bivariate | Data involving two variables. | |

bivariate | Data involving two variables are called _________data. | |

Business Intelligence | It transforms data into actionable intelligence for business purposes. | |

business intelligence | It is used in organization’s strategic and tactical business decision making. | |

Business Intelligence | It offers a way to examine trends from collected data and derive insights from it. | |

casualty | The following are abstract notions EXCEPT | |

Chi-square | Which of the following is a continuous distribution? | |

classification | Which of the following data mining techniques is predictive? | |

cluster analysis | It includes identifying groups of data records. | |

Cluster analysis | _____________ includes identifying groups of data record. | |

cluster analysis | It includes identifying groups of data records | |

collecting | The following processes are used in data analysis EXCEPT: | |

collecting data | Which of the following is NOT a goal in data mining? | |

Collection | The following are data mining techniques EXCEPT: | |

computational complexity theory | is an important part of a broader_____________. | |

confusion matrix | The classification table that XL Stat can display. | |

Correlation | It refers to the degree of relationship between two variables? | |

Data Science | It refers to well based theories and sound business judgement. | |

data analysis | It has the goal of discovering useful information to support decision making. | |

data analysis | The process of inspecting,cleansing,transforming and modelling data with the goal of discovering useful information. | |

data base | Which is Not an interaction data? | |

Data Mining | It is a method for discovering patterns in large data sets. | |

Data mining | The goal is to transform raw data into understandable business information. | |

Data mining | It is used to discover patterns in large data sets | |

data visualization | ___________ uses artifacts to present data visually. | |

data visualization | It makes complex data more understandable and usable. | |

data visualization | Refers to using tools of statistics to present data visually. | |

datafication | The creation of data from varied sources and its qualification into information. | |

Datafication | The creation of data from varied sources and its quantification into information. | |

datalogy | Earlier name for data science. | |

disjoint | The two sets If A={ 2,3} B={4,5} are said to be | |

Donald Knuth | He coined the term “analysis of algorithms”. | |

Eric Schmidt | He pointed out that until 2003 ,all of mankind had generated just 5 exabytes of data | |

Expected | The _______value is the weighted average of the value the random variable may assume. | |

Firth | He proposed the use of a penalized likehood function. | |

frame | It views the world in thinking of prototypical objects. | |

Georg Cantor | He said that “ In mathematics the art of proposing a question must be held of higher value than solving it”. | |

George E. P. Box | “ All models are wrong but some are useful “ | |

Google Flu trends | It shows a high correlation between the incidence of flu and searches about flu on google. | |

google maps | What is a great example of data product? | |

graph | Which is NOT a basic representation technology? | |

Grouped frequency distribution | A distribution where large distribution are displayed. | |

Have same sizes. | Addition and subtraction of matrices only is possible if two are more matrices. | |

hidden | The constant multiplicative factor in which algorithms are related are_______ constants. | |

Higher than the mean | A positive z-score means that the score is | |

Hypergeometric | Which of the following is a discrete distribution? | |

hypergeometric | Which of the following does NOT use continuous distribution? | |

I and ii | Which pair belongs to the same family of models called GLM? i) logistic ii) linear regression iii.) multinomial regression iv)probability | |

imperfect | All representations are ________. | |

inference | Any way to get new expressions from old ones. | |

Intelligent Reasoning | It is a variety of formal calculation typically deduction. | |

interaction | The explosion of _______data is the main reason why every 2 days 5 exabytes of data are generated. | |

interaction | A new phenomenon for the explosion of _________data | |

Internet of things | IOT means | |

INTERNIST | It sees a set of prototypes in particular prototypical diseases to be matched against the case at hand. | |

invertible | Matrix B is | |

it adheres to the function | Which is NOT a component of KR? | |

Java | What programming language is used in Rapid miner? | |

joint | The sets A= { x/x is a distinct letter in the word "MATHEMATICS"} and B={x/x is a distinct letter in the word "STATISTICS"} , the two sets are | |

Knime | It is a powerful tool that shows the network of data. | |

Knime | Primarily used for data pre-processing. | |

Knime | It is popular among financial data analysts. | |

Knowledge Representation | It is used to enable an entity to determine consequences by thinking rather than acting. | |

likehood | To estimate the parameters of the model ,the ________function is maximized. | |

logic | It involves a commitment in viewing the world in terms of individual entities and relations. | |

logistic | Which of the following belong to the GLM? | |

logistic regression | A frequently used method as it enables binary variables, sum polytomous variable to be modelled. | |

loop | Which of the following is NOT a module in rapid Miner? | |

Mean | The score easily affected by extreme values is the _________. | |

Median | The score NOT easily affected by extreme values. | |

Medium for pragmatically diligent interpretation | The following are distinct roles that KR plays EXCEPT | |

Medium of human expression | It is a language that we say things about the world. | |

Medium of human expression. | It is a language that we say things about the world. | |

Mode | The number that occurs most frequently is called________. | |

multimodal | A distribution with 4 modes is said to be a _________distribution. | |

multinomial logit model | It corresponds to the case where the dependent variable has more than 2 categories. | |

network topology | A network purpoting to describe family memberships. | |

normal | A bell-shaped distribution that is symmetric about a vertical line? | |

Normal | The most widely used continuous probability distribution. | |

normal distribution | A bell shaped curve that is symmetric about a vertical line. | |

null | Another term for an empty set. | |

null set | The intersection of the two sets A={ 2,3} B={4,5} is a | |

One | The integral of all the values of a random variable in a probability density function is equal to______. | |

ontological | KR is a set of __________commitments. | |

Orange | It is a perfect software which is written in Python computing language. | |

orange | it is a perfect software for machine learning. | |

Pearson r | Which of the following is used as a method for Correlation? | |

Predictive Analytics World | PAW means____________. | |

Probability density | Which function provides the value of a function at any particular value of x but does NOT directly give the probability of the random variable? | |

PROBIT | The most common functions used to link probability to the explanatory variables are the LOGIT model and ________model. | |

profile likehood | It does NOT require the assumption that the parameters are normally distributed. | |

profile likehood | The method that does not require the assumption that parameters are normally distributed. | |

python | What programming language doe Orange use? | |

Python | Which of the following is NOT a data mining tool? | |

Q2=median | Which of the following statements is TRUE? | |

R-programming | It is a free software programming language. | |

Rapid miner | _____________ is rated as the number one business analytics software. | |

reasoning | It is a process that goes on internally while most things it wishes about exists only externally. | |

Regression | Which of the following pertains to predictive data mining technique? | |

regression | Which of the following is a predictive data mining technique? | |

Regression | The equation of the _______line predicts the value of Y given X. | |

ROC | It enables the performance of a model and enables a comparison to be made with other models. | |

roles | Which is NOT a KR technology? | |

rule based | It views the world in terms of attributes object value triples. | |

run time analysis | It is a theoretical classification that estimates and anticipates the increase increase in running time for algorithms. | |

sensitivity | The proportion of a well-defined classified positive events. | |

sensitivity | The proportion of a well defined positive event is called _________________. | |

sequence | A special type of function where the domain is a set of consecutive integers. | |

Sociology | The following provided inspirations of what constitutes intelligent reasoning EXCEPT | |

space complexity | It relates the length of an algorithm to the number of storage location it uses. | |

speaking | These are the data skills that a good data scientist need to cultivate EXCEPT | |

Spearman rho | The method of correlation used for ranked score is ________. | |

SPSS | The following are softwares used in data mining EXCEPT | |

square | A matrix that has the same number of rows and columns is called | |

Standard | The normal distribution with a mean of 0 and standard deviation of 1. | |

Statistics Analytics | Which of the following is NOT a method used in data analysis? | |

studio | It is a module in rapid miner that considers the workflow. | |

studio | It is used for prototyping in Rapid miner. | |

surrogate | KR as a _________is a substitute for the thing itself. | |

Text mining | It expands available data enormously since there is so much more text being generated than numbers. | |

Text analytics | It extracts meaningful numerical indices from information and make it available to statistical and | |

Text Analytics | What is the process of deriving useful information from text? | |

The correct answers are: Mean, Median, Mode | Which of the following is TRUE when a distribution is normal? | |

there is no mode. | If in a distribution all scores are distinct then_____________. | |

time complexity | It relates the length of an algorithm’s input to the number of steps it takes. | |

Turing machine | An example of an abstract computer. | |

unstructured | Which of the following type of text is processed in text analytics? | |

unstructured | What type of text are processed in Text analytics? | |

veracity | The following are the 3V's of big data EXCEPT | |

WEKA | It is a collection of machine learning algorithms for data mining task. | |

William Gibson | The person who said that “ The future is not google-able”. | |

worst case | The function describing the performance of an algorithm is usually an upper bound determined from ______inputs. | |

x increases y decreases | A negative correlation exists when___________. | |

Zynga Incorporated | The developer of farmville, a famous game in the internet. | |

λ | The symbol used to indicate strings with no elements. | |

λ | Null strings are indicated by |