### MATH6200 Data Analysis

This course reviews and expands upon core topics in probability and statistics through the study and practice of data analysis.

Data Analysis- is the process of inspecting,cleansing,transforming and modelling data with the goal of discovering useful information ,informing conclusions and supporting decision-making. It is the process of evaluating data using analytical and statistical tools to discover useful information and aid in business decision making.

Methods used:

1. Data Mining

2.Text analytics

4. Data Visualization

Data mining-is a method of data analysis for discovering patterns in large data sets using methods of statistics, artificial intelligence,machine learning and data bases. The goal is to transform raw data into understandable business information.These might include identifying groups of data records(known as cluster analysis) or identifying anomalies and dependencies between data groups.

Text Analytics-is the process of deriving useful information from text It is accomplished by processing unstructured textual information,extract meaningful numerical indices from the information and make the information available to statistical and machine learning algorithms for further processing.

Business Intelligence-transforms data into actionable intelligence for business purposes and maybe used in an organization's strategic and tactical business decision making. It offers a way for people to examine trends from collected data and derive insights from it.

Data Visualization- refers very simply to the visual representation of data. In the context of data analysis,it means using the tools of statistics,probability,pivot tables and other artifacts to present data visually. It makes complex data more understandable and usable.

Data Mining

7 most Important data mining techniques

1.Tracking pattern

2. Classification (predictive)

3. Association (descriptive)

4. Outlier detection

5.Clustering Desciptive0

6.Regression (predictive)

7. Prediction

Data Mining tools

1. Rapid Miner

2. Orange

3. Weka

4. Knime

5. R-programming

Rapid Miner is one of the the best predictive analysis system developed by the company with same name. It is written in JAVA programming language.It provides an integrated environment for deep learning,text mining,machine learning and predictive analysis.

Rapid Miner offers the server both on premise and in public/private cloud infrastructures. It has a client/server model as its base.It is rated as the number one business analytics software.

It consists of three modules :

1.Rapid miner studio-for workflow design ,prototyping

2.Rapid miner server-to operate predictive data models created in studio

3. Rapid miner Radoop-executes processes directly in Hadoop cluster to simplify predictive analysis.

ORANGE

It is a perfect software suit for machine learning and data mining. It best aids the data visualization and is a component based software. It has been written in Python computing language.

As it is a component-based software,the components of orange are called "widgets". These widgets range from data visualizationahmsnd pre-processing to an evaluationto an evaluation of algorithms

WEKA

It is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code.The tool is very sophisticated and used in many different applications including visualization and algorithms for data analysis and predictive modelling.

KNIME

Primarily used for data preprocessing-data extraction,transformation and loading. It is a powerful tool with GUi that shows the network of data nodes.Popular amongst financial data analysts.

R-programming

Its primarily written in C and in Fortran and a lot of its modules are written in R itself.It's a free software programming language and software environment for statistical computing and graphics. nonlinear modelling,classical statistical tests,time-series analysis,classification,clustering and others.