1. Foreword

This article is intended to be a brief introduction to some of the knowledge points about data mining and machine learning. The whole article will start with a brief introduction to the types of machine learning, the workflow of machine learning, examples of data and types of attributes. I hope this article will be useful for some people who want to learn about machine learning and get started with data mining. This will be a concise article.

2. Four types of machine learning

Classification: Given a set of classified examples, learn to classify a new example
Association: finding any interesting combinations of associated properties between attributes or combinations of attributes
Clustering: grouping similar examples together
Numerical prediction: instead of classifying, predict a numerical value

3. Workflow of machine learning

Machine learning will start by acquiring the data, pre-processing it to ensure that it is clean, by purifying the data set and removing any data that is not useful. The processed data is then used to create a model and the created model is tested until the target conditions are met and then the model is deployed, which is a traditional machine learning process. Most machine learning workflows nowadays will have a step to optimise the model, by tweaking hyperparameters etc. Many articles on the internet are not written in a concise and easy to understand way, so here I have used a concise machine learning flow diagram that my teacher Ekaterina Komendantskaya has demonstrated as follows.

4. Description of the data

(1).Instance

Simple examples - data rows
Input to the learning programme = set of examples (data set)
Represented as a single relationship, or a flat file
Inputs are in a rather limited form
No relationships between instances

(2). Attributes

Each instance is described by a fixed set of predefined attributes
The number of attributes may vary
The existence of one attribute may depend on the value of another

Often, we are interested in predicting the value of a particular attribute. This is because it is determined by the values of other attributes. What we want to predict is called the class or the target.

Addendum: In general for two-dimensional datasets, each row is an instance and each column is an attribute.

5. Attribute value types

Four common types of attribute values will be described here.

(1). Nominal

Nominal comes from the Latin name
Values are different symbols
Values are only used as labels or names
There is no implied relationship between nominal values
No sorting or distance measurement
Only equality tests can be performed
Also known as categorical

For example, colour (red, yellow…) , Country (UK, USA…)

(2). Ordinal

The values are sequential
There is no definite distance between the values
Addition and subtraction cannot be used

For example, temperature can be expressed in terms of hot, fit and cold

(3). Interval

is ordered and also in fixed, equal units
Sum or product is meaningless
Zero is not defined

For example, temperature in degrees Fahrenheit, A.D. chronology

(4). Ratio

Defines a quantity with zero points.
Ratio is treated as a real number
All mathematical operations are allowed
Interval values can be subtracted to obtain a Ratio

For example, the distance between objects

Copyright Notice
This article is the original content of Junhao except the referenced content below, and the final interpretation right belongs to the original author. If there is any infringement, please contact to delete. Without my authorization, please do not reprint it privately.

6. Reference

[1]. Data Mining Book https://www.cs.waikato.ac.nz/ml/weka/book.html