1. Foreword
This article is intended to be a brief introduction to some of the knowledge points about data mining and machine learning. The whole article will start with a brief introduction to the types of machine learning, the workflow of machine learning, examples of data and types of attributes. I hope this article will be useful for some people who want to learn about machine learning and get started with data mining. This will be a concise article.
2. Four types of machine learning
- Classification: Given a set of classified examples, learn to classify a new example
- Association: finding any interesting combinations of associated properties between attributes or combinations of attributes
- Clustering: grouping similar examples together
- Numerical prediction: instead of classifying, predict a numerical value
3. Workflow of machine learning
Machine learning will start by acquiring the data, pre-processing it to ensure that it is clean, by purifying the data set and removing any data that is not useful. The processed data is then used to create a model and the created model is tested until the target conditions are met and then the model is deployed, which is a traditional machine learning process. Most machine learning workflows nowadays will have a step to optimise the model, by tweaking hyperparameters etc. Many articles on the internet are not written in a concise and easy to understand way, so here I have used a concise machine learning flow diagram that my teacher Ekaterina Komendantskaya has demonstrated as follows.
4. Description of the data
(1).Instance
- Simple examples - data rows
- Input to the learning programme = set of examples (data set)
- Represented as a single relationship, or a flat file
- Inputs are in a rather limited form
- No relationships between instances
(2). Attributes
- Each instance is described by a fixed set of predefined attributes
- The number of attributes may vary
- The existence of one attribute may depend on the value of another
Often, we are interested in predicting the value of a particular attribute. This is because it is determined by the values of other attributes. What we want to predict is called the class or the target.
Addendum: In general for two-dimensional datasets, each row is an instance and each column is an attribute.
5. Attribute value types
Four common types of attribute values will be described here.
(1). Nominal
- Nominal comes from the Latin name
- Values are different symbols
- Values are only used as labels or names
- There is no implied relationship between nominal values
- No sorting or distance measurement
- Only equality tests can be performed
- Also known as categorical
For example, colour (red, yellow…) , Country (UK, USA…)
(2). Ordinal
- The values are sequential
- There is no definite distance between the values
- Addition and subtraction cannot be used
For example, temperature can be expressed in terms of hot, fit and cold
(3). Interval
- is ordered and also in fixed, equal units
- Sum or product is meaningless
- Zero is not defined
For example, temperature in degrees Fahrenheit, A.D. chronology
(4). Ratio
- Defines a quantity with zero points.
- Ratio is treated as a real number
- All mathematical operations are allowed
- Interval values can be subtracted to obtain a Ratio
For example, the distance between objects
Copyright Notice
This article is the original content of Junhao except the referenced content below, and the final interpretation right belongs to the original author. If there is any infringement, please contact to delete. Without my authorization, please do not reprint it privately.
6. Reference
[1]. Data Mining Book https://www.cs.waikato.ac.nz/ml/weka/book.html