| Abstract: | The field of data mining has seen rapid strides over the past two decades, especially from
the perspective of the computer science community. While data analysis has been studied
extensively in the conventional field of probability and statistics, data mining is a term
coined by the computer science-oriented community. For computer scientists, issues such as
scalability, usability, and computational implementation are extremely important.
The emergence of data science as a discipline requires the development of a book that
goes beyond the traditional focus of books on only the fundamental data mining courses.The textbook assumes a basic knowledge of probability, statistics, and linear algebra,
which is taught in most undergraduate curricula of science and engineering disciplines.
Therefore, the book can also be used by industrial practitioners, who have a working knowl-
edge of these basic skills. While stronger mathematical background is helpful for the more
advanced chapters, it is not a prerequisite. Special chapters are also devoted to different
aspects of data mining, such as text data, time-series data, discrete sequences, and graphs.
This kind of specialized treatment is intended to capture the wide diversity of problem
domains in which a data mining problem might arise.
Recent years have seen the emergence of the job description of “data scientists,” who try to
glean knowledge from vast amounts of data. In typical applications, the data types are so
heterogeneous and diverse that the fundamental methods discussed for a multidimensional
data type may not be effective. Therefore, more emphasis needs to be placed on the different
data types and the applications that arise in the context of these different data types. A
comprehensive data mining book must explore the different aspects of data mining, starting
from the fundamentals, and then explore the complex data types, and their relationships
with the fundamental techniques. While fundamental techniques form an excellent basis
for the further study of data mining, they do not provide a complete picture of the true
complexity of data analysis. This book studies these advanced topics without compromis-
ing the presentation of fundamental methods. Therefore, this book may be used for both
introductory and advanced data mining courses. Until now, no single book has addressed
all these topics in a comprehensive and integrated way. |
| Description: | The book is written in a simple style to make it accessible to undergraduate students and
industrial practitioners with a limited mathematical background. Thus, the book will serve
both as an introductory text and as an advanced text for students, industrial practitioners,
and researchers.
Throughout this book, a vector or a multidimensional data point (including categorical
attributes), is annotated with a bar, such as X or y. A vector or multidimensional point
may be denoted by either small letters or capital letters, as long as it has a bar. Vector dot
products are denoted by centered dots, such as X · Y . A matrix is denoted in capital letters
without a bar, such as R. Throughout the book, the n×d data matrix is denoted by D, with
n points and d dimensions. The individual data points in D are therefore d-dimensional row
vectors. On the other hand, vectors with one component for each data point are usually
n-dimensional column vectors. An example is the n-dimensional column vector y of class
variables of n data points. |