Title:

Data Storage and Preparation

Code:UPA
Ac.Year:2019/2020
Sem:Winter
Curriculums:
ProgrammeField/
Specialization
YearDuty
MITAINADE1stCompulsory
MITAINBIO1stCompulsory
MITAINCPS1stCompulsory
MITAINEMB-Compulsory
MITAINGRI-Compulsory
MITAINHPC-Compulsory
MITAINIDE1stCompulsory
MITAINISD1stCompulsory
MITAINISY-Compulsory
MITAINMAL1stCompulsory
MITAINMAT-Compulsory
MITAINNET1stCompulsory
MITAINSEC-Compulsory
MITAINSEN1stCompulsory
MITAINSPE1stCompulsory
MITAINVER-Compulsory
MITAINVIZ1stCompulsory
Language of Instruction:Czech
Credits:5
Completion:examination (written)
Type of
instruction:
Hour/semLecturesSeminar
Exercises
Laboratory
Exercises
Computer
Exercises
Other
Hours:2660614
 ExamsTestsExercisesLaboratoriesOther
Points:60200020
Guarantor:Zendulka Jaroslav, doc. Ing., CSc. (DIFS)
Deputy guarantor:Rychlý Marek, RNDr., Ph.D. (DIFS)
Lecturer:Kolář Dušan, doc. Dr. Ing. (DIFS)
Rychlý Marek, RNDr., Ph.D. (DIFS)
Zendulka Jaroslav, doc. Ing., CSc. (DIFS)
Instructor:Burgetová Ivana, Ing., Ph.D. (DIFS)
Faculty:Faculty of Information Technology BUT
Department:Department of Information Systems FIT BUT
Schedule:
DayLessonWeekRoomStartEndLect.Gr.Groups
TuelecturelecturesE104 E105 E112 08:0009:501MIT 2MIT xx
 
Learning objectives:
  The aim of the course is to explain fundamental data classification and classification of data resources, to give deeper insight on selected database systems (object-relational, spatial, NoSQL, XML, and multimedia) and efficient data manipulation, to provide core insight and particular steps on the process of data mining and knowledge discovery with concentration on data pre-processing and exploratory analysis.
Description:
  The course introduces fundamental data classification from the viewpoint of data mining and knowledge discovery. It also provides insight on selected modern database systems and particular topics are studied in deep manner --- there are presented object-relational databases, spatial databases (including issues connected with spatial data storage and indexing), NoSQL databases, XML databases, and multimedia databases. Moreover, advanced queries on relational databases are discussed too. Next, it is explained a process of data mining and knowledge discovery and particular steps of this process. The explanations is focused on typical tasks performed in data pre-processing before ongoing extraction of potentially useful knowledge from data. The process of data mining and knowledge discovery is presented on selected use-cases.
Knowledge and skills required for the course:
  Fundamental relational data model theory. Formal design of relational database. Data storage on internal level. Data safety and integrity. Transactions. Conceptual modeling and database design from conceptual model. SQL programming language. Fundaments of computer graphics. Fundaments of computational geometry. Object paradigm. Fundaments of statistics and probability.
Subject specific learning outcomes and competencies:
  Students will be able to classify data from data mining and knowledge discovery viewpoint, store and manipulate data in suitable database systems, quickly search for required data, inspect data features and prepare data for consecutive knowledge extraction.
Generic learning outcomes and competencies:
  - Student can better perform in data manipulation in various situations
- Student improves in participation on a small project as a member of a small team
Why is the course taught:
  The aim of this course is to demonstrate how to work with complex data around us, how to store such data, how to get oriented in such data, obtain useful characteristics from such data, and how to prepare such data for extraction of hidden information/knowledge by application of machine learning methods and other advanced analytical methods.
Syllabus of lectures:
 
  1. Introduction: course contents, data characteristics, introduction to data mining and knowledge discovery, database technology development history recapitulation
  2. Object-relational DB, object-relational mapping, advanced SQL features.
  3. Spatial DB: spatial data storage and manipulation issues
  4. Spatial DB: possible solutions of spatial data storage
  5. Indexing in spatial DB I - points
  6. Indexing in spatial DB II - multi-dimensional objects
  7. Mid-term exam
  8. Multimedia and XML databases
  9. NoSQL databases
  10. Data mining and knowledge discovery process, data pre-processing in this process - data characteristics, exploratory data analysis
  11. Data pre-processing during data mining and knowledge discovery process - pre-processing methods
  12. Fundamental tasks in data mining and knowledge discovery, examples of corresponding methods
  13. Programming languages used for data mining and knowledge discovery, illustrative use-cases on data mining and knowledge discovery
Syllabus of numerical exercises:
 DEMO excercises
  1. Object-relational and spatial databases, data definition and manipulation, peculiarities
  2. Multimedia and XML databases, data indices
  3. NoSQL databases
Syllabus of computer exercises:
 
  1. Application binding to object-relational databases, application building in spatial databases
  2. Multimedia and XML databases, building and exploiting data indices
  3. NoSQL databases in applications
Syllabus - others, projects and individual work of students:
 
  1. Creation and feature demonstration of both structured and unstructured data processing, where data may be of various nature.
Fundamental literature:
 
  • Lemahieu, W., Broucke, S., Baesens, B.: Principles of Database Management. Cambridge University Press. 2018, 780 p.
  • Kim, W. (ed.): Modern Database Systems, ACM Press, 1995, ISBN 0-201-59098-0
  • Melton, J.: Advanced SQL: 1999 - Understanding Object-Relational and Other Advanced. Morgan Kaufmann, 2002, p. 562, ISBN 1-558-60677-7
  • Shekhar, S., Chawla, S.: Spatial Databases: A Tour, Prentice Hall, 2002/2003, p. 262, ISBN 0-13-017480-7
  • Dunckley, L.: Multimedia Databases: An Object-Relational Approach. Pearson Education, 2003, p. 464, ISBN 0-201-78899-3
  • Gaede, V., Günther, O.: Multidimensional Access Methods, ACM Computing Surveys, Vol. 30, No. 2, 1998, pp. 170-231. 
  • Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Third Edition. Morgan Kaufmann Publishers, 2012, p. 703, ISBN 978-0-12-381479-1
Study literature:
 
  • Lecture materials (slides, scripts, etc.)
  • Lemahieu, W., Broucke, S., Baesens, B.: Principles of Database Management. Cambridge University Press. 2018, 780 p.
  • Kim, W. (ed.): Modern Database Systems, ACM Press, 1995, ISBN 0-201-59098-0
  • Melton, J.: Advanced SQL: 1999 - Understanding Object-Relational and Other Advanced. Morgan Kaufmann, 2002, p. 562, ISBN 1-558-60677-7
  • Shekhar, S., Chawla, S.: Spatial Databases: A Tour, Prentice Hall, 2002/2003, p. 262, ISBN 0-13-017480-7
  • Dunckley, L.: Multimedia Databases: An Object-Relational Approach. Pearson Education, 2003, p. 464, ISBN 0-201-78899-3
  • Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Third Edition. Morgan Kaufmann Publishers, 2012, p. 703, ISBN 978-0-12-381479-1
Controlled instruction:
  
  • Mid-term exam - written form, questions, where answers are given in full sentences, no possibility to have a second/alternative trial. (20 points)
  • Projects realization - 1 project (program development according to a given specification) with appropriate documentation. (20 points)
  • Final exam is performed in written form. Students are given questions, where answers are provided in full sentences. The maximal amount of points one can get is 60 points - the minimal number of points which must be obtained from the final exam is 25, otherwise, no points will be assigned to a student. The exam has one regular and two corrective periods. Regular period is always performed in fully written way only. Corrective periods can be performed either in fully written form or in a combined form (both written and verbal performance in a single day, written in the morning verbal in the afternoon). The form of corrective periods is announced as soon as the previous period is evaluated, while the combined form will be performed in the case when for the particular period is assigned no more than 16 students.
Progress assessment:
  
  • Mid-term exam, for which there is only one schedule and, thus, there is no possibility to have another trial.
  • One project should be solved and delivered in a given date during a term.
Exam prerequisites:
  At the end of a term, a student should have at least 50% of points that he or she could obtain during the term; that means at least 20 points out of 40.
Plagiarism and not allowed cooperation will cause that involved students are not classified and disciplinary action can be initiated.
 

Your IPv4 address: 35.153.135.60