| Stryka, L., Chmelař, P.: Simplified Progressive Data Mining, In: Proceedings of the 16th International Conference on Systems Science, Wroclaw, PL, TUWR, 2007, p. 378-387, ISBN 978-83-7493-340-7 | | Publication language: | english |
|---|
| Original title: | Simplified Progressive Data Mining |
|---|
| Title (cs): | Zjednodušené progresivní dolování dat |
|---|
| Pages: | 378-387 |
|---|
| Proceedings: | Proceedings of the 16th International Conference on Systems Science |
|---|
| Conference: | XVI International Conference on Systems Science: ICSS 2007 |
|---|
| Place: | Wroclaw, PL |
|---|
| Year: | 2007 |
|---|
| ISBN: | 978-83-7493-340-7 |
|---|
| Publisher: | Wroclaw University of Technology |
|---|
| Keywords |
|---|
| On-line data mining,
concept hierarchy, frequent patterns, cover, obviosity |
| Annotation |
|---|
| There are huge amounts
of data stored in databases, but it is very difficult to make decisions based
on this data. We propose the OLAM SE system (Self Explaining On-Line Analytical
Mining) that is similar to the Han's OLAM [5] in the idea of interactive data
mining. The contribution is to simplify on-line analytical data mining to professionals,
who understand their data but want more significant, interesting and useful
information. It is done by shielding internal concepts (associations,
classifications, characterizations) and thresholds (supports, confidences) from
the user and by a simple graphical interface that suggests most relevant items.
OLAM SE determines minimum support value from
required cover of data with usage of entropy coding principle. This is
automatically applied on the structure based on given conceptual hierarchy
where present. We also determine the maximum threshold to avoid explaining
knowledge that is obvious. Major part of data is thus described by frequent
patterns.
The presentation of results is realized using diagram
notation similar to UML. In fact, it is a visual graph which nodes are frequent
data sets presented as packages including sub packages - data concepts or
items. Edges represent links or patterns between them. These patterns can be progressively
explored by the user, who gets a detailed view of patterns which are attractive
to him. Other possibly interesting sets are offered to the user without any
other action. This is well suitable for characterization and descriptive classification
equivalent to normal Bayes. |
| BibTeX: |
|---|
@INPROCEEDINGS{
author = {Lukáš Stryka and Petr Chmelař},
title = {Simplified Progressive Data Mining},
pages = {378--387},
booktitle = {Proceedings of the 16th International Conference on Systems
Science},
year = {2007},
location = {Wroclaw, PL},
publisher = {Wroclaw University of Technology},
ISBN = {978-83-7493-340-7},
language = {english},
url = {http://www.fit.vutbr.cz/research/view_pub.php?id=8454}
} |
|