Journal article

HLOSTA Martin, ZDRÁHAL Zdeněk and ZENDULKA Jaroslav. Are we meeting a deadline? classification goal achievement in time in the presence of imbalanced data. Knowledge-Based Systems. Amsterdam: Elsevier Science, 2018, vol. 2018, no. 160, pp. 278-295. ISSN 0950-7051. Available from: https://www.sciencedirect.com/science/article/pii/S0950705118303496
Publication language:english
Original title:Are we meeting a deadline? classification goal achievement in time in the presence of imbalanced data
Title (cs):Splníme termín? klasifikace dosažení cíle v čase při nevyvážených datech
Pages:278-295
Place:NL
Year:2018
URL:https://www.sciencedirect.com/science/article/pii/S0950705118303496
Journal:Knowledge-Based Systems, Vol. 2018, No. 160, Amsterdam, NL
ISSN:0950-7051
DOI:10.1016/j.knosys.2018.07.021
Keywords

Classification, imbalanced data, learning analytics, educational data mining
Annotation
This paper addresses the problem of a finite set of entities which are required to achieve a goal within a predefined deadline. For example, a group of students is supposed to submit a homework by a specified cutoff. Further, we are interested in predicting which entities will achieve the goal within the deadline. The predictive models are built based only on the data from that population. The predictions are computed at various time instants by taking into account updated data about the entities. The first contribution of the paper is a formal description of the problem. The important characteristic of the proposed method for model building is the use of the properties of entities that have already achieved the goal. We call such an approach "Self-Learning". Since typically only a few entities have achieved the goal at the beginning and their number gradually grows, the problem is inherently imbalanced. To mitigate the curse of imbalance, we improved the Self-Learning method by tackling information loss and by several sampling techniques. The original Self-Learning and the modifications have been evaluated in a case study for predicting submission of the first assessment in distance higher education courses. The results show that the proposed improvements outperform the specified two base-line models and the original Self-Learner, and also that the best results are achieved if domain-driven techniques are utilised to tackle the imbalance problem. We also showed that these improvements are statistically significant using Wilcoxon signed rank test.
BibTeX:
@ARTICLE{
   author = {Martin Hlosta and Zden{\v{e}}k Zdr{\'{a}}hal and
	Jaroslav Zendulka},
   title = {Are we meeting a deadline? classification goal
	achievement in time in the presence of imbalanced
	data},
   pages = {278--295},
   journal = {Knowledge-Based Systems},
   volume = {2018},
   number = {160},
   year = {2018},
   ISSN = {0950-7051},
   doi = {10.1016/j.knosys.2018.07.021},
   language = {english},
   url = {http://www.fit.vutbr.cz/research/view_pub.php.en.iso-8859-2?id=11826}
}

Your IPv4 address: 54.81.69.220