Title:

Speech Signal Processing

Code:ZRE
Ac.Year:2019/2020
Sem:Summer
Curriculums:
ProgrammeField/
Specialization
YearDuty
IT-MSC-2MBI-Compulsory-Elective - group S
IT-MSC-2MBS-Elective
IT-MSC-2MGM1stCompulsory
IT-MSC-2MIN-Compulsory-Elective - group C
IT-MSC-2MIS-Elective
IT-MSC-2MMM-Elective
IT-MSC-2MPV-Compulsory-Elective - group G
IT-MSC-2MSK2ndCompulsory-Elective - group B
MITAINADE-Elective
MITAINBIO-Elective
MITAINCPS-Elective
MITAINEMB-Elective
MITAINGRI-Elective
MITAINHPC-Elective
MITAINIDE-Elective
MITAINISD-Elective
MITAINISY-Elective
MITAINMAL-Elective
MITAINMAT-Elective
MITAINNET-Elective
MITAINSEC-Elective
MITAINSEN-Elective
MITAINSPE-Compulsory
MITAINVER-Elective
MITAINVIZ-Elective
Language of Instruction:Czech
Public info:http://www.fit.vutbr.cz/study/courses/ZRE/public/
Credits:5
Completion:examination (written)
Type of
instruction:
Hour/semLecturesSeminar
Exercises
Laboratory
Exercises
Computer
Exercises
Other
Hours:26201212
 ExamsTestsExercisesLaboratoriesOther
Points:51140629
Guarantor:Černocký Jan, doc. Dr. Ing. (DCGM)
Deputy guarantor:Grézl František, Ing., Ph.D. (DCGM)
Lecturer:Černocký Jan, doc. Dr. Ing. (DCGM)
Instructor:Mošner Ladislav, Ing. (DCGM)
Žmolíková Kateřina, Ing. (DCGM)
Faculty:Faculty of Information Technology BUT
Department:Department of Computer Graphics and Multimedia FIT BUT
Follow-ups:
Speech Processing Systems (SRE), DCGM
Schedule:
DayLessonWeekRoomStartEndLect.Gr.Groups
MonlecturelecturesE105 16:0017:501MIT 2MIT MGM xx
 
Learning objectives:
  To provide students with the knowledge of basic characteristics of speech signal in relation to production and hearing of speech by humans. To describe basic algorithms of speech analysis common to many applications. To give an overview of applications (recognition, synthesis, coding) and to inform about practical aspects of speech algorithms implementation.
Description:
  Applications of speech processing, digital processing of speech signals, production and perception of speech, introduction to phonetics, pre-processing and basic parameters of speech, linear-predictive model, cepstrum, fundamental frequency estimation, coding - time domain and vocoders, recognition - DTW and HMM, synthesis. Software and libraries for speech processing.
Learning outcomes and competencies:
  The students will get familiar with basic characteristics of speech signal in relation to production and hearing of speech by humans. They will understand basic algorithms of speech analysis common to many applications. They will be given an overview of applications (recognition, synthesis, coding) and be informed about practical aspects of speech algorithms implementation. The students will be able to design a simple system for speech processing (speech activity detector, recognizer of limited number of isolated words), including its implementation into application programs.
Why is the course taught:
  Speech is the most common form of human communication and when we want to transmit information to others, we do mostly so by speaking. Automatic speech processing is necessary for processing of human-to-human communication (coding, or for example analysis of call center traffic), human-computer (voice command, voice search, authentication by voice) and computer-to-human (text to speech synthesis). The ZRE course will provide you with basic information on how speech is created, what usual features to represent it and what we do next. You will see that i tis an exciting combination of soft sciences", signal processing, machine learning, and other ingredients. Know-how acquired in ZRE is applicable also elsewhere - for example the problem of sequence recognition goes well beyond automatic speech recognition. The course is taught by members of the BUT Speech@FIT group, one of worlds prominent labs in speech data mining research.
Syllabus of lectures:
 
  1. Introduction, applications of speech processing. 
  2. Digital processing of speech signals.
  3. Speech production and its signal processing model. 
  4. Pre-processing and basic parameters of speech, cepstrum.
  5. Linear-predictive model. 
  6. Fundamental frequency estimation.
  7. Speech coding - basics
  8. CELP Speech coding. 
  9. Speech recognition - basics, DTW. 
  10. Hidden Markov models HMM. 
  11. Large vocabulary continuous speech recognition (LVCSR) systems. 
  12. Speaker and language recognition. Neural networks in speech processing. 
  13. Text to speech synthesis. 
Syllabus of numerical exercises:
 
  1. Parameterization, DTW, HMM.
Syllabus of computer exercises:
 
    Except the last one, Matlab is used in labs.
  1. Introduction. 
  2. Linear prediction and vector quantization. 
  3. Fundamental frequency estimation and speech coding. 
  4. Basics of classification. 
  5. Recognition - Dynamic time Warping (DTW).
  6. Recognition - hidden Markov models (HTK).
Fundamental literature:
 
  • Gold, B., Morgan, N.: Speech and Audio Signal Processing, Wiley-Interscience; 2 edition. 
  • Rabiner, L. R., & Schafer, R. W. Theory and applications of digital speech processing, Pearson, 2011. 
  • Psutka, J., Müller, L., Matoušek, J., & Radová, V., Mluvíme s počítačem česky, Academia, 2006.  
  • Yu, D., Deng, L., Automatic speech recognition, Springer, 2016.
Study literature:
 
  • Gold, B., Morgan, N.: Speech and Audio Signal Processing, Wiley-Interscience; 2 edition. 
  • Rabiner, L. R., & Schafer, R. W. Theory and applications of digital speech processing, Pearson, 2011. 
  • Psutka, J., Müller, L., Matoušek, J., & Radová, V., Mluvíme s počítačem česky, Academia, 2006.  
  • Yu, D., Deng, L., Automatic speech recognition, Springer, 2016.
Progress assessment:
  
  • mid-term test 14 pts
  • project 29 pts
  • presentation of results in computer labs 6 pts
 

Your IPv4 address: 54.198.246.164