Short bio
Kazuya Takeda received his B.E.E., M.E.E., and Doctor of Engineering
degrees from Nagoya University, Nagoya Japan, in 1983, 1985, and 1994,
respectively. From 1986 to 1989 he was a researcher at Advanced
Telecommunication Research laboratories (ATR), Osaka Japan. His main
research interest at ATR was corpus based speech synthesis. He stayed
at MIT, Cambridge USA as a visiting Scientist from November 1987 to
April 1988. From 1989 to 1995, he has been a researcher and research
supervisor at KDD Research and Development Laboratories, Kamifukuoka,
Japan. He led a research project for voice-activated telephone
extension (VOATEX) system. He received technical achievement award and
Kiyoshi-Awaya award from Acoustical Society of Japan for the
development the VOATEX system. From 1995 to 2003, he was an associate
professor of the faculty of engineering. He also led an information
transformation research team at Center for Integrated Acoustic
Research (CIAIR). From 2003 he has been a professor at graduate school
of information science. His current research interest is media signal
processing and its applications include; spatial audio, robust speech
recognition, behavior modeling and interfaces.
He is an author of more than 80 journal papers, 6 books and more than
100 conference papers.
He is a member of Acoustic Society of Japan (ASJ), the Institute of
Electronics, Information, and Communication Engineers (IECE) Japan,
Information Processing Society Japan (IPSJ) and IEEE. He is an
executive board member of ASJ. He is an associate editor of IEICE
Transaction on System and Information. He is the chair of the Spoken
Language Processing technical group of IPSJ.
Research Topics
Audio Signal Processing
Spatial impression is one of the most important types of information
in audio signals, and not only transmitting, but also generating 3D
sound fields, has been a fundamental problem in the audio signal
processing. The goal of our group's Selective Listening Point (SLP)
audio system project is to realize an audio system that can generate a
sound field at a given location using sounds captured through distant
microphones. The current approach, i.e., combining the blind
separation of acoustic signals under real conditions and controlling
Head Related acoustic Transfer Functions (HRTFs), can generate natural
directional sounds for anechoic space, whereas the perceptible
degradation found in highly reverberant rooms.
Speech Signal Processing
As speech is the most natural way for humans to communicate, spoken
language interfaces are considered to be the best modality for a wide
range of information systems. Particularly, in a vehicular
environment, where hands and/or eyes are not free for operating
information devices, the use of speech recognition technology is
appropriate. Since current speech recognition technology fully
utilizes statistical approaches, such as hidden Markov Models of
speech and the N-gram model of word sequences, the mismatch between
training and testing conditions cause serious degradation of system
performance. Therefore, noise reduction technology is a very important
issue for speech recognition to succeed in real environments.
Our group is studying speech enhancement technologies based on the
multiple-regression of spatially distributed microphones for in-car
applications. We have found that the method is very effective for low
SNRs, i.e. -10 dB, and highly non-stationary environments. By
extending the statistical modeling of the noise contamination process,
we also found an effective noise reduction method for moderate SNR
conditions. These methods' effectiveness is confirmed through speech
recognition experiments using the standard corpus. In addition to the
signal processing aspect of speech processing, we are investigating
language modeling, field tests of spoken dialogue systems, and
discrimination of spoken and song speech.
Human Behavior Signal Processing
The recent progress in sensor and communication technologies is making
possible long-term human sensing with wearable devices. Here, the
target area of signal processing covers a broad range of
human-observation signals, highlighting the growing importance of
human-behavior signal processing (HBSP).
As a pioneer group in the field of HBSP, we have been working on
modeling driving behaviors. We found that the statistical phase space
of driving, i.e. a joint probability of the head distance from the
preceding car and the speed of the car generally represents the
driving characteristics, and noticed that drivers' individuality can
be extracted through a Gaussian Mixture Model (GMM) of the phase
space. It was also found that the cepstrum analysis is an effective
method for deconvolving the driving action into human dynamics and the
command sequence. Showing that approximately 80% drivers can be
correctly identified by a cepstrum feature with dynamic parameters, we
confirmed the effectiveness of signal modeling of human
behavior. Currently, we are applying HBSP to the generation of driving
behavior.