HAI2017 Workshop on “Representation Learning for Human and Robot Cognition”

Information | Speakers | Program | Call for papers | Organizers


Workshop date and place

This workshop will be held in HAI2017 conference.

Date October 17th, 2017
Place CITEC, Bielefeld Universiy, Germany

Aim and scope

Creating intelligent and interactive robots has been subject to extensive research studies. They are rapidly moving to the center of human environment so that they collaborate with human users in different applications, which requires high-level cognitive functions so as to allow them to understand and learn from human behavior. To this end, an important challenge that attracts much attention in cognitive science and artificial intelligence, is the “Symbol Emergence” problem, which investigates the bottom-up development of symbols through social interaction. This research line employs representation learning based models for understanding language and action in a developmentally plausible manner so as to make robots able to behave appropriately on their own. This could open the door to robots to understand syntactic formalisms and semantic references of human speech, and to associate language knowledge to perceptual knowledge so as to successfully collaborate with human users in space.

Another interesting approach to study representation learning is “Cognitive Mirroring”, which refers to artificial systems that could make cognitive processes observable, such as the models that could learn concepts of objects, actions, and/or emotions from humans through interaction. A key idea of this approach is that robots learn individual characteristics of human cognition rather than acquiring a general representation of cognition. In this way, the characteristics of human cognition become observable and can be measured as modifications in model parameters, which is difficult to verify through neuroscience studies only.

In this workshop, we invite researchers in artificial intelligence, cognitive science, cognitive robotics, and neuroscience to share their knowledge and research findings on representation learning, and to engage in cutting-edge discussions with other experienced researchers so as to help promoting this research line in the Human-Agent Interaction (HAI) community.

Invited speakers

Dr. Beata Joanna Grzyb
Assistant professor

Radboud University
Integration of perception, language and action in natural and artificial cognitive systems

Both children and robots face a difficult problem, that is, they have to learn how to interact with the surrounding world. An appropriate action on an object can be selected using direct perceptual cues (i.e., object affordances) or in-direct, semantic knowledge that includes knowledge of object typical functions or memories of past interactions. In adults and older children, the perception of an object and the organization of object-related actions are smoothly integrated. Young children, however, sometimes make serious attempts to perform impossible actions on miniature objects as if they were full-size objects. Examples include children’s attempts to sit in a tiny chair, put doll shoes on their own feet or get inside toy cars. These scale errors offer a unique example of how stored object representations can overpower immediate visual information in the undeveloped mind.
In this talk, I will first introduce our empirical studies on scale errors that provide strong evidence that these errors result from the growing influence of the conceptual and linguistic systems on the action selection processes. Then I will present our developmental model of action and name learning instantiated in a deep learning architecture that provides important insights into the underlying mechanism of scale errors in children. Finally, I will argue that the empirical lens of scale errors in children can inform cognitive robotics on the role that language plays in mechanisms of perception-action learning, and by identifying these mechanisms cognitive robotics, in turn, can inform developmental psychology by generating explicit computational hypothesis of how perception, language and action integrate in children’s learning.

Prof. Thomas Hermann
Head of the Ambient Intelligence group

Bielefeld University
Auditory Displays and Interactive Sonification for representation learning and communication

Sonification, the systematic and reproducible auditory display of data, enables users to query multiple auditory views of complex data. This talk will first give a general introduction to auditory display and sonification with a focus on auditory learning and interaction. This will come alive by showing selected application examples ranging from interactive sonification, (e.g. from sports activity), process monitoring to exploratory data analysis of high-dimensional data sets. The particular ability of sound to provide a shared resource to couple interacting users will be highlighted. Finally, EmoSonics, an approach to use sound to convey emotions will be presented with sound examples and an outlook to prospective medical applications will be given.

Prof. Tetsuya Ogata
Head of Laboratory for Intelligent Dynamics and Representation

Waseda University
End to End Approach for Behavior Generation and Language Understanding in Robot Systems

In this talk, I will present three topics of our end to end learning approach on deep neural network models, which enable robot systems to recognize the environment and to interact with human beings. The first topic is a multi-modal integration model of humanoid robot using time-delay deep auto-encoders. The proposed mechanism enables the small humanoid NAO to handle different objects by integrating the raw camera images, raw sound spectrums and motor joint angles without any dedicated feature extraction mechanism. By retrieving temporal sequences over the learnt different modalities, the robot can generate and predict the object manipulation behaviors or camera images from the current sensory-motor states. The second topic is an enhanced system of the multi-modal integration model. The model introduces the combination of a convolution neural model and a recurrent neural model which enable a humanoid robot NEXTAGE to manipulate the various objects including soft materials. The third topic is a linguistic communication model of the robot following a sequence-to-sequence manner with a recurrent neural model. In the proposed neural model after the network receives a verbal input, its internal state changes according to the first half of the attractors with branch structures corresponding to semantics. Then, the internal state shifts to the second half of the attractors for generating the appropriate behavior. The model achieves immediate and repeatable response to linguistic directions. The future problems for the robot application will be discussed in the end of this talk.

Prof. Erhan Oztop
Director of Robotics Laboratory

Ozyegin Universiy
Concept Inception

Symbols has been fundamental for human cultural evolution and the progress of science and technology. We not only use symbols to communicate but also make plans and build theories. In particular, Artificial Intelligence has classically used symbols to build computational systems to emulate human cognition or some function of it. From a neuroscientific point of view, whether the brain needs symbols to compute and plan is an open question. There are several important findings that suggest such symbol representations exists in the brain. In general, if we relax the meaning of a symbol, the neural representation of objects and effects of self and others’ actions can be considered as ‘brain symbols’ that are essential for neural computation. In particular, predictive mechanisms of the aforementioned concepts are at core of intelligent behavior. As such, cognitive roboticists thrive to endow robots with such representational power and predictive mechanisms. This can be classically done by designing the mechanisms by hand and expert programming. On the other hand, a powerful recent trend is to endow robots with learning mechanisms to allow them to organizes their own sensorimotor experiences into concepts and/or symbols, which can be used for planning and understanding of others’ actions. In general, robot learning refers to both learning of low level sensorimotor skills and high-level concepts. Recently, the human-in-the-robot learning has served as a successful method for obtaining sensorimotor skills -by transferring the human generated control policy to the robot. A novel direction to be explored now is to exploit the human-in-the-loop robot control framework to not only transfer sensorimotor skills, but also to incept the concepts of the human operator into the target robot.

Prof. Stefan Wermter
Head of Knowledge Technology Institute

University of Hamburg
Crossmodal Learning in Neural Robots

Neural learning approaches and human-robot interaction are addressed often in different research communities. For this talk we develop and describe neural network models with a particular focus on crossmodal learning for human-robot interaction. Our goal is to better understand grounded learned communication in humans and machines and to use the knowledge we gain to improve multisensory integration and social interaction in humanoid domestic robots. In the context of understanding grounded communication, we will present a deep neural network model for emotion expression recognition for human-robot interaction. Furthermore, we will present a model using self-organizing neural networks for human-action recognition in the context of human-robot assistance. We argue that neural learning of emotion perception, multisensory integration and observation of human actions are essential for future human-robot cooperation.

Workshop program

Morning session
09:00 Welcome and introduction
09:10-9:25 Introduction of CREST COLLABORATION Project (Takayuki Nagai)
09:25-10:00 Invited talk 1 (Prof. Stefan Wermter)
10:00-10:35 Invited talk 2 (Prof. Thomas Hermann)
10:35-11:00 Coffee break
11:00-11:35 Invited talk 3 (Prof. Tetsuya Ogata)
11:35-11:50 Oral presentation 1 (Tatsuro Yamada, “Representation Learning of Logical Words via Seq2seq Learning from Linguistic Instructions to Robot Actions”)
11:50-12:05 Oral presentation 2 (Takumi Kawasetsu, “Towards rich representation learning in a tactile domain: a flexible tactile sensor providing a vision-like feature based on the dual inductor”)
12:05-12:20 Oral presentation 3 (Wataru Hashiguchi, “Proposal and Evaluation of An Adaptive Agent for Stress Control Training using Multimodal Biological Signals”)
Lunch break (12:20 – 13:40)
Afternoon session
13:40-14:15 Invited talk 4 (Prof. Erhan Oztop)
14:15-14:50 Invited talk 5 (Dr. Beata Joanna Grzyb)
14:50-15:30 Coffee break
15:30-15:45 Oral presentation 4 (Niyati Rawal, “How does visual attention to face develop in infancy?: A computational account”)
15:45-16:00 Introduction of CREST Cognitive Mirroring Project (Yukie Nagai)
16:00-17:55 Discussion
17:55-18:00 Farewell and conclusion
19:00-21:00 Closed discussion (invitation only)

Call for paper


We invite interdisciplinary papers, which investigate architectures for representation learning in order to understand human cognitive processes as well as to realize advanced human-robot collaboration.

The paper (extended abstract) should be limited to 2-4 pages long (include references) in ACM SIGCHI format (check the link as well).

Please submit your paper by EasyChair (check the link as well)..

The paper should include:

    • key research questions,
    • background and related work,
    • research methodology,
    • results to date,
    • remaining or future work.

Topics of interest include, but are not limited to:

    • Computational model for high-level cognitive capabilities,
    • Predictive learning from sensorimotor information,
    • Multimodal interaction and concept formulation,
    • Human-robot communication and collaboration based on machine learning,
    • Learning supported by external trainers by demonstration and imitation,
    • Bayesian modeling,
    • Learning with hierarchical and deep architectures,
    • Interactive reinforcement learning.

This includes more targeted results that have implications for the broader human-agent interaction community:

    • human-robot interaction,
    • human-virtual agent interaction,
    • interaction with smart homes and smart cars,
    • distributed groupware where people have remote embodiments and representations,
    • Learning supported by external trainers by demonstration and imitation,
    • and more!


All accepted papers will be presented by spotlight talks and poster presentations, and some selected papers will be presented in the oral session.

Important Dates

    • 01 September 2017 30 September 2017: Deadline for paper submission
    • 15 September 2017 5 October 2017: Notification of acceptance
    • 30 September 20175 October 2017: Final camera-ready papers due
    • 17 October 2017: Workshop



    • Takato Horii (Osaka University, Japan)
    • Amir Aly (Ritsumeikan University, Japan)
    • Yukie Nagai (National Institute of Information and Communications Technology, Japan)
    • Takayuki Nagai (The University of Electro-Communications, Japan)

Program commitees:


This workshop is jointly organized by JST CREST projects titled “Cognitive Mirroring: Assisting people with developmental disorders by means of self-understanding and social sharing of cognitive processes” (Research director: Yukie Nagai) and “Symbol Emergence in Robotics for Future Human-Machine Collaboration” (Research director: Takayuki Nagai), which share the research goal of designing and understanding human and robot cognition.