July 29, 2019

3200 words 16 mins read

Paper Group ANR 53

The detector principle of constructing artificial neural networks as an alternative to the connectionist paradigm. Eyemotion: Classifying facial expressions in VR using eye-tracking cameras. Automatized Generation of Alphabets of Symbols. Efficient Eye Typing with 9-direction Gaze Estimation. Grounding Symbols in Multi-Modal Instructions. Learning …

The detector principle of constructing artificial neural networks as an alternative to the connectionist paradigm


Title	The detector principle of constructing artificial neural networks as an alternative to the connectionist paradigm
Authors	Yuri Parzhin
Abstract	Artificial neural networks (ANN) are inadequate to biological neural networks. This inadequacy is manifested in the use of the obsolete model of the neuron and the connectionist paradigm of constructing ANN. The result of this inadequacy is the existence of many shortcomings of the ANN and the problems of their practical implementation. The alternative principle of ANN construction is proposed in the article. This principle was called the detector principle. The basis of the detector principle is the consideration of the binding property of the input signals of a neuron. A new model of the neuron-detector, a new approach to teaching ANN - counter training and a new approach to the formation of the ANN architecture are used in this principle.
Tasks
Published	2017-07-12
URL	http://arxiv.org/abs/1707.03623v1
PDF	http://arxiv.org/pdf/1707.03623v1.pdf
PWC	https://paperswithcode.com/paper/the-detector-principle-of-constructing
Repo
Framework

Eyemotion: Classifying facial expressions in VR using eye-tracking cameras


Title	Eyemotion: Classifying facial expressions in VR using eye-tracking cameras
Authors	Steven Hickson, Nick Dufour, Avneesh Sud, Vivek Kwatra, Irfan Essa
Abstract	One of the main challenges of social interaction in virtual reality settings is that head-mounted displays occlude a large portion of the face, blocking facial expressions and thereby restricting social engagement cues among users. Hence, auxiliary means of sensing and conveying these expressions are needed. We present an algorithm to automatically infer expressions by analyzing only a partially occluded face while the user is engaged in a virtual reality experience. Specifically, we show that images of the user’s eyes captured from an IR gaze-tracking camera within a VR headset are sufficient to infer a select subset of facial expressions without the use of any fixed external camera. Using these inferences, we can generate dynamic avatars in real-time which function as an expressive surrogate for the user. We propose a novel data collection pipeline as well as a novel approach for increasing CNN accuracy via personalization. Our results show a mean accuracy of 74% ($F1$ of 0.73) among 5 `emotive’ expressions and a mean accuracy of 70% ($F1$ of 0.68) among 10 distinct facial action units, outperforming human raters. \|
Tasks	Eye Tracking
Published	2017-07-22
URL	http://arxiv.org/abs/1707.07204v2
PDF	http://arxiv.org/pdf/1707.07204v2.pdf
PWC	https://paperswithcode.com/paper/eyemotion-classifying-facial-expressions-in
Repo
Framework

Automatized Generation of Alphabets of Symbols


Title	Automatized Generation of Alphabets of Symbols
Authors	Serhii Hamotskyi, Anis Rojbi, Sergii Stirenko, Yuri Gordienko
Abstract	In this paper, we discuss the generation of symbols (and alphabets) based on specific user requirements (medium, priorities, type of information that needs to be conveyed). A framework for the generation of alphabets is proposed, and its use for the generation of a shorthand writing system is explored. We discuss the possible use of machine learning and genetic algorithms to gather inputs for generation of such alphabets and for optimization of already generated ones. The alphabets generated using such methods may be used in very different fields, from the creation of synthetic languages and constructed scripts to the creation of sensible commands for multimodal interaction through Human-Computer Interfaces, such as mouse gestures, touchpads, body gestures, eye-tracking cameras, and brain-computing Interfaces, especially in applications for elderly care and people with disabilities.
Tasks	Eye Tracking
Published	2017-07-16
URL	http://arxiv.org/abs/1707.04935v1
PDF	http://arxiv.org/pdf/1707.04935v1.pdf
PWC	https://paperswithcode.com/paper/automatized-generation-of-alphabets-of
Repo
Framework

Efficient Eye Typing with 9-direction Gaze Estimation


Title	Efficient Eye Typing with 9-direction Gaze Estimation
Authors	Chi Zhang, Rui Yao, Jinpeng Cai
Abstract	Vision based text entry systems aim to help disabled people achieve text communication using eye movement. Most previous methods have employed an existing eye tracker to predict gaze direction and design an input method based upon that. However, these methods can result in eye tracking quality becoming easily affected by various factors and lengthy amounts of time for calibration. Our paper presents a novel efficient gaze based text input method, which has the advantage of low cost and robustness. Users can type in words by looking at an on-screen keyboard and blinking. Rather than estimate gaze angles directly to track eyes, we introduce a method that divides the human gaze into nine directions. This method can effectively improve the accuracy of making a selection by gaze and blinks. We build a Convolutional Neural Network (CNN) model for 9-direction gaze estimation. On the basis of the 9-direction gaze, we use a nine-key T9 input method which is widely used in candy bar phones. Bar phones were very popular in the world decades ago and have cultivated strong user habits and language models. To train a robust gaze estimator, we created a large-scale dataset with images of eyes sourced from 25 people. According to the results from our experiments, our CNN model is able to accurately estimate different people’s gaze under various lighting conditions by different devices. In considering disable people’s needs, we removed the complex calibration process. The input methods can run in screen mode and portable off-screen mode. Moreover, The datasets used in our experiments are made available to the community to allow further experimentation.
Tasks	Calibration, Eye Tracking, Gaze Estimation
Published	2017-07-03
URL	http://arxiv.org/abs/1707.00548v1
PDF	http://arxiv.org/pdf/1707.00548v1.pdf
PWC	https://paperswithcode.com/paper/efficient-eye-typing-with-9-direction-gaze
Repo
Framework


Title	Grounding Symbols in Multi-Modal Instructions
Authors	Yordan Hristov, Svetlin Penkov, Alex Lascarides, Subramanian Ramamoorthy
Abstract	As robots begin to cohabit with humans in semi-structured environments, the need arises to understand instructions involving rich variability—for instance, learning to ground symbols in the physical world. Realistically, this task must cope with small datasets consisting of a particular users’ contextual assignment of meaning to terms. We present a method for processing a raw stream of cross-modal input—i.e., linguistic instructions, visual perception of a scene and a concurrent trace of 3D eye tracking fixations—to produce the segmentation of objects with a correspondent association to high-level concepts. To test our framework we present experiments in a table-top object manipulation scenario. Our results show our model learns the user’s notion of colour and shape from a small number of physical demonstrations, generalising to identifying physical referents for novel combinations of the words.
Tasks	Eye Tracking
Published	2017-06-01
URL	http://arxiv.org/abs/1706.00355v1
PDF	http://arxiv.org/pdf/1706.00355v1.pdf
PWC	https://paperswithcode.com/paper/grounding-symbols-in-multi-modal-instructions
Repo
Framework

Learning Graph-Structured Sum-Product Networks for Probabilistic Semantic Maps


Title	Learning Graph-Structured Sum-Product Networks for Probabilistic Semantic Maps
Authors	Kaiyu Zheng, Andrzej Pronobis, Rajesh P. N. Rao
Abstract	We introduce Graph-Structured Sum-Product Networks (GraphSPNs), a probabilistic approach to structured prediction for problems where dependencies between latent variables are expressed in terms of arbitrary, dynamic graphs. While many approaches to structured prediction place strict constraints on the interactions between inferred variables, many real-world problems can be only characterized using complex graph structures of varying size, often contaminated with noise when obtained from real data. Here, we focus on one such problem in the domain of robotics. We demonstrate how GraphSPNs can be used to bolster inference about semantic, conceptual place descriptions using noisy topological relations discovered by a robot exploring large-scale office spaces. Through experiments, we show that GraphSPNs consistently outperform the traditional approach based on undirected graphical models, successfully disambiguating information in global semantic maps built from uncertain, noisy local evidence. We further exploit the probabilistic nature of the model to infer marginal distributions over semantic descriptions of as yet unexplored places and detect spatial environment configurations that are novel and incongruent with the known evidence.
Tasks	Structured Prediction
Published	2017-09-24
URL	http://arxiv.org/abs/1709.08274v2
PDF	http://arxiv.org/pdf/1709.08274v2.pdf
PWC	https://paperswithcode.com/paper/learning-graph-structured-sum-product
Repo
Framework

Reward-Balancing for Statistical Spoken Dialogue Systems using Multi-objective Reinforcement Learning


Title	Reward-Balancing for Statistical Spoken Dialogue Systems using Multi-objective Reinforcement Learning
Authors	Stefan Ultes, Paweł Budzianowski, Iñigo Casanueva, Nikola Mrkšić, Lina Rojas-Barahona, Pei-Hao Su, Tsung-Hsien Wen, Milica Gašić, Steve Young
Abstract	Reinforcement learning is widely used for dialogue policy optimization where the reward function often consists of more than one component, e.g., the dialogue success and the dialogue length. In this work, we propose a structured method for finding a good balance between these components by searching for the optimal reward component weighting. To render this search feasible, we use multi-objective reinforcement learning to significantly reduce the number of training dialogues required. We apply our proposed method to find optimized component weights for six domains and compare them to a default baseline.
Tasks	Spoken Dialogue Systems
Published	2017-07-19
URL	http://arxiv.org/abs/1707.06299v1
PDF	http://arxiv.org/pdf/1707.06299v1.pdf
PWC	https://paperswithcode.com/paper/reward-balancing-for-statistical-spoken
Repo
Framework

Approximate Profile Maximum Likelihood


Title	Approximate Profile Maximum Likelihood
Authors	Dmitri S. Pavlichin, Jiantao Jiao, Tsachy Weissman
Abstract	We propose an efficient algorithm for approximate computation of the profile maximum likelihood (PML), a variant of maximum likelihood maximizing the probability of observing a sufficient statistic rather than the empirical sample. The PML has appealing theoretical properties, but is difficult to compute exactly. Inspired by observations gleaned from exactly solvable cases, we look for an approximate PML solution, which, intuitively, clumps comparably frequent symbols into one symbol. This amounts to lower-bounding a certain matrix permanent by summing over a subgroup of the symmetric group rather than the whole group during the computation. We extensively experiment with the approximate solution, and find the empirical performance of our approach is competitive and sometimes significantly better than state-of-the-art performance for various estimation problems.
Tasks
Published	2017-12-19
URL	http://arxiv.org/abs/1712.07177v1
PDF	http://arxiv.org/pdf/1712.07177v1.pdf
PWC	https://paperswithcode.com/paper/approximate-profile-maximum-likelihood
Repo
Framework

Learning Human Pose Models from Synthesized Data for Robust RGB-D Action Recognition


Title	Learning Human Pose Models from Synthesized Data for Robust RGB-D Action Recognition
Authors	Jian Liu, Naveed Akhtar, Ajmal Mian
Abstract	We propose Human Pose Models that represent RGB and depth images of human poses independent of clothing textures, backgrounds, lighting conditions, body shapes and camera viewpoints. Learning such universal models requires training images where all factors are varied for every human pose. Capturing such data is prohibitively expensive. Therefore, we develop a framework for synthesizing the training data. First, we learn representative human poses from a large corpus of real motion captured human skeleton data. Next, we fit synthetic 3D humans with different body shapes to each pose and render each from 180 camera viewpoints while randomly varying the clothing textures, background and lighting. Generative Adversarial Networks are employed to minimize the gap between synthetic and real image distributions. CNN models are then learned that transfer human poses to a shared high-level invariant space. The learned CNN models are then used as invariant feature extractors from real RGB and depth frames of human action videos and the temporal variations are modelled by Fourier Temporal Pyramid. Finally, linear SVM is used for classification. Experiments on three benchmark cross-view human action datasets show that our algorithm outperforms existing methods by significant margins for RGB only and RGB-D action recognition.
Tasks	Skeleton Based Action Recognition, Temporal Action Localization
Published	2017-07-04
URL	http://arxiv.org/abs/1707.00823v2
PDF	http://arxiv.org/pdf/1707.00823v2.pdf
PWC	https://paperswithcode.com/paper/learning-human-pose-models-from-synthesized
Repo
Framework

Relational Learning and Feature Extraction by Querying over Heterogeneous Information Networks


Title	Relational Learning and Feature Extraction by Querying over Heterogeneous Information Networks
Authors	Parisa Kordjamshidi, Sameer Singh, Daniel Khashabi, Christos Christodoulopoulos, Mark Summons, Saurabh Sinha, Dan Roth
Abstract	Many real world systems need to operate on heterogeneous information networks that consist of numerous interacting components of different types. Examples include systems that perform data analysis on biological information networks; social networks; and information extraction systems processing unstructured data to convert raw text to knowledge graphs. Many previous works describe specialized approaches to perform specific types of analysis, mining and learning on such networks. In this work, we propose a unified framework consisting of a data model -a graph with a first order schema along with a declarative language for constructing, querying and manipulating such networks in ways that facilitate relational and structured machine learning. In particular, we provide an initial prototype for a relational and graph traversal query language where queries are directly used as relational features for structured machine learning models. Feature extraction is performed by making declarative graph traversal queries. Learning and inference models can directly operate on this relational representation and augment it with new data and knowledge that, in turn, is integrated seamlessly into the relational structure to support new predictions. We demonstrate this system’s capabilities by showcasing tasks in natural language processing and computational biology domains.
Tasks	Knowledge Graphs, Relational Reasoning
Published	2017-07-25
URL	http://arxiv.org/abs/1707.07794v1
PDF	http://arxiv.org/pdf/1707.07794v1.pdf
PWC	https://paperswithcode.com/paper/relational-learning-and-feature-extraction-by
Repo
Framework

Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition


Title	Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition
Authors	Hong Liu, Juanhui Tu, Mengyuan Liu
Abstract	It remains a challenge to efficiently extract spatialtemporal information from skeleton sequences for 3D human action recognition. Although most recent action recognition methods are based on Recurrent Neural Networks which present outstanding performance, one of the shortcomings of these methods is the tendency to overemphasize the temporal information. Since 3D convolutional neural network(3D CNN) is a powerful tool to simultaneously learn features from both spatial and temporal dimensions through capturing the correlations between three dimensional signals, this paper proposes a novel two-stream model using 3D CNN. To our best knowledge, this is the first application of 3D CNN in skeleton-based action recognition. Our method consists of three stages. First, skeleton joints are mapped into a 3D coordinate space and then encoding the spatial and temporal information, respectively. Second, 3D CNN models are seperately adopted to extract deep features from two streams. Third, to enhance the ability of deep features to capture global relationships, we extend every stream into multitemporal version. Extensive experiments on the SmartHome dataset and the large-scale NTU RGB-D dataset demonstrate that our method outperforms most of RNN-based methods, which verify the complementary property between spatial and temporal information and the robustness to noise.
Tasks	3D Human Action Recognition, Skeleton Based Action Recognition, Temporal Action Localization
Published	2017-05-23
URL	http://arxiv.org/abs/1705.08106v2
PDF	http://arxiv.org/pdf/1705.08106v2.pdf
PWC	https://paperswithcode.com/paper/two-stream-3d-convolutional-neural-network
Repo
Framework

Composition by Conversation


Title	Composition by Conversation
Authors	Donya Quick, Clayton T. Morrison
Abstract	Most musical programming languages are developed purely for coding virtual instruments or algorithmic compositions. Although there has been some work in the domain of musical query languages for music information retrieval, there has been little attempt to unify the principles of musical programming and query languages with cognitive and natural language processing models that would facilitate the activity of composition by conversation. We present a prototype framework, called MusECI, that merges these domains, permitting score-level algorithmic composition in a text editor while also supporting connectivity to existing natural language processing frameworks.
Tasks	Information Retrieval, Music Information Retrieval
Published	2017-09-07
URL	http://arxiv.org/abs/1709.02076v1
PDF	http://arxiv.org/pdf/1709.02076v1.pdf
PWC	https://paperswithcode.com/paper/composition-by-conversation
Repo
Framework

Towards Automatic 3D Shape Instantiation for Deployed Stent Grafts: 2D Multiple-class and Class-imbalance Marker Segmentation with Equally-weighted Focal U-Net


Title	Towards Automatic 3D Shape Instantiation for Deployed Stent Grafts: 2D Multiple-class and Class-imbalance Marker Segmentation with Equally-weighted Focal U-Net
Authors	Xiao-Yun Zhou Celia Riga, Su-Lin Lee, Guang-Zhong Yang
Abstract	Robot-assisted Fenestrated Endovascular Aortic Repair (FEVAR) is currently navigated by 2D fluoroscopy which is insufficiently informative. Previously, a semi-automatic 3D shape instantiation method was developed to instantiate the 3D shape of a main, deployed, and fenestrated stent graft from a single fluoroscopy projection in real-time, which could help 3D FEVAR navigation and robotic path planning. This proposed semi-automatic method was based on the Robust Perspective-5-Point (RP5P) method, graft gap interpolation and semi-automatic multiple-class marker center determination. In this paper, an automatic 3D shape instantiation could be achieved by automatic multiple-class marker segmentation and hence automatic multiple-class marker center determination. Firstly, the markers were designed into five different shapes. Then, Equally-weighted Focal U-Net was proposed to segment the fluoroscopy projections of customized markers into five classes and hence to determine the marker centers. The proposed Equally-weighted Focal U-Net utilized U-Net as the network architecture, equally-weighted loss function for initial marker segmentation, and then equally-weighted focal loss function for improving the initial marker segmentation. This proposed network outperformed traditional Weighted U-Net on the class-imbalance segmentation in this paper with reducing one hyper-parameter - the weight. An overall mean Intersection over Union (mIoU) of 0.6943 was achieved on 78 testing images, where 81.01% markers were segmented with a center position error <1.6mm. Comparable accuracy of 3D shape instantiation was also achieved and stated. The data, trained models and TensorFlow codes are available on-line.
Tasks
Published	2017-11-04
URL	http://arxiv.org/abs/1711.01506v4
PDF	http://arxiv.org/pdf/1711.01506v4.pdf
PWC	https://paperswithcode.com/paper/towards-automatic-3d-shape-instantiation-for
Repo
Framework

Building Usage Profiles Using Deep Neural Nets


Title	Building Usage Profiles Using Deep Neural Nets
Authors	Domenic Curro, Konstantinos G. Derpanis, Andriy V. Miranskyy
Abstract	To improve software quality, one needs to build test scenarios resembling the usage of a software product in the field. This task is rendered challenging when a product’s customer base is large and diverse. In this scenario, existing profiling approaches, such as operational profiling, are difficult to apply. In this work, we consider publicly available video tutorials of a product to profile usage. Our goal is to construct an automatic approach to extract information about user actions from instructional videos. To achieve this goal, we use a Deep Convolutional Neural Network (DCNN) to recognize user actions. Our pilot study shows that a DCNN trained to recognize user actions in video can classify five different actions in a collection of 236 publicly available Microsoft Word tutorial videos (published on YouTube). In our empirical evaluation we report a mean average precision of 94.42% across all actions. This study demonstrates the efficacy of DCNN-based methods for extracting software usage information from videos. Moreover, this approach may aid in other software engineering activities that require information about customer usage of a product.
Tasks
Published	2017-02-23
URL	http://arxiv.org/abs/1702.07424v1
PDF	http://arxiv.org/pdf/1702.07424v1.pdf
PWC	https://paperswithcode.com/paper/building-usage-profiles-using-deep-neural
Repo
Framework

Empathy in Bimatrix Games


Title	Empathy in Bimatrix Games
Authors	Brian Powers, Michalis Smyrnakis, Hamidou Tembine
Abstract	Although the definition of what empathetic preferences exactly are is still evolving, there is a general consensus in the psychology, science and engineering communities that the evolution toward players’ behaviors in interactive decision-making problems will be accompanied by the exploitation of their empathy, sympathy, compassion, antipathy, spitefulness, selfishness, altruism, and self-abnegating states in the payoffs. In this article, we study one-shot bimatrix games from a psychological game theory viewpoint. A new empathetic payoff model is calculated to fit empirical observations and both pure and mixed equilibria are investigated. For a realized empathy structure, the bimatrix game is categorized among four generic class of games. Number of interesting results are derived. A notable level of involvement can be observed in the empathetic one-shot game compared the non-empathetic one and this holds even for games with dominated strategies. Partial altruism can help in breaking symmetry, in reducing payoff-inequality and in selecting social welfare and more efficient outcomes. By contrast, partial spite and self-abnegating may worsen payoff equity. Empathetic evolutionary game dynamics are introduced to capture the resulting empathetic evolutionarily stable strategies under wide range of revision protocols including Brown-von Neumann-Nash, Smith, imitation, replicator, and hybrid dynamics. Finally, mutual support and Berge solution are investigated and their connection with empathetic preferences are established. We show that pure altruism is logically inconsistent, only by balancing it with some partial selfishness does it create a consistent psychology.
Tasks	Decision Making
Published	2017-08-06
URL	http://arxiv.org/abs/1708.01910v1
PDF	http://arxiv.org/pdf/1708.01910v1.pdf
PWC	https://paperswithcode.com/paper/empathy-in-bimatrix-games
Repo
Framework