July 29, 2019

3186 words 15 mins read

Paper Group AWR 179

Language Bootstrapping: Learning Word Meanings From Perception-Action Association. Reconstructing Video from Interferometric Measurements of Time-Varying Sources. Adversarial Sets for Regularising Neural Link Predictors. SAM: Semantic Attribute Modulation for Language Modeling and Style Variation. OptNet: Differentiable Optimization as a Layer in N …

Language Bootstrapping: Learning Word Meanings From Perception-Action Association


Title	Language Bootstrapping: Learning Word Meanings From Perception-Action Association
Authors	Giampiero Salvi, Luis Montesano, Alexandre Bernardino, José Santos-Victor
Abstract	We address the problem of bootstrapping language acquisition for an artificial system similarly to what is observed in experiments with human infants. Our method works by associating meanings to words in manipulation tasks, as a robot interacts with objects and listens to verbal descriptions of the interactions. The model is based on an affordance network, i.e., a mapping between robot actions, robot perceptions, and the perceived effects of these actions upon objects. We extend the affordance model to incorporate spoken words, which allows us to ground the verbal symbols to the execution of actions and the perception of the environment. The model takes verbal descriptions of a task as the input and uses temporal co-occurrence to create links between speech utterances and the involved objects, actions, and effects. We show that the robot is able form useful word-to-meaning associations, even without considering grammatical structure in the learning process and in the presence of recognition errors. These word-to-meaning associations are embedded in the robot’s own understanding of its actions. Thus, they can be directly used to instruct the robot to perform tasks and also allow to incorporate context in the speech recognition task. We believe that the encouraging results with our approach may afford robots with a capacity to acquire language descriptors in their operation’s environment as well as to shed some light as to how this challenging process develops with human infants.
Tasks	Language Acquisition, Speech Recognition
Published	2017-11-27
URL	http://arxiv.org/abs/1711.09714v1
PDF	http://arxiv.org/pdf/1711.09714v1.pdf
PWC	https://paperswithcode.com/paper/language-bootstrapping-learning-word-meanings
Repo	https://github.com/giampierosalvi/AffordancesAndSpeech
Framework	none

Reconstructing Video from Interferometric Measurements of Time-Varying Sources


Title	Reconstructing Video from Interferometric Measurements of Time-Varying Sources
Authors	Katherine L. Bouman, Michael D. Johnson, Adrian V. Dalca, Andrew A. Chael, Freek Roelofs, Sheperd S. Doeleman, William T. Freeman
Abstract	Very long baseline interferometry (VLBI) makes it possible to recover images of astronomical sources with extremely high angular resolution. Most recently, the Event Horizon Telescope (EHT) has extended VLBI to short millimeter wavelengths with a goal of achieving angular resolution sufficient for imaging the event horizons of nearby supermassive black holes. VLBI provides measurements related to the underlying source image through a sparse set spatial frequencies. An image can then be recovered from these measurements by making assumptions about the underlying image. One of the most important assumptions made by conventional imaging methods is that over the course of a night’s observation the image is static. However, for quickly evolving sources, such as the galactic center’s supermassive black hole (Sgr A*) targeted by the EHT, this assumption is violated and these conventional imaging approaches fail. In this work we propose a new way to model VLBI measurements that allows us to recover both the appearance and dynamics of an evolving source by reconstructing a video rather than a static image. By modeling VLBI measurements using a Gaussian Markov Model, we are able to propagate information across observations in time to reconstruct a video, while simultaneously learning about the dynamics of the source’s emission region. We demonstrate our proposed Expectation-Maximization (EM) algorithm, StarWarps, on realistic synthetic observations of black holes, and show how it substantially improves results compared to conventional imaging algorithms. Additionally, we demonstrate StarWarps on real VLBI data of the M87 Jet from the VLBA.
Tasks	Image Imputation, Radio Interferometry
Published	2017-11-03
URL	http://arxiv.org/abs/1711.01357v2
PDF	http://arxiv.org/pdf/1711.01357v2.pdf
PWC	https://paperswithcode.com/paper/reconstructing-video-from-interferometric
Repo	https://github.com/achael/eht-imaging
Framework	none

Adversarial Sets for Regularising Neural Link Predictors


Title	Adversarial Sets for Regularising Neural Link Predictors
Authors	Pasquale Minervini, Thomas Demeester, Tim Rocktäschel, Sebastian Riedel
Abstract	In adversarial training, a set of models learn together by pursuing competing goals, usually defined on single data instances. However, in relational learning and other non-i.i.d domains, goals can also be defined over sets of instances. For example, a link predictor for the is-a relation needs to be consistent with the transitivity property: if is-a(x_1, x_2) and is-a(x_2, x_3) hold, is-a(x_1, x_3) needs to hold as well. Here we use such assumptions for deriving an inconsistency loss, measuring the degree to which the model violates the assumptions on an adversarially-generated set of examples. The training objective is defined as a minimax problem, where an adversary finds the most offending adversarial examples by maximising the inconsistency loss, and the model is trained by jointly minimising a supervised loss and the inconsistency loss on the adversarial examples. This yields the first method that can use function-free Horn clauses (as in Datalog) to regularise any neural link predictor, with complexity independent of the domain size. We show that for several link prediction models, the optimisation problem faced by the adversary has efficient closed-form solutions. Experiments on link prediction benchmarks indicate that given suitable prior knowledge, our method can significantly improve neural link predictors on all relevant metrics.
Tasks	Link Prediction, Relational Reasoning
Published	2017-07-24
URL	http://arxiv.org/abs/1707.07596v1
PDF	http://arxiv.org/pdf/1707.07596v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-sets-for-regularising-neural-link
Repo	https://github.com/uclmr/inferbeddings
Framework	none

SAM: Semantic Attribute Modulation for Language Modeling and Style Variation


Title	SAM: Semantic Attribute Modulation for Language Modeling and Style Variation
Authors	Wenbo Hu, Lifeng Hua, Lei Li, Hang Su, Tian Wang, Ning Chen, Bo Zhang
Abstract	This paper presents a Semantic Attribute Modulation (SAM) for language modeling and style variation. The semantic attribute modulation includes various document attributes, such as titles, authors, and document categories. We consider two types of attributes, (title attributes and category attributes), and a flexible attribute selection scheme by automatically scoring them via an attribute attention mechanism. The semantic attributes are embedded into the hidden semantic space as the generation inputs. With the attributes properly harnessed, our proposed SAM can generate interpretable texts with regard to the input attributes. Qualitative analysis, including word semantic analysis and attention values, shows the interpretability of SAM. On several typical text datasets, we empirically demonstrate the superiority of the Semantic Attribute Modulated language model with different combinations of document attributes. Moreover, we present a style variation for the lyric generation using SAM, which shows a strong connection between the style variation and the semantic attributes.
Tasks	Language Modelling
Published	2017-07-01
URL	http://arxiv.org/abs/1707.00117v3
PDF	http://arxiv.org/pdf/1707.00117v3.pdf
PWC	https://paperswithcode.com/paper/sam-semantic-attribute-modulation-for
Repo	https://github.com/yiyang92/sam_caption
Framework	tf

OptNet: Differentiable Optimization as a Layer in Neural Networks


Title	OptNet: Differentiable Optimization as a Layer in Neural Networks
Authors	Brandon Amos, J. Zico Kolter
Abstract	This paper presents OptNet, a network architecture that integrates optimization problems (here, specifically in the form of quadratic programs) as individual layers in larger end-to-end trainable deep networks. These layers encode constraints and complex dependencies between the hidden states that traditional convolutional and fully-connected layers often cannot capture. In this paper, we explore the foundations for such an architecture: we show how techniques from sensitivity analysis, bilevel optimization, and implicit differentiation can be used to exactly differentiate through these layers and with respect to layer parameters; we develop a highly efficient solver for these layers that exploits fast GPU-based batch solves within a primal-dual interior point method, and which provides backpropagation gradients with virtually no additional cost on top of the solve; and we highlight the application of these approaches in several problems. In one notable example, we show that the method is capable of learning to play mini-Sudoku (4x4) given just input and output games, with no a priori information about the rules of the game; this highlights the ability of our architecture to learn hard constraints better than other neural architectures.
Tasks	bilevel optimization
Published	2017-03-01
URL	https://arxiv.org/abs/1703.00443v4
PDF	https://arxiv.org/pdf/1703.00443v4.pdf
PWC	https://paperswithcode.com/paper/optnet-differentiable-optimization-as-a-layer
Repo	https://github.com/locuslab/e2e-model-learning
Framework	pytorch

Predict Responsibly: Improving Fairness and Accuracy by Learning to Defer


Title	Predict Responsibly: Improving Fairness and Accuracy by Learning to Defer
Authors	David Madras, Toniann Pitassi, Richard Zemel
Abstract	In many machine learning applications, there are multiple decision-makers involved, both automated and human. The interaction between these agents often goes unaddressed in algorithmic development. In this work, we explore a simple version of this interaction with a two-stage framework containing an automated model and an external decision-maker. The model can choose to say “Pass”, and pass the decision downstream, as explored in rejection learning. We extend this concept by proposing “learning to defer”, which generalizes rejection learning by considering the effect of other agents in the decision-making process. We propose a learning algorithm which accounts for potential biases held by external decision-makers in a system. Experiments demonstrate that learning to defer can make systems not only more accurate but also less biased. Even when working with inconsistent or biased users, we show that deferring models still greatly improve the accuracy and/or fairness of the entire system.
Tasks	Decision Making
Published	2017-11-17
URL	http://arxiv.org/abs/1711.06664v3
PDF	http://arxiv.org/pdf/1711.06664v3.pdf
PWC	https://paperswithcode.com/paper/predict-responsibly-improving-fairness-and
Repo	https://github.com/dmadras/predict-responsibly
Framework	tf

Generating Natural Adversarial Examples


Title	Generating Natural Adversarial Examples
Authors	Zhengli Zhao, Dheeru Dua, Sameer Singh
Abstract	Due to their complex nature, it is hard to characterize the ways in which machine learning models can misbehave or be exploited when deployed. Recent work on adversarial examples, i.e. inputs with minor perturbations that result in substantially different model predictions, is helpful in evaluating the robustness of these models by exposing the adversarial scenarios where they fail. However, these malicious perturbations are often unnatural, not semantically meaningful, and not applicable to complicated domains such as language. In this paper, we propose a framework to generate natural and legible adversarial examples that lie on the data manifold, by searching in semantic space of dense and continuous data representation, utilizing the recent advances in generative adversarial networks. We present generated adversaries to demonstrate the potential of the proposed approach for black-box classifiers for a wide range of applications such as image classification, textual entailment, and machine translation. We include experiments to show that the generated adversaries are natural, legible to humans, and useful in evaluating and analyzing black-box classifiers.
Tasks	Adversarial Attack, Image Classification, Machine Translation, Natural Language Inference
Published	2017-10-31
URL	http://arxiv.org/abs/1710.11342v2
PDF	http://arxiv.org/pdf/1710.11342v2.pdf
PWC	https://paperswithcode.com/paper/generating-natural-adversarial-examples
Repo	https://github.com/zhengliz/natural-adversary
Framework	tf

Multi-View Dynamic Facial Action Unit Detection


Title	Multi-View Dynamic Facial Action Unit Detection
Authors	Andres Romero, Juan Leon, Pablo Arbelaez
Abstract	We propose a novel convolutional neural network approach to address the fine-grained recognition problem of multi-view dynamic facial action unit detection. We leverage recent gains in large-scale object recognition by formulating the task of predicting the presence or absence of a specific action unit in a still image of a human face as holistic classification. We then explore the design space of our approach by considering both shared and independent representations for separate action units, and also different CNN architectures for combining color and motion information. We then move to the novel setup of the FERA 2017 Challenge, in which we propose a multi-view extension of our approach that operates by first predicting the viewpoint from which the video was taken, and then evaluating an ensemble of action unit detectors that were trained for that specific viewpoint. Our approach is holistic, efficient, and modular, since new action units can be easily included in the overall system. Our approach significantly outperforms the baseline of the FERA 2017 Challenge, with an absolute improvement of 14% on the F1-metric. Additionally, it compares favorably against the winner of the FERA 2017 challenge. Code source is available at https://github.com/BCV-Uniandes/AUNets.
Tasks	Action Unit Detection, Facial Action Unit Detection
Published	2017-04-25
URL	http://arxiv.org/abs/1704.07863v2
PDF	http://arxiv.org/pdf/1704.07863v2.pdf
PWC	https://paperswithcode.com/paper/multi-view-dynamic-facial-action-unit
Repo	https://github.com/BCV-Uniandes/AUNets
Framework	pytorch

Semi-supervised learning of hierarchical representations of molecules using neural message passing


Title	Semi-supervised learning of hierarchical representations of molecules using neural message passing
Authors	Hai Nguyen, Shin-ichi Maeda, Kenta Oono
Abstract	With the rapid increase of compound databases available in medicinal and material science, there is a growing need for learning representations of molecules in a semi-supervised manner. In this paper, we propose an unsupervised hierarchical feature extraction algorithm for molecules (or more generally, graph-structured objects with fixed number of types of nodes and edges), which is applicable to both unsupervised and semi-supervised tasks. Our method extends recently proposed Paragraph Vector algorithm and incorporates neural message passing to obtain hierarchical representations of subgraphs. We applied our method to an unsupervised task and demonstrated that it outperforms existing proposed methods in several benchmark datasets. We also experimentally showed that semi-supervised tasks enhanced predictive performance compared with supervised ones with labeled molecules only.
Tasks
Published	2017-11-28
URL	http://arxiv.org/abs/1711.10168v2
PDF	http://arxiv.org/pdf/1711.10168v2.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-learning-of-hierarchical
Repo	https://github.com/pfnet-research/hierarchical-molecular-learning
Framework	none

CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data


Title	CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data
Authors	Lukas Cavigelli, Philippe Degen, Luca Benini
Abstract	Extracting per-frame features using convolutional neural networks for real-time processing of video data is currently mainly performed on powerful GPU-accelerated workstations and compute clusters. However, there are many applications such as smart surveillance cameras that require or would benefit from on-site processing. To this end, we propose and evaluate a novel algorithm for change-based evaluation of CNNs for video data recorded with a static camera setting, exploiting the spatio-temporal sparsity of pixel changes. We achieve an average speed-up of 8.6x over a cuDNN baseline on a realistic benchmark with a negligible accuracy loss of less than 0.1% and no retraining of the network. The resulting energy efficiency is 10x higher than that of per-frame evaluation and reaches an equivalent of 328 GOp/s/W on the Tegra X1 platform.
Tasks
Published	2017-04-14
URL	http://arxiv.org/abs/1704.04313v2
PDF	http://arxiv.org/pdf/1704.04313v2.pdf
PWC	https://paperswithcode.com/paper/cbinfer-change-based-inference-for
Repo	https://github.com/lukasc-ch/CBinfer
Framework	pytorch

Group Affect Prediction Using Multimodal Distributions


Title	Group Affect Prediction Using Multimodal Distributions
Authors	Saqib Shamsi, Bhanu Pratap Singh Rawat, Manya Wadhwa
Abstract	We describe our approach towards building an efficient predictive model to detect emotions for a group of people in an image. We have proposed that training a Convolutional Neural Network (CNN) model on the emotion heatmaps extracted from the image, outperforms a CNN model trained entirely on the raw images. The comparison of the models have been done on a recently published dataset of Emotion Recognition in the Wild (EmotiW) challenge, 2017. The proposed method achieved validation accuracy of 55.23% which is 2.44% above the baseline accuracy, provided by the EmotiW organizers.
Tasks	Emotion Recognition
Published	2017-09-17
URL	http://arxiv.org/abs/1710.01216v2
PDF	http://arxiv.org/pdf/1710.01216v2.pdf
PWC	https://paperswithcode.com/paper/group-affect-prediction-using-multimodal
Repo	https://github.com/saqibns/cv-aal-2018
Framework	none

CORe50: a New Dataset and Benchmark for Continuous Object Recognition


Title	CORe50: a New Dataset and Benchmark for Continuous Object Recognition
Authors	Vincenzo Lomonaco, Davide Maltoni
Abstract	Continuous/Lifelong learning of high-dimensional data streams is a challenging research problem. In fact, fully retraining models each time new data become available is infeasible, due to computational and storage issues, while na"ive incremental strategies have been shown to suffer from catastrophic forgetting. In the context of real-world object recognition applications (e.g., robotic vision), where continuous learning is crucial, very few datasets and benchmarks are available to evaluate and compare emerging techniques. In this work we propose a new dataset and benchmark CORe50, specifically designed for continuous object recognition, and introduce baseline approaches for different continuous learning scenarios.
Tasks	Continuous Object Recognition, Object Recognition
Published	2017-05-09
URL	http://arxiv.org/abs/1705.03550v1
PDF	http://arxiv.org/pdf/1705.03550v1.pdf
PWC	https://paperswithcode.com/paper/core50-a-new-dataset-and-benchmark-for
Repo	https://github.com/vlomonaco/core50
Framework	none

Superpixel-based Semantic Segmentation Trained by Statistical Process Control


Title	Superpixel-based Semantic Segmentation Trained by Statistical Process Control
Authors	Hyojin Park, Jisoo Jeong, Youngjoon Yoo, Nojun Kwak
Abstract	Semantic segmentation, like other fields of computer vision, has seen a remarkable performance advance by the use of deep convolution neural networks. However, considering that neighboring pixels are heavily dependent on each other, both learning and testing of these methods have a lot of redundant operations. To resolve this problem, the proposed network is trained and tested with only 0.37% of total pixels by superpixel-based sampling and largely reduced the complexity of upsampling calculation. The hypercolumn feature maps are constructed by pyramid module in combination with the convolution layers of the base network. Since the proposed method uses a very small number of sampled pixels, the end-to-end learning of the entire network is difficult with a common learning rate for all the layers. In order to resolve this problem, the learning rate after sampling is controlled by statistical process control (SPC) of gradients in each layer. The proposed method performs better than or equal to the conventional methods that use much more samples on Pascal Context, SUN-RGBD dataset.
Tasks	Semantic Segmentation
Published	2017-06-30
URL	http://arxiv.org/abs/1706.10071v2
PDF	http://arxiv.org/pdf/1706.10071v2.pdf
PWC	https://paperswithcode.com/paper/superpixel-based-semantic-segmentation
Repo	https://github.com/HYOJINPARK/HP-SPS
Framework	caffe2

Mining fine-grained opinions on closed captions of YouTube videos with an attention-RNN


Title	Mining fine-grained opinions on closed captions of YouTube videos with an attention-RNN
Authors	Edison Marrese-Taylor, Jorge A. Balazs, Yutaka Matsuo
Abstract	Video reviews are the natural evolution of written product reviews. In this paper we target this phenomenon and introduce the first dataset created from closed captions of YouTube product review videos as well as a new attention-RNN model for aspect extraction and joint aspect extraction and sentiment classification. Our model provides state-of-the-art performance on aspect extraction without requiring the usage of hand-crafted features on the SemEval ABSA corpus, while it outperforms the baseline on the joint task. In our dataset, the attention-RNN model outperforms the baseline for both tasks, but we observe important performance drops for all models in comparison to SemEval. These results, as well as further experiments on domain adaptation for aspect extraction, suggest that differences between speech and written text, which have been discussed extensively in the literature, also extend to the domain of product reviews, where they are relevant for fine-grained opinion mining.
Tasks	Aspect Extraction, Domain Adaptation, Opinion Mining, Sentiment Analysis
Published	2017-08-08
URL	http://arxiv.org/abs/1708.02420v1
PDF	http://arxiv.org/pdf/1708.02420v1.pdf
PWC	https://paperswithcode.com/paper/mining-fine-grained-opinions-on-closed
Repo	https://github.com/epochx/opinatt
Framework	tf

Recurrent Relational Networks


Title	Recurrent Relational Networks
Authors	Rasmus Berg Palm, Ulrich Paquet, Ole Winther
Abstract	This paper is concerned with learning to solve tasks that require a chain of interdependent steps of relational inference, like answering complex questions about the relationships between objects, or solving puzzles where the smaller elements of a solution mutually constrain each other. We introduce the recurrent relational network, a general purpose module that operates on a graph representation of objects. As a generalization of Santoro et al. [2017]‘s relational network, it can augment any neural network model with the capacity to do many-step relational reasoning. We achieve state of the art results on the bAbI textual question-answering dataset with the recurrent relational network, consistently solving 20/20 tasks. As bAbI is not particularly challenging from a relational reasoning point of view, we introduce Pretty-CLEVR, a new diagnostic dataset for relational reasoning. In the Pretty-CLEVR set-up, we can vary the question to control for the number of relational reasoning steps that are required to obtain the answer. Using Pretty-CLEVR, we probe the limitations of multi-layer perceptrons, relational and recurrent relational networks. Finally, we show how recurrent relational networks can learn to solve Sudoku puzzles from supervised training data, a challenging task requiring upwards of 64 steps of relational reasoning. We achieve state-of-the-art results amongst comparable methods by solving 96.6% of the hardest Sudoku puzzles.
Tasks	Question Answering, Relational Reasoning
Published	2017-11-21
URL	http://arxiv.org/abs/1711.08028v4
PDF	http://arxiv.org/pdf/1711.08028v4.pdf
PWC	https://paperswithcode.com/paper/recurrent-relational-networks
Repo	https://github.com/rasmusbergpalm/recurrent-relational-networks
Framework	tf