Paper Group AWR 179
Language Bootstrapping: Learning Word Meanings From Perception-Action Association. Reconstructing Video from Interferometric Measurements of Time-Varying Sources. Adversarial Sets for Regularising Neural Link Predictors. SAM: Semantic Attribute Modulation for Language Modeling and Style Variation. OptNet: Differentiable Optimization as a Layer in N …
Language Bootstrapping: Learning Word Meanings From Perception-Action Association
Title | Language Bootstrapping: Learning Word Meanings From Perception-Action Association |
Authors | Giampiero Salvi, Luis Montesano, Alexandre Bernardino, José Santos-Victor |
Abstract | We address the problem of bootstrapping language acquisition for an artificial system similarly to what is observed in experiments with human infants. Our method works by associating meanings to words in manipulation tasks, as a robot interacts with objects and listens to verbal descriptions of the interactions. The model is based on an affordance network, i.e., a mapping between robot actions, robot perceptions, and the perceived effects of these actions upon objects. We extend the affordance model to incorporate spoken words, which allows us to ground the verbal symbols to the execution of actions and the perception of the environment. The model takes verbal descriptions of a task as the input and uses temporal co-occurrence to create links between speech utterances and the involved objects, actions, and effects. We show that the robot is able form useful word-to-meaning associations, even without considering grammatical structure in the learning process and in the presence of recognition errors. These word-to-meaning associations are embedded in the robot’s own understanding of its actions. Thus, they can be directly used to instruct the robot to perform tasks and also allow to incorporate context in the speech recognition task. We believe that the encouraging results with our approach may afford robots with a capacity to acquire language descriptors in their operation’s environment as well as to shed some light as to how this challenging process develops with human infants. |
Tasks | Language Acquisition, Speech Recognition |
Published | 2017-11-27 |
URL | http://arxiv.org/abs/1711.09714v1 |
http://arxiv.org/pdf/1711.09714v1.pdf | |
PWC | https://paperswithcode.com/paper/language-bootstrapping-learning-word-meanings |
Repo | https://github.com/giampierosalvi/AffordancesAndSpeech |
Framework | none |
Reconstructing Video from Interferometric Measurements of Time-Varying Sources
Title | Reconstructing Video from Interferometric Measurements of Time-Varying Sources |
Authors | Katherine L. Bouman, Michael D. Johnson, Adrian V. Dalca, Andrew A. Chael, Freek Roelofs, Sheperd S. Doeleman, William T. Freeman |
Abstract | Very long baseline interferometry (VLBI) makes it possible to recover images of astronomical sources with extremely high angular resolution. Most recently, the Event Horizon Telescope (EHT) has extended VLBI to short millimeter wavelengths with a goal of achieving angular resolution sufficient for imaging the event horizons of nearby supermassive black holes. VLBI provides measurements related to the underlying source image through a sparse set spatial frequencies. An image can then be recovered from these measurements by making assumptions about the underlying image. One of the most important assumptions made by conventional imaging methods is that over the course of a night’s observation the image is static. However, for quickly evolving sources, such as the galactic center’s supermassive black hole (Sgr A*) targeted by the EHT, this assumption is violated and these conventional imaging approaches fail. In this work we propose a new way to model VLBI measurements that allows us to recover both the appearance and dynamics of an evolving source by reconstructing a video rather than a static image. By modeling VLBI measurements using a Gaussian Markov Model, we are able to propagate information across observations in time to reconstruct a video, while simultaneously learning about the dynamics of the source’s emission region. We demonstrate our proposed Expectation-Maximization (EM) algorithm, StarWarps, on realistic synthetic observations of black holes, and show how it substantially improves results compared to conventional imaging algorithms. Additionally, we demonstrate StarWarps on real VLBI data of the M87 Jet from the VLBA. |
Tasks | Image Imputation, Radio Interferometry |
Published | 2017-11-03 |
URL | http://arxiv.org/abs/1711.01357v2 |
http://arxiv.org/pdf/1711.01357v2.pdf | |
PWC | https://paperswithcode.com/paper/reconstructing-video-from-interferometric |
Repo | https://github.com/achael/eht-imaging |
Framework | none |
Adversarial Sets for Regularising Neural Link Predictors
Title | Adversarial Sets for Regularising Neural Link Predictors |
Authors | Pasquale Minervini, Thomas Demeester, Tim Rocktäschel, Sebastian Riedel |
Abstract | In adversarial training, a set of models learn together by pursuing competing goals, usually defined on single data instances. However, in relational learning and other non-i.i.d domains, goals can also be defined over sets of instances. For example, a link predictor for the is-a relation needs to be consistent with the transitivity property: if is-a(x_1, x_2) and is-a(x_2, x_3) hold, is-a(x_1, x_3) needs to hold as well. Here we use such assumptions for deriving an inconsistency loss, measuring the degree to which the model violates the assumptions on an adversarially-generated set of examples. The training objective is defined as a minimax problem, where an adversary finds the most offending adversarial examples by maximising the inconsistency loss, and the model is trained by jointly minimising a supervised loss and the inconsistency loss on the adversarial examples. This yields the first method that can use function-free Horn clauses (as in Datalog) to regularise any neural link predictor, with complexity independent of the domain size. We show that for several link prediction models, the optimisation problem faced by the adversary has efficient closed-form solutions. Experiments on link prediction benchmarks indicate that given suitable prior knowledge, our method can significantly improve neural link predictors on all relevant metrics. |
Tasks | Link Prediction, Relational Reasoning |
Published | 2017-07-24 |
URL | http://arxiv.org/abs/1707.07596v1 |
http://arxiv.org/pdf/1707.07596v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-sets-for-regularising-neural-link |
Repo | https://github.com/uclmr/inferbeddings |
Framework | none |
SAM: Semantic Attribute Modulation for Language Modeling and Style Variation
Title | SAM: Semantic Attribute Modulation for Language Modeling and Style Variation |
Authors | Wenbo Hu, Lifeng Hua, Lei Li, Hang Su, Tian Wang, Ning Chen, Bo Zhang |
Abstract | This paper presents a Semantic Attribute Modulation (SAM) for language modeling and style variation. The semantic attribute modulation includes various document attributes, such as titles, authors, and document categories. We consider two types of attributes, (title attributes and category attributes), and a flexible attribute selection scheme by automatically scoring them via an attribute attention mechanism. The semantic attributes are embedded into the hidden semantic space as the generation inputs. With the attributes properly harnessed, our proposed SAM can generate interpretable texts with regard to the input attributes. Qualitative analysis, including word semantic analysis and attention values, shows the interpretability of SAM. On several typical text datasets, we empirically demonstrate the superiority of the Semantic Attribute Modulated language model with different combinations of document attributes. Moreover, we present a style variation for the lyric generation using SAM, which shows a strong connection between the style variation and the semantic attributes. |
Tasks | Language Modelling |
Published | 2017-07-01 |
URL | http://arxiv.org/abs/1707.00117v3 |
http://arxiv.org/pdf/1707.00117v3.pdf | |
PWC | https://paperswithcode.com/paper/sam-semantic-attribute-modulation-for |
Repo | https://github.com/yiyang92/sam_caption |
Framework | tf |
OptNet: Differentiable Optimization as a Layer in Neural Networks
Title | OptNet: Differentiable Optimization as a Layer in Neural Networks |
Authors | Brandon Amos, J. Zico Kolter |
Abstract | This paper presents OptNet, a network architecture that integrates optimization problems (here, specifically in the form of quadratic programs) as individual layers in larger end-to-end trainable deep networks. These layers encode constraints and complex dependencies between the hidden states that traditional convolutional and fully-connected layers often cannot capture. In this paper, we explore the foundations for such an architecture: we show how techniques from sensitivity analysis, bilevel optimization, and implicit differentiation can be used to exactly differentiate through these layers and with respect to layer parameters; we develop a highly efficient solver for these layers that exploits fast GPU-based batch solves within a primal-dual interior point method, and which provides backpropagation gradients with virtually no additional cost on top of the solve; and we highlight the application of these approaches in several problems. In one notable example, we show that the method is capable of learning to play mini-Sudoku (4x4) given just input and output games, with no a priori information about the rules of the game; this highlights the ability of our architecture to learn hard constraints better than other neural architectures. |
Tasks | bilevel optimization |
Published | 2017-03-01 |
URL | https://arxiv.org/abs/1703.00443v4 |
https://arxiv.org/pdf/1703.00443v4.pdf | |
PWC | https://paperswithcode.com/paper/optnet-differentiable-optimization-as-a-layer |
Repo | https://github.com/locuslab/e2e-model-learning |
Framework | pytorch |
Predict Responsibly: Improving Fairness and Accuracy by Learning to Defer
Title | Predict Responsibly: Improving Fairness and Accuracy by Learning to Defer |
Authors | David Madras, Toniann Pitassi, Richard Zemel |
Abstract | In many machine learning applications, there are multiple decision-makers involved, both automated and human. The interaction between these agents often goes unaddressed in algorithmic development. In this work, we explore a simple version of this interaction with a two-stage framework containing an automated model and an external decision-maker. The model can choose to say “Pass”, and pass the decision downstream, as explored in rejection learning. We extend this concept by proposing “learning to defer”, which generalizes rejection learning by considering the effect of other agents in the decision-making process. We propose a learning algorithm which accounts for potential biases held by external decision-makers in a system. Experiments demonstrate that learning to defer can make systems not only more accurate but also less biased. Even when working with inconsistent or biased users, we show that deferring models still greatly improve the accuracy and/or fairness of the entire system. |
Tasks | Decision Making |
Published | 2017-11-17 |
URL | http://arxiv.org/abs/1711.06664v3 |
http://arxiv.org/pdf/1711.06664v3.pdf | |
PWC | https://paperswithcode.com/paper/predict-responsibly-improving-fairness-and |
Repo | https://github.com/dmadras/predict-responsibly |
Framework | tf |
Generating Natural Adversarial Examples
Title | Generating Natural Adversarial Examples |
Authors | Zhengli Zhao, Dheeru Dua, Sameer Singh |
Abstract | Due to their complex nature, it is hard to characterize the ways in which machine learning models can misbehave or be exploited when deployed. Recent work on adversarial examples, i.e. inputs with minor perturbations that result in substantially different model predictions, is helpful in evaluating the robustness of these models by exposing the adversarial scenarios where they fail. However, these malicious perturbations are often unnatural, not semantically meaningful, and not applicable to complicated domains such as language. In this paper, we propose a framework to generate natural and legible adversarial examples that lie on the data manifold, by searching in semantic space of dense and continuous data representation, utilizing the recent advances in generative adversarial networks. We present generated adversaries to demonstrate the potential of the proposed approach for black-box classifiers for a wide range of applications such as image classification, textual entailment, and machine translation. We include experiments to show that the generated adversaries are natural, legible to humans, and useful in evaluating and analyzing black-box classifiers. |
Tasks | Adversarial Attack, Image Classification, Machine Translation, Natural Language Inference |
Published | 2017-10-31 |
URL | http://arxiv.org/abs/1710.11342v2 |
http://arxiv.org/pdf/1710.11342v2.pdf | |
PWC | https://paperswithcode.com/paper/generating-natural-adversarial-examples |
Repo | https://github.com/zhengliz/natural-adversary |
Framework | tf |
Multi-View Dynamic Facial Action Unit Detection
Title | Multi-View Dynamic Facial Action Unit Detection |
Authors | Andres Romero, Juan Leon, Pablo Arbelaez |
Abstract | We propose a novel convolutional neural network approach to address the fine-grained recognition problem of multi-view dynamic facial action unit detection. We leverage recent gains in large-scale object recognition by formulating the task of predicting the presence or absence of a specific action unit in a still image of a human face as holistic classification. We then explore the design space of our approach by considering both shared and independent representations for separate action units, and also different CNN architectures for combining color and motion information. We then move to the novel setup of the FERA 2017 Challenge, in which we propose a multi-view extension of our approach that operates by first predicting the viewpoint from which the video was taken, and then evaluating an ensemble of action unit detectors that were trained for that specific viewpoint. Our approach is holistic, efficient, and modular, since new action units can be easily included in the overall system. Our approach significantly outperforms the baseline of the FERA 2017 Challenge, with an absolute improvement of 14% on the F1-metric. Additionally, it compares favorably against the winner of the FERA 2017 challenge. Code source is available at https://github.com/BCV-Uniandes/AUNets. |
Tasks | Action Unit Detection, Facial Action Unit Detection |
Published | 2017-04-25 |
URL | http://arxiv.org/abs/1704.07863v2 |
http://arxiv.org/pdf/1704.07863v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-view-dynamic-facial-action-unit |
Repo | https://github.com/BCV-Uniandes/AUNets |
Framework | pytorch |
Semi-supervised learning of hierarchical representations of molecules using neural message passing
Title | Semi-supervised learning of hierarchical representations of molecules using neural message passing |
Authors | Hai Nguyen, Shin-ichi Maeda, Kenta Oono |
Abstract | With the rapid increase of compound databases available in medicinal and material science, there is a growing need for learning representations of molecules in a semi-supervised manner. In this paper, we propose an unsupervised hierarchical feature extraction algorithm for molecules (or more generally, graph-structured objects with fixed number of types of nodes and edges), which is applicable to both unsupervised and semi-supervised tasks. Our method extends recently proposed Paragraph Vector algorithm and incorporates neural message passing to obtain hierarchical representations of subgraphs. We applied our method to an unsupervised task and demonstrated that it outperforms existing proposed methods in several benchmark datasets. We also experimentally showed that semi-supervised tasks enhanced predictive performance compared with supervised ones with labeled molecules only. |
Tasks | |
Published | 2017-11-28 |
URL | http://arxiv.org/abs/1711.10168v2 |
http://arxiv.org/pdf/1711.10168v2.pdf | |
PWC | https://paperswithcode.com/paper/semi-supervised-learning-of-hierarchical |
Repo | https://github.com/pfnet-research/hierarchical-molecular-learning |
Framework | none |
CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data
Title | CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data |
Authors | Lukas Cavigelli, Philippe Degen, Luca Benini |
Abstract | Extracting per-frame features using convolutional neural networks for real-time processing of video data is currently mainly performed on powerful GPU-accelerated workstations and compute clusters. However, there are many applications such as smart surveillance cameras that require or would benefit from on-site processing. To this end, we propose and evaluate a novel algorithm for change-based evaluation of CNNs for video data recorded with a static camera setting, exploiting the spatio-temporal sparsity of pixel changes. We achieve an average speed-up of 8.6x over a cuDNN baseline on a realistic benchmark with a negligible accuracy loss of less than 0.1% and no retraining of the network. The resulting energy efficiency is 10x higher than that of per-frame evaluation and reaches an equivalent of 328 GOp/s/W on the Tegra X1 platform. |
Tasks | |
Published | 2017-04-14 |
URL | http://arxiv.org/abs/1704.04313v2 |
http://arxiv.org/pdf/1704.04313v2.pdf | |
PWC | https://paperswithcode.com/paper/cbinfer-change-based-inference-for |
Repo | https://github.com/lukasc-ch/CBinfer |
Framework | pytorch |
Group Affect Prediction Using Multimodal Distributions
Title | Group Affect Prediction Using Multimodal Distributions |
Authors | Saqib Shamsi, Bhanu Pratap Singh Rawat, Manya Wadhwa |
Abstract | We describe our approach towards building an efficient predictive model to detect emotions for a group of people in an image. We have proposed that training a Convolutional Neural Network (CNN) model on the emotion heatmaps extracted from the image, outperforms a CNN model trained entirely on the raw images. The comparison of the models have been done on a recently published dataset of Emotion Recognition in the Wild (EmotiW) challenge, 2017. The proposed method achieved validation accuracy of 55.23% which is 2.44% above the baseline accuracy, provided by the EmotiW organizers. |
Tasks | Emotion Recognition |
Published | 2017-09-17 |
URL | http://arxiv.org/abs/1710.01216v2 |
http://arxiv.org/pdf/1710.01216v2.pdf | |
PWC | https://paperswithcode.com/paper/group-affect-prediction-using-multimodal |
Repo | https://github.com/saqibns/cv-aal-2018 |
Framework | none |
CORe50: a New Dataset and Benchmark for Continuous Object Recognition
Title | CORe50: a New Dataset and Benchmark for Continuous Object Recognition |
Authors | Vincenzo Lomonaco, Davide Maltoni |
Abstract | Continuous/Lifelong learning of high-dimensional data streams is a challenging research problem. In fact, fully retraining models each time new data become available is infeasible, due to computational and storage issues, while na"ive incremental strategies have been shown to suffer from catastrophic forgetting. In the context of real-world object recognition applications (e.g., robotic vision), where continuous learning is crucial, very few datasets and benchmarks are available to evaluate and compare emerging techniques. In this work we propose a new dataset and benchmark CORe50, specifically designed for continuous object recognition, and introduce baseline approaches for different continuous learning scenarios. |
Tasks | Continuous Object Recognition, Object Recognition |
Published | 2017-05-09 |
URL | http://arxiv.org/abs/1705.03550v1 |
http://arxiv.org/pdf/1705.03550v1.pdf | |
PWC | https://paperswithcode.com/paper/core50-a-new-dataset-and-benchmark-for |
Repo | https://github.com/vlomonaco/core50 |
Framework | none |
Superpixel-based Semantic Segmentation Trained by Statistical Process Control
Title | Superpixel-based Semantic Segmentation Trained by Statistical Process Control |
Authors | Hyojin Park, Jisoo Jeong, Youngjoon Yoo, Nojun Kwak |
Abstract | Semantic segmentation, like other fields of computer vision, has seen a remarkable performance advance by the use of deep convolution neural networks. However, considering that neighboring pixels are heavily dependent on each other, both learning and testing of these methods have a lot of redundant operations. To resolve this problem, the proposed network is trained and tested with only 0.37% of total pixels by superpixel-based sampling and largely reduced the complexity of upsampling calculation. The hypercolumn feature maps are constructed by pyramid module in combination with the convolution layers of the base network. Since the proposed method uses a very small number of sampled pixels, the end-to-end learning of the entire network is difficult with a common learning rate for all the layers. In order to resolve this problem, the learning rate after sampling is controlled by statistical process control (SPC) of gradients in each layer. The proposed method performs better than or equal to the conventional methods that use much more samples on Pascal Context, SUN-RGBD dataset. |
Tasks | Semantic Segmentation |
Published | 2017-06-30 |
URL | http://arxiv.org/abs/1706.10071v2 |
http://arxiv.org/pdf/1706.10071v2.pdf | |
PWC | https://paperswithcode.com/paper/superpixel-based-semantic-segmentation |
Repo | https://github.com/HYOJINPARK/HP-SPS |
Framework | caffe2 |
Mining fine-grained opinions on closed captions of YouTube videos with an attention-RNN
Title | Mining fine-grained opinions on closed captions of YouTube videos with an attention-RNN |
Authors | Edison Marrese-Taylor, Jorge A. Balazs, Yutaka Matsuo |
Abstract | Video reviews are the natural evolution of written product reviews. In this paper we target this phenomenon and introduce the first dataset created from closed captions of YouTube product review videos as well as a new attention-RNN model for aspect extraction and joint aspect extraction and sentiment classification. Our model provides state-of-the-art performance on aspect extraction without requiring the usage of hand-crafted features on the SemEval ABSA corpus, while it outperforms the baseline on the joint task. In our dataset, the attention-RNN model outperforms the baseline for both tasks, but we observe important performance drops for all models in comparison to SemEval. These results, as well as further experiments on domain adaptation for aspect extraction, suggest that differences between speech and written text, which have been discussed extensively in the literature, also extend to the domain of product reviews, where they are relevant for fine-grained opinion mining. |
Tasks | Aspect Extraction, Domain Adaptation, Opinion Mining, Sentiment Analysis |
Published | 2017-08-08 |
URL | http://arxiv.org/abs/1708.02420v1 |
http://arxiv.org/pdf/1708.02420v1.pdf | |
PWC | https://paperswithcode.com/paper/mining-fine-grained-opinions-on-closed |
Repo | https://github.com/epochx/opinatt |
Framework | tf |
Recurrent Relational Networks
Title | Recurrent Relational Networks |
Authors | Rasmus Berg Palm, Ulrich Paquet, Ole Winther |
Abstract | This paper is concerned with learning to solve tasks that require a chain of interdependent steps of relational inference, like answering complex questions about the relationships between objects, or solving puzzles where the smaller elements of a solution mutually constrain each other. We introduce the recurrent relational network, a general purpose module that operates on a graph representation of objects. As a generalization of Santoro et al. [2017]‘s relational network, it can augment any neural network model with the capacity to do many-step relational reasoning. We achieve state of the art results on the bAbI textual question-answering dataset with the recurrent relational network, consistently solving 20/20 tasks. As bAbI is not particularly challenging from a relational reasoning point of view, we introduce Pretty-CLEVR, a new diagnostic dataset for relational reasoning. In the Pretty-CLEVR set-up, we can vary the question to control for the number of relational reasoning steps that are required to obtain the answer. Using Pretty-CLEVR, we probe the limitations of multi-layer perceptrons, relational and recurrent relational networks. Finally, we show how recurrent relational networks can learn to solve Sudoku puzzles from supervised training data, a challenging task requiring upwards of 64 steps of relational reasoning. We achieve state-of-the-art results amongst comparable methods by solving 96.6% of the hardest Sudoku puzzles. |
Tasks | Question Answering, Relational Reasoning |
Published | 2017-11-21 |
URL | http://arxiv.org/abs/1711.08028v4 |
http://arxiv.org/pdf/1711.08028v4.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-relational-networks |
Repo | https://github.com/rasmusbergpalm/recurrent-relational-networks |
Framework | tf |