October 19, 2019

3300 words 16 mins read

Paper Group ANR 242

Classifying Object Manipulation Actions based on Grasp-types and Motion-Constraints. A Constrained Coupled Matrix-Tensor Factorization for Learning Time-evolving and Emerging Topics. MOBIUS: Model-Oblivious Binarized Neural Networks. Learning to See the Invisible: End-to-End Trainable Amodal Instance Segmentation. Auto-Encoding Scene Graphs for Ima …

Classifying Object Manipulation Actions based on Grasp-types and Motion-Constraints


Title	Classifying Object Manipulation Actions based on Grasp-types and Motion-Constraints
Authors	Kartik Gupta, Darius Burschka, Arnav Bhavsar
Abstract	In this work, we address a challenging problem of fine-grained and coarse-grained recognition of object manipulation actions. Due to the variations in geometrical and motion constraints, there are different manipulations actions possible to perform different sets of actions with an object. Also, there are subtle movements involved to complete most of object manipulation actions. This makes the task of object manipulation action recognition difficult with only just the motion information. We propose to use grasp and motion-constraints information to recognise and understand action intention with different objects. We also provide an extensive experimental evaluation on the recent Yale Human Grasping dataset consisting of large set of 455 manipulation actions. The evaluation involves a) Different contemporary multi-class classifiers, and binary classifiers with one-vs-one multi- class voting scheme, b) Differential comparisons results based on subsets of attributes involving information of grasp and motion-constraints, c) Fine-grained and Coarse-grained object manipulation action recognition based on fine-grained as well as coarse-grained grasp type information, and d) Comparison between Instance level and Sequence level modeling of object manipulation actions. Our results justifies the efficacy of grasp attributes for the task of fine-grained and coarse-grained object manipulation action recognition.
Tasks	Temporal Action Localization
Published	2018-06-20
URL	http://arxiv.org/abs/1806.07574v1
PDF	http://arxiv.org/pdf/1806.07574v1.pdf
PWC	https://paperswithcode.com/paper/classifying-object-manipulation-actions-based
Repo
Framework

A Constrained Coupled Matrix-Tensor Factorization for Learning Time-evolving and Emerging Topics


Title	A Constrained Coupled Matrix-Tensor Factorization for Learning Time-evolving and Emerging Topics
Authors	Sanaz Bahargam, Evangelos E. Papalexakis
Abstract	Topic discovery has witnessed a significant growth as a field of data mining at large. In particular, time-evolving topic discovery, where the evolution of a topic is taken into account has been instrumental in understanding the historical context of an emerging topic in a dynamic corpus. Traditionally, time-evolving topic discovery has focused on this notion of time. However, especially in settings where content is contributed by a community or a crowd, an orthogonal notion of time is the one that pertains to the level of expertise of the content creator: the more experienced the creator, the more advanced the topic. In this paper, we propose a novel time-evolving topic discovery method which, in addition to the extracted topics, is able to identify the evolution of that topic over time, as well as the level of difficulty of that topic, as it is inferred by the level of expertise of its main contributors. Our method is based on a novel formulation of Constrained Coupled Matrix-Tensor Factorization, which adopts constraints well-motivated for, and, as we demonstrate, are essential for high-quality topic discovery. We qualitatively evaluate our approach using real data from the Physics and also Programming Stack Exchange forum, and we were able to identify topics of varying levels of difficulty which can be linked to external events, such as the announcement of gravitational waves by the LIGO lab in Physics forum. We provide a quantitative evaluation of our method by conducting a user study where experts were asked to judge the coherence and quality of the extracted topics. Finally, our proposed method has implications for automatic curriculum design using the extracted topics, where the notion of the level of difficulty is necessary for the proper modeling of prerequisites and advanced concepts.
Tasks
Published	2018-06-30
URL	http://arxiv.org/abs/1807.00122v1
PDF	http://arxiv.org/pdf/1807.00122v1.pdf
PWC	https://paperswithcode.com/paper/a-constrained-coupled-matrix-tensor
Repo
Framework

MOBIUS: Model-Oblivious Binarized Neural Networks


Title	MOBIUS: Model-Oblivious Binarized Neural Networks
Authors	Hiromasa Kitai, Jason Paul Cruz, Naoto Yanai, Naohisa Nishida, Tatsumi Oba, Yuji Unagami, Tadanori Teruya, Nuttapong Attrapadung, Takahiro Matsuda, Goichiro Hanaoka
Abstract	A privacy-preserving framework in which a computational resource provider receives encrypted data from a client and returns prediction results without decrypting the data, i.e., oblivious neural network or encrypted prediction, has been studied in machine learning that provides prediction services. In this work, we present MOBIUS (Model-Oblivious BInary neUral networkS), a new system that combines Binarized Neural Networks (BNNs) and secure computation based on secret sharing as tools for scalable and fast privacy-preserving machine learning. BNNs improve computational performance by binarizing values in training to $-1$ and $+1$, while secure computation based on secret sharing provides fast and various computations under encrypted forms via modulo operations with a short bit length. However, combining these tools is not trivial because their operations have different algebraic structures and the use of BNNs downgrades prediction accuracy in general. MOBIUS uses improved procedures of BNNs and secure computation that have compatible algebraic structures without downgrading prediction accuracy. We created an implementation of MOBIUS in C++ using the ABY library (NDSS 2015). We then conducted experiments using the MNIST dataset, and the results show that MOBIUS can return a prediction within 0.76 seconds, which is six times faster than SecureML (IEEE S&P 2017). MOBIUS allows a client to request for encrypted prediction and allows a trainer to obliviously publish an encrypted model to a cloud provided by a computational resource provider, i.e., without revealing the original model itself to the provider.
Tasks
Published	2018-11-29
URL	http://arxiv.org/abs/1811.12028v1
PDF	http://arxiv.org/pdf/1811.12028v1.pdf
PWC	https://paperswithcode.com/paper/mobius-model-oblivious-binarized-neural
Repo
Framework

Learning to See the Invisible: End-to-End Trainable Amodal Instance Segmentation


Title	Learning to See the Invisible: End-to-End Trainable Amodal Instance Segmentation
Authors	Patrick Follmann, Rebecca König, Philipp Härtinger, Michael Klostermann
Abstract	Semantic amodal segmentation is a recently proposed extension to instance-aware segmentation that includes the prediction of the invisible region of each object instance. We present the first all-in-one end-to-end trainable model for semantic amodal segmentation that predicts the amodal instance masks as well as their visible and invisible part in a single forward pass. In a detailed analysis, we provide experiments to show which architecture choices are beneficial for an all-in-one amodal segmentation model. On the COCO amodal dataset, our model outperforms the current baseline for amodal segmentation by a large margin. To further evaluate our model, we provide two new datasets with ground truth for semantic amodal segmentation, D2S amodal and COCOA cls. For both datasets, our model provides a strong baseline performance. Using special data augmentation techniques, we show that amodal segmentation on D2S amodal is possible with reasonable performance, even without providing amodal training data.
Tasks	Data Augmentation, Instance Segmentation, Semantic Segmentation
Published	2018-04-24
URL	http://arxiv.org/abs/1804.08864v1
PDF	http://arxiv.org/pdf/1804.08864v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-see-the-invisible-end-to-end
Repo
Framework

Auto-Encoding Scene Graphs for Image Captioning


Title	Auto-Encoding Scene Graphs for Image Captioning
Authors	Xu Yang, Kaihua Tang, Hanwang Zhang, Jianfei Cai
Abstract	We propose Scene Graph Auto-Encoder (SGAE) that incorporates the language inductive bias into the encoder-decoder image captioning framework for more human-like captions. Intuitively, we humans use the inductive bias to compose collocations and contextual inference in discourse. For example, when we see the relation `person on bike', it is natural to replace` on’ with `ride' and infer` person riding bike on a road’ even the `road’ is not evident. Therefore, exploiting such bias as a language prior is expected to help the conventional encoder-decoder models less likely overfit to the dataset bias and focus on reasoning. Specifically, we use the scene graph — a directed graph ($\mathcal{G}$) where an object node is connected by adjective nodes and relationship nodes — to represent the complex structural layout of both image ($\mathcal{I}$) and sentence ($\mathcal{S}$). In the textual domain, we use SGAE to learn a dictionary ($\mathcal{D}$) that helps to reconstruct sentences in the $\mathcal{S}\rightarrow \mathcal{G} \rightarrow \mathcal{D} \rightarrow \mathcal{S}$ pipeline, where $\mathcal{D}$ encodes the desired language prior; in the vision-language domain, we use the shared $\mathcal{D}$ to guide the encoder-decoder in the $\mathcal{I}\rightarrow \mathcal{G}\rightarrow \mathcal{D} \rightarrow \mathcal{S}$ pipeline. Thanks to the scene graph representation and shared dictionary, the inductive bias is transferred across domains in principle. We validate the effectiveness of SGAE on the challenging MS-COCO image captioning benchmark, e.g., our SGAE-based single-model achieves a new state-of-the-art $127.8$ CIDEr-D on the Karpathy split, and a competitive $125.5$ CIDEr-D (c40) on the official server even compared to other ensemble models. \|
Tasks	Image Captioning
Published	2018-12-06
URL	http://arxiv.org/abs/1812.02378v3
PDF	http://arxiv.org/pdf/1812.02378v3.pdf
PWC	https://paperswithcode.com/paper/auto-encoding-scene-graphs-for-image
Repo
Framework

Learning random-walk label propagation for weakly-supervised semantic segmentation


Title	Learning random-walk label propagation for weakly-supervised semantic segmentation
Authors	Paul Vernaza, Manmohan Chandraker
Abstract	Large-scale training for semantic segmentation is challenging due to the expense of obtaining training data for this task relative to other vision tasks. We propose a novel training approach to address this difficulty. Given cheaply-obtained sparse image labelings, we propagate the sparse labels to produce guessed dense labelings. A standard CNN-based segmentation network is trained to mimic these labelings. The label-propagation process is defined via random-walk hitting probabilities, which leads to a differentiable parameterization with uncertainty estimates that are incorporated into our loss. We show that by learning the label-propagator jointly with the segmentation predictor, we are able to effectively learn semantic edges given no direct edge supervision. Experiments also show that training a segmentation network in this way outperforms the naive approach.
Tasks	Semantic Segmentation, Weakly-Supervised Semantic Segmentation
Published	2018-02-01
URL	http://arxiv.org/abs/1802.00470v1
PDF	http://arxiv.org/pdf/1802.00470v1.pdf
PWC	https://paperswithcode.com/paper/learning-random-walk-label-propagation-for
Repo
Framework

Generative Adversarial Self-Imitation Learning


Title	Generative Adversarial Self-Imitation Learning
Authors	Yijie Guo, Junhyuk Oh, Satinder Singh, Honglak Lee
Abstract	This paper explores a simple regularizer for reinforcement learning by proposing Generative Adversarial Self-Imitation Learning (GASIL), which encourages the agent to imitate past good trajectories via generative adversarial imitation learning framework. Instead of directly maximizing rewards, GASIL focuses on reproducing past good trajectories, which can potentially make long-term credit assignment easier when rewards are sparse and delayed. GASIL can be easily combined with any policy gradient objective by using GASIL as a learned shaped reward function. Our experimental results show that GASIL improves the performance of proximal policy optimization on 2D Point Mass and MuJoCo environments with delayed reward and stochastic dynamics.
Tasks	Imitation Learning
Published	2018-12-03
URL	http://arxiv.org/abs/1812.00950v1
PDF	http://arxiv.org/pdf/1812.00950v1.pdf
PWC	https://paperswithcode.com/paper/generative-adversarial-self-imitation
Repo
Framework

Learning behavioral context recognition with multi-stream temporal convolutional networks


Title	Learning behavioral context recognition with multi-stream temporal convolutional networks
Authors	Aaqib Saeed, Tanir Ozcelebi, Stojan Trajanovski, Johan Lukkien
Abstract	Smart devices of everyday use (such as smartphones and wearables) are increasingly integrated with sensors that provide immense amounts of information about a person’s daily life such as behavior and context. The automatic and unobtrusive sensing of behavioral context can help develop solutions for assisted living, fitness tracking, sleep monitoring, and several other fields. Towards addressing this issue, we raise the question: can a machine learn to recognize a diverse set of contexts and activities in a real-life through joint learning from raw multi-modal signals (e.g. accelerometer, gyroscope and audio etc.)? In this paper, we propose a multi-stream temporal convolutional network to address the problem of multi-label behavioral context recognition. A four-stream network architecture handles learning from each modality with a contextualization module which incorporates extracted representations to infer a user’s context. Our empirical evaluation suggests that a deep convolutional network trained end-to-end achieves an optimal recognition rate. Furthermore, the presented architecture can be extended to include similar sensors for performance improvements and handles missing modalities through multi-task learning without any manual feature engineering on highly imbalanced and sparsely labeled dataset.
Tasks	Feature Engineering, Multi-Task Learning
Published	2018-08-27
URL	http://arxiv.org/abs/1808.08766v1
PDF	http://arxiv.org/pdf/1808.08766v1.pdf
PWC	https://paperswithcode.com/paper/learning-behavioral-context-recognition-with
Repo
Framework

Pattern Localization in Time Series through Signal-To-Model Alignment in Latent Space


Title	Pattern Localization in Time Series through Signal-To-Model Alignment in Latent Space
Authors	Steven Van Vaerenbergh, Ignacio Santamaria, Victor Elvira, Matteo Salvatori
Abstract	In this paper, we study the problem of locating a predefined sequence of patterns in a time series. In particular, the studied scenario assumes a theoretical model is available that contains the expected locations of the patterns. This problem is found in several contexts, and it is commonly solved by first synthesizing a time series from the model, and then aligning it to the true time series through dynamic time warping. We propose a technique that increases the similarity of both time series before aligning them, by mapping them into a latent correlation space. The mapping is learned from the data through a machine-learning setup. Experiments on data from non-destructive testing demonstrate that the proposed approach shows significant improvements over the state of the art.
Tasks	Time Series
Published	2018-02-16
URL	http://arxiv.org/abs/1802.05910v2
PDF	http://arxiv.org/pdf/1802.05910v2.pdf
PWC	https://paperswithcode.com/paper/pattern-localization-in-time-series-through
Repo
Framework

Accelerated Bayesian Optimization throughWeight-Prior Tuning


Title	Accelerated Bayesian Optimization throughWeight-Prior Tuning
Authors	Alistair Shilton, Sunil Gupta, Santu Rana, Pratibha Vellanki, Laurence Park, Cheng Li, Svetha Venkatesh, Alessandra Sutti, David Rubin, Thomas Dorin, Alireza Vahid, Murray Height, Teo Slezak
Abstract	Bayesian optimization (BO) is a widely-used method for optimizing expensive (to evaluate) problems. At the core of most BO methods is the modeling of the objective function using a Gaussian Process (GP) whose covariance is selected from a set of standard covariance functions. From a weight-space view, this models the objective as a linear function in a feature space implied by the given covariance K, with an arbitrary Gaussian weight prior ${\bf w} \sim \mathcal{N} ({\bf 0}, {\bf I})$. In many practical applications there is data available that has a similar (covariance) structure to the objective, but which, having different form, cannot be used directly in standard transfer learning. In this paper we show how such auxiliary data may be used to construct a GP covariance corresponding to a more appropriate weight prior for the objective function. Building on this, we show that we may accelerate BO by modeling the objective function using this (learned) weight prior, which we demonstrate on both test functions and a practical application to short-polymer fibre manufacture.
Tasks	Bayesian Optimisation, Transfer Learning
Published	2018-05-21
URL	https://arxiv.org/abs/1805.07852v2
PDF	https://arxiv.org/pdf/1805.07852v2.pdf
PWC	https://paperswithcode.com/paper/kernel-pre-training-in-feature-space-via-m
Repo
Framework

Multimodal Deep Neural Networks using Both Engineered and Learned Representations for Biodegradability Prediction


Title	Multimodal Deep Neural Networks using Both Engineered and Learned Representations for Biodegradability Prediction
Authors	Garrett B. Goh, Khushmeen Sakloth, Charles Siegel, Abhinav Vishnu, Jim Pfaendtner
Abstract	Deep learning algorithms excel at extracting patterns from raw data, and with large datasets, they have been very successful in computer vision and natural language applications. However, in other domains, large datasets on which to learn representations from may not exist. In this work, we develop a novel multimodal CNN-MLP neural network architecture that utilizes both domain-specific feature engineering as well as learned representations from raw data. We illustrate the effectiveness of such network designs in the chemical sciences, for predicting biodegradability. DeepBioD, a multimodal CNN-MLP network is more accurate than either standalone network designs, and achieves an error classification rate of 0.125 that is 27% lower than the current state-of-the-art. Thus, our work indicates that combining traditional feature engineering with representation learning can be effective, particularly in situations where labeled data is limited.
Tasks	Feature Engineering, Representation Learning
Published	2018-08-13
URL	http://arxiv.org/abs/1808.04456v2
PDF	http://arxiv.org/pdf/1808.04456v2.pdf
PWC	https://paperswithcode.com/paper/multimodal-deep-neural-networks-using-both
Repo
Framework

Heuristic Feature Selection for Clickbait Detection


Title	Heuristic Feature Selection for Clickbait Detection
Authors	Matti Wiegmann, Michael Völske, Benno Stein, Matthias Hagen, Martin Potthast
Abstract	We study feature selection as a means to optimize the baseline clickbait detector employed at the Clickbait Challenge 2017. The challenge’s task is to score the “clickbaitiness” of a given Twitter tweet on a scale from 0 (no clickbait) to 1 (strong clickbait). Unlike most other approaches submitted to the challenge, the baseline approach is based on manual feature engineering and does not compete out of the box with many of the deep learning-based approaches. We show that scaling up feature selection efforts to heuristically identify better-performing feature subsets catapults the performance of the baseline classifier to second rank overall, beating 12 other competing approaches and improving over the baseline performance by 20%. This demonstrates that traditional classification approaches can still keep up with deep learning on this task.
Tasks	Clickbait Detection, Feature Engineering, Feature Selection
Published	2018-02-04
URL	http://arxiv.org/abs/1802.01191v1
PDF	http://arxiv.org/pdf/1802.01191v1.pdf
PWC	https://paperswithcode.com/paper/heuristic-feature-selection-for-clickbait
Repo
Framework

Predicting Future Lane Changes of Other Highway Vehicles using RNN-based Deep Models


Title	Predicting Future Lane Changes of Other Highway Vehicles using RNN-based Deep Models
Authors	Sajan Patel, Brent Griffin, Kristofer Kusano, Jason J. Corso
Abstract	In the event of sensor failure, autonomous vehicles need to safely execute emergency maneuvers while avoiding other vehicles on the road. To accomplish this, the sensor-failed vehicle must predict the future semantic behaviors of other drivers, such as lane changes, as well as their future trajectories given a recent window of past sensor observations. We address the first issue of semantic behavior prediction in this paper, which is a precursor to trajectory prediction, by introducing a framework that leverages the power of recurrent neural networks (RNNs) and graphical models. Our goal is to predict the future categorical driving intent, for lane changes, of neighboring vehicles up to three seconds into the future given as little as a one-second window of past LIDAR, GPS, inertial, and map data. We collect real-world data containing over 20 hours of highway driving using an autonomous Toyota vehicle. We propose a composite RNN model by adopting the methodology of Structural Recurrent Neural Networks (RNNs) to learn factor functions and take advantage of both the high-level structure of graphical models and the sequence modeling power of RNNs, which we expect to afford more transparent modeling and activity than opaque, single RNN models. To demonstrate our approach, we validate our model using authentic interstate highway driving to predict the future lane change maneuvers of other vehicles neighboring our autonomous vehicle. We find that our composite Structural RNN outperforms baselines by as much as 12% in balanced accuracy metrics.
Tasks	Accuracy Metrics, Autonomous Vehicles, Trajectory Prediction
Published	2018-01-12
URL	https://arxiv.org/abs/1801.04340v4
PDF	https://arxiv.org/pdf/1801.04340v4.pdf
PWC	https://paperswithcode.com/paper/predicting-future-lane-changes-of-other
Repo
Framework

Exploration of Numerical Precision in Deep Neural Networks


Title	Exploration of Numerical Precision in Deep Neural Networks
Authors	Zhaoqi Li, Yu Ma, Catalina Vajiac, Yunkai Zhang
Abstract	Reduced numerical precision is a common technique to reduce computational cost in many Deep Neural Networks (DNNs). While it has been observed that DNNs are resilient to small errors and noise, no general result exists that is capable of predicting a given DNN system architecture’s sensitivity to reduced precision. In this project, we emulate arbitrary bit-width using a specified floating-point representation with a truncation method, which is applied to the neural network after each batch. We explore the impact of several model parameters on the network’s training accuracy and show results on the MNIST dataset. We then present a preliminary theoretical investigation of the error scaling in both forward and backward propagations. We end with a discussion of the implications of these results as well as the potential for generalization to other network architectures.
Tasks
Published	2018-05-03
URL	http://arxiv.org/abs/1805.01078v1
PDF	http://arxiv.org/pdf/1805.01078v1.pdf
PWC	https://paperswithcode.com/paper/exploration-of-numerical-precision-in-deep
Repo
Framework

Generalization in quasi-periodic environments


Title	Generalization in quasi-periodic environments
Authors	Giovanni Bellettini, Alessandro Betti, Marco Gori
Abstract	By and large the behavior of stochastic gradient is regarded as a challenging problem, and it is often presented in the framework of statistical machine learning. This paper offers a novel view on the analysis of on-line models of learning that arises when dealing with a generalized version of stochastic gradient that is based on dissipative dynamics. In order to face the complex evolution of these models, a systematic treatment is proposed which is based on energy balance equations that are derived by means of the Caldirola-Kanai (CK) Hamiltonian. According to these equations, learning can be regarded as an ordering process which corresponds with the decrement of the loss function. Finally, the main results established in this paper is that in the case of quasi-periodic environments, where the pattern novelty is progressively limited as time goes by, the system dynamics yields an asymptotically consistent solution in the weight space, that is the solution maps similar patterns to the same decision.
Tasks
Published	2018-07-14
URL	http://arxiv.org/abs/1807.05343v1
PDF	http://arxiv.org/pdf/1807.05343v1.pdf
PWC	https://paperswithcode.com/paper/generalization-in-quasi-periodic-environments
Repo
Framework