Paper Group ANR 242
Classifying Object Manipulation Actions based on Grasp-types and Motion-Constraints. A Constrained Coupled Matrix-Tensor Factorization for Learning Time-evolving and Emerging Topics. MOBIUS: Model-Oblivious Binarized Neural Networks. Learning to See the Invisible: End-to-End Trainable Amodal Instance Segmentation. Auto-Encoding Scene Graphs for Ima …
Classifying Object Manipulation Actions based on Grasp-types and Motion-Constraints
Title | Classifying Object Manipulation Actions based on Grasp-types and Motion-Constraints |
Authors | Kartik Gupta, Darius Burschka, Arnav Bhavsar |
Abstract | In this work, we address a challenging problem of fine-grained and coarse-grained recognition of object manipulation actions. Due to the variations in geometrical and motion constraints, there are different manipulations actions possible to perform different sets of actions with an object. Also, there are subtle movements involved to complete most of object manipulation actions. This makes the task of object manipulation action recognition difficult with only just the motion information. We propose to use grasp and motion-constraints information to recognise and understand action intention with different objects. We also provide an extensive experimental evaluation on the recent Yale Human Grasping dataset consisting of large set of 455 manipulation actions. The evaluation involves a) Different contemporary multi-class classifiers, and binary classifiers with one-vs-one multi- class voting scheme, b) Differential comparisons results based on subsets of attributes involving information of grasp and motion-constraints, c) Fine-grained and Coarse-grained object manipulation action recognition based on fine-grained as well as coarse-grained grasp type information, and d) Comparison between Instance level and Sequence level modeling of object manipulation actions. Our results justifies the efficacy of grasp attributes for the task of fine-grained and coarse-grained object manipulation action recognition. |
Tasks | Temporal Action Localization |
Published | 2018-06-20 |
URL | http://arxiv.org/abs/1806.07574v1 |
http://arxiv.org/pdf/1806.07574v1.pdf | |
PWC | https://paperswithcode.com/paper/classifying-object-manipulation-actions-based |
Repo | |
Framework | |
A Constrained Coupled Matrix-Tensor Factorization for Learning Time-evolving and Emerging Topics
Title | A Constrained Coupled Matrix-Tensor Factorization for Learning Time-evolving and Emerging Topics |
Authors | Sanaz Bahargam, Evangelos E. Papalexakis |
Abstract | Topic discovery has witnessed a significant growth as a field of data mining at large. In particular, time-evolving topic discovery, where the evolution of a topic is taken into account has been instrumental in understanding the historical context of an emerging topic in a dynamic corpus. Traditionally, time-evolving topic discovery has focused on this notion of time. However, especially in settings where content is contributed by a community or a crowd, an orthogonal notion of time is the one that pertains to the level of expertise of the content creator: the more experienced the creator, the more advanced the topic. In this paper, we propose a novel time-evolving topic discovery method which, in addition to the extracted topics, is able to identify the evolution of that topic over time, as well as the level of difficulty of that topic, as it is inferred by the level of expertise of its main contributors. Our method is based on a novel formulation of Constrained Coupled Matrix-Tensor Factorization, which adopts constraints well-motivated for, and, as we demonstrate, are essential for high-quality topic discovery. We qualitatively evaluate our approach using real data from the Physics and also Programming Stack Exchange forum, and we were able to identify topics of varying levels of difficulty which can be linked to external events, such as the announcement of gravitational waves by the LIGO lab in Physics forum. We provide a quantitative evaluation of our method by conducting a user study where experts were asked to judge the coherence and quality of the extracted topics. Finally, our proposed method has implications for automatic curriculum design using the extracted topics, where the notion of the level of difficulty is necessary for the proper modeling of prerequisites and advanced concepts. |
Tasks | |
Published | 2018-06-30 |
URL | http://arxiv.org/abs/1807.00122v1 |
http://arxiv.org/pdf/1807.00122v1.pdf | |
PWC | https://paperswithcode.com/paper/a-constrained-coupled-matrix-tensor |
Repo | |
Framework | |
MOBIUS: Model-Oblivious Binarized Neural Networks
Title | MOBIUS: Model-Oblivious Binarized Neural Networks |
Authors | Hiromasa Kitai, Jason Paul Cruz, Naoto Yanai, Naohisa Nishida, Tatsumi Oba, Yuji Unagami, Tadanori Teruya, Nuttapong Attrapadung, Takahiro Matsuda, Goichiro Hanaoka |
Abstract | A privacy-preserving framework in which a computational resource provider receives encrypted data from a client and returns prediction results without decrypting the data, i.e., oblivious neural network or encrypted prediction, has been studied in machine learning that provides prediction services. In this work, we present MOBIUS (Model-Oblivious BInary neUral networkS), a new system that combines Binarized Neural Networks (BNNs) and secure computation based on secret sharing as tools for scalable and fast privacy-preserving machine learning. BNNs improve computational performance by binarizing values in training to $-1$ and $+1$, while secure computation based on secret sharing provides fast and various computations under encrypted forms via modulo operations with a short bit length. However, combining these tools is not trivial because their operations have different algebraic structures and the use of BNNs downgrades prediction accuracy in general. MOBIUS uses improved procedures of BNNs and secure computation that have compatible algebraic structures without downgrading prediction accuracy. We created an implementation of MOBIUS in C++ using the ABY library (NDSS 2015). We then conducted experiments using the MNIST dataset, and the results show that MOBIUS can return a prediction within 0.76 seconds, which is six times faster than SecureML (IEEE S&P 2017). MOBIUS allows a client to request for encrypted prediction and allows a trainer to obliviously publish an encrypted model to a cloud provided by a computational resource provider, i.e., without revealing the original model itself to the provider. |
Tasks | |
Published | 2018-11-29 |
URL | http://arxiv.org/abs/1811.12028v1 |
http://arxiv.org/pdf/1811.12028v1.pdf | |
PWC | https://paperswithcode.com/paper/mobius-model-oblivious-binarized-neural |
Repo | |
Framework | |
Learning to See the Invisible: End-to-End Trainable Amodal Instance Segmentation
Title | Learning to See the Invisible: End-to-End Trainable Amodal Instance Segmentation |
Authors | Patrick Follmann, Rebecca König, Philipp Härtinger, Michael Klostermann |
Abstract | Semantic amodal segmentation is a recently proposed extension to instance-aware segmentation that includes the prediction of the invisible region of each object instance. We present the first all-in-one end-to-end trainable model for semantic amodal segmentation that predicts the amodal instance masks as well as their visible and invisible part in a single forward pass. In a detailed analysis, we provide experiments to show which architecture choices are beneficial for an all-in-one amodal segmentation model. On the COCO amodal dataset, our model outperforms the current baseline for amodal segmentation by a large margin. To further evaluate our model, we provide two new datasets with ground truth for semantic amodal segmentation, D2S amodal and COCOA cls. For both datasets, our model provides a strong baseline performance. Using special data augmentation techniques, we show that amodal segmentation on D2S amodal is possible with reasonable performance, even without providing amodal training data. |
Tasks | Data Augmentation, Instance Segmentation, Semantic Segmentation |
Published | 2018-04-24 |
URL | http://arxiv.org/abs/1804.08864v1 |
http://arxiv.org/pdf/1804.08864v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-see-the-invisible-end-to-end |
Repo | |
Framework | |
Auto-Encoding Scene Graphs for Image Captioning
Title | Auto-Encoding Scene Graphs for Image Captioning |
Authors | Xu Yang, Kaihua Tang, Hanwang Zhang, Jianfei Cai |
Abstract | We propose Scene Graph Auto-Encoder (SGAE) that incorporates the language inductive bias into the encoder-decoder image captioning framework for more human-like captions. Intuitively, we humans use the inductive bias to compose collocations and contextual inference in discourse. For example, when we see the relation person on bike', it is natural to replace on’ with ride' and infer person riding bike on a road’ even the `road’ is not evident. Therefore, exploiting such bias as a language prior is expected to help the conventional encoder-decoder models less likely overfit to the dataset bias and focus on reasoning. Specifically, we use the scene graph — a directed graph ($\mathcal{G}$) where an object node is connected by adjective nodes and relationship nodes — to represent the complex structural layout of both image ($\mathcal{I}$) and sentence ($\mathcal{S}$). In the textual domain, we use SGAE to learn a dictionary ($\mathcal{D}$) that helps to reconstruct sentences in the $\mathcal{S}\rightarrow \mathcal{G} \rightarrow \mathcal{D} \rightarrow \mathcal{S}$ pipeline, where $\mathcal{D}$ encodes the desired language prior; in the vision-language domain, we use the shared $\mathcal{D}$ to guide the encoder-decoder in the $\mathcal{I}\rightarrow \mathcal{G}\rightarrow \mathcal{D} \rightarrow \mathcal{S}$ pipeline. Thanks to the scene graph representation and shared dictionary, the inductive bias is transferred across domains in principle. We validate the effectiveness of SGAE on the challenging MS-COCO image captioning benchmark, e.g., our SGAE-based single-model achieves a new state-of-the-art $127.8$ CIDEr-D on the Karpathy split, and a competitive $125.5$ CIDEr-D (c40) on the official server even compared to other ensemble models. | |
Tasks | Image Captioning |
Published | 2018-12-06 |
URL | http://arxiv.org/abs/1812.02378v3 |
http://arxiv.org/pdf/1812.02378v3.pdf | |
PWC | https://paperswithcode.com/paper/auto-encoding-scene-graphs-for-image |
Repo | |
Framework | |
Learning random-walk label propagation for weakly-supervised semantic segmentation
Title | Learning random-walk label propagation for weakly-supervised semantic segmentation |
Authors | Paul Vernaza, Manmohan Chandraker |
Abstract | Large-scale training for semantic segmentation is challenging due to the expense of obtaining training data for this task relative to other vision tasks. We propose a novel training approach to address this difficulty. Given cheaply-obtained sparse image labelings, we propagate the sparse labels to produce guessed dense labelings. A standard CNN-based segmentation network is trained to mimic these labelings. The label-propagation process is defined via random-walk hitting probabilities, which leads to a differentiable parameterization with uncertainty estimates that are incorporated into our loss. We show that by learning the label-propagator jointly with the segmentation predictor, we are able to effectively learn semantic edges given no direct edge supervision. Experiments also show that training a segmentation network in this way outperforms the naive approach. |
Tasks | Semantic Segmentation, Weakly-Supervised Semantic Segmentation |
Published | 2018-02-01 |
URL | http://arxiv.org/abs/1802.00470v1 |
http://arxiv.org/pdf/1802.00470v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-random-walk-label-propagation-for |
Repo | |
Framework | |
Generative Adversarial Self-Imitation Learning
Title | Generative Adversarial Self-Imitation Learning |
Authors | Yijie Guo, Junhyuk Oh, Satinder Singh, Honglak Lee |
Abstract | This paper explores a simple regularizer for reinforcement learning by proposing Generative Adversarial Self-Imitation Learning (GASIL), which encourages the agent to imitate past good trajectories via generative adversarial imitation learning framework. Instead of directly maximizing rewards, GASIL focuses on reproducing past good trajectories, which can potentially make long-term credit assignment easier when rewards are sparse and delayed. GASIL can be easily combined with any policy gradient objective by using GASIL as a learned shaped reward function. Our experimental results show that GASIL improves the performance of proximal policy optimization on 2D Point Mass and MuJoCo environments with delayed reward and stochastic dynamics. |
Tasks | Imitation Learning |
Published | 2018-12-03 |
URL | http://arxiv.org/abs/1812.00950v1 |
http://arxiv.org/pdf/1812.00950v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-adversarial-self-imitation |
Repo | |
Framework | |
Learning behavioral context recognition with multi-stream temporal convolutional networks
Title | Learning behavioral context recognition with multi-stream temporal convolutional networks |
Authors | Aaqib Saeed, Tanir Ozcelebi, Stojan Trajanovski, Johan Lukkien |
Abstract | Smart devices of everyday use (such as smartphones and wearables) are increasingly integrated with sensors that provide immense amounts of information about a person’s daily life such as behavior and context. The automatic and unobtrusive sensing of behavioral context can help develop solutions for assisted living, fitness tracking, sleep monitoring, and several other fields. Towards addressing this issue, we raise the question: can a machine learn to recognize a diverse set of contexts and activities in a real-life through joint learning from raw multi-modal signals (e.g. accelerometer, gyroscope and audio etc.)? In this paper, we propose a multi-stream temporal convolutional network to address the problem of multi-label behavioral context recognition. A four-stream network architecture handles learning from each modality with a contextualization module which incorporates extracted representations to infer a user’s context. Our empirical evaluation suggests that a deep convolutional network trained end-to-end achieves an optimal recognition rate. Furthermore, the presented architecture can be extended to include similar sensors for performance improvements and handles missing modalities through multi-task learning without any manual feature engineering on highly imbalanced and sparsely labeled dataset. |
Tasks | Feature Engineering, Multi-Task Learning |
Published | 2018-08-27 |
URL | http://arxiv.org/abs/1808.08766v1 |
http://arxiv.org/pdf/1808.08766v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-behavioral-context-recognition-with |
Repo | |
Framework | |
Pattern Localization in Time Series through Signal-To-Model Alignment in Latent Space
Title | Pattern Localization in Time Series through Signal-To-Model Alignment in Latent Space |
Authors | Steven Van Vaerenbergh, Ignacio Santamaria, Victor Elvira, Matteo Salvatori |
Abstract | In this paper, we study the problem of locating a predefined sequence of patterns in a time series. In particular, the studied scenario assumes a theoretical model is available that contains the expected locations of the patterns. This problem is found in several contexts, and it is commonly solved by first synthesizing a time series from the model, and then aligning it to the true time series through dynamic time warping. We propose a technique that increases the similarity of both time series before aligning them, by mapping them into a latent correlation space. The mapping is learned from the data through a machine-learning setup. Experiments on data from non-destructive testing demonstrate that the proposed approach shows significant improvements over the state of the art. |
Tasks | Time Series |
Published | 2018-02-16 |
URL | http://arxiv.org/abs/1802.05910v2 |
http://arxiv.org/pdf/1802.05910v2.pdf | |
PWC | https://paperswithcode.com/paper/pattern-localization-in-time-series-through |
Repo | |
Framework | |
Accelerated Bayesian Optimization throughWeight-Prior Tuning
Title | Accelerated Bayesian Optimization throughWeight-Prior Tuning |
Authors | Alistair Shilton, Sunil Gupta, Santu Rana, Pratibha Vellanki, Laurence Park, Cheng Li, Svetha Venkatesh, Alessandra Sutti, David Rubin, Thomas Dorin, Alireza Vahid, Murray Height, Teo Slezak |
Abstract | Bayesian optimization (BO) is a widely-used method for optimizing expensive (to evaluate) problems. At the core of most BO methods is the modeling of the objective function using a Gaussian Process (GP) whose covariance is selected from a set of standard covariance functions. From a weight-space view, this models the objective as a linear function in a feature space implied by the given covariance K, with an arbitrary Gaussian weight prior ${\bf w} \sim \mathcal{N} ({\bf 0}, {\bf I})$. In many practical applications there is data available that has a similar (covariance) structure to the objective, but which, having different form, cannot be used directly in standard transfer learning. In this paper we show how such auxiliary data may be used to construct a GP covariance corresponding to a more appropriate weight prior for the objective function. Building on this, we show that we may accelerate BO by modeling the objective function using this (learned) weight prior, which we demonstrate on both test functions and a practical application to short-polymer fibre manufacture. |
Tasks | Bayesian Optimisation, Transfer Learning |
Published | 2018-05-21 |
URL | https://arxiv.org/abs/1805.07852v2 |
https://arxiv.org/pdf/1805.07852v2.pdf | |
PWC | https://paperswithcode.com/paper/kernel-pre-training-in-feature-space-via-m |
Repo | |
Framework | |
Multimodal Deep Neural Networks using Both Engineered and Learned Representations for Biodegradability Prediction
Title | Multimodal Deep Neural Networks using Both Engineered and Learned Representations for Biodegradability Prediction |
Authors | Garrett B. Goh, Khushmeen Sakloth, Charles Siegel, Abhinav Vishnu, Jim Pfaendtner |
Abstract | Deep learning algorithms excel at extracting patterns from raw data, and with large datasets, they have been very successful in computer vision and natural language applications. However, in other domains, large datasets on which to learn representations from may not exist. In this work, we develop a novel multimodal CNN-MLP neural network architecture that utilizes both domain-specific feature engineering as well as learned representations from raw data. We illustrate the effectiveness of such network designs in the chemical sciences, for predicting biodegradability. DeepBioD, a multimodal CNN-MLP network is more accurate than either standalone network designs, and achieves an error classification rate of 0.125 that is 27% lower than the current state-of-the-art. Thus, our work indicates that combining traditional feature engineering with representation learning can be effective, particularly in situations where labeled data is limited. |
Tasks | Feature Engineering, Representation Learning |
Published | 2018-08-13 |
URL | http://arxiv.org/abs/1808.04456v2 |
http://arxiv.org/pdf/1808.04456v2.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-deep-neural-networks-using-both |
Repo | |
Framework | |
Heuristic Feature Selection for Clickbait Detection
Title | Heuristic Feature Selection for Clickbait Detection |
Authors | Matti Wiegmann, Michael Völske, Benno Stein, Matthias Hagen, Martin Potthast |
Abstract | We study feature selection as a means to optimize the baseline clickbait detector employed at the Clickbait Challenge 2017. The challenge’s task is to score the “clickbaitiness” of a given Twitter tweet on a scale from 0 (no clickbait) to 1 (strong clickbait). Unlike most other approaches submitted to the challenge, the baseline approach is based on manual feature engineering and does not compete out of the box with many of the deep learning-based approaches. We show that scaling up feature selection efforts to heuristically identify better-performing feature subsets catapults the performance of the baseline classifier to second rank overall, beating 12 other competing approaches and improving over the baseline performance by 20%. This demonstrates that traditional classification approaches can still keep up with deep learning on this task. |
Tasks | Clickbait Detection, Feature Engineering, Feature Selection |
Published | 2018-02-04 |
URL | http://arxiv.org/abs/1802.01191v1 |
http://arxiv.org/pdf/1802.01191v1.pdf | |
PWC | https://paperswithcode.com/paper/heuristic-feature-selection-for-clickbait |
Repo | |
Framework | |
Predicting Future Lane Changes of Other Highway Vehicles using RNN-based Deep Models
Title | Predicting Future Lane Changes of Other Highway Vehicles using RNN-based Deep Models |
Authors | Sajan Patel, Brent Griffin, Kristofer Kusano, Jason J. Corso |
Abstract | In the event of sensor failure, autonomous vehicles need to safely execute emergency maneuvers while avoiding other vehicles on the road. To accomplish this, the sensor-failed vehicle must predict the future semantic behaviors of other drivers, such as lane changes, as well as their future trajectories given a recent window of past sensor observations. We address the first issue of semantic behavior prediction in this paper, which is a precursor to trajectory prediction, by introducing a framework that leverages the power of recurrent neural networks (RNNs) and graphical models. Our goal is to predict the future categorical driving intent, for lane changes, of neighboring vehicles up to three seconds into the future given as little as a one-second window of past LIDAR, GPS, inertial, and map data. We collect real-world data containing over 20 hours of highway driving using an autonomous Toyota vehicle. We propose a composite RNN model by adopting the methodology of Structural Recurrent Neural Networks (RNNs) to learn factor functions and take advantage of both the high-level structure of graphical models and the sequence modeling power of RNNs, which we expect to afford more transparent modeling and activity than opaque, single RNN models. To demonstrate our approach, we validate our model using authentic interstate highway driving to predict the future lane change maneuvers of other vehicles neighboring our autonomous vehicle. We find that our composite Structural RNN outperforms baselines by as much as 12% in balanced accuracy metrics. |
Tasks | Accuracy Metrics, Autonomous Vehicles, Trajectory Prediction |
Published | 2018-01-12 |
URL | https://arxiv.org/abs/1801.04340v4 |
https://arxiv.org/pdf/1801.04340v4.pdf | |
PWC | https://paperswithcode.com/paper/predicting-future-lane-changes-of-other |
Repo | |
Framework | |
Exploration of Numerical Precision in Deep Neural Networks
Title | Exploration of Numerical Precision in Deep Neural Networks |
Authors | Zhaoqi Li, Yu Ma, Catalina Vajiac, Yunkai Zhang |
Abstract | Reduced numerical precision is a common technique to reduce computational cost in many Deep Neural Networks (DNNs). While it has been observed that DNNs are resilient to small errors and noise, no general result exists that is capable of predicting a given DNN system architecture’s sensitivity to reduced precision. In this project, we emulate arbitrary bit-width using a specified floating-point representation with a truncation method, which is applied to the neural network after each batch. We explore the impact of several model parameters on the network’s training accuracy and show results on the MNIST dataset. We then present a preliminary theoretical investigation of the error scaling in both forward and backward propagations. We end with a discussion of the implications of these results as well as the potential for generalization to other network architectures. |
Tasks | |
Published | 2018-05-03 |
URL | http://arxiv.org/abs/1805.01078v1 |
http://arxiv.org/pdf/1805.01078v1.pdf | |
PWC | https://paperswithcode.com/paper/exploration-of-numerical-precision-in-deep |
Repo | |
Framework | |
Generalization in quasi-periodic environments
Title | Generalization in quasi-periodic environments |
Authors | Giovanni Bellettini, Alessandro Betti, Marco Gori |
Abstract | By and large the behavior of stochastic gradient is regarded as a challenging problem, and it is often presented in the framework of statistical machine learning. This paper offers a novel view on the analysis of on-line models of learning that arises when dealing with a generalized version of stochastic gradient that is based on dissipative dynamics. In order to face the complex evolution of these models, a systematic treatment is proposed which is based on energy balance equations that are derived by means of the Caldirola-Kanai (CK) Hamiltonian. According to these equations, learning can be regarded as an ordering process which corresponds with the decrement of the loss function. Finally, the main results established in this paper is that in the case of quasi-periodic environments, where the pattern novelty is progressively limited as time goes by, the system dynamics yields an asymptotically consistent solution in the weight space, that is the solution maps similar patterns to the same decision. |
Tasks | |
Published | 2018-07-14 |
URL | http://arxiv.org/abs/1807.05343v1 |
http://arxiv.org/pdf/1807.05343v1.pdf | |
PWC | https://paperswithcode.com/paper/generalization-in-quasi-periodic-environments |
Repo | |
Framework | |