January 30, 2020

3307 words 16 mins read

Paper Group ANR 268

Paper Group ANR 268

Learning to Discover Novel Visual Categories via Deep Transfer Clustering. Widely Linear Complex-valued Autoencoder: Dealing with Noncircularity in Generative-Discriminative Models. Can You Trust This Prediction? Auditing Pointwise Reliability After Learning. On guiding video object segmentation. Learning based Methods for Code Runtime Complexity P …

Learning to Discover Novel Visual Categories via Deep Transfer Clustering

Title Learning to Discover Novel Visual Categories via Deep Transfer Clustering
Authors Kai Han, Andrea Vedaldi, Andrew Zisserman
Abstract We consider the problem of discovering novel object categories in an image collection. While these images are unlabelled, we also assume prior knowledge of related but different image classes. We use such prior knowledge to reduce the ambiguity of clustering, and improve the quality of the newly discovered classes. Our contributions are twofold. The first contribution is to extend Deep Embedded Clustering to a transfer learning setting; we also improve the algorithm by introducing a representation bottleneck, temporal ensembling, and consistency. The second contribution is a method to estimate the number of classes in the unlabelled data. This also transfers knowledge from the known classes, using them as probes to diagnose different choices for the number of classes in the unlabelled subset. We thoroughly evaluate our method, substantially outperforming state-of-the-art techniques in a large number of benchmarks, including ImageNet, OmniGlot, CIFAR-100, CIFAR-10, and SVHN.
Tasks Omniglot, Transfer Learning
Published 2019-08-26
URL https://arxiv.org/abs/1908.09884v1
PDF https://arxiv.org/pdf/1908.09884v1.pdf
PWC https://paperswithcode.com/paper/learning-to-discover-novel-visual-categories
Repo
Framework

Widely Linear Complex-valued Autoencoder: Dealing with Noncircularity in Generative-Discriminative Models

Title Widely Linear Complex-valued Autoencoder: Dealing with Noncircularity in Generative-Discriminative Models
Authors Zeyang Yu, Shengxi Li, Danilo Mandic
Abstract We propose a new structure for the complex-valued autoencoder by introducing additional degrees of freedom into its design through a widely linear (WL) transform. The corresponding widely linear backpropagation algorithm is also developed using the $\mathbb{CR}$ calculus, to unify the gradient calculation of the cost function and the underlying WL model. More specifically, all the existing complex-valued autoencoders employ the strictly linear transform, which is optimal only when the complex-valued outputs of each network layer are independent of the conjugate of the inputs. In addition, the widely linear model which underpins our work allows us to consider all the second-order statistics of inputs. This provides more freedom in the design and enhanced optimization opportunities, as compared to the state-of-the-art. Furthermore, we show that the most widely adopted cost function, i.e., the mean squared error, is not best suited for the complex domain, as it is a real quantity with a single degree of freedom, while both the phase and the amplitude information need to be optimized. To resolve this issue, we design a new cost function, which is capable of controlling the balance between the phase and the amplitude contribution to the solution. The experimental results verify the superior performance of the proposed autoencoder together with the new cost function, especially for the imaging scenarios where the phase preserves extensive information on edges and shapes.
Tasks
Published 2019-03-05
URL http://arxiv.org/abs/1903.02014v1
PDF http://arxiv.org/pdf/1903.02014v1.pdf
PWC https://paperswithcode.com/paper/widely-linear-complex-valued-autoencoder
Repo
Framework

Can You Trust This Prediction? Auditing Pointwise Reliability After Learning

Title Can You Trust This Prediction? Auditing Pointwise Reliability After Learning
Authors Peter Schulam, Suchi Saria
Abstract To use machine learning in high stakes applications (e.g. medicine), we need tools for building confidence in the system and evaluating whether it is reliable. Methods to improve model reliability often require new learning algorithms (e.g. using Bayesian inference to obtain uncertainty estimates). An alternative is to audit a model after it is trained. In this paper, we describe resampling uncertainty estimation (RUE), an algorithm to audit the pointwise reliability of predictions. Intuitively, RUE estimates the amount that a prediction would change if the model had been fit on different training data. The algorithm uses the gradient and Hessian of the model’s loss function to create an ensemble of predictions. Experimentally, we show that RUE more effectively detects inaccurate predictions than existing tools for auditing reliability subsequent to training. We also show that RUE can create predictive distributions that are competitive with state-of-the-art methods like Monte Carlo dropout, probabilistic backpropagation, and deep ensembles, but does not depend on specific algorithms at train-time like these methods do.
Tasks Bayesian Inference
Published 2019-01-02
URL http://arxiv.org/abs/1901.00403v2
PDF http://arxiv.org/pdf/1901.00403v2.pdf
PWC https://paperswithcode.com/paper/can-you-trust-this-prediction-auditing
Repo
Framework

On guiding video object segmentation

Title On guiding video object segmentation
Authors Diego Ortego, Kevin McGuinness, Juan C. SanMiguel, Eric Arazo, José M. Martínez, Noel E. O’Connor
Abstract This paper presents a novel approach for segmenting moving objects in unconstrained environments using guided convolutional neural networks. This guiding process relies on foreground masks from independent algorithms (i.e. state-of-the-art algorithms) to implement an attention mechanism that incorporates the spatial location of foreground and background to compute their separated representations. Our approach initially extracts two kinds of features for each frame using colour and optical flow information. Such features are combined following a multiplicative scheme to benefit from their complementarity. These unified colour and motion features are later processed to obtain the separated foreground and background representations. Then, both independent representations are concatenated and decoded to perform foreground segmentation. Experiments conducted on the challenging DAVIS 2016 dataset demonstrate that our guided representations not only outperform non-guided, but also recent and top-performing video object segmentation algorithms.
Tasks Optical Flow Estimation, Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published 2019-04-25
URL http://arxiv.org/abs/1904.11256v1
PDF http://arxiv.org/pdf/1904.11256v1.pdf
PWC https://paperswithcode.com/paper/on-guiding-video-object-segmentation
Repo
Framework

Learning based Methods for Code Runtime Complexity Prediction

Title Learning based Methods for Code Runtime Complexity Prediction
Authors Jagriti Sikka, Kushal Satya, Yaman Kumar, Shagun Uppal, Rajiv Ratn Shah, Roger Zimmermann
Abstract Predicting the runtime complexity of a programming code is an arduous task. In fact, even for humans, it requires a subtle analysis and comprehensive knowledge of algorithms to predict time complexity with high fidelity, given any code. As per Turing’s Halting problem proof, estimating code complexity is mathematically impossible. Nevertheless, an approximate solution to such a task can help developers to get real-time feedback for the efficiency of their code. In this work, we model this problem as a machine learning task and check its feasibility with thorough analysis. Due to the lack of any open source dataset for this task, we propose our own annotated dataset CoRCoD: Code Runtime Complexity Dataset, extracted from online judges. We establish baselines using two different approaches: feature engineering and code embeddings, to achieve state of the art results and compare their performances. Such solutions can be widely useful in potential applications like automatically grading coding assignments, IDE-integrated tools for static code analysis, and others.
Tasks Feature Engineering
Published 2019-11-04
URL https://arxiv.org/abs/1911.01155v1
PDF https://arxiv.org/pdf/1911.01155v1.pdf
PWC https://paperswithcode.com/paper/learning-based-methods-for-code-runtime
Repo
Framework

Non-Rigid Structure-From-Motion by Rank-One Basis Shapes

Title Non-Rigid Structure-From-Motion by Rank-One Basis Shapes
Authors Sami S. Brandt, Hanno Ackermann
Abstract In this paper, we show that the affine, non-rigid structure-from-motion problem can be solved by rank-one, thus degenerate, basis shapes. It is a natural reformulation of the classic low-rank method by Bregler et al., where it was assumed that the deformable 3D structure is generated by a linear combination of rigid basis shapes. The non-rigid shape will be decomposed into the mean shape and the degenerate shapes, constructed from the right singular vectors of the low-rank decomposition. The right singular vectors are affinely back-projected into the 3D space, and the affine back-projections will also be solved as part of the factorisation. By construction, a direct interpretation for the right singular vectors of the low-rank decomposition will also follow: they can be seen as principal components, hence, the first variant of our method is referred to as Rank-1-PCA. The second variant, referred to as Rank-1-ICA, additionally estimates the orthogonal transform which maps the deformation modes into as statistically independent modes as possible. It has the advantage of pinpointing statistically dependent subspaces related to, for instance, lip movements on human faces. Moreover, in contrast to prior works, no predefined dimensionality for the subspaces is imposed. The experiments on several datasets show that the method achieves better results than the state-of-the-art, it can be computed faster, and it provides an intuitive interpretation for the deformation modes.
Tasks
Published 2019-04-30
URL http://arxiv.org/abs/1904.13271v1
PDF http://arxiv.org/pdf/1904.13271v1.pdf
PWC https://paperswithcode.com/paper/non-rigid-structure-from-motion-by-rank-one
Repo
Framework

An efficient multi-language Video Search Engine to facilitate the HADJ and the UMRA

Title An efficient multi-language Video Search Engine to facilitate the HADJ and the UMRA
Authors Mohamed Hamroun, Sonia Lajmi
Abstract Videos clips became the most important and prominent multimedia document to illustrate the rituals process of Hajj and Umrah. Therefore, it is necessary to develop a system to facilitate access to information related to the duties, the pillars, the stages and the prayers. In this paper present a new project accomplishing a search engine in a large video database enabling any pilgrims to get the information that he care about as fast, accurate. This project is based on two techniques: (a) the weighting method to determine the degree of affiliation of a video clip to a particular topic (b) organizing data using several layers.
Tasks
Published 2019-04-17
URL http://arxiv.org/abs/1904.08418v1
PDF http://arxiv.org/pdf/1904.08418v1.pdf
PWC https://paperswithcode.com/paper/an-efficient-multi-language-video-search
Repo
Framework

Towards Effective Human-AI Teams: The Case of Collaborative Packing

Title Towards Effective Human-AI Teams: The Case of Collaborative Packing
Authors Gilwoo Lee, Christoforos Mavrogiannis, Siddhartha S. Srinivasa
Abstract We focus on the problem of designing an artificial agent (AI), capable of assisting a human user to complete a task. Our goal is to guide human users towards optimal task performance while keeping their cognitive load as low as possible. Our insight is that doing so requires an understanding of human decision making for the task domain at hand. In this work, we consider the domain of collaborative packing, in which an AI agent provides placement recommendations to a human user. As a first step, we explore the mechanisms underlying human packing strategies. We conducted a user study in which 100 human participants completed a series of packing tasks in a virtual environment. We analyzed their packing strategies and discovered spatial and temporal patterns, such as that humans tend to place larger items at corners first. We expect that imbuing an artificial agent with an understanding of this spatiotemporal structure will enable improved assistance, which will be reflected in the task performance and the human perception of the AI. Ongoing work involves the development of a framework that incorporates the extracted insights to predict and manipulate human decision making towards an efficient trajectory of low cognitive load and high efficiency. A follow-up study will evaluate our framework against a set of baselines featuring alternative strategies of assistance. Our eventual goal is the deployment and evaluation of our framework on an autonomous robotic manipulator, actively assisting users on a packing task.
Tasks Decision Making
Published 2019-09-14
URL https://arxiv.org/abs/1909.06527v3
PDF https://arxiv.org/pdf/1909.06527v3.pdf
PWC https://paperswithcode.com/paper/towards-effective-human-ai-teams-the-case-of
Repo
Framework

Discriminative Online Learning for Fast Video Object Segmentation

Title Discriminative Online Learning for Fast Video Object Segmentation
Authors Andreas Robinson, Felix Järemo Lawin, Martin Danelljan, Fahad Shahbaz Khan, Michael Felsberg
Abstract We address the highly challenging problem of video object segmentation. Given only the initial mask, the task is to segment the target in the subsequent frames. In order to effectively handle appearance changes and similar background objects, a robust representation of the target is required. Previous approaches either rely on fine-tuning a segmentation network on the first frame, or employ generative appearance models. Although partially successful, these methods often suffer from impractically low frame rates or unsatisfactory robustness. We propose a novel approach, based on a dedicated target appearance model that is exclusively learned online to discriminate between the target and background image regions. Importantly, we design a specialized loss and customized optimization techniques to enable highly efficient online training. Our light-weight target model is integrated into a carefully designed segmentation network, trained offline to enhance the predictions generated by the target model. Extensive experiments are performed on three datasets. Our approach achieves an overall score of over 70 on YouTube-VOS, while operating at 25 frames per second.
Tasks Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published 2019-04-18
URL http://arxiv.org/abs/1904.08630v1
PDF http://arxiv.org/pdf/1904.08630v1.pdf
PWC https://paperswithcode.com/paper/discriminative-online-learning-for-fast-video
Repo
Framework

Optimization and Learning with Information Streams: Time-varying Algorithms and Applications

Title Optimization and Learning with Information Streams: Time-varying Algorithms and Applications
Authors Emiliano Dall’Anese, Andrea Simonetto, Stephen Becker, Liam Madden
Abstract There is a growing cross-disciplinary effort in the broad domain of optimization and learning with streams of data, applied to settings where traditional batch optimization techniques cannot produce solutions at time scales that match the inter-arrival times of the data points due to computational and/or communication bottlenecks. Special types of online algorithms can handle this situation, and this article focuses on such time-varying optimization algorithms, with emphasis on Machine Leaning and Signal Processing, as well as data-driven Control. Approaches for the design of time-varying or online first-order optimization methods are discussed, with emphasis on algorithms that can handle errors in the gradient, as may arise when the gradient is estimated. Insights on performance metrics and accompanying claims are provided, along with evidence of cases where algorithms that are provably convergent in batch optimization may perform poorly in an online regime. The role of distributed computation is discussed. Illustrative numerical examples for a number of applications of broad interest are provided to convey key ideas.
Tasks
Published 2019-10-17
URL https://arxiv.org/abs/1910.08123v2
PDF https://arxiv.org/pdf/1910.08123v2.pdf
PWC https://paperswithcode.com/paper/optimization-and-learning-with-information
Repo
Framework

Context Aware Machine Learning

Title Context Aware Machine Learning
Authors Yun Zeng
Abstract We propose a principle for exploring context in machine learning models. Starting with a simple assumption that each observation may or may not depend on its context, a conditional probability distribution is decomposed into two parts: context-free and context-sensitive. Then by employing the log-linear word production model for relating random variables to their embedding space representation and making use of the convexity of natural exponential function, we show that the embedding of an observation can also be decomposed into a weighted sum of two vectors, representing its context-free and context-sensitive parts, respectively. This simple treatment of context provides a unified view of many existing deep learning models, leading to revisions of these models able to achieve significant performance boost. Specifically, our upgraded version of a recent sentence embedding model not only outperforms the original one by a large margin, but also leads to a new, principled approach for compositing the embeddings of bag-of-words features, as well as a new architecture for modeling attention in deep neural networks. More surprisingly, our new principle provides a novel understanding of the gates and equations defined by the long short term memory model, which also leads to a new model that is able to converge significantly faster and achieve much lower prediction errors. Furthermore, our principle also inspires a new type of generic neural network layer that better resembles real biological neurons than the traditional linear mapping plus nonlinear activation based architecture. Its multi-layer extension provides a new principle for deep neural networks which subsumes residual network (ResNet) as its special case, and its extension to convolutional neutral network model accounts for irrelevant input (e.g., background in an image) in addition to filtering.
Tasks Sentence Embedding
Published 2019-01-10
URL http://arxiv.org/abs/1901.03415v2
PDF http://arxiv.org/pdf/1901.03415v2.pdf
PWC https://paperswithcode.com/paper/context-aware-machine-learning
Repo
Framework

Multi-domain Conversation Quality Evaluation via User Satisfaction Estimation

Title Multi-domain Conversation Quality Evaluation via User Satisfaction Estimation
Authors Praveen Kumar Bodigutla, Lazaros Polymenakos, Spyros Matsoukas
Abstract An automated metric to evaluate dialogue quality is vital for optimizing data driven dialogue management. The common approach of relying on explicit user feedback during a conversation is intrusive and sparse. Current models to estimate user satisfaction use limited feature sets and employ annotation schemes with limited generalizability to conversations spanning multiple domains. To address these gaps, we created a new Response Quality annotation scheme, introduced five new domain-independent feature sets and experimented with six machine learning models to estimate User Satisfaction at both turn and dialogue level. Response Quality ratings achieved significantly high correlation (0.76) with explicit turn-level user ratings. Using the new feature sets we introduced, Gradient Boosting Regression model achieved best (rating [1-5]) prediction performance on 26 seen (linear correlation ~0.79) and one new multi-turn domain (linear correlation 0.67). We observed a 16% relative improvement (68% -> 79%) in binary (“satisfactory/dissatisfactory”) class prediction accuracy of a domain-independent dialogue-level satisfaction estimation model after including predicted turn-level satisfaction ratings as features.
Tasks Dialogue Management
Published 2019-11-18
URL https://arxiv.org/abs/1911.08567v1
PDF https://arxiv.org/pdf/1911.08567v1.pdf
PWC https://paperswithcode.com/paper/multi-domain-conversation-quality-evaluation
Repo
Framework

Sequence Tagging with Contextual and Non-Contextual Subword Representations: A Multilingual Evaluation

Title Sequence Tagging with Contextual and Non-Contextual Subword Representations: A Multilingual Evaluation
Authors Benjamin Heinzerling, Michael Strube
Abstract Pretrained contextual and non-contextual subword embeddings have become available in over 250 languages, allowing massively multilingual NLP. However, while there is no dearth of pretrained embeddings, the distinct lack of systematic evaluations makes it difficult for practitioners to choose between them. In this work, we conduct an extensive evaluation comparing non-contextual subword embeddings, namely FastText and BPEmb, and a contextual representation method, namely BERT, on multilingual named entity recognition and part-of-speech tagging. We find that overall, a combination of BERT, BPEmb, and character representations works best across languages and tasks. A more detailed analysis reveals different strengths and weaknesses: Multilingual BERT performs well in medium- to high-resource languages, but is outperformed by non-contextual subword embeddings in a low-resource setting.
Tasks Named Entity Recognition, Part-Of-Speech Tagging
Published 2019-06-04
URL https://arxiv.org/abs/1906.01569v1
PDF https://arxiv.org/pdf/1906.01569v1.pdf
PWC https://paperswithcode.com/paper/sequence-tagging-with-contextual-and-non
Repo
Framework

A Two-Stage Approach to Few-Shot Learning for Image Recognition

Title A Two-Stage Approach to Few-Shot Learning for Image Recognition
Authors Debasmit Das, C. S. George Lee
Abstract This paper proposes a multi-layer neural network structure for few-shot image recognition of novel categories. The proposed multi-layer neural network architecture encodes transferable knowledge extracted from a large annotated dataset of base categories. This architecture is then applied to novel categories containing only a few samples. The transfer of knowledge is carried out at the feature-extraction and the classification levels distributed across the two training stages. In the first-training stage, we introduce the relative feature to capture the structure of the data as well as obtain a low-dimensional discriminative space. Secondly, we account for the variable variance of different categories by using a network to predict the variance of each class. Classification is then performed by computing the Mahalanobis distance to the mean-class representation in contrast to previous approaches that used the Euclidean distance. In the second-training stage, a category-agnostic mapping is learned from the mean-sample representation to its corresponding class-prototype representation. This is because the mean-sample representation may not accurately represent the novel category prototype. Finally, we evaluate the proposed network structure on four standard few-shot image recognition datasets, where our proposed few-shot learning system produces competitive performance compared to previous work. We also extensively studied and analyzed the contribution of each component of our proposed framework.
Tasks Few-Shot Learning
Published 2019-12-10
URL https://arxiv.org/abs/1912.04973v1
PDF https://arxiv.org/pdf/1912.04973v1.pdf
PWC https://paperswithcode.com/paper/a-two-stage-approach-to-few-shot-learning-for
Repo
Framework

Correlated Parameters to Accurately Measure Uncertainty in Deep Neural Networks

Title Correlated Parameters to Accurately Measure Uncertainty in Deep Neural Networks
Authors Konstantin Posch, Jürgen Pilz
Abstract In this article a novel approach for training deep neural networks using Bayesian techniques is presented. The Bayesian methodology allows for an easy evaluation of model uncertainty and additionally is robust to overfitting. These are commonly the two main problems classical, i.e. non-Bayesian, architectures have to struggle with. The proposed approach applies variational inference in order to approximate the intractable posterior distribution. In particular, the variational distribution is defined as product of multiple multivariate normal distributions with tridiagonal covariance matrices. Each single normal distribution belongs either to the weights, or to the biases corresponding to one network layer. The layer-wise a posteriori variances are defined based on the corresponding expectation values and further the correlations are assumed to be identical. Therefore, only a few additional parameters need to be optimized compared to non-Bayesian settings. The novel approach is successfully evaluated on basis of the popular benchmark datasets MNIST and CIFAR-10.
Tasks
Published 2019-04-02
URL http://arxiv.org/abs/1904.01334v1
PDF http://arxiv.org/pdf/1904.01334v1.pdf
PWC https://paperswithcode.com/paper/correlated-parameters-to-accurately-measure
Repo
Framework
comments powered by Disqus