April 1, 2020

Paper Group ANR 418

Predicting Regression Probability Distributions with Imperfect Data Through Optimal Transformations. Generating Object Stamps. Reinforcement Learning with Goal-Distance Gradient. Applying Gene Expression Programming for Solving One-Dimensional Bin-Packing Problems. TapLab: A Fast Framework for Semantic Video Segmentation Tapping into Compressed-Dom …

Predicting Regression Probability Distributions with Imperfect Data Through Optimal Transformations

Title Predicting Regression Probability Distributions with Imperfect Data Through Optimal Transformations
Authors Jerome H. Friedman
Abstract The goal of regression analysis is to predict the value of a numeric outcome variable y given a vector of joint values of other (predictor) variables x. Usually a particular x-vector does not specify a repeatable value for y, but rather a probability distribution of possible y–values, p(yx). This distribution has a location, scale and shape, all of which can depend on x, and are needed to infer likely values for y given x. Regression methods usually assume that training data y-values are perfect numeric realizations from some well behaived p(yx). Often actual training data y-values are discrete, truncated and/or arbitrary censored. Regression procedures based on an optimal transformation strategy are presented for estimating location, scale and shape of p(yx) as general functions of x, in the possible presence of such imperfect training data. In addition, validation diagnostics are presented to ascertain the quality of the solutions.
Published 2020-01-27
URL https://arxiv.org/abs/2001.10102v1
PDF https://arxiv.org/pdf/2001.10102v1.pdf
PWC https://paperswithcode.com/paper/predicting-regression-probability
Repo
Framework

Generating Object Stamps

Title Generating Object Stamps
Authors Youssef Alami Mejjati, Zejiang Shen, Michael Snower, Aaron Gokaslan, Oliver Wang, James Tompkin, Kwang In Kim
Abstract We present an algorithm to generate diverse foreground objects and composite them into background images using a GAN architecture. Given an object class, a user-provided bounding box, and a background image, we first use a mask generator to create an object shape, and then use a texture generator to fill the mask such that the texture integrates with the background. By separating the problem of object insertion into these two stages, we show that our model allows us to improve the realism of diverse object generation that also agrees with the provided background image. Our results on the challenging COCO dataset show improved overall quality and diversity compared to state-of-the-art object insertion approaches.
Published 2020-01-01
URL https://arxiv.org/abs/2001.02595v2
PDF https://arxiv.org/pdf/2001.02595v2.pdf
PWC https://paperswithcode.com/paper/generating-object-stamps
Repo
Framework

Title Reinforcement Learning with Goal-Distance Gradient
Authors Kai Jiang, XiaoLong Qin
Abstract Reinforcement learning usually uses the feedback rewards of environmental to train agents. But the rewards in the actual environment are sparse, and even some environments will not rewards. Most of the current methods are difficult to get good performance in sparse reward or non-reward environments. Although using shaped rewards is effective when solving sparse reward tasks, it is limited to specific problems and learning is also susceptible to local optima. We propose a model-free method that does not rely on environmental rewards to solve the problem of sparse rewards in the general environment. Our method use the minimum number of transitions between states as the distance to replace the rewards of environmental, and proposes a goal-distance gradient to achieve policy improvement. We also introduce a bridge point planning method based on the characteristics of our method to improve exploration efficiency, thereby solving more complex tasks. Experiments show that our method performs better on sparse reward and local optimal problems in complex environments than previous work.
Published 2020-01-01
URL https://arxiv.org/abs/2001.00127v2
PDF https://arxiv.org/pdf/2001.00127v2.pdf
PWC https://paperswithcode.com/paper/reinforcement-learning-with-goal-distance
Repo
Framework

Applying Gene Expression Programming for Solving One-Dimensional Bin-Packing Problems

Title Applying Gene Expression Programming for Solving One-Dimensional Bin-Packing Problems
Authors Najla Akram Al-Saati
Abstract This work aims to study and explore the use of Gene Expression Programming (GEP) in solving the on-line Bin-Packing problem. The main idea is to show how GEP can automatically find acceptable heuristic rules to solve the problem efficiently and economically. One dimensional Bin-Packing problem is considered in the course of this work with the constraint of minimizing the number of bins filled with the given pieces. Experimental Data includes instances of benchmark test data taken from Falkenauer (1996) for One-dimensional Bin-Packing Problems. Results show that GEP can be used as a very powerful and flexible tool for finding interesting compact rules suited for the problem. The impact of functions is also investigated to show how they can affect and influence the success of rates when they appear in rules. High success rates are gained with smaller population size and fewer generations compared to previous work performed using Genetic Programming.
Published 2020-01-13
URL https://arxiv.org/abs/2001.09923v1
PDF https://arxiv.org/pdf/2001.09923v1.pdf
PWC https://paperswithcode.com/paper/applying-gene-expression-programming-for
Repo
Framework

TapLab: A Fast Framework for Semantic Video Segmentation Tapping into Compressed-Domain Knowledge

Title TapLab: A Fast Framework for Semantic Video Segmentation Tapping into Compressed-Domain Knowledge
Authors Junyi Feng, Songyuan Li, Xi Li, Fei Wu, Qi Tian, Ming-Hsuan Yang, Haibin Ling
Abstract Real-time semantic video segmentation is a challenging task due to the strict requirements of inference speed. Recent approaches mainly devote great efforts to reducing the model size for high efficiency. In this paper, we rethink this problem from a different viewpoint: using knowledge contained in compressed videos. We propose a simple and effective framework, dubbed TapLab, to tap into resources from the compressed domain. Specifically, we design a fast feature warping module using motion vectors for acceleration. To reduce the noise introduced by motion vectors, we design a residual-guided correction module and a residual-guided frame selection module using residuals. Compared with the state-of-the-art fast semantic image segmentation models, our proposed TapLab significantly reduces redundant computations, running around 3 times faster with comparable accuracy for 1024x2048 video. The experimental results show that TapLab achieves 70.6% mIoU on the Cityscapes dataset at 99.8 FPS with a single GPU card. A high-speed version even reaches the speed of 160+ FPS.
Tasks Semantic Segmentation, Video Semantic Segmentation
Published 2020-03-30
URL https://arxiv.org/abs/2003.13260v1
PDF https://arxiv.org/pdf/2003.13260v1.pdf
PWC https://paperswithcode.com/paper/taplab-a-fast-framework-for-semantic-video
Repo
Framework

Multi-Path Region Mining For Weakly Supervised 3D Semantic Segmentation on Point Clouds

Title Multi-Path Region Mining For Weakly Supervised 3D Semantic Segmentation on Point Clouds
Authors Jiacheng Wei, Guosheng Lin, Kim-Hui Yap, Tzu-Yi Hung, Lihua Xie
Abstract Point clouds provide intrinsic geometric information and surface context for scene understanding. Existing methods for point cloud segmentation require a large amount of fully labeled data. Using advanced depth sensors, collection of large scale 3D dataset is no longer a cumbersome process. However, manually producing point-level label on the large scale dataset is time and labor-intensive. In this paper, we propose a weakly supervised approach to predict point-level results using weak labels on 3D point clouds. We introduce our multi-path region mining module to generate pseudo point-level label from a classification network trained with weak labels. It mines the localization cues for each class from various aspects of the network feature using different attention modules. Then, we use the point-level pseudo labels to train a point cloud segmentation network in a fully supervised manner. To the best of our knowledge, this is the first method that uses cloud-level weak labels on raw 3D space to train a point cloud semantic segmentation network. In our setting, the 3D weak labels only indicate the classes that appeared in our input sample. We discuss both scene- and subcloud-level weakly labels on raw 3D point cloud data and perform in-depth experiments on them. On ScanNet dataset, our result trained with subcloud-level labels is compatible with some fully supervised methods.
Tasks 3D Semantic Segmentation, Scene Understanding, Semantic Segmentation
Published 2020-03-29
URL https://arxiv.org/abs/2003.13035v1
PDF https://arxiv.org/pdf/2003.13035v1.pdf
PWC https://paperswithcode.com/paper/multi-path-region-mining-for-weakly
Repo
Framework

Pathological speech detection using x-vector embeddings

Title Pathological speech detection using x-vector embeddings
Authors Catarina Botelho, Francisco Teixeira, Thomas Rolland, Alberto Abad, Isabel Trancoso
Abstract The potential of speech as a non-invasive biomarker to assess a speaker’s health has been repeatedly supported by the results of multiple works, for both physical and psychological conditions. Traditional systems for speech-based disease classification have focused on carefully designed knowledge-based features. However, these features may not represent the disease’s full symptomatology, and may even overlook its more subtle manifestations. This has prompted researchers to move in the direction of general speaker representations that inherently model symptoms, such as Gaussian Supervectors, i-vectors and, x-vectors. In this work, we focus on the latter, to assess their applicability as a general feature extraction method to the detection of Parkinson’s disease (PD) and obstructive sleep apnea (OSA). We test our approach against knowledge-based features and i-vectors, and report results for two European Portuguese corpora, for OSA and PD, as well as for an additional Spanish corpus for PD. Both x-vector and i-vector models were trained with an out-of-domain European Portuguese corpus. Our results show that x-vectors are able to perform better than knowledge-based features in same-language corpora. Moreover, while x-vectors performed similarly to i-vectors in matched conditions, they significantly outperform them when domain-mismatch occurs.
Published 2020-03-02
URL https://arxiv.org/abs/2003.00864v2
PDF https://arxiv.org/pdf/2003.00864v2.pdf
PWC https://paperswithcode.com/paper/pathological-speech-detection-using-x-vector
Repo
Framework

SNIFF: Reverse Engineering of Neural Networks with Fault Attacks

Title SNIFF: Reverse Engineering of Neural Networks with Fault Attacks
Authors Jakub Breier, Dirmanto Jap, Xiaolu Hou, Shivam Bhasin, Yang Liu
Abstract Neural networks have been shown to be vulnerable against fault injection attacks. These attacks change the physical behavior of the device during the computation, resulting in a change of value that is currently being computed. They can be realized by various fault injection techniques, ranging from clock/voltage glitching to application of lasers to rowhammer. In this paper we explore the possibility to reverse engineer neural networks with the usage of fault attacks. SNIFF stands for sign bit flip fault, which enables the reverse engineering by changing the sign of intermediate values. We develop the first exact extraction method on deep-layer feature extractor networks that provably allows the recovery of the model parameters. Our experiments with Keras library show that the precision error for the parameter recovery for the tested networks is less than $10^{-13}$ with the usage of 64-bit floats, which improves the current state of the art by 6 orders of magnitude. Additionally, we discuss the protection techniques against fault injection attacks that can be applied to enhance the fault resistance.
Published 2020-02-23
URL https://arxiv.org/abs/2002.11021v1
PDF https://arxiv.org/pdf/2002.11021v1.pdf
PWC https://paperswithcode.com/paper/sniff-reverse-engineering-of-neural-networks
Repo
Framework

Using the Split Bregman Algorithm to Solve the Self-Repelling Snake Model

Title Using the Split Bregman Algorithm to Solve the Self-Repelling Snake Model
Authors Huizhu Pan, Jintao Song, Wanquan Liu, Ling Li, Guanglu Zhou, Lu Tan, Shichu Chen
Abstract Preserving the contour topology during image segmentation is useful in manypractical scenarios. By keeping the contours isomorphic, it is possible to pre-vent over-segmentation and under-segmentation, as well as to adhere to giventopologies. The self-repelling snake model (SR) is a variational model thatpreserves contour topology by combining a non-local repulsion term with thegeodesic active contour model (GAC). The SR is traditionally solved using theadditive operator splitting (AOS) scheme. Although this solution is stable, thememory requirement grows quickly as the image size increases. In our paper,we propose an alternative solution to the SR using the Split Bregman method.Our algorithm breaks the problem down into simpler subproblems to use lower-order evolution equations and approximation schemes. The memory usage issignificantly reduced as a result. Experiments show comparable performance to the original algorithm with shorter iteration times.
Published 2020-03-28
URL https://arxiv.org/abs/2003.12693v1
PDF https://arxiv.org/pdf/2003.12693v1.pdf
PWC https://paperswithcode.com/paper/using-the-split-bregman-algorithm-to-solve
Repo
Framework

Biased Stochastic Gradient Descent for Conditional Stochastic Optimization

Title Biased Stochastic Gradient Descent for Conditional Stochastic Optimization
Authors Yifan Hu, Siqi Zhang, Xin Chen, Niao He
Abstract Conditional Stochastic Optimization (CSO) covers a variety of applications ranging from meta-learning and causal inference to invariant learning. However, constructing unbiased gradient estimates in CSO is challenging due to the composition structure. As an alternative, we propose a biased stochastic gradient descent (BSGD) algorithm and study the bias-variance tradeoff under different structural assumptions. We establish the sample complexities of BSGD for strongly convex, convex, and weakly convex objectives, under smooth and non-smooth conditions. We also provide matching lower bounds of BSGD for convex CSO objectives. Extensive numerical experiments are conducted to illustrate the performance of BSGD on robust logistic regression, model-agnostic meta-learning (MAML), and instrumental variable regression (IV).
Tasks Causal Inference, Meta-Learning, Stochastic Optimization
Published 2020-02-25
URL https://arxiv.org/abs/2002.10790v1
PDF https://arxiv.org/pdf/2002.10790v1.pdf
Repo
Framework

General Partial Label Learning via Dual Bipartite Graph Autoencoder

Title General Partial Label Learning via Dual Bipartite Graph Autoencoder
Authors Brian Chen, Bo Wu, Alireza Zareian, Hanwang Zhang, Shih-Fu Chang
Abstract We formulate a practical yet challenging problem: General Partial Label Learning (GPLL). Compared to the traditional Partial Label Learning (PLL) problem, GPLL relaxes the supervision assumption from instance-level — a label set partially labels an instance — to group-level: 1) a label set partially labels a group of instances, where the within-group instance-label link annotations are missing, and 2) cross-group links are allowed — instances in a group may be partially linked to the label set from another group. Such ambiguous group-level supervision is more practical in real-world scenarios as additional annotation on the instance-level is no longer required, e.g., face-naming in videos where the group consists of faces in a frame, labeled by a name set in the corresponding caption. In this paper, we propose a novel graph convolutional network (GCN) called Dual Bipartite Graph Autoencoder (DB-GAE) to tackle the label ambiguity challenge of GPLL. First, we exploit the cross-group correlations to represent the instance groups as dual bipartite graphs: within-group and cross-group, which reciprocally complements each other to resolve the linking ambiguities. Second, we design a GCN autoencoder to encode and decode them, where the decodings are considered as the refined results. It is worth noting that DB-GAE is self-supervised and transductive, as it only uses the group-level supervision without a separate offline training stage. Extensive experiments on two real-world datasets demonstrate that DB-GAE significantly outperforms the best baseline over absolute 0.159 F1-score and 24.8% accuracy. We further offer analysis on various levels of label ambiguities.
Published 2020-01-05
URL https://arxiv.org/abs/2001.01290v1
PDF https://arxiv.org/pdf/2001.01290v1.pdf
PWC https://paperswithcode.com/paper/general-partial-label-learning-via-dual
Repo
Framework

Representing Unordered Data Using Multiset Automata and Complex Numbers

Title Representing Unordered Data Using Multiset Automata and Complex Numbers
Authors Justin DeBenedetto, David Chiang
Abstract Unordered, variable-sized inputs arise in many settings across multiple fields. The ability for set- and multiset- oriented neural networks to handle this type of input has been the focus of much work in recent years. We propose to represent multisets using complex-weighted multiset automata and show how the multiset representations of certain existing neural architectures can be viewed as special cases of ours. Namely, (1) we provide a new theoretical and intuitive justification for the Transformer model’s representation of positions using sinusoidal functions, and (2) we extend the DeepSets model to use complex numbers, enabling it to outperform the existing model on an extension of one of their tasks.
Published 2020-01-02
URL https://arxiv.org/abs/2001.00610v1
PDF https://arxiv.org/pdf/2001.00610v1.pdf
PWC https://paperswithcode.com/paper/representing-unordered-data-using-multiset-1
Repo
Framework

The Costs and Benefits of Goal-Directed Attention in Deep Convolutional Neural Networks

Title The Costs and Benefits of Goal-Directed Attention in Deep Convolutional Neural Networks
Abstract Attention in machine learning is largely bottom-up, whereas people also deploy top-down, goal-directed attention. Motivated by neuroscience research, we evaluated a plug-and-play, top-down attention layer that is easily added to existing deep convolutional neural networks (DCNNs). In object recognition tasks, increasing top-down attention has benefits (increasing hit rates) and costs (increasing false alarm rates). At a moderate level, attention improves sensitivity (i.e., increases $d^\prime$) at only a moderate increase in bias for tasks involving standard images, blended images, and natural adversarial images. These theoretical results suggest that top-down attention can effectively reconfigure general-purpose DCNNs to better suit the current task goal. We hope our results continue the fruitful dialog between neuroscience and machine learning.
Published 2020-02-06
URL https://arxiv.org/abs/2002.02342v1
PDF https://arxiv.org/pdf/2002.02342v1.pdf
PWC https://paperswithcode.com/paper/the-costs-and-benefits-of-goal-directed
Repo
Framework

Set-Constrained Viterbi for Set-Supervised Action Segmentation

Title Set-Constrained Viterbi for Set-Supervised Action Segmentation
Authors Jun Li, Sinisa Todorovic
Abstract This paper is about weakly supervised action segmentation, where the ground truth specifies only a set of actions present in a training video, but not their true temporal ordering. Prior work typically uses a classifier that independently labels video frames for generating the pseudo ground truth, and multiple instance learning for training the classifier. We extend this framework by specifying an HMM, which accounts for co-occurrences of action classes and their temporal lengths, and by explicitly training the HMM on a Viterbi-based loss. Our first contribution is the formulation of a new set-constrained Viterbi algorithm (SCV). Given a video, the SCV generates the MAP action segmentation that satisfies the ground truth. This prediction is used as a framewise pseudo ground truth in our HMM training. Our second contribution in training is a new regularization of feature affinities between training videos that share the same action classes. Evaluation on action segmentation and alignment on the Breakfast, MPII Cooking2, Hollywood Extended datasets demonstrates our significant performance improvement for the two tasks over prior work.
Tasks action segmentation, Multiple Instance Learning
Published 2020-02-27
URL https://arxiv.org/abs/2002.11925v2
PDF https://arxiv.org/pdf/2002.11925v2.pdf
PWC https://paperswithcode.com/paper/set-constrained-viterbi-for-set-supervised
Repo
Framework