February 1, 2020

3141 words 15 mins read

Paper Group AWR 223

Paper Group AWR 223

Adding Intuitive Physics to Neural-Symbolic Capsules Using Interaction Networks. Multi-Similarity Loss with General Pair Weighting for Deep Metric Learning. FLO: Fast and Lightweight Hyperparameter Optimization for AutoML. End-to-End Probabilistic Inference for Nonstationary Audio Analysis. 3D Convolutional Neural Networks Image Registration Based …

Adding Intuitive Physics to Neural-Symbolic Capsules Using Interaction Networks

Title Adding Intuitive Physics to Neural-Symbolic Capsules Using Interaction Networks
Authors Michael Kissner, Helmut Mayer
Abstract Many current methods to learn intuitive physics are based on interaction networks and similar approaches. However, they rely on information that has proven difficult to estimate directly from image data in the past. We aim to narrow this gap by inferring all the semantic information needed from raw pixel data in the form of a scene-graph. Our approach is based on neural-symbolic capsules, which identify which objects in the scene are static, dynamic, elastic or rigid, possible joints between them, as well as their collision information. By integrating all this with interaction networks, we demonstrate how our method is able to learn intuitive physics directly from image sequences and apply its knowledge to new scenes and objects, resulting in an inverse-simulation pipeline.
Tasks
Published 2019-05-23
URL https://arxiv.org/abs/1905.09891v2
PDF https://arxiv.org/pdf/1905.09891v2.pdf
PWC https://paperswithcode.com/paper/adding-intuitive-physics-to-neural-symbolic
Repo https://github.com/Kayzaks/VividNet
Framework none

Multi-Similarity Loss with General Pair Weighting for Deep Metric Learning

Title Multi-Similarity Loss with General Pair Weighting for Deep Metric Learning
Authors Xun Wang, Xintong Han, Weilin Huang, Dengke Dong, Matthew R. Scott
Abstract A family of loss functions built on pair-based computation have been proposed in the literature which provide a myriad of solutions for deep metric learning. In this paper, we provide a general weighting framework for understanding recent pair-based loss functions. Our contributions are three-fold: (1) we establish a General Pair Weighting (GPW) framework, which casts the sampling problem of deep metric learning into a unified view of pair weighting through gradient analysis, providing a powerful tool for understanding recent pair-based loss functions; (2) we show that with GPW, various existing pair-based methods can be compared and discussed comprehensively, with clear differences and key limitations identified; (3) we propose a new loss called multi-similarity loss (MS loss) under the GPW, which is implemented in two iterative steps (i.e., mining and weighting). This allows it to fully consider three similarities for pair weighting, providing a more principled approach for collecting and weighting informative pairs. Finally, the proposed MS loss obtains new state-of-the-art performance on four image retrieval benchmarks, where it outperforms the most recent approaches, such as ABE\cite{Kim_2018_ECCV} and HTL by a large margin: 60.6% to 65.7% on CUB200, and 80.9% to 88.0% on In-Shop Clothes Retrieval dataset at Recall@1. Code is available at https://github.com/MalongTech/research-ms-loss.
Tasks Image Retrieval, Metric Learning
Published 2019-04-14
URL https://arxiv.org/abs/1904.06627v3
PDF https://arxiv.org/pdf/1904.06627v3.pdf
PWC https://paperswithcode.com/paper/multi-similarity-loss-with-general-pair
Repo https://github.com/MalongTech/research-ms-loss
Framework pytorch

FLO: Fast and Lightweight Hyperparameter Optimization for AutoML

Title FLO: Fast and Lightweight Hyperparameter Optimization for AutoML
Authors Chi Wang, Qingyun Wu
Abstract Integrating ML models in software is of growing interest. Building accurate models requires right choice of hyperparameters for training procedures (learners), when the training dataset is given. AutoML tools provide APIs to automate the choice, which usually involve many trials of different hyperparameters for a given training dataset. Since training and evaluation of complex models can be time and resource consuming, existing AutoML solutions require long time or large resource to produce accurate models for large scale training data. That prevents AutoML to be embedded in a software which needs to repeatedly tune hyperparameters and produce models to be consumed by other components, such as large-scale data systems. We present a fast and lightweight hyperparameter optimization method FLO and use it to build an efficient AutoML solution. Our method optimizes for minimal evaluation cost instead of number of iterations to find accurate models. Our main idea is to leverage a holistic consideration of the relations among model complexity, evaluation cost and accuracy. FLO has a strong anytime performance and significantly outperforms Bayesian Optimization and random search for hyperparameter tuning on a large open source AutoML Benchmark. Our AutoML solution also outperforms top-ranked AutoML libraries in a majority of the tasks on this benchmark.
Tasks AutoML, Hyperparameter Optimization
Published 2019-11-12
URL https://arxiv.org/abs/1911.04706v1
PDF https://arxiv.org/pdf/1911.04706v1.pdf
PWC https://paperswithcode.com/paper/flo-fast-and-lightweight-hyperparameter
Repo https://github.com/Shmuelnaaman/Fast_Lightweight_Hyperparameter-Optimization-
Framework none

End-to-End Probabilistic Inference for Nonstationary Audio Analysis

Title End-to-End Probabilistic Inference for Nonstationary Audio Analysis
Authors William J. Wilkinson, Michael Riis Andersen, Joshua D. Reiss, Dan Stowell, Arno Solin
Abstract A typical audio signal processing pipeline includes multiple disjoint analysis stages, including calculation of a time-frequency representation followed by spectrogram-based feature analysis. We show how time-frequency analysis and nonnegative matrix factorisation can be jointly formulated as a spectral mixture Gaussian process model with nonstationary priors over the amplitude variance parameters. Further, we formulate this nonlinear model’s state space representation, making it amenable to infinite-horizon Gaussian process regression with approximate inference via expectation propagation, which scales linearly in the number of time steps and quadratically in the state dimensionality. By doing so, we are able to process audio signals with hundreds of thousands of data points. We demonstrate, on various tasks with empirical data, how this inference scheme outperforms more standard techniques that rely on extended Kalman filtering.
Tasks
Published 2019-01-31
URL http://arxiv.org/abs/1901.11436v5
PDF http://arxiv.org/pdf/1901.11436v5.pdf
PWC https://paperswithcode.com/paper/end-to-end-probabilistic-inference-for
Repo https://github.com/AaltoML/nonstationary-audio-gp
Framework none

3D Convolutional Neural Networks Image Registration Based on Efficient Supervised Learning from Artificial Deformations

Title 3D Convolutional Neural Networks Image Registration Based on Efficient Supervised Learning from Artificial Deformations
Authors Hessam Sokooti, Bob de Vos, Floris Berendsen, Mohsen Ghafoorian, Sahar Yousefi, Boudewijn P. F. Lelieveldt, Ivana Isgum, Marius Staring
Abstract We propose a supervised nonrigid image registration method, trained using artificial displacement vector fields (DVF), for which we propose and compare three network architectures. The artificial DVFs allow training in a fully supervised and voxel-wise dense manner, but without the cost usually associated with the creation of densely labeled data. We propose a scheme to artificially generate DVFs, and for chest CT registration augment these with simulated respiratory motion. The proposed architectures are embedded in a multi-stage approach, to increase the capture range of the proposed networks in order to more accurately predict larger displacements. The proposed method, RegNet, is evaluated on multiple databases of chest CT scans and achieved a target registration error of 2.32 $\pm$ 5.33 mm and 1.86 $\pm$ 2.12 mm on SPREAD and DIR-Lab-4DCT studies, respectively. The average inference time of RegNet with two stages is about 2.2 s.
Tasks Image Registration
Published 2019-08-27
URL https://arxiv.org/abs/1908.10235v1
PDF https://arxiv.org/pdf/1908.10235v1.pdf
PWC https://paperswithcode.com/paper/3d-convolutional-neural-networks-image
Repo https://github.com/hsokooti/RegNet
Framework tf

SHE: A Fast and Accurate Deep Neural Network for Encrypted Data

Title SHE: A Fast and Accurate Deep Neural Network for Encrypted Data
Authors Qian Lou, Lei Jiang
Abstract Homomorphic Encryption (HE) is one of the most promising security solutions to emerging Machine Learning as a Service (MLaaS). Leveled-HE (LHE)-enabled Convolutional Neural Networks (LHECNNs) are proposed to implement MLaaS to avoid large bootstrapping overhead. However, prior LHECNNs have to pay significant computing overhead but achieve only low inference accuracy, due to their polynomial approximation activations and poolings. Stacking many polynomial approximation activation layers in a network greatly reduces inference accuracy, since the polynomial approximation activation errors lead to a low distortion of the output distribution of the next batch normalization layer. So the polynomial approximation activations and poolings have become the obstacle to a fast and accurate LHECNN model. In this paper, we propose a Shift-accumulation-based LHE-enabled deep neural network (SHE) for fast and accurate inferences on encrypted data. We use the binary-operation-friendly Leveled Fast Homomorphic Encryption over Torus (LTFHE) encryption scheme to implement ReLU activations and max poolings. We also adopt the logarithmic quantization to accelerate inferences by replacing expensive LTFHE multiplications with cheap LTFHE shifts. We propose a mixed bitwidth accumulator to accelerate accumulations. Since the LTFHE ReLU activations, max poolings, shifts and accumulations have small multiplicative depth overhead, SHE can implement much deeper network architectures with more convolutional and activation layers. Our experimental results show SHE achieves the state-of-the-art inference accuracy and reduces the inference latency by 76.21% ~ 94.23% over prior LHECNNs on MNIST and CIFAR-10. The source code of SHE is available at https://github.com/qianlou/SHE.
Tasks Quantization
Published 2019-06-01
URL https://arxiv.org/abs/1906.00148v2
PDF https://arxiv.org/pdf/1906.00148v2.pdf
PWC https://paperswithcode.com/paper/she-a-fast-and-accurate-privacy-preserving
Repo https://github.com/qianlou/SHE
Framework none

World Discovery Models

Title World Discovery Models
Authors Mohammad Gheshlaghi Azar, Bilal Piot, Bernardo Avila Pires, Jean-Bastien Grill, Florent Altché, Rémi Munos
Abstract As humans we are driven by a strong desire for seeking novelty in our world. Also upon observing a novel pattern we are capable of refining our understanding of the world based on the new information—humans can discover their world. The outstanding ability of the human mind for discovery has led to many breakthroughs in science, art and technology. Here we investigate the possibility of building an agent capable of discovering its world using the modern AI technology. In particular we introduce NDIGO, Neural Differential Information Gain Optimisation, a self-supervised discovery model that aims at seeking new information to construct a global view of its world from partial and noisy observations. Our experiments on some controlled 2-D navigation tasks show that NDIGO outperforms state-of-the-art information-seeking methods in terms of the quality of the learned representation. The improvement in performance is particularly significant in the presence of white or structured noise where other information-seeking methods follow the noise instead of discovering their world.
Tasks
Published 2019-02-20
URL http://arxiv.org/abs/1902.07685v3
PDF http://arxiv.org/pdf/1902.07685v3.pdf
PWC https://paperswithcode.com/paper/world-discovery-models
Repo https://github.com/amalF/ML-notes
Framework none

Self-Supervised Learning of Depth and Motion Under Photometric Inconsistency

Title Self-Supervised Learning of Depth and Motion Under Photometric Inconsistency
Authors Tianwei Shen, Lei Zhou, Zixin Luo, Yao Yao, Shiwei Li, Jiahui Zhang, Tian Fang, Long Quan
Abstract The self-supervised learning of depth and pose from monocular sequences provides an attractive solution by using the photometric consistency of nearby frames as it depends much less on the ground-truth data. In this paper, we address the issue when previous assumptions of the self-supervised approaches are violated due to the dynamic nature of real-world scenes. Different from handling the noise as uncertainty, our key idea is to incorporate more robust geometric quantities and enforce internal consistency in the temporal image sequence. As demonstrated on commonly used benchmark datasets, the proposed method substantially improves the state-of-the-art methods on both depth and relative pose estimation for monocular image sequences, without adding inference overhead.
Tasks Pose Estimation
Published 2019-09-19
URL https://arxiv.org/abs/1909.09115v1
PDF https://arxiv.org/pdf/1909.09115v1.pdf
PWC https://paperswithcode.com/paper/self-supervised-learning-of-depth-and-motion
Repo https://github.com/hlzz/DeepMatchVO
Framework tf

Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog

Title Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog
Authors Ryuichi Takanobu, Hanlin Zhu, Minlie Huang
Abstract Dialog policy decides what and how a task-oriented dialog system will respond, and plays a vital role in delivering effective conversations. Many studies apply Reinforcement Learning to learn a dialog policy with the reward function which requires elaborate design and pre-specified user goals. With the growing needs to handle complex goals across multiple domains, such manually designed reward functions are not affordable to deal with the complexity of real-world tasks. To this end, we propose Guided Dialog Policy Learning, a novel algorithm based on Adversarial Inverse Reinforcement Learning for joint reward estimation and policy optimization in multi-domain task-oriented dialog. The proposed approach estimates the reward signal and infers the user goal in the dialog sessions. The reward estimator evaluates the state-action pairs so that it can guide the dialog policy at each dialog turn. Extensive experiments on a multi-domain dialog dataset show that the dialog policy guided by the learned reward function achieves remarkably higher task success than state-of-the-art baselines.
Tasks
Published 2019-08-28
URL https://arxiv.org/abs/1908.10719v1
PDF https://arxiv.org/pdf/1908.10719v1.pdf
PWC https://paperswithcode.com/paper/guided-dialog-policy-learning-reward
Repo https://github.com/truthless11/GDPL
Framework pytorch

Equivariant Multi-View Networks

Title Equivariant Multi-View Networks
Authors Carlos Esteves, Yinshuang Xu, Christine Allen-Blanchette, Kostas Daniilidis
Abstract Several popular approaches to 3D vision tasks process multiple views of the input independently with deep neural networks pre-trained on natural images, achieving view permutation invariance through a single round of pooling over all views. We argue that this operation discards important information and leads to subpar global descriptors. In this paper, we propose a group convolutional approach to multiple view aggregation where convolutions are performed over a discrete subgroup of the rotation group, enabling, thus, joint reasoning over all views in an equivariant (instead of invariant) fashion, up to the very last layer. We further develop this idea to operate on smaller discrete homogeneous spaces of the rotation group, where a polar view representation is used to maintain equivariance with only a fraction of the number of input views. We set the new state of the art in several large scale 3D shape retrieval tasks, and show additional applications to panoramic scene classification.
Tasks 3D Shape Retrieval, Scene Classification
Published 2019-04-01
URL https://arxiv.org/abs/1904.00993v2
PDF https://arxiv.org/pdf/1904.00993v2.pdf
PWC https://paperswithcode.com/paper/equivariant-multi-view-networks
Repo https://github.com/daniilidis-group/emvn
Framework none

Syntax-aware Neural Semantic Role Labeling

Title Syntax-aware Neural Semantic Role Labeling
Authors Qingrong Xia, Zhenghua Li, Min Zhang, Meishan Zhang, Guohong Fu, Rui Wang, Luo Si
Abstract Semantic role labeling (SRL), also known as shallow semantic parsing, is an important yet challenging task in NLP. Motivated by the close correlation between syntactic and semantic structures, traditional discrete-feature-based SRL approaches make heavy use of syntactic features. In contrast, deep-neural-network-based approaches usually encode the input sentence as a word sequence without considering the syntactic structures. In this work, we investigate several previous approaches for encoding syntactic trees, and make a thorough study on whether extra syntax-aware representations are beneficial for neural SRL models. Experiments on the benchmark CoNLL-2005 dataset show that syntax-aware SRL approaches can effectively improve performance over a strong baseline with external word representations from ELMo. With the extra syntax-aware representations, our approaches achieve new state-of-the-art 85.6 F1 (single model) and 86.6 F1 (ensemble) on the test data, outperforming the corresponding strong baselines with ELMo by 0.8 and 1.0, respectively. Detailed error analysis are conducted to gain more insights on the investigated approaches.
Tasks Semantic Parsing, Semantic Role Labeling
Published 2019-07-22
URL https://arxiv.org/abs/1907.09312v1
PDF https://arxiv.org/pdf/1907.09312v1.pdf
PWC https://paperswithcode.com/paper/syntax-aware-neural-semantic-role-labeling-2
Repo https://github.com/KiroSummer/Syntax-aware-Neural-SRL
Framework pytorch

Using Segmentation Masks in the ICCV 2019 Learning to Drive Challenge

Title Using Segmentation Masks in the ICCV 2019 Learning to Drive Challenge
Authors Antonia Lovjer, Minsu Yeom, Benedikt D. Schifferer, Iddo Drori
Abstract In this work we predict vehicle speed and steering angle given camera image frames. Our key contribution is using an external pre-trained neural network for segmentation. We augment the raw images with their segmentation masks and mirror images. We ensemble three diverse neural network models (i) a CNN using a single image and its segmentation mask, (ii) a stacked CNN taking as input a sequence of images and segmentation masks, and (iii) a bidirectional GRU, extracting image features using a pre-trained ResNet34, DenseNet121 and our own CNN single image model. We achieve the second best performance for MSE angle and second best performance overall, to win 2nd place in the ICCV Learning to Drive challenge. We make our models and code publicly available.
Tasks
Published 2019-10-23
URL https://arxiv.org/abs/1910.10317v1
PDF https://arxiv.org/pdf/1910.10317v1.pdf
PWC https://paperswithcode.com/paper/using-segmentation-masks-in-the-iccv-2019
Repo https://github.com/AntoniaLovjer/learntodrive
Framework pytorch

Convolutional Character Networks

Title Convolutional Character Networks
Authors Linjie Xing, Zhi Tian, Weilin Huang, Matthew R. Scott
Abstract Recent progress has been made on developing a unified framework for joint text detection and recognition in natural images, but existing joint models were mostly built on two-stage framework by involving ROI pooling, which can degrade the performance on recognition task. In this work, we propose convolutional character networks, referred as CharNet, which is an one-stage model that can process two tasks simultaneously in one pass. CharNet directly outputs bounding boxes of words and characters, with corresponding character labels. We utilize character as basic element, allowing us to overcome the main difficulty of existing approaches that attempted to optimize text detection jointly with a RNN-based recognition branch. In addition, we develop an iterative character detection approach able to transform the ability of character detection learned from synthetic data to real-world images. These technical improvements result in a simple, compact, yet powerful one-stage model that works reliably on multi-orientation and curved text. We evaluate CharNet on three standard benchmarks, where it consistently outperforms the state-of-the-art approaches [25, 24] by a large margin, e.g., with improvements of 65.33%->71.08% (with generic lexicon) on ICDAR 2015, and 54.0%->69.23% on Total-Text, on end-to-end text recognition. Code is available at: https://github.com/MalongTech/research-charnet.
Tasks Scene Text Detection
Published 2019-10-17
URL https://arxiv.org/abs/1910.07954v1
PDF https://arxiv.org/pdf/1910.07954v1.pdf
PWC https://paperswithcode.com/paper/convolutional-character-networks
Repo https://github.com/MalongTech/research-charnet
Framework pytorch

Active Collaborative Sensing for Energy Breakdown

Title Active Collaborative Sensing for Energy Breakdown
Authors Yiling Jia, Nipun Batra, Hongning Wang, Kamin Whitehouse
Abstract Residential homes constitute roughly one-fourth of the total energy usage worldwide. Providing appliance-level energy breakdown has been shown to induce positive behavioral changes that can reduce energy consumption by 15%. Existing approaches for energy breakdown either require hardware installation in every target home or demand a large set of energy sensor data available for model training. However, very few homes in the world have installed sub-meters (sensors measuring individual appliance energy); and the cost of retrofitting a home with extensive sub-metering eats into the funds available for energy saving retrofits. As a result, strategically deploying sensing hardware to maximize the reconstruction accuracy of sub-metered readings in non-instrumented homes while minimizing deployment costs becomes necessary and promising. In this work, we develop an active learning solution based on low-rank tensor completion for energy breakdown. We propose to actively deploy energy sensors to appliances from selected homes, with a goal to improve the prediction accuracy of the completed tensor with minimum sensor deployment cost. We empirically evaluate our approach on the largest public energy dataset collected in Austin, Texas, USA, from 2013 to 2017. The results show that our approach gives better performance with a fixed number of sensors installed when compared to the state-of-the-art, which is also proven by our theoretical analysis.
Tasks Active Learning
Published 2019-09-02
URL https://arxiv.org/abs/1909.00525v1
PDF https://arxiv.org/pdf/1909.00525v1.pdf
PWC https://paperswithcode.com/paper/active-collaborative-sensing-for-energy
Repo https://github.com/yilingjia/ActSense
Framework none

Noise2Self: Blind Denoising by Self-Supervision

Title Noise2Self: Blind Denoising by Self-Supervision
Authors Joshua Batson, Loic Royer
Abstract We propose a general framework for denoising high-dimensional measurements which requires no prior on the signal, no estimate of the noise, and no clean training data. The only assumption is that the noise exhibits statistical independence across different dimensions of the measurement, while the true signal exhibits some correlation. For a broad class of functions (“$\mathcal{J}$-invariant”), it is then possible to estimate the performance of a denoiser from noisy data alone. This allows us to calibrate $\mathcal{J}$-invariant versions of any parameterised denoising algorithm, from the single hyperparameter of a median filter to the millions of weights of a deep neural network. We demonstrate this on natural image and microscopy data, where we exploit noise independence between pixels, and on single-cell gene expression data, where we exploit independence between detections of individual molecules. This framework generalizes recent work on training neural nets from noisy images and on cross-validation for matrix factorization.
Tasks Denoising
Published 2019-01-30
URL https://arxiv.org/abs/1901.11365v2
PDF https://arxiv.org/pdf/1901.11365v2.pdf
PWC https://paperswithcode.com/paper/noise2self-blind-denoising-by-self
Repo https://github.com/czbiohub/noise2self
Framework pytorch
comments powered by Disqus