January 29, 2020

2729 words 13 mins read

Paper Group ANR 594

MFPN: A Novel Mixture Feature Pyramid Network of Multiple Architectures for Object Detection. Improving Disentangled Representation Learning with the Beta Bernoulli Process. Semi-Cyclic Stochastic Gradient Descent. An Effective Approach to Unsupervised Machine Translation. Neural Networks Learning and Memorization with (almost) no Over-Parameteriza …

MFPN: A Novel Mixture Feature Pyramid Network of Multiple Architectures for Object Detection


Title	MFPN: A Novel Mixture Feature Pyramid Network of Multiple Architectures for Object Detection
Authors	Tingting Liang, Yongtao Wang, Qijie Zhao, huan zhang, Zhi Tang, Haibin Ling
Abstract	Feature pyramids are widely exploited in many detectors to solve the scale variation problem for object detection. In this paper, we first investigate the Feature Pyramid Network (FPN) architectures and briefly categorize them into three typical fashions: top-down, bottom-up and fusing-splitting, which have their own merits for detecting small objects, large objects, and medium-sized objects, respectively. Further, we design three FPNs of different architectures and propose a novel Mixture Feature Pyramid Network (MFPN) which inherits the merits of all these three kinds of FPNs, by assembling the three kinds of FPNs in a parallel multi-branch architecture and mixing the features. MFPN can significantly enhance both one-stage and two-stage FPN-based detectors with about 2 percent Average Precision(AP) increment on the MS-COCO benchmark, at little sacrifice in running time latency. By simply assembling MFPN with the one-stage and two-stage baseline detectors, we achieve competitive single-model detection results on the COCO detection benchmark without bells and whistles.
Tasks	Object Detection
Published	2019-12-20
URL	https://arxiv.org/abs/1912.09748v1
PDF	https://arxiv.org/pdf/1912.09748v1.pdf
PWC	https://paperswithcode.com/paper/mfpn-a-novel-mixture-feature-pyramid-network
Repo
Framework

Improving Disentangled Representation Learning with the Beta Bernoulli Process


Title	Improving Disentangled Representation Learning with the Beta Bernoulli Process
Authors	Prashnna Kumar Gyawali, Zhiyuan Li, Cameron Knight, Sandesh Ghimire, B. Milan Horacek, John Sapp, Linwei Wang
Abstract	To improve the ability of VAE to disentangle in the latent space, existing works mostly focus on enforcing independence among the learned latent factors. However, the ability of these models to disentangle often decreases as the complexity of the generative factors increases. In this paper, we investigate the little-explored effect of the modeling capacity of a posterior density on the disentangling ability of the VAE. We note that the independence within and the complexity of the latent density are two different properties we constrain when regularizing the posterior density: while the former promotes the disentangling ability of VAE, the latter – if overly limited – creates an unnecessary competition with the data reconstruction objective in VAE. Therefore, if we preserve the independence but allow richer modeling capacity in the posterior density, we will lift this competition and thereby allow improved independence and data reconstruction at the same time. We investigate this theoretical intuition with a VAE that utilizes a non-parametric latent factor model, the Indian Buffet Process (IBP), as a latent density that is able to grow with the complexity of the data. Across three widely-used benchmark data sets and two clinical data sets little explored for disentangled learning, we qualitatively and quantitatively demonstrated the improved disentangling performance of IBP-VAE over the state of the art. In the latter two clinical data sets riddled with complex factors of variations, we further demonstrated that unsupervised disentangling of nuisance factors via IBP-VAE – when combined with a supervised objective – can not only improve task accuracy in comparison to relevant supervised deep architectures but also facilitate knowledge discovery related to task decision-making. A shorter version of this work will appear in the ICDM 2019 conference proceedings.
Tasks	Decision Making, Representation Learning
Published	2019-09-03
URL	https://arxiv.org/abs/1909.01839v1
PDF	https://arxiv.org/pdf/1909.01839v1.pdf
PWC	https://paperswithcode.com/paper/improving-disentangled-representation
Repo
Framework

Semi-Cyclic Stochastic Gradient Descent


Title	Semi-Cyclic Stochastic Gradient Descent
Authors	Hubert Eichner, Tomer Koren, H. Brendan McMahan, Nathan Srebro, Kunal Talwar
Abstract	We consider convex SGD updates with a block-cyclic structure, i.e. where each cycle consists of a small number of blocks, each with many samples from a possibly different, block-specific, distribution. This situation arises, e.g., in Federated Learning where the mobile devices available for updates at different times during the day have different characteristics. We show that such block-cyclic structure can significantly deteriorate the performance of SGD, but propose a simple approach that allows prediction with the same performance guarantees as for i.i.d., non-cyclic, sampling.
Tasks
Published	2019-04-23
URL	http://arxiv.org/abs/1904.10120v1
PDF	http://arxiv.org/pdf/1904.10120v1.pdf
PWC	https://paperswithcode.com/paper/semi-cyclic-stochastic-gradient-descent
Repo
Framework

An Effective Approach to Unsupervised Machine Translation


Title	An Effective Approach to Unsupervised Machine Translation
Authors	Mikel Artetxe, Gorka Labaka, Eneko Agirre
Abstract	While machine translation has traditionally relied on large amounts of parallel corpora, a recent research line has managed to train both Neural Machine Translation (NMT) and Statistical Machine Translation (SMT) systems using monolingual corpora only. In this paper, we identify and address several deficiencies of existing unsupervised SMT approaches by exploiting subword information, developing a theoretically well founded unsupervised tuning method, and incorporating a joint refinement procedure. Moreover, we use our improved SMT system to initialize a dual NMT model, which is further fine-tuned through on-the-fly back-translation. Together, we obtain large improvements over the previous state-of-the-art in unsupervised machine translation. For instance, we get 22.5 BLEU points in English-to-German WMT 2014, 5.5 points more than the previous best unsupervised system, and 0.5 points more than the (supervised) shared task winner back in 2014.
Tasks	Machine Translation, Unsupervised Machine Translation
Published	2019-02-04
URL	https://arxiv.org/abs/1902.01313v2
PDF	https://arxiv.org/pdf/1902.01313v2.pdf
PWC	https://paperswithcode.com/paper/an-effective-approach-to-unsupervised-machine
Repo
Framework

Neural Networks Learning and Memorization with (almost) no Over-Parameterization


Title	Neural Networks Learning and Memorization with (almost) no Over-Parameterization
Authors	Amit Daniely
Abstract	Many results in recent years established polynomial time learnability of various models via neural networks algorithms. However, unless the model is linear separable, or the activation is a polynomial, these results require very large networks – much more than what is needed for the mere existence of a good predictor. In this paper we prove that SGD on depth two neural networks can memorize samples, learn polynomials with bounded weights, and learn certain kernel spaces, with near optimal network size, sample complexity, and runtime. In particular, we show that SGD on depth two network with $\tilde{O}\left(\frac{m}{d}\right)$ hidden neurons (and hence $\tilde{O}(m)$ parameters) can memorize $m$ random labeled points in $\mathbb{S}^{d-1}$.
Tasks
Published	2019-11-22
URL	https://arxiv.org/abs/1911.09873v1
PDF	https://arxiv.org/pdf/1911.09873v1.pdf
PWC	https://paperswithcode.com/paper/neural-networks-learning-and-memorization
Repo
Framework

Dimension of Reservoir Computers


Title	Dimension of Reservoir Computers
Authors	Thomas L. Carroll
Abstract	A reservoir computer is a complex dynamical system, often created by coupling nonlinear nodes in a network. The nodes are all driven by a common driving signal. In this work, three dimension estimation methods, false nearest neighbor, covariance and Kaplan-Yorke dimensions, are used to estimate the dimension of the reservoir dynamical system. It is shown that the signals in the reservoir system exist on a relatively low dimensional surface. Changing the spectral radius of the reservoir network can increase the fractal dimension of the reservoir signals, leading to an increase in testing error.
Tasks
Published	2019-12-10
URL	https://arxiv.org/abs/1912.06472v1
PDF	https://arxiv.org/pdf/1912.06472v1.pdf
PWC	https://paperswithcode.com/paper/dimension-of-reservoir-computers
Repo
Framework

Differentiable Probabilistic Logic Networks


Title	Differentiable Probabilistic Logic Networks
Authors	Alexey Potapov, Anatoly Belikov, Vitaly Bogdanov, Alexander Scherbatiy
Abstract	Probabilistic logic reasoning is a central component of such cognitive architectures as OpenCog. However, as an integrative architecture, OpenCog facilitates cognitive synergy via hybridization of different inference methods. In this paper, we introduce a differentiable version of Probabilistic Logic networks, which rules operate over tensor truth values in such a way that a chain of reasoning steps constructs a computation graph over tensors that accepts truth values of premises from the knowledge base as input and produces truth values of conclusions as output. This allows for both learning truth values of premises and formulas for rules (specified in a form with trainable weights) by backpropagation combining subsymbolic optimization and symbolic reasoning.
Tasks
Published	2019-07-10
URL	https://arxiv.org/abs/1907.04592v1
PDF	https://arxiv.org/pdf/1907.04592v1.pdf
PWC	https://paperswithcode.com/paper/differentiable-probabilistic-logic-networks
Repo
Framework

Key Fact as Pivot: A Two-Stage Model for Low Resource Table-to-Text Generation


Title	Key Fact as Pivot: A Two-Stage Model for Low Resource Table-to-Text Generation
Authors	Shuming Ma, Pengcheng Yang, Tianyu Liu, Peng Li, Jie Zhou, Xu Sun
Abstract	Table-to-text generation aims to translate the structured data into the unstructured text. Most existing methods adopt the encoder-decoder framework to learn the transformation, which requires large-scale training samples. However, the lack of large parallel data is a major practical problem for many domains. In this work, we consider the scenario of low resource table-to-text generation, where only limited parallel data is available. We propose a novel model to separate the generation into two stages: key fact prediction and surface realization. It first predicts the key facts from the tables, and then generates the text with the key facts. The training of key fact prediction needs much fewer annotated data, while surface realization can be trained with pseudo parallel corpus. We evaluate our model on a biography generation dataset. Our model can achieve $27.34$ BLEU score with only $1,000$ parallel data, while the baseline model only obtain the performance of $9.71$ BLEU score.
Tasks	Table-to-Text Generation, Text Generation
Published	2019-08-08
URL	https://arxiv.org/abs/1908.03067v1
PDF	https://arxiv.org/pdf/1908.03067v1.pdf
PWC	https://paperswithcode.com/paper/key-fact-as-pivot-a-two-stage-model-for-low-1
Repo
Framework

Uncertainty in Model-Agnostic Meta-Learning using Variational Inference


Title	Uncertainty in Model-Agnostic Meta-Learning using Variational Inference
Authors	Cuong Nguyen, Thanh-Toan Do, Gustavo Carneiro
Abstract	We introduce a new, rigorously-formulated Bayesian meta-learning algorithm that learns a probability distribution of model parameter prior for few-shot learning. The proposed algorithm employs a gradient-based variational inference to infer the posterior of model parameters to a new task. Our algorithm can be applied to any model architecture and can be implemented in various machine learning paradigms, including regression and classification. We show that the models trained with our proposed meta-learning algorithm are well calibrated and accurate, with state-of-the-art calibration and classification results on two few-shot classification benchmarks (Omniglot and Mini-ImageNet), and competitive results in a multi-modal task-distribution regression.
Tasks	Calibration, Few-Shot Learning, Meta-Learning, Omniglot
Published	2019-07-27
URL	https://arxiv.org/abs/1907.11864v2
PDF	https://arxiv.org/pdf/1907.11864v2.pdf
PWC	https://paperswithcode.com/paper/uncertainty-in-model-agnostic-meta-learning
Repo
Framework

Deep ReLU Networks Have Surprisingly Few Activation Patterns


Title	Deep ReLU Networks Have Surprisingly Few Activation Patterns
Authors	Boris Hanin, David Rolnick
Abstract	The success of deep networks has been attributed in part to their expressivity: per parameter, deep networks can approximate a richer class of functions than shallow networks. In ReLU networks, the number of activation patterns is one measure of expressivity; and the maximum number of patterns grows exponentially with the depth. However, recent work has showed that the practical expressivity of deep networks - the functions they can learn rather than express - is often far from the theoretical maximum. In this paper, we show that the average number of activation patterns for ReLU networks at initialization is bounded by the total number of neurons raised to the input dimension. We show empirically that this bound, which is independent of the depth, is tight both at initialization and during training, even on memorization tasks that should maximize the number of activation patterns. Our work suggests that realizing the full expressivity of deep networks may not be possible in practice, at least with current methods.
Tasks
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00904v2
PDF	https://arxiv.org/pdf/1906.00904v2.pdf
PWC	https://paperswithcode.com/paper/190600904
Repo
Framework

Hyperspectral Super-Resolution via Global-Local Low-Rank Matrix Estimation


Title	Hyperspectral Super-Resolution via Global-Local Low-Rank Matrix Estimation
Authors	Ruiyuan Wu, Wing-Kin Ma, Xiao Fu, Qiang Li
Abstract	Hyperspectral super-resolution (HSR) is a problem that aims to estimate an image of high spectral and spatial resolutions from a pair of co-registered multispectral (MS) and hyperspectral (HS) images, which have coarser spectral and spatial resolutions, respectively. In this paper we pursue a low-rank matrix estimation approach for HSR. We assume that the spectral-spatial matrices associated with the whole image and the local areas of the image have low-rank structures. The local low-rank assumption, in particular, has the aim of providing a more flexible model for accounting for local variation effects due to endmember variability. We formulate the HSR problem as a global-local rank-regularized least-squares problem. By leveraging on the recent advances in non-convex large-scale optimization, namely, the smooth Schatten-p approximation and the accelerated majorization-minimization method, we develop an efficient algorithm for the global-local low-rank problem. Numerical experiments on synthetic, semi-real and real data show that the proposed algorithm outperforms a number of benchmark algorithms in terms of recovery performance.
Tasks	Super-Resolution
Published	2019-07-02
URL	https://arxiv.org/abs/1907.01149v2
PDF	https://arxiv.org/pdf/1907.01149v2.pdf
PWC	https://paperswithcode.com/paper/hyperspectral-super-resolution-via-global
Repo
Framework

LiDAR ICPS-net: Indoor Camera Positioning based-on Generative Adversarial Network for RGB to Point-Cloud Translation


Title	LiDAR ICPS-net: Indoor Camera Positioning based-on Generative Adversarial Network for RGB to Point-Cloud Translation
Authors	Ali Ghofrani, Rahil Mahdian Toroghi, Seyed Mojtaba Tabatabaie, Seyed Maziar Tabasi
Abstract	Indoor positioning aims at navigation inside areas with no GPS-data availability and could be employed in many applications such as augmented reality, autonomous driving specially inside closed areas and tunnels. In this paper, a deep neural network-based architecture has been proposed to address this problem. In this regard, a tandem set of convolutional neural networks, as well as a Pix2Pix GAN network have been leveraged to perform as the scene classifier, scene RGB image to point cloud converter, and position regressor, respectively. The proposed architecture outperforms the previous works, including our recent work, in the sense that it makes data generation task easier and more robust against scene small variations, whilst the accuracy of the positioning is remarkably well, for both Cartesian position and quaternion information of the camera.
Tasks	Autonomous Driving
Published	2019-11-14
URL	https://arxiv.org/abs/1911.05871v1
PDF	https://arxiv.org/pdf/1911.05871v1.pdf
PWC	https://paperswithcode.com/paper/lidar-icps-net-indoor-camera-positioning
Repo
Framework

Feedback Learning for Improving the Robustness of Neural Networks


Title	Feedback Learning for Improving the Robustness of Neural Networks
Authors	Chang Song, Zuoguan Wang, Hai Li
Abstract	Recent research studies revealed that neural networks are vulnerable to adversarial attacks. State-of-the-art defensive techniques add various adversarial examples in training to improve models’ adversarial robustness. However, these methods are not universal and can’t defend unknown or non-adversarial evasion attacks. In this paper, we analyze the model robustness in the decision space. A feedback learning method is then proposed, to understand how well a model learns and to facilitate the retraining process of remedying the defects. The evaluations according to a set of distance-based criteria show that our method can significantly improve models’ accuracy and robustness against different types of evasion attacks. Moreover, we observe the existence of inter-class inequality and propose to compensate it by changing the proportions of examples generated in different classes.
Tasks
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05443v1
PDF	https://arxiv.org/pdf/1909.05443v1.pdf
PWC	https://paperswithcode.com/paper/feedback-learning-for-improving-the
Repo
Framework

Inconsistency Proofs for ASP: The ASP-DRUPE Format


Title	Inconsistency Proofs for ASP: The ASP-DRUPE Format
Authors	Mario Alviano, Carmine Dodaro, Johannes K. Fichte, Markus Hecher, Tobias Philipp, Jakob Rath
Abstract	Answer Set Programming (ASP) solvers are highly-tuned and complex procedures that implicitly solve the consistency problem, i.e., deciding whether a logic program admits an answer set. Verifying whether a claimed answer set is formally a correct answer set of the program can be decided in polynomial time for (normal) programs. However, it is far from immediate to verify whether a program that is claimed to be inconsistent, indeed does not admit any answer sets. In this paper, we address this problem and develop the new proof format ASP-DRUPE for propositional, disjunctive logic programs, including weight and choice rules. ASP-DRUPE is based on the Reverse Unit Propagation (RUP) format designed for Boolean satisfiability. We establish correctness of ASP-DRUPE and discuss how to integrate it into modern ASP solvers. Later, we provide an implementation of ASP-DRUPE into the wasp solver for normal logic programs. This work is under consideration for acceptance in TPLP.
Tasks
Published	2019-07-24
URL	https://arxiv.org/abs/1907.10389v1
PDF	https://arxiv.org/pdf/1907.10389v1.pdf
PWC	https://paperswithcode.com/paper/inconsistency-proofs-for-asp-the-asp-drupe
Repo
Framework

Target Conditioned Sampling: Optimizing Data Selection for Multilingual Neural Machine Translation


Title	Target Conditioned Sampling: Optimizing Data Selection for Multilingual Neural Machine Translation
Authors	Xinyi Wang, Graham Neubig
Abstract	To improve low-resource Neural Machine Translation (NMT) with multilingual corpora, training on the most related high-resource language only is often more effective than using all data available (Neubig and Hu, 2018). However, it is possible that an intelligent data selection strategy can further improve low-resource NMT with data from other auxiliary languages. In this paper, we seek to construct a sampling distribution over all multilingual data, so that it minimizes the training loss of the low-resource language. Based on this formulation, we propose an efficient algorithm, Target Conditioned Sampling (TCS), which first samples a target sentence, and then conditionally samples its source sentence. Experiments show that TCS brings significant gains of up to 2 BLEU on three of four languages we test, with minimal training overhead.
Tasks	Low-Resource Neural Machine Translation, Machine Translation
Published	2019-05-20
URL	https://arxiv.org/abs/1905.08212v1
PDF	https://arxiv.org/pdf/1905.08212v1.pdf
PWC	https://paperswithcode.com/paper/target-conditioned-sampling-optimizing-data
Repo
Framework