Paper Group ANR 594
MFPN: A Novel Mixture Feature Pyramid Network of Multiple Architectures for Object Detection. Improving Disentangled Representation Learning with the Beta Bernoulli Process. Semi-Cyclic Stochastic Gradient Descent. An Effective Approach to Unsupervised Machine Translation. Neural Networks Learning and Memorization with (almost) no Over-Parameteriza …
MFPN: A Novel Mixture Feature Pyramid Network of Multiple Architectures for Object Detection
Title | MFPN: A Novel Mixture Feature Pyramid Network of Multiple Architectures for Object Detection |
Authors | Tingting Liang, Yongtao Wang, Qijie Zhao, huan zhang, Zhi Tang, Haibin Ling |
Abstract | Feature pyramids are widely exploited in many detectors to solve the scale variation problem for object detection. In this paper, we first investigate the Feature Pyramid Network (FPN) architectures and briefly categorize them into three typical fashions: top-down, bottom-up and fusing-splitting, which have their own merits for detecting small objects, large objects, and medium-sized objects, respectively. Further, we design three FPNs of different architectures and propose a novel Mixture Feature Pyramid Network (MFPN) which inherits the merits of all these three kinds of FPNs, by assembling the three kinds of FPNs in a parallel multi-branch architecture and mixing the features. MFPN can significantly enhance both one-stage and two-stage FPN-based detectors with about 2 percent Average Precision(AP) increment on the MS-COCO benchmark, at little sacrifice in running time latency. By simply assembling MFPN with the one-stage and two-stage baseline detectors, we achieve competitive single-model detection results on the COCO detection benchmark without bells and whistles. |
Tasks | Object Detection |
Published | 2019-12-20 |
URL | https://arxiv.org/abs/1912.09748v1 |
https://arxiv.org/pdf/1912.09748v1.pdf | |
PWC | https://paperswithcode.com/paper/mfpn-a-novel-mixture-feature-pyramid-network |
Repo | |
Framework | |
Improving Disentangled Representation Learning with the Beta Bernoulli Process
Title | Improving Disentangled Representation Learning with the Beta Bernoulli Process |
Authors | Prashnna Kumar Gyawali, Zhiyuan Li, Cameron Knight, Sandesh Ghimire, B. Milan Horacek, John Sapp, Linwei Wang |
Abstract | To improve the ability of VAE to disentangle in the latent space, existing works mostly focus on enforcing independence among the learned latent factors. However, the ability of these models to disentangle often decreases as the complexity of the generative factors increases. In this paper, we investigate the little-explored effect of the modeling capacity of a posterior density on the disentangling ability of the VAE. We note that the independence within and the complexity of the latent density are two different properties we constrain when regularizing the posterior density: while the former promotes the disentangling ability of VAE, the latter – if overly limited – creates an unnecessary competition with the data reconstruction objective in VAE. Therefore, if we preserve the independence but allow richer modeling capacity in the posterior density, we will lift this competition and thereby allow improved independence and data reconstruction at the same time. We investigate this theoretical intuition with a VAE that utilizes a non-parametric latent factor model, the Indian Buffet Process (IBP), as a latent density that is able to grow with the complexity of the data. Across three widely-used benchmark data sets and two clinical data sets little explored for disentangled learning, we qualitatively and quantitatively demonstrated the improved disentangling performance of IBP-VAE over the state of the art. In the latter two clinical data sets riddled with complex factors of variations, we further demonstrated that unsupervised disentangling of nuisance factors via IBP-VAE – when combined with a supervised objective – can not only improve task accuracy in comparison to relevant supervised deep architectures but also facilitate knowledge discovery related to task decision-making. A shorter version of this work will appear in the ICDM 2019 conference proceedings. |
Tasks | Decision Making, Representation Learning |
Published | 2019-09-03 |
URL | https://arxiv.org/abs/1909.01839v1 |
https://arxiv.org/pdf/1909.01839v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-disentangled-representation |
Repo | |
Framework | |
Semi-Cyclic Stochastic Gradient Descent
Title | Semi-Cyclic Stochastic Gradient Descent |
Authors | Hubert Eichner, Tomer Koren, H. Brendan McMahan, Nathan Srebro, Kunal Talwar |
Abstract | We consider convex SGD updates with a block-cyclic structure, i.e. where each cycle consists of a small number of blocks, each with many samples from a possibly different, block-specific, distribution. This situation arises, e.g., in Federated Learning where the mobile devices available for updates at different times during the day have different characteristics. We show that such block-cyclic structure can significantly deteriorate the performance of SGD, but propose a simple approach that allows prediction with the same performance guarantees as for i.i.d., non-cyclic, sampling. |
Tasks | |
Published | 2019-04-23 |
URL | http://arxiv.org/abs/1904.10120v1 |
http://arxiv.org/pdf/1904.10120v1.pdf | |
PWC | https://paperswithcode.com/paper/semi-cyclic-stochastic-gradient-descent |
Repo | |
Framework | |
An Effective Approach to Unsupervised Machine Translation
Title | An Effective Approach to Unsupervised Machine Translation |
Authors | Mikel Artetxe, Gorka Labaka, Eneko Agirre |
Abstract | While machine translation has traditionally relied on large amounts of parallel corpora, a recent research line has managed to train both Neural Machine Translation (NMT) and Statistical Machine Translation (SMT) systems using monolingual corpora only. In this paper, we identify and address several deficiencies of existing unsupervised SMT approaches by exploiting subword information, developing a theoretically well founded unsupervised tuning method, and incorporating a joint refinement procedure. Moreover, we use our improved SMT system to initialize a dual NMT model, which is further fine-tuned through on-the-fly back-translation. Together, we obtain large improvements over the previous state-of-the-art in unsupervised machine translation. For instance, we get 22.5 BLEU points in English-to-German WMT 2014, 5.5 points more than the previous best unsupervised system, and 0.5 points more than the (supervised) shared task winner back in 2014. |
Tasks | Machine Translation, Unsupervised Machine Translation |
Published | 2019-02-04 |
URL | https://arxiv.org/abs/1902.01313v2 |
https://arxiv.org/pdf/1902.01313v2.pdf | |
PWC | https://paperswithcode.com/paper/an-effective-approach-to-unsupervised-machine |
Repo | |
Framework | |
Neural Networks Learning and Memorization with (almost) no Over-Parameterization
Title | Neural Networks Learning and Memorization with (almost) no Over-Parameterization |
Authors | Amit Daniely |
Abstract | Many results in recent years established polynomial time learnability of various models via neural networks algorithms. However, unless the model is linear separable, or the activation is a polynomial, these results require very large networks – much more than what is needed for the mere existence of a good predictor. In this paper we prove that SGD on depth two neural networks can memorize samples, learn polynomials with bounded weights, and learn certain kernel spaces, with near optimal network size, sample complexity, and runtime. In particular, we show that SGD on depth two network with $\tilde{O}\left(\frac{m}{d}\right)$ hidden neurons (and hence $\tilde{O}(m)$ parameters) can memorize $m$ random labeled points in $\mathbb{S}^{d-1}$. |
Tasks | |
Published | 2019-11-22 |
URL | https://arxiv.org/abs/1911.09873v1 |
https://arxiv.org/pdf/1911.09873v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-networks-learning-and-memorization |
Repo | |
Framework | |
Dimension of Reservoir Computers
Title | Dimension of Reservoir Computers |
Authors | Thomas L. Carroll |
Abstract | A reservoir computer is a complex dynamical system, often created by coupling nonlinear nodes in a network. The nodes are all driven by a common driving signal. In this work, three dimension estimation methods, false nearest neighbor, covariance and Kaplan-Yorke dimensions, are used to estimate the dimension of the reservoir dynamical system. It is shown that the signals in the reservoir system exist on a relatively low dimensional surface. Changing the spectral radius of the reservoir network can increase the fractal dimension of the reservoir signals, leading to an increase in testing error. |
Tasks | |
Published | 2019-12-10 |
URL | https://arxiv.org/abs/1912.06472v1 |
https://arxiv.org/pdf/1912.06472v1.pdf | |
PWC | https://paperswithcode.com/paper/dimension-of-reservoir-computers |
Repo | |
Framework | |
Differentiable Probabilistic Logic Networks
Title | Differentiable Probabilistic Logic Networks |
Authors | Alexey Potapov, Anatoly Belikov, Vitaly Bogdanov, Alexander Scherbatiy |
Abstract | Probabilistic logic reasoning is a central component of such cognitive architectures as OpenCog. However, as an integrative architecture, OpenCog facilitates cognitive synergy via hybridization of different inference methods. In this paper, we introduce a differentiable version of Probabilistic Logic networks, which rules operate over tensor truth values in such a way that a chain of reasoning steps constructs a computation graph over tensors that accepts truth values of premises from the knowledge base as input and produces truth values of conclusions as output. This allows for both learning truth values of premises and formulas for rules (specified in a form with trainable weights) by backpropagation combining subsymbolic optimization and symbolic reasoning. |
Tasks | |
Published | 2019-07-10 |
URL | https://arxiv.org/abs/1907.04592v1 |
https://arxiv.org/pdf/1907.04592v1.pdf | |
PWC | https://paperswithcode.com/paper/differentiable-probabilistic-logic-networks |
Repo | |
Framework | |
Key Fact as Pivot: A Two-Stage Model for Low Resource Table-to-Text Generation
Title | Key Fact as Pivot: A Two-Stage Model for Low Resource Table-to-Text Generation |
Authors | Shuming Ma, Pengcheng Yang, Tianyu Liu, Peng Li, Jie Zhou, Xu Sun |
Abstract | Table-to-text generation aims to translate the structured data into the unstructured text. Most existing methods adopt the encoder-decoder framework to learn the transformation, which requires large-scale training samples. However, the lack of large parallel data is a major practical problem for many domains. In this work, we consider the scenario of low resource table-to-text generation, where only limited parallel data is available. We propose a novel model to separate the generation into two stages: key fact prediction and surface realization. It first predicts the key facts from the tables, and then generates the text with the key facts. The training of key fact prediction needs much fewer annotated data, while surface realization can be trained with pseudo parallel corpus. We evaluate our model on a biography generation dataset. Our model can achieve $27.34$ BLEU score with only $1,000$ parallel data, while the baseline model only obtain the performance of $9.71$ BLEU score. |
Tasks | Table-to-Text Generation, Text Generation |
Published | 2019-08-08 |
URL | https://arxiv.org/abs/1908.03067v1 |
https://arxiv.org/pdf/1908.03067v1.pdf | |
PWC | https://paperswithcode.com/paper/key-fact-as-pivot-a-two-stage-model-for-low-1 |
Repo | |
Framework | |
Uncertainty in Model-Agnostic Meta-Learning using Variational Inference
Title | Uncertainty in Model-Agnostic Meta-Learning using Variational Inference |
Authors | Cuong Nguyen, Thanh-Toan Do, Gustavo Carneiro |
Abstract | We introduce a new, rigorously-formulated Bayesian meta-learning algorithm that learns a probability distribution of model parameter prior for few-shot learning. The proposed algorithm employs a gradient-based variational inference to infer the posterior of model parameters to a new task. Our algorithm can be applied to any model architecture and can be implemented in various machine learning paradigms, including regression and classification. We show that the models trained with our proposed meta-learning algorithm are well calibrated and accurate, with state-of-the-art calibration and classification results on two few-shot classification benchmarks (Omniglot and Mini-ImageNet), and competitive results in a multi-modal task-distribution regression. |
Tasks | Calibration, Few-Shot Learning, Meta-Learning, Omniglot |
Published | 2019-07-27 |
URL | https://arxiv.org/abs/1907.11864v2 |
https://arxiv.org/pdf/1907.11864v2.pdf | |
PWC | https://paperswithcode.com/paper/uncertainty-in-model-agnostic-meta-learning |
Repo | |
Framework | |
Deep ReLU Networks Have Surprisingly Few Activation Patterns
Title | Deep ReLU Networks Have Surprisingly Few Activation Patterns |
Authors | Boris Hanin, David Rolnick |
Abstract | The success of deep networks has been attributed in part to their expressivity: per parameter, deep networks can approximate a richer class of functions than shallow networks. In ReLU networks, the number of activation patterns is one measure of expressivity; and the maximum number of patterns grows exponentially with the depth. However, recent work has showed that the practical expressivity of deep networks - the functions they can learn rather than express - is often far from the theoretical maximum. In this paper, we show that the average number of activation patterns for ReLU networks at initialization is bounded by the total number of neurons raised to the input dimension. We show empirically that this bound, which is independent of the depth, is tight both at initialization and during training, even on memorization tasks that should maximize the number of activation patterns. Our work suggests that realizing the full expressivity of deep networks may not be possible in practice, at least with current methods. |
Tasks | |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.00904v2 |
https://arxiv.org/pdf/1906.00904v2.pdf | |
PWC | https://paperswithcode.com/paper/190600904 |
Repo | |
Framework | |
Hyperspectral Super-Resolution via Global-Local Low-Rank Matrix Estimation
Title | Hyperspectral Super-Resolution via Global-Local Low-Rank Matrix Estimation |
Authors | Ruiyuan Wu, Wing-Kin Ma, Xiao Fu, Qiang Li |
Abstract | Hyperspectral super-resolution (HSR) is a problem that aims to estimate an image of high spectral and spatial resolutions from a pair of co-registered multispectral (MS) and hyperspectral (HS) images, which have coarser spectral and spatial resolutions, respectively. In this paper we pursue a low-rank matrix estimation approach for HSR. We assume that the spectral-spatial matrices associated with the whole image and the local areas of the image have low-rank structures. The local low-rank assumption, in particular, has the aim of providing a more flexible model for accounting for local variation effects due to endmember variability. We formulate the HSR problem as a global-local rank-regularized least-squares problem. By leveraging on the recent advances in non-convex large-scale optimization, namely, the smooth Schatten-p approximation and the accelerated majorization-minimization method, we develop an efficient algorithm for the global-local low-rank problem. Numerical experiments on synthetic, semi-real and real data show that the proposed algorithm outperforms a number of benchmark algorithms in terms of recovery performance. |
Tasks | Super-Resolution |
Published | 2019-07-02 |
URL | https://arxiv.org/abs/1907.01149v2 |
https://arxiv.org/pdf/1907.01149v2.pdf | |
PWC | https://paperswithcode.com/paper/hyperspectral-super-resolution-via-global |
Repo | |
Framework | |
LiDAR ICPS-net: Indoor Camera Positioning based-on Generative Adversarial Network for RGB to Point-Cloud Translation
Title | LiDAR ICPS-net: Indoor Camera Positioning based-on Generative Adversarial Network for RGB to Point-Cloud Translation |
Authors | Ali Ghofrani, Rahil Mahdian Toroghi, Seyed Mojtaba Tabatabaie, Seyed Maziar Tabasi |
Abstract | Indoor positioning aims at navigation inside areas with no GPS-data availability and could be employed in many applications such as augmented reality, autonomous driving specially inside closed areas and tunnels. In this paper, a deep neural network-based architecture has been proposed to address this problem. In this regard, a tandem set of convolutional neural networks, as well as a Pix2Pix GAN network have been leveraged to perform as the scene classifier, scene RGB image to point cloud converter, and position regressor, respectively. The proposed architecture outperforms the previous works, including our recent work, in the sense that it makes data generation task easier and more robust against scene small variations, whilst the accuracy of the positioning is remarkably well, for both Cartesian position and quaternion information of the camera. |
Tasks | Autonomous Driving |
Published | 2019-11-14 |
URL | https://arxiv.org/abs/1911.05871v1 |
https://arxiv.org/pdf/1911.05871v1.pdf | |
PWC | https://paperswithcode.com/paper/lidar-icps-net-indoor-camera-positioning |
Repo | |
Framework | |
Feedback Learning for Improving the Robustness of Neural Networks
Title | Feedback Learning for Improving the Robustness of Neural Networks |
Authors | Chang Song, Zuoguan Wang, Hai Li |
Abstract | Recent research studies revealed that neural networks are vulnerable to adversarial attacks. State-of-the-art defensive techniques add various adversarial examples in training to improve models’ adversarial robustness. However, these methods are not universal and can’t defend unknown or non-adversarial evasion attacks. In this paper, we analyze the model robustness in the decision space. A feedback learning method is then proposed, to understand how well a model learns and to facilitate the retraining process of remedying the defects. The evaluations according to a set of distance-based criteria show that our method can significantly improve models’ accuracy and robustness against different types of evasion attacks. Moreover, we observe the existence of inter-class inequality and propose to compensate it by changing the proportions of examples generated in different classes. |
Tasks | |
Published | 2019-09-12 |
URL | https://arxiv.org/abs/1909.05443v1 |
https://arxiv.org/pdf/1909.05443v1.pdf | |
PWC | https://paperswithcode.com/paper/feedback-learning-for-improving-the |
Repo | |
Framework | |
Inconsistency Proofs for ASP: The ASP-DRUPE Format
Title | Inconsistency Proofs for ASP: The ASP-DRUPE Format |
Authors | Mario Alviano, Carmine Dodaro, Johannes K. Fichte, Markus Hecher, Tobias Philipp, Jakob Rath |
Abstract | Answer Set Programming (ASP) solvers are highly-tuned and complex procedures that implicitly solve the consistency problem, i.e., deciding whether a logic program admits an answer set. Verifying whether a claimed answer set is formally a correct answer set of the program can be decided in polynomial time for (normal) programs. However, it is far from immediate to verify whether a program that is claimed to be inconsistent, indeed does not admit any answer sets. In this paper, we address this problem and develop the new proof format ASP-DRUPE for propositional, disjunctive logic programs, including weight and choice rules. ASP-DRUPE is based on the Reverse Unit Propagation (RUP) format designed for Boolean satisfiability. We establish correctness of ASP-DRUPE and discuss how to integrate it into modern ASP solvers. Later, we provide an implementation of ASP-DRUPE into the wasp solver for normal logic programs. This work is under consideration for acceptance in TPLP. |
Tasks | |
Published | 2019-07-24 |
URL | https://arxiv.org/abs/1907.10389v1 |
https://arxiv.org/pdf/1907.10389v1.pdf | |
PWC | https://paperswithcode.com/paper/inconsistency-proofs-for-asp-the-asp-drupe |
Repo | |
Framework | |
Target Conditioned Sampling: Optimizing Data Selection for Multilingual Neural Machine Translation
Title | Target Conditioned Sampling: Optimizing Data Selection for Multilingual Neural Machine Translation |
Authors | Xinyi Wang, Graham Neubig |
Abstract | To improve low-resource Neural Machine Translation (NMT) with multilingual corpora, training on the most related high-resource language only is often more effective than using all data available (Neubig and Hu, 2018). However, it is possible that an intelligent data selection strategy can further improve low-resource NMT with data from other auxiliary languages. In this paper, we seek to construct a sampling distribution over all multilingual data, so that it minimizes the training loss of the low-resource language. Based on this formulation, we propose an efficient algorithm, Target Conditioned Sampling (TCS), which first samples a target sentence, and then conditionally samples its source sentence. Experiments show that TCS brings significant gains of up to 2 BLEU on three of four languages we test, with minimal training overhead. |
Tasks | Low-Resource Neural Machine Translation, Machine Translation |
Published | 2019-05-20 |
URL | https://arxiv.org/abs/1905.08212v1 |
https://arxiv.org/pdf/1905.08212v1.pdf | |
PWC | https://paperswithcode.com/paper/target-conditioned-sampling-optimizing-data |
Repo | |
Framework | |