January 29, 2020

3220 words 16 mins read

Paper Group ANR 670

Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks. A comparison of apartment rent price prediction using a large dataset: Kriging versus DNN. Meta Learning with Relational Information for Short Sequences. The role of invariance in spectral complexity-based generalization bounds. W-Net: Two-stage U-Net with …

Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks


Title	Weak Supervision helps Emergence of Word-Object Alignment and improves Vision-Language Tasks
Authors	Corentin Kervadec, Grigory Antipov, Moez Baccouche, Christian Wolf
Abstract	The large adoption of the self-attention (i.e. transformer model) and BERT-like training principles has recently resulted in a number of high performing models on a large panoply of vision-and-language problems (such as Visual Question Answering (VQA), image retrieval, etc.). In this paper we claim that these State-Of-The-Art (SOTA) approaches perform reasonably well in structuring information inside a single modality but, despite their impressive performances , they tend to struggle to identify fine-grained inter-modality relationships. Indeed, such relations are frequently assumed to be implicitly learned during training from application-specific losses, mostly cross-entropy for classification. While most recent works provide inductive bias for inter-modality relationships via cross attention modules, in this work, we demonstrate (1) that the latter assumption does not hold, i.e. modality alignment does not necessarily emerge automatically, and (2) that adding weak supervision for alignment between visual objects and words improves the quality of the learned models on tasks requiring reasoning. In particular , we integrate an object-word alignment loss into SOTA vision-language reasoning models and evaluate it on two tasks VQA and Language-driven Comparison of Images. We show that the proposed fine-grained inter-modality supervision significantly improves performance on both tasks. In particular, this new learning signal allows obtaining SOTA-level performances on GQA dataset (VQA task) with pre-trained models without finetuning on the task, and a new SOTA on NLVR2 dataset (Language-driven Comparison of Images). Finally, we also illustrate the impact of the contribution on the models reasoning by visualizing attention distributions.
Tasks	Image Retrieval, Question Answering, Visual Question Answering, Word Alignment
Published	2019-12-06
URL	https://arxiv.org/abs/1912.03063v1
PDF	https://arxiv.org/pdf/1912.03063v1.pdf
PWC	https://paperswithcode.com/paper/weak-supervision-helps-emergence-of-word
Repo
Framework

A comparison of apartment rent price prediction using a large dataset: Kriging versus DNN


Title	A comparison of apartment rent price prediction using a large dataset: Kriging versus DNN
Authors	Hajime Seya, Daiki Shiroi
Abstract	The hedonic approach based on a regression model has been widely adopted for the prediction of real estate property price and rent. In particular, a spatial regression technique called Kriging, a method of interpolation that was advanced in the field of spatial statistics, are known to enable high accuracy prediction in light of the spatial dependence of real estate property data. Meanwhile, there has been a rapid increase in machine learning-based prediction using a large (big) dataset and its effectiveness has been demonstrated in previous studies. However, no studies have ever shown the extent to which predictive accuracy differs for Kriging and machine learning techniques using big data. Thus, this study compares the predictive accuracy of apartment rent price in Japan between the nearest neighbor Gaussian processes (NNGP) model, which enables application of Kriging to big data, and the deep neural network (DNN), a representative machine learning technique, with a particular focus on the data sample size (n = 10^4, 10^5, 10^6) and differences in predictive performance. Our analysis showed that, with an increase in sample size, the out-of-sample predictive accuracy of DNN approached that of NNGP and they were nearly equal on the order of n = 10^6. Furthermore, it is suggested that, for both higher and lower end properties whose rent price deviates from the median, DNN may have a higher predictive accuracy than that of NNGP.
Tasks	Gaussian Processes
Published	2019-06-25
URL	https://arxiv.org/abs/1906.11099v1
PDF	https://arxiv.org/pdf/1906.11099v1.pdf
PWC	https://paperswithcode.com/paper/a-comparison-of-apartment-rent-price
Repo
Framework

Meta Learning with Relational Information for Short Sequences


Title	Meta Learning with Relational Information for Short Sequences
Authors	Yujia Xie, Haoming Jiang, Feng Liu, Tuo Zhao, Hongyuan Zha
Abstract	This paper proposes a new meta-learning method – named HARMLESS (HAwkes Relational Meta LEarning method for Short Sequences) for learning heterogeneous point process models from short event sequence data along with a relational network. Specifically, we propose a hierarchical Bayesian mixture Hawkes process model, which naturally incorporates the relational information among sequences into point process modeling. Compared with existing methods, our model can capture the underlying mixed-community patterns of the relational network, which simultaneously encourages knowledge sharing among sequences and facilitates adaptive learning for each individual sequence. We further propose an efficient stochastic variational meta expectation maximization algorithm that can scale to large problems. Numerical experiments on both synthetic and real data show that HARMLESS outperforms existing methods in terms of predicting the future events.
Tasks	Meta-Learning
Published	2019-09-04
URL	https://arxiv.org/abs/1909.02105v1
PDF	https://arxiv.org/pdf/1909.02105v1.pdf
PWC	https://paperswithcode.com/paper/meta-learning-with-relational-information-for
Repo
Framework

The role of invariance in spectral complexity-based generalization bounds


Title	The role of invariance in spectral complexity-based generalization bounds
Authors	Konstantinos Pitas, Andreas Loukas, Mike Davies, Pierre Vandergheynst
Abstract	Deep convolutional neural networks (CNNs) have been shown to be able to fit a random labeling over data while still being able to generalize well for normal labels. Describing CNN capacity through a posteriori measures of complexity has been recently proposed to tackle this apparent paradox. These complexity measures are usually validated by showing that they correlate empirically with GE; being empirically larger for networks trained on random vs normal labels. Focusing on the case of spectral complexity we investigate theoretically and empirically the insensitivity of the complexity measure to invariances relevant to CNNs, and show several limitations of spectral complexity that occur as a result. For a specific formulation of spectral complexity we show that it results in the same upper bound complexity estimates for convolutional and locally connected architectures (which don’t have the same favorable invariance properties). This is contrary to common intuition and empirical results.
Tasks
Published	2019-05-23
URL	https://arxiv.org/abs/1905.09677v2
PDF	https://arxiv.org/pdf/1905.09677v2.pdf
PWC	https://paperswithcode.com/paper/some-limitations-of-norm-based-generalization
Repo
Framework

W-Net: Two-stage U-Net with misaligned data for raw-to-RGB mapping


Title	W-Net: Two-stage U-Net with misaligned data for raw-to-RGB mapping
Authors	Kwang-Hyun Uhm, Seung-Wook Kim, Seo-Won Ji, Sung-Jin Cho, Jun-Pyo Hong, Sung-Jea Ko
Abstract	Recent research on learning a mapping between raw Bayer images and RGB images has progressed with the development of deep convolutional neural networks. A challenging data set namely the Zurich Raw-to-RGB data set (ZRR) has been released in the AIM 2019 raw-to-RGB mapping challenge. In ZRR, input raw and target RGB images are captured by two different cameras and thus not perfectly aligned. Moreover, camera metadata such as white balance gains and color correction matrix are not provided, which makes the challenge more difficult. In this paper, we explore an effective network structure and a loss function to address these issues. We exploit a two-stage U-Net architecture and also introduce a loss function that is less variant to alignment and more sensitive to color differences. In addition, we show an ensemble of networks trained with different loss functions can bring a significant performance gain. We demonstrate the superiority of our method by achieving the highest score in terms of both the peak signal-to-noise ratio and the structural similarity and obtaining the second-best mean-opinion-score in the challenge.
Tasks
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08656v3
PDF	https://arxiv.org/pdf/1911.08656v3.pdf
PWC	https://paperswithcode.com/paper/w-net-two-stage-u-net-with-misaligned-data
Repo
Framework

Low-latency Visual SLAM with Appearance-Enhanced Local Map Building


Title	Low-latency Visual SLAM with Appearance-Enhanced Local Map Building
Authors	Yipu Zhao, Wenkai Ye, Patricio A. Vela
Abstract	A local map module is often implemented in modern VO/VSLAM systems to improve data association and pose estimation. Conventionally, the local map contents are determined by co-visibility. While co-visibility is cheap to establish, it utilizes the relatively-weak temporal prior (i.e. seen before, likely to be seen now), therefore admitting more features into the local map than necessary. This paper describes an enhancement to co-visibility local map building by incorporating a strong appearance prior, which leads to a more compact local map and latency reduction in downstream data association. The appearance prior collected from the current image influences the local map contents: only the map features visually similar to the current measurements are potentially useful for data association. To that end, mapped features are indexed and queried with Multi-index Hashing (MIH). An online hash table selection algorithm is developed to further reduce the query overhead of MIH and the local map size. The proposed appearance-based local map building method is integrated into a state-of-the-art VO/VSLAM system. When evaluated on two public benchmarks, the size of the local map, as well as the latency of real-time pose tracking in VO/VSLAM are significantly reduced. Meanwhile, the VO/VSLAM mean performance is preserved or improves.
Tasks	Pose Estimation, Pose Tracking
Published	2019-05-19
URL	https://arxiv.org/abs/1905.07797v1
PDF	https://arxiv.org/pdf/1905.07797v1.pdf
PWC	https://paperswithcode.com/paper/low-latency-visual-slam-with-appearance
Repo
Framework

Embodied Neuromorphic Vision with Event-Driven Random Backpropagation


Title	Embodied Neuromorphic Vision with Event-Driven Random Backpropagation
Authors	Jacques Kaiser, Alexander Friedrich, J. Camilo Vasquez Tieck, Daniel Reichard, Arne Roennau, Emre Neftci, Rüdiger Dillmann
Abstract	Spike-based communication between biological neurons is sparse and unreliable. This enables the brain to process visual information from the eyes efficiently. Taking inspiration from biology, artificial spiking neural networks coupled with silicon retinas attempt to model these computations. Recent findings in machine learning allowed the derivation of a family of powerful synaptic plasticity rules approximating backpropagation for spiking networks. Are these rules capable of processing real-world visual sensory data? In this paper, we evaluate the performance of Event-Driven Random Back-Propagation (eRBP) at learning representations from event streams provided by a Dynamic Vision Sensor (DVS). First, we show that eRBP matches state-of-the-art performance on the DvsGesture dataset with the addition of a simple covert attention mechanism. By remapping visual receptive fields relatively to the center of the motion, this attention mechanism provides translation invariance at low computational cost compared to convolutions. Second, we successfully integrate eRBP in a real robotic setup, where a robotic arm grasps objects according to detected visual affordances. In this setup, visual information is actively sensed by a DVS mounted on a robotic head performing microsaccadic eye movements. We show that our method classifies affordances within 100ms after microsaccade onset, which is comparable to human performance reported in behavioral study. Our results suggest that advances in neuromorphic technology and plasticity rules enable the development of autonomous robots operating at high speed and low energy consumption.
Tasks
Published	2019-04-09
URL	https://arxiv.org/abs/1904.04805v2
PDF	https://arxiv.org/pdf/1904.04805v2.pdf
PWC	https://paperswithcode.com/paper/embodied-event-driven-random-backpropagation
Repo
Framework

Reducing Artificial Neural Network Complexity: A Case Study on Exoplanet Detection


Title	Reducing Artificial Neural Network Complexity: A Case Study on Exoplanet Detection
Authors	Sebastiaan Koning, Caspar Greeven, Eric Postma
Abstract	Despite their successes in the field of self-learning AI, Convolutional Neural Networks (CNNs) suffer from having too many trainable parameters, impacting computational performance. Several approaches have been proposed to reduce the number of parameters in the visual domain, the Inception architecture [Szegedy et al., 2016] being a prominent example. This raises the question whether the number of trainable parameters in CNNs can also be reduced for 1D inputs, such as time-series data, without incurring a substantial loss in classification performance. We propose and examine two methods for complexity reduction in AstroNet [Shallue & Vanderburg, 2018], a CNN for automatic classification of time-varying brightness data of stars to detect exoplanets. The first method makes only a tactical reduction of layers in AstroNet while the second method also modifies the original input data by means of a Gaussian pyramid. We conducted our experiments with various degrees of dropout regularization. Our results show only a non-substantial loss in accuracy compared to the original AstroNet, while reducing training time up to 85 percent. These results show potential for similar reductions in other CNN applications while largely retaining accuracy.
Tasks	Time Series
Published	2019-02-27
URL	http://arxiv.org/abs/1902.10385v1
PDF	http://arxiv.org/pdf/1902.10385v1.pdf
PWC	https://paperswithcode.com/paper/reducing-artificial-neural-network-complexity
Repo
Framework

Smaller Models, Better Generalization


Title	Smaller Models, Better Generalization
Authors	Mayank Sharma, Suraj Tripathi, Abhimanyu Dubey, Jayadeva, Sai Guruju, Nihal Goalla
Abstract	Reducing network complexity has been a major research focus in recent years with the advent of mobile technology. Convolutional Neural Networks that perform various vision tasks without memory overhaul is the need of the hour. This paper focuses on qualitative and quantitative analysis of reducing the network complexity using an upper bound on the Vapnik-Chervonenkis dimension, pruning, and quantization. We observe a general trend in improvement of accuracies as we quantize the models. We propose a novel loss function that helps in achieving considerable sparsity at comparable accuracies to that of dense models. We compare various regularizations prevalent in the literature and show the superiority of our method in achieving sparser models that generalize well.
Tasks	Quantization
Published	2019-08-29
URL	https://arxiv.org/abs/1908.11250v1
PDF	https://arxiv.org/pdf/1908.11250v1.pdf
PWC	https://paperswithcode.com/paper/smaller-models-better-generalization
Repo
Framework

Phase Transitions and Cyclic Phenomena in Bandits with Switching Constraints


Title	Phase Transitions and Cyclic Phenomena in Bandits with Switching Constraints
Authors	David Simchi-Levi, Yunzong Xu
Abstract	We consider the classical stochastic multi-armed bandit problem with a constraint on the total cost incurred by switching between actions. We prove matching upper and lower bounds on regret and provide near-optimal algorithms for this problem. Surprisingly, we discover phase transitions and cyclic phenomena of the optimal regret. That is, we show that associated with the multi-armed bandit problem, there are phases defined by the number of arms and switching costs, where the regret upper and lower bounds in each phase remain the same and drop significantly between phases. The results enable us to fully characterize the trade-off between regret and incurred switching cost in the stochastic multi-armed bandit problem, contributing new insights to this fundamental problem. Under the general switching cost structure, the results reveal a deep connection between bandit problems and graph traversal problems, such as the shortest Hamiltonian path problem.
Tasks
Published	2019-05-26
URL	https://arxiv.org/abs/1905.10825v3
PDF	https://arxiv.org/pdf/1905.10825v3.pdf
PWC	https://paperswithcode.com/paper/phase-transitions-and-cyclic-phenomena-in
Repo
Framework

Combining Q-Learning and Search with Amortized Value Estimates


Title	Combining Q-Learning and Search with Amortized Value Estimates
Authors	Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Tobias Pfaff, Theophane Weber, Lars Buesing, Peter W. Battaglia
Abstract	We introduce “Search with Amortized Value Estimates” (SAVE), an approach for combining model-free Q-learning with model-based Monte-Carlo Tree Search (MCTS). In SAVE, a learned prior over state-action values is used to guide MCTS, which estimates an improved set of state-action values. The new Q-estimates are then used in combination with real experience to update the prior. This effectively amortizes the value computation performed by MCTS, resulting in a cooperative relationship between model-free learning and model-based search. SAVE can be implemented on top of any Q-learning agent with access to a model, which we demonstrate by incorporating it into agents that perform challenging physical reasoning tasks and Atari. SAVE consistently achieves higher rewards with fewer training steps, and—in contrast to typical model-based search approaches—yields strong performance with very small search budgets. By combining real experience with information computed during search, SAVE demonstrates that it is possible to improve on both the performance of model-free learning and the computational cost of planning.
Tasks	Q-Learning
Published	2019-12-05
URL	https://arxiv.org/abs/1912.02807v2
PDF	https://arxiv.org/pdf/1912.02807v2.pdf
PWC	https://paperswithcode.com/paper/combining-q-learning-and-search-with-1
Repo
Framework

BookQA: Stories of Challenges and Opportunities


Title	BookQA: Stories of Challenges and Opportunities
Authors	Stefanos Angelidis, Lea Frermann, Diego Marcheggiani, Roi Blanco, Lluís Màrquez
Abstract	We present a system for answering questions based on the full text of books (BookQA), which first selects book passages given a question at hand, and then uses a memory network to reason and predict an answer. To improve generalization, we pretrain our memory network using artificial questions generated from book sentences. We experiment with the recently published NarrativeQA corpus, on the subset of Who questions, which expect book characters as answers. We experimentally show that BERT-based retrieval and pretraining improve over baseline results significantly. At the same time, we confirm that NarrativeQA is a highly challenging data set, and that there is need for novel research in order to achieve high-precision BookQA results. We analyze some of the bottlenecks of the current approach, and we argue that more research is needed on text representation, retrieval of relevant passages, and reasoning, including commonsense knowledge.
Tasks
Published	2019-10-02
URL	https://arxiv.org/abs/1910.00856v1
PDF	https://arxiv.org/pdf/1910.00856v1.pdf
PWC	https://paperswithcode.com/paper/bookqa-stories-of-challenges-and
Repo
Framework

Value function estimation in Markov reward processes: Instance-dependent $\ell_\infty$-bounds for policy evaluation


Title	Value function estimation in Markov reward processes: Instance-dependent $\ell_\infty$-bounds for policy evaluation
Authors	Ashwin Pananjady, Martin J. Wainwright
Abstract	Markov reward processes (MRPs) are used to model stochastic phenomena arising in operations research, control engineering, robotics, artificial intelligence, as well as communication and transportation networks. In many of these cases, such as in the policy evaluation problem encountered in reinforcement learning, the goal is to estimate the long-term value function of such a process without access to the underlying population transition and reward functions. Working with samples generated under the synchronous model, we study the problem of estimating the value function of an infinite-horizon, discounted MRP in the $\ell_\infty$-norm. We analyze both the standard plug-in approach to this problem and a more robust variant, and establish non-asymptotic bounds that depend on the (unknown) problem instance, as well as data-dependent bounds that can be evaluated based on the observed data. We show that these approaches are minimax-optimal up to constant factors over natural sub-classes of MRPs. Our analysis makes use of a leave-one-out decoupling argument tailored to the policy evaluation problem, one which may be of independent interest.
Tasks
Published	2019-09-19
URL	https://arxiv.org/abs/1909.08749v1
PDF	https://arxiv.org/pdf/1909.08749v1.pdf
PWC	https://paperswithcode.com/paper/value-function-estimation-in-markov-reward
Repo
Framework

Learning from Thresholds: Fully Automated Classification of Tumor Infiltrating Lymphocytes for Multiple Cancer Types


Title	Learning from Thresholds: Fully Automated Classification of Tumor Infiltrating Lymphocytes for Multiple Cancer Types
Authors	Shahira Abousamra, Le Hou, Rajarsi Gupta, Chao Chen, Dimitris Samaras, Tahsin Kurc, Rebecca Batiste, Tianhao Zhao, Shroyer Kenneth, Joel Saltz
Abstract	Deep learning classifiers for characterization of whole slide tissue morphology require large volumes of annotated data to learn variations across different tissue and cancer types. As is well known, manual generation of digital pathology training data is time consuming and expensive. In this paper, we propose a semi-automated method for annotating a group of similar instances at once, instead of collecting only per-instance manual annotations. This allows for a much larger training set, that reflects visual variability across multiple cancer types and thus training of a single network which can be automatically applied to each cancer type without human adjustment. We apply our method to the important task of classifying Tumor Infiltrating Lymphocytes (TILs) in H&E images. Prior approaches were trained for individual cancer types, with smaller training sets and human-in-the-loop threshold adjustment. We utilize these thresholded results as large scale “semi-automatic” annotations. Combined with existing manual annotations, our trained deep networks are able to automatically produce better TIL prediction results in 12 cancer types, compared to the human-in-the-loop approach.
Tasks
Published	2019-07-09
URL	https://arxiv.org/abs/1907.03960v1
PDF	https://arxiv.org/pdf/1907.03960v1.pdf
PWC	https://paperswithcode.com/paper/learning-from-thresholds-fully-automated
Repo
Framework

Adaptive Learned Bloom Filter (Ada-BF): Efficient Utilization of the Classifier


Title	Adaptive Learned Bloom Filter (Ada-BF): Efficient Utilization of the Classifier
Authors	Zhenwei Dai, Anshumali Shrivastava
Abstract	Recent work suggests improving the performance of Bloom filter by incorporating a machine learning model as a binary classifier. However, such learned Bloom filter does not take full advantage of the predicted probability scores. We proposed new algorithms that generalize the learned Bloom filter by using the complete spectrum of the scores regions. We proved our algorithms have lower False Positive Rate (FPR) and memory usage compared with the existing approaches to learned Bloom filter. We also demonstrated the improved performance of our algorithms on real-world datasets.
Tasks
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09131v1
PDF	https://arxiv.org/pdf/1910.09131v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-learned-bloom-filter-ada-bf
Repo
Framework