July 29, 2019

3091 words 15 mins read

Paper Group AWR 186

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection. CloudScan - A configuration-free invoice analysis system using recurrent neural networks. Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training. Pitfalls and Best Practices in Algorithm Configuration. Experiment Segmentation in Scientific …

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection


Title	Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
Authors	Emre Çakır, Giambattista Parascandolo, Toni Heittola, Heikki Huttunen, Tuomas Virtanen
Abstract	Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure. Convolutional neural networks (CNN) are able to extract higher level features that are invariant to local spectral and temporal variations. Recurrent neural networks (RNNs) are powerful in learning the longer term temporal context in the audio signals. CNNs and RNNs as classifiers have recently shown improved performances over established methods in various sound recognition tasks. We combine these two approaches in a Convolutional Recurrent Neural Network (CRNN) and apply it on a polyphonic sound event detection task. We compare the performance of the proposed CRNN method with CNN, RNN, and other established methods, and observe a considerable improvement for four different datasets consisting of everyday sound events.
Tasks	Sound Event Detection
Published	2017-02-21
URL	http://arxiv.org/abs/1702.06286v1
PDF	http://arxiv.org/pdf/1702.06286v1.pdf
PWC	https://paperswithcode.com/paper/convolutional-recurrent-neural-networks-for-6
Repo	https://github.com/cchinchristopherj/Right-Whale-Unsupervised-Model
Framework	tf

CloudScan - A configuration-free invoice analysis system using recurrent neural networks


Title	CloudScan - A configuration-free invoice analysis system using recurrent neural networks
Authors	Rasmus Berg Palm, Ole Winther, Florian Laws
Abstract	We present CloudScan; an invoice analysis system that requires zero configuration or upfront annotation. In contrast to previous work, CloudScan does not rely on templates of invoice layout, instead it learns a single global model of invoices that naturally generalizes to unseen invoice layouts. The model is trained using data automatically extracted from end-user provided feedback. This automatic training data extraction removes the requirement for users to annotate the data precisely. We describe a recurrent neural network model that can capture long range context and compare it to a baseline logistic regression model corresponding to the current CloudScan production system. We train and evaluate the system on 8 important fields using a dataset of 326,471 invoices. The recurrent neural network and baseline model achieve 0.891 and 0.887 average F1 scores respectively on seen invoice layouts. For the harder task of unseen invoice layouts, the recurrent neural network model outperforms the baseline with 0.840 average F1 compared to 0.788.
Tasks
Published	2017-08-24
URL	http://arxiv.org/abs/1708.07403v1
PDF	http://arxiv.org/pdf/1708.07403v1.pdf
PWC	https://paperswithcode.com/paper/cloudscan-a-configuration-free-invoice
Repo	https://github.com/naiveHobo/InvoiceNet
Framework	none

Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training


Title	Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training
Authors	Rakshith Shetty, Marcus Rohrbach, Lisa Anne Hendricks, Mario Fritz, Bernt Schiele
Abstract	While strong progress has been made in image captioning over the last years, machine and human captions are still quite distinct. A closer look reveals that this is due to the deficiencies in the generated word distribution, vocabulary size, and strong bias in the generators towards frequent captions. Furthermore, humans – rightfully so – generate multiple, diverse captions, due to the inherent ambiguity in the captioning task which is not considered in today’s systems. To address these challenges, we change the training objective of the caption generator from reproducing groundtruth captions to generating a set of captions that is indistinguishable from human generated captions. Instead of handcrafting such a learning target, we employ adversarial training in combination with an approximate Gumbel sampler to implicitly match the generated distribution to the human one. While our method achieves comparable performance to the state-of-the-art in terms of the correctness of the captions, we generate a set of diverse captions, that are significantly less biased and match the word statistics better in several aspects.
Tasks	Image Captioning
Published	2017-03-30
URL	http://arxiv.org/abs/1703.10476v2
PDF	http://arxiv.org/pdf/1703.10476v2.pdf
PWC	https://paperswithcode.com/paper/speaking-the-same-language-matching-machine
Repo	https://github.com/rakshithShetty/captionGAN
Framework	none

Pitfalls and Best Practices in Algorithm Configuration


Title	Pitfalls and Best Practices in Algorithm Configuration
Authors	Katharina Eggensperger, Marius Lindauer, Frank Hutter
Abstract	Good parameter settings are crucial to achieve high performance in many areas of artificial intelligence (AI), such as propositional satisfiability solving, AI planning, scheduling, and machine learning (in particular deep learning). Automated algorithm configuration methods have recently received much attention in the AI community since they replace tedious, irreproducible and error-prone manual parameter tuning and can lead to new state-of-the-art performance. However, practical applications of algorithm configuration are prone to several (often subtle) pitfalls in the experimental design that can render the procedure ineffective. We identify several common issues and propose best practices for avoiding them. As one possibility for automatically handling as many of these as possible, we also propose a tool called GenericWrapper4AC.
Tasks
Published	2017-05-17
URL	http://arxiv.org/abs/1705.06058v3
PDF	http://arxiv.org/pdf/1705.06058v3.pdf
PWC	https://paperswithcode.com/paper/pitfalls-and-best-practices-in-algorithm
Repo	https://github.com/mlindauer/GenericWrapper4AC
Framework	none

Experiment Segmentation in Scientific Discourse as Clause-level Structured Prediction using Recurrent Neural Networks


Title	Experiment Segmentation in Scientific Discourse as Clause-level Structured Prediction using Recurrent Neural Networks
Authors	Pradeep Dasigi, Gully A. P. C. Burns, Eduard Hovy, Anita de Waard
Abstract	We propose a deep learning model for identifying structure within experiment narratives in scientific literature. We take a sequence labeling approach to this problem, and label clauses within experiment narratives to identify the different parts of the experiment. Our dataset consists of paragraphs taken from open access PubMed papers labeled with rhetorical information as a result of our pilot annotation. Our model is a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) cells that labels clauses. The clause representations are computed by combining word representations using a novel attention mechanism that involves a separate RNN. We compare this model against LSTMs where the input layer has simple or no attention and a feature rich CRF model. Furthermore, we describe how our work could be useful for information extraction from scientific literature.
Tasks	Structured Prediction
Published	2017-02-17
URL	http://arxiv.org/abs/1702.05398v1
PDF	http://arxiv.org/pdf/1702.05398v1.pdf
PWC	https://paperswithcode.com/paper/experiment-segmentation-in-scientific
Repo	https://github.com/edvisees/sciDT
Framework	none

Cloud Radiative Effect Study Using Sky Camera


Title	Cloud Radiative Effect Study Using Sky Camera
Authors	Soumyabrata Dev, Shilpa Manandhar, Feng Yuan, Yee Hui Lee, Stefan Winkler
Abstract	The analysis of clouds in the earth’s atmosphere is important for a variety of applications, viz. weather reporting, climate forecasting, and solar energy generation. In this paper, we focus our attention on the impact of cloud on the total solar irradiance reaching the earth’s surface. We use weather station to record the total solar irradiance. Moreover, we employ collocated ground-based sky camera to automatically compute the instantaneous cloud coverage. We analyze the relationship between measured solar irradiance and computed cloud coverage value, and conclude that higher cloud coverage greatly impacts the total solar irradiance. Such studies will immensely help in solar energy generation and forecasting.
Tasks
Published	2017-03-15
URL	http://arxiv.org/abs/1703.05591v1
PDF	http://arxiv.org/pdf/1703.05591v1.pdf
PWC	https://paperswithcode.com/paper/cloud-radiative-effect-study-using-sky-camera
Repo	https://github.com/Soumyabrata/cloud-radiative-effect
Framework	none

Revised Note on Learning Algorithms for Quadratic Assignment with Graph Neural Networks


Title	Revised Note on Learning Algorithms for Quadratic Assignment with Graph Neural Networks
Authors	Alex Nowak, Soledad Villar, Afonso S. Bandeira, Joan Bruna
Abstract	Inverse problems correspond to a certain type of optimization problems formulated over appropriate input distributions. Recently, there has been a growing interest in understanding the computational hardness of these optimization problems, not only in the worst case, but in an average-complexity sense under this same input distribution. In this revised note, we are interested in studying another aspect of hardness, related to the ability to learn how to solve a problem by simply observing a collection of previously solved instances. These ‘planted solutions’ are used to supervise the training of an appropriate predictive model that parametrizes a broad class of algorithms, with the hope that the resulting model will provide good accuracy-complexity tradeoffs in the average sense. We illustrate this setup on the Quadratic Assignment Problem, a fundamental problem in Network Science. We observe that data-driven models based on Graph Neural Networks offer intriguingly good performance, even in regimes where standard relaxation based techniques appear to suffer.
Tasks
Published	2017-06-22
URL	http://arxiv.org/abs/1706.07450v2
PDF	http://arxiv.org/pdf/1706.07450v2.pdf
PWC	https://paperswithcode.com/paper/revised-note-on-learning-algorithms-for
Repo	https://github.com/alexnowakvila/QAP_pt
Framework	pytorch

clcNet: Improving the Efficiency of Convolutional Neural Network using Channel Local Convolutions


Title	clcNet: Improving the Efficiency of Convolutional Neural Network using Channel Local Convolutions
Authors	Dong-Qing Zhang
Abstract	Depthwise convolution and grouped convolution has been successfully applied to improve the efficiency of convolutional neural network (CNN). We suggest that these models can be considered as special cases of a generalized convolution operation, named channel local convolution(CLC), where an output channel is computed using a subset of the input channels. This definition entails computation dependency relations between input and output channels, which can be represented by a channel dependency graph(CDG). By modifying the CDG of grouped convolution, a new CLC kernel named interlaced grouped convolution (IGC) is created. Stacking IGC and GC kernels results in a convolution block (named CLC Block) for approximating regular convolution. By resorting to the CDG as an analysis tool, we derive the rule for setting the meta-parameters of IGC and GC and the framework for minimizing the computational cost. A new CNN model named clcNet is then constructed using CLC blocks, which shows significantly higher computational efficiency and fewer parameters compared to state-of-the-art networks, when being tested using the ImageNet-1K dataset. Source code is available at https://github.com/dqzhang17/clcnet.torch .
Tasks
Published	2017-12-17
URL	http://arxiv.org/abs/1712.06145v3
PDF	http://arxiv.org/pdf/1712.06145v3.pdf
PWC	https://paperswithcode.com/paper/clcnet-improving-the-efficiency-of
Repo	https://github.com/dqzhang17/clcnet.torch
Framework	torch

Variational Continual Learning


Title	Variational Continual Learning
Authors	Cuong V. Nguyen, Yingzhen Li, Thang D. Bui, Richard E. Turner
Abstract	This paper develops variational continual learning (VCL), a simple but general framework for continual learning that fuses online variational inference (VI) and recent advances in Monte Carlo VI for neural networks. The framework can successfully train both deep discriminative models and deep generative models in complex continual learning settings where existing tasks evolve over time and entirely new tasks emerge. Experimental results show that VCL outperforms state-of-the-art continual learning methods on a variety of tasks, avoiding catastrophic forgetting in a fully automatic way.
Tasks	Continual Learning
Published	2017-10-29
URL	http://arxiv.org/abs/1710.10628v3
PDF	http://arxiv.org/pdf/1710.10628v3.pdf
PWC	https://paperswithcode.com/paper/variational-continual-learning
Repo	https://github.com/aml-team-2/aml-reproducability-challenge
Framework	pytorch

PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes


Title	PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes
Authors	Yu Xiang, Tanner Schmidt, Venkatraman Narayanan, Dieter Fox
Abstract	Estimating the 6D pose of known objects is important for robots to interact with the real world. The problem is challenging due to the variety of objects as well as the complexity of a scene caused by clutter and occlusions between objects. In this work, we introduce PoseCNN, a new Convolutional Neural Network for 6D object pose estimation. PoseCNN estimates the 3D translation of an object by localizing its center in the image and predicting its distance from the camera. The 3D rotation of the object is estimated by regressing to a quaternion representation. We also introduce a novel loss function that enables PoseCNN to handle symmetric objects. In addition, we contribute a large scale video dataset for 6D object pose estimation named the YCB-Video dataset. Our dataset provides accurate 6D poses of 21 objects from the YCB dataset observed in 92 videos with 133,827 frames. We conduct extensive experiments on our YCB-Video dataset and the OccludedLINEMOD dataset to show that PoseCNN is highly robust to occlusions, can handle symmetric objects, and provide accurate pose estimation using only color images as input. When using depth data to further refine the poses, our approach achieves state-of-the-art results on the challenging OccludedLINEMOD dataset. Our code and dataset are available at https://rse-lab.cs.washington.edu/projects/posecnn/.
Tasks	6D Pose Estimation, 6D Pose Estimation using RGB, 6D Pose Estimation using RGBD, Pose Estimation
Published	2017-11-01
URL	http://arxiv.org/abs/1711.00199v3
PDF	http://arxiv.org/pdf/1711.00199v3.pdf
PWC	https://paperswithcode.com/paper/posecnn-a-convolutional-neural-network-for-6d
Repo	https://github.com/yuxng/PoseCNN
Framework	tf

Reservoir Computing Using Non-Uniform Binary Cellular Automata


Title	Reservoir Computing Using Non-Uniform Binary Cellular Automata
Authors	Stefano Nichele, Magnus S. Gundersen
Abstract	The Reservoir Computing (RC) paradigm utilizes a dynamical system, i.e., a reservoir, and a linear classifier, i.e., a read-out layer, to process data from sequential classification tasks. In this paper the usage of Cellular Automata (CA) as a reservoir is investigated. The use of CA in RC has been showing promising results. In this paper, selected state-of-the-art experiments are reproduced. It is shown that some CA-rules perform better than others, and the reservoir performance is improved by increasing the size of the CA reservoir itself. In addition, the usage of parallel loosely coupled CA-reservoirs, where each reservoir has a different CA-rule, is investigated. The experiments performed on quasi-uniform CA reservoir provide valuable insights in CA reservoir design. The results herein show that some rules do not work well together, while other combinations work remarkably well. This suggests that non-uniform CA could represent a powerful tool for novel CA reservoir implementations.
Tasks
Published	2017-02-13
URL	http://arxiv.org/abs/1702.03812v1
PDF	http://arxiv.org/pdf/1702.03812v1.pdf
PWC	https://paperswithcode.com/paper/reservoir-computing-using-non-uniform-binary
Repo	https://github.com/magnusgundersen/spec
Framework	none

Distance Metric Learning using Graph Convolutional Networks: Application to Functional Brain Networks


Title	Distance Metric Learning using Graph Convolutional Networks: Application to Functional Brain Networks
Authors	Sofia Ira Ktena, Sarah Parisot, Enzo Ferrante, Martin Rajchl, Matthew Lee, Ben Glocker, Daniel Rueckert
Abstract	Evaluating similarity between graphs is of major importance in several computer vision and pattern recognition problems, where graph representations are often used to model objects or interactions between elements. The choice of a distance or similarity metric is, however, not trivial and can be highly dependent on the application at hand. In this work, we propose a novel metric learning method to evaluate distance between graphs that leverages the power of convolutional neural networks, while exploiting concepts from spectral graph theory to allow these operations on irregular graphs. We demonstrate the potential of our method in the field of connectomics, where neuronal pathways or functional connections between brain regions are commonly modelled as graphs. In this problem, the definition of an appropriate graph similarity function is critical to unveil patterns of disruptions associated with certain brain disorders. Experimental results on the ABIDE dataset show that our method can learn a graph similarity metric tailored for a clinical application, improving the performance of a simple k-nn classifier by 11.9% compared to a traditional distance metric.
Tasks	Graph Similarity, Metric Learning
Published	2017-03-07
URL	http://arxiv.org/abs/1703.02161v2
PDF	http://arxiv.org/pdf/1703.02161v2.pdf
PWC	https://paperswithcode.com/paper/distance-metric-learning-using-graph
Repo	https://github.com/sheryl-ai/MemGCN
Framework	tf

Free Space Estimation using Occupancy Grids and Dynamic Object Detection


Title	Free Space Estimation using Occupancy Grids and Dynamic Object Detection
Authors	Raghavender Sahdev
Abstract	In this paper we present an approach to estimate Free Space from a Stereo image pair using stochastic occupancy grids. We do this in the domain of autonomous driving on the famous benchmark dataset KITTI. Later based on the generated occupancy grid we match 2 image sequences to compute the top view representation of the map. We do this to map the environment. We compute a transformation between the occupancy grids of two successive images and use it to compute the top view map. Two issues need to be addressed for mapping are discussed - computing a map and dealing with dynamic objects for computing the map. Dynamic Objects are detected in successive images based on an idea similar to tracking of foreground objects from the background objects based on motion flow. A novel RANSAC based segmentation approach has been proposed here to address this issue.
Tasks	Autonomous Driving, Object Detection
Published	2017-08-16
URL	http://arxiv.org/abs/1708.04989v1
PDF	http://arxiv.org/pdf/1708.04989v1.pdf
PWC	https://paperswithcode.com/paper/free-space-estimation-using-occupancy-grids
Repo	https://github.com/raghavendersahdev/Free-Space
Framework	none

On the Effects of Batch and Weight Normalization in Generative Adversarial Networks


Title	On the Effects of Batch and Weight Normalization in Generative Adversarial Networks
Authors	Sitao Xiang, Hao Li
Abstract	Generative adversarial networks (GANs) are highly effective unsupervised learning frameworks that can generate very sharp data, even for data such as images with complex, highly multimodal distributions. However GANs are known to be very hard to train, suffering from problems such as mode collapse and disturbing visual artifacts. Batch normalization (BN) techniques have been introduced to address the training. Though BN accelerates the training in the beginning, our experiments show that the use of BN can be unstable and negatively impact the quality of the trained model. The evaluation of BN and numerous other recent schemes for improving GAN training is hindered by the lack of an effective objective quality measure for GAN models. To address these issues, we first introduce a weight normalization (WN) approach for GAN training that significantly improves the stability, efficiency and the quality of the generated samples. To allow a methodical evaluation, we introduce squared Euclidean reconstruction error on a test set as a new objective measure, to assess training performance in terms of speed, stability, and quality of generated samples. Our experiments with a standard DCGAN architecture on commonly used datasets (CelebA, LSUN bedroom, and CIFAR-10) indicate that training using WN is generally superior to BN for GANs, achieving 10% lower mean squared loss for reconstruction and significantly better qualitative results than BN. We further demonstrate the stability of WN on a 21-layer ResNet trained with the CelebA data set. The code for this paper is available at https://github.com/stormraiser/gan-weightnorm-resnet
Tasks
Published	2017-04-13
URL	http://arxiv.org/abs/1704.03971v4
PDF	http://arxiv.org/pdf/1704.03971v4.pdf
PWC	https://paperswithcode.com/paper/on-the-effects-of-batch-and-weight
Repo	https://github.com/nardeas/MHGAN
Framework	tf

The Sample Complexity of Online One-Class Collaborative Filtering


Title	The Sample Complexity of Online One-Class Collaborative Filtering
Authors	Reinhard Heckel, Kannan Ramchandran
Abstract	We consider the online one-class collaborative filtering (CF) problem that consists of recommending items to users over time in an online fashion based on positive ratings only. This problem arises when users respond only occasionally to a recommendation with a positive rating, and never with a negative one. We study the impact of the probability of a user responding to a recommendation, p_f, on the sample complexity, i.e., the number of ratings required to make `good’ recommendations, and ask whether receiving positive and negative ratings, instead of positive ratings only, improves the sample complexity. Both questions arise in the design of recommender systems. We introduce a simple probabilistic user model, and analyze the performance of an online user-based CF algorithm. We prove that after an initial cold start phase, where recommendations are invested in exploring the user’s preferences, this algorithm makes—up to a fraction of the recommendations required for updating the user’s preferences—perfect recommendations. The number of ratings required for the cold start phase is nearly proportional to 1/p_f, and that for updating the user’s preferences is essentially independent of p_f. As a consequence we find that, receiving positive and negative ratings instead of only positive ones improves the number of ratings required for initial exploration by a factor of 1/p_f, which can be significant. \|
Tasks	Recommendation Systems
Published	2017-05-31
URL	http://arxiv.org/abs/1706.00061v1
PDF	http://arxiv.org/pdf/1706.00061v1.pdf
PWC	https://paperswithcode.com/paper/the-sample-complexity-of-online-one-class
Repo	https://github.com/Atomu2014/product-nets-distributed
Framework	tf