Paper Group AWR 186
Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection. CloudScan - A configuration-free invoice analysis system using recurrent neural networks. Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training. Pitfalls and Best Practices in Algorithm Configuration. Experiment Segmentation in Scientific …
Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
Title | Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection |
Authors | Emre Çakır, Giambattista Parascandolo, Toni Heittola, Heikki Huttunen, Tuomas Virtanen |
Abstract | Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure. Convolutional neural networks (CNN) are able to extract higher level features that are invariant to local spectral and temporal variations. Recurrent neural networks (RNNs) are powerful in learning the longer term temporal context in the audio signals. CNNs and RNNs as classifiers have recently shown improved performances over established methods in various sound recognition tasks. We combine these two approaches in a Convolutional Recurrent Neural Network (CRNN) and apply it on a polyphonic sound event detection task. We compare the performance of the proposed CRNN method with CNN, RNN, and other established methods, and observe a considerable improvement for four different datasets consisting of everyday sound events. |
Tasks | Sound Event Detection |
Published | 2017-02-21 |
URL | http://arxiv.org/abs/1702.06286v1 |
http://arxiv.org/pdf/1702.06286v1.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-recurrent-neural-networks-for-6 |
Repo | https://github.com/cchinchristopherj/Right-Whale-Unsupervised-Model |
Framework | tf |
CloudScan - A configuration-free invoice analysis system using recurrent neural networks
Title | CloudScan - A configuration-free invoice analysis system using recurrent neural networks |
Authors | Rasmus Berg Palm, Ole Winther, Florian Laws |
Abstract | We present CloudScan; an invoice analysis system that requires zero configuration or upfront annotation. In contrast to previous work, CloudScan does not rely on templates of invoice layout, instead it learns a single global model of invoices that naturally generalizes to unseen invoice layouts. The model is trained using data automatically extracted from end-user provided feedback. This automatic training data extraction removes the requirement for users to annotate the data precisely. We describe a recurrent neural network model that can capture long range context and compare it to a baseline logistic regression model corresponding to the current CloudScan production system. We train and evaluate the system on 8 important fields using a dataset of 326,471 invoices. The recurrent neural network and baseline model achieve 0.891 and 0.887 average F1 scores respectively on seen invoice layouts. For the harder task of unseen invoice layouts, the recurrent neural network model outperforms the baseline with 0.840 average F1 compared to 0.788. |
Tasks | |
Published | 2017-08-24 |
URL | http://arxiv.org/abs/1708.07403v1 |
http://arxiv.org/pdf/1708.07403v1.pdf | |
PWC | https://paperswithcode.com/paper/cloudscan-a-configuration-free-invoice |
Repo | https://github.com/naiveHobo/InvoiceNet |
Framework | none |
Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training
Title | Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training |
Authors | Rakshith Shetty, Marcus Rohrbach, Lisa Anne Hendricks, Mario Fritz, Bernt Schiele |
Abstract | While strong progress has been made in image captioning over the last years, machine and human captions are still quite distinct. A closer look reveals that this is due to the deficiencies in the generated word distribution, vocabulary size, and strong bias in the generators towards frequent captions. Furthermore, humans – rightfully so – generate multiple, diverse captions, due to the inherent ambiguity in the captioning task which is not considered in today’s systems. To address these challenges, we change the training objective of the caption generator from reproducing groundtruth captions to generating a set of captions that is indistinguishable from human generated captions. Instead of handcrafting such a learning target, we employ adversarial training in combination with an approximate Gumbel sampler to implicitly match the generated distribution to the human one. While our method achieves comparable performance to the state-of-the-art in terms of the correctness of the captions, we generate a set of diverse captions, that are significantly less biased and match the word statistics better in several aspects. |
Tasks | Image Captioning |
Published | 2017-03-30 |
URL | http://arxiv.org/abs/1703.10476v2 |
http://arxiv.org/pdf/1703.10476v2.pdf | |
PWC | https://paperswithcode.com/paper/speaking-the-same-language-matching-machine |
Repo | https://github.com/rakshithShetty/captionGAN |
Framework | none |
Pitfalls and Best Practices in Algorithm Configuration
Title | Pitfalls and Best Practices in Algorithm Configuration |
Authors | Katharina Eggensperger, Marius Lindauer, Frank Hutter |
Abstract | Good parameter settings are crucial to achieve high performance in many areas of artificial intelligence (AI), such as propositional satisfiability solving, AI planning, scheduling, and machine learning (in particular deep learning). Automated algorithm configuration methods have recently received much attention in the AI community since they replace tedious, irreproducible and error-prone manual parameter tuning and can lead to new state-of-the-art performance. However, practical applications of algorithm configuration are prone to several (often subtle) pitfalls in the experimental design that can render the procedure ineffective. We identify several common issues and propose best practices for avoiding them. As one possibility for automatically handling as many of these as possible, we also propose a tool called GenericWrapper4AC. |
Tasks | |
Published | 2017-05-17 |
URL | http://arxiv.org/abs/1705.06058v3 |
http://arxiv.org/pdf/1705.06058v3.pdf | |
PWC | https://paperswithcode.com/paper/pitfalls-and-best-practices-in-algorithm |
Repo | https://github.com/mlindauer/GenericWrapper4AC |
Framework | none |
Experiment Segmentation in Scientific Discourse as Clause-level Structured Prediction using Recurrent Neural Networks
Title | Experiment Segmentation in Scientific Discourse as Clause-level Structured Prediction using Recurrent Neural Networks |
Authors | Pradeep Dasigi, Gully A. P. C. Burns, Eduard Hovy, Anita de Waard |
Abstract | We propose a deep learning model for identifying structure within experiment narratives in scientific literature. We take a sequence labeling approach to this problem, and label clauses within experiment narratives to identify the different parts of the experiment. Our dataset consists of paragraphs taken from open access PubMed papers labeled with rhetorical information as a result of our pilot annotation. Our model is a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) cells that labels clauses. The clause representations are computed by combining word representations using a novel attention mechanism that involves a separate RNN. We compare this model against LSTMs where the input layer has simple or no attention and a feature rich CRF model. Furthermore, we describe how our work could be useful for information extraction from scientific literature. |
Tasks | Structured Prediction |
Published | 2017-02-17 |
URL | http://arxiv.org/abs/1702.05398v1 |
http://arxiv.org/pdf/1702.05398v1.pdf | |
PWC | https://paperswithcode.com/paper/experiment-segmentation-in-scientific |
Repo | https://github.com/edvisees/sciDT |
Framework | none |
Cloud Radiative Effect Study Using Sky Camera
Title | Cloud Radiative Effect Study Using Sky Camera |
Authors | Soumyabrata Dev, Shilpa Manandhar, Feng Yuan, Yee Hui Lee, Stefan Winkler |
Abstract | The analysis of clouds in the earth’s atmosphere is important for a variety of applications, viz. weather reporting, climate forecasting, and solar energy generation. In this paper, we focus our attention on the impact of cloud on the total solar irradiance reaching the earth’s surface. We use weather station to record the total solar irradiance. Moreover, we employ collocated ground-based sky camera to automatically compute the instantaneous cloud coverage. We analyze the relationship between measured solar irradiance and computed cloud coverage value, and conclude that higher cloud coverage greatly impacts the total solar irradiance. Such studies will immensely help in solar energy generation and forecasting. |
Tasks | |
Published | 2017-03-15 |
URL | http://arxiv.org/abs/1703.05591v1 |
http://arxiv.org/pdf/1703.05591v1.pdf | |
PWC | https://paperswithcode.com/paper/cloud-radiative-effect-study-using-sky-camera |
Repo | https://github.com/Soumyabrata/cloud-radiative-effect |
Framework | none |
Revised Note on Learning Algorithms for Quadratic Assignment with Graph Neural Networks
Title | Revised Note on Learning Algorithms for Quadratic Assignment with Graph Neural Networks |
Authors | Alex Nowak, Soledad Villar, Afonso S. Bandeira, Joan Bruna |
Abstract | Inverse problems correspond to a certain type of optimization problems formulated over appropriate input distributions. Recently, there has been a growing interest in understanding the computational hardness of these optimization problems, not only in the worst case, but in an average-complexity sense under this same input distribution. In this revised note, we are interested in studying another aspect of hardness, related to the ability to learn how to solve a problem by simply observing a collection of previously solved instances. These ‘planted solutions’ are used to supervise the training of an appropriate predictive model that parametrizes a broad class of algorithms, with the hope that the resulting model will provide good accuracy-complexity tradeoffs in the average sense. We illustrate this setup on the Quadratic Assignment Problem, a fundamental problem in Network Science. We observe that data-driven models based on Graph Neural Networks offer intriguingly good performance, even in regimes where standard relaxation based techniques appear to suffer. |
Tasks | |
Published | 2017-06-22 |
URL | http://arxiv.org/abs/1706.07450v2 |
http://arxiv.org/pdf/1706.07450v2.pdf | |
PWC | https://paperswithcode.com/paper/revised-note-on-learning-algorithms-for |
Repo | https://github.com/alexnowakvila/QAP_pt |
Framework | pytorch |
clcNet: Improving the Efficiency of Convolutional Neural Network using Channel Local Convolutions
Title | clcNet: Improving the Efficiency of Convolutional Neural Network using Channel Local Convolutions |
Authors | Dong-Qing Zhang |
Abstract | Depthwise convolution and grouped convolution has been successfully applied to improve the efficiency of convolutional neural network (CNN). We suggest that these models can be considered as special cases of a generalized convolution operation, named channel local convolution(CLC), where an output channel is computed using a subset of the input channels. This definition entails computation dependency relations between input and output channels, which can be represented by a channel dependency graph(CDG). By modifying the CDG of grouped convolution, a new CLC kernel named interlaced grouped convolution (IGC) is created. Stacking IGC and GC kernels results in a convolution block (named CLC Block) for approximating regular convolution. By resorting to the CDG as an analysis tool, we derive the rule for setting the meta-parameters of IGC and GC and the framework for minimizing the computational cost. A new CNN model named clcNet is then constructed using CLC blocks, which shows significantly higher computational efficiency and fewer parameters compared to state-of-the-art networks, when being tested using the ImageNet-1K dataset. Source code is available at https://github.com/dqzhang17/clcnet.torch . |
Tasks | |
Published | 2017-12-17 |
URL | http://arxiv.org/abs/1712.06145v3 |
http://arxiv.org/pdf/1712.06145v3.pdf | |
PWC | https://paperswithcode.com/paper/clcnet-improving-the-efficiency-of |
Repo | https://github.com/dqzhang17/clcnet.torch |
Framework | torch |
Variational Continual Learning
Title | Variational Continual Learning |
Authors | Cuong V. Nguyen, Yingzhen Li, Thang D. Bui, Richard E. Turner |
Abstract | This paper develops variational continual learning (VCL), a simple but general framework for continual learning that fuses online variational inference (VI) and recent advances in Monte Carlo VI for neural networks. The framework can successfully train both deep discriminative models and deep generative models in complex continual learning settings where existing tasks evolve over time and entirely new tasks emerge. Experimental results show that VCL outperforms state-of-the-art continual learning methods on a variety of tasks, avoiding catastrophic forgetting in a fully automatic way. |
Tasks | Continual Learning |
Published | 2017-10-29 |
URL | http://arxiv.org/abs/1710.10628v3 |
http://arxiv.org/pdf/1710.10628v3.pdf | |
PWC | https://paperswithcode.com/paper/variational-continual-learning |
Repo | https://github.com/aml-team-2/aml-reproducability-challenge |
Framework | pytorch |
PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes
Title | PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes |
Authors | Yu Xiang, Tanner Schmidt, Venkatraman Narayanan, Dieter Fox |
Abstract | Estimating the 6D pose of known objects is important for robots to interact with the real world. The problem is challenging due to the variety of objects as well as the complexity of a scene caused by clutter and occlusions between objects. In this work, we introduce PoseCNN, a new Convolutional Neural Network for 6D object pose estimation. PoseCNN estimates the 3D translation of an object by localizing its center in the image and predicting its distance from the camera. The 3D rotation of the object is estimated by regressing to a quaternion representation. We also introduce a novel loss function that enables PoseCNN to handle symmetric objects. In addition, we contribute a large scale video dataset for 6D object pose estimation named the YCB-Video dataset. Our dataset provides accurate 6D poses of 21 objects from the YCB dataset observed in 92 videos with 133,827 frames. We conduct extensive experiments on our YCB-Video dataset and the OccludedLINEMOD dataset to show that PoseCNN is highly robust to occlusions, can handle symmetric objects, and provide accurate pose estimation using only color images as input. When using depth data to further refine the poses, our approach achieves state-of-the-art results on the challenging OccludedLINEMOD dataset. Our code and dataset are available at https://rse-lab.cs.washington.edu/projects/posecnn/. |
Tasks | 6D Pose Estimation, 6D Pose Estimation using RGB, 6D Pose Estimation using RGBD, Pose Estimation |
Published | 2017-11-01 |
URL | http://arxiv.org/abs/1711.00199v3 |
http://arxiv.org/pdf/1711.00199v3.pdf | |
PWC | https://paperswithcode.com/paper/posecnn-a-convolutional-neural-network-for-6d |
Repo | https://github.com/yuxng/PoseCNN |
Framework | tf |
Reservoir Computing Using Non-Uniform Binary Cellular Automata
Title | Reservoir Computing Using Non-Uniform Binary Cellular Automata |
Authors | Stefano Nichele, Magnus S. Gundersen |
Abstract | The Reservoir Computing (RC) paradigm utilizes a dynamical system, i.e., a reservoir, and a linear classifier, i.e., a read-out layer, to process data from sequential classification tasks. In this paper the usage of Cellular Automata (CA) as a reservoir is investigated. The use of CA in RC has been showing promising results. In this paper, selected state-of-the-art experiments are reproduced. It is shown that some CA-rules perform better than others, and the reservoir performance is improved by increasing the size of the CA reservoir itself. In addition, the usage of parallel loosely coupled CA-reservoirs, where each reservoir has a different CA-rule, is investigated. The experiments performed on quasi-uniform CA reservoir provide valuable insights in CA reservoir design. The results herein show that some rules do not work well together, while other combinations work remarkably well. This suggests that non-uniform CA could represent a powerful tool for novel CA reservoir implementations. |
Tasks | |
Published | 2017-02-13 |
URL | http://arxiv.org/abs/1702.03812v1 |
http://arxiv.org/pdf/1702.03812v1.pdf | |
PWC | https://paperswithcode.com/paper/reservoir-computing-using-non-uniform-binary |
Repo | https://github.com/magnusgundersen/spec |
Framework | none |
Distance Metric Learning using Graph Convolutional Networks: Application to Functional Brain Networks
Title | Distance Metric Learning using Graph Convolutional Networks: Application to Functional Brain Networks |
Authors | Sofia Ira Ktena, Sarah Parisot, Enzo Ferrante, Martin Rajchl, Matthew Lee, Ben Glocker, Daniel Rueckert |
Abstract | Evaluating similarity between graphs is of major importance in several computer vision and pattern recognition problems, where graph representations are often used to model objects or interactions between elements. The choice of a distance or similarity metric is, however, not trivial and can be highly dependent on the application at hand. In this work, we propose a novel metric learning method to evaluate distance between graphs that leverages the power of convolutional neural networks, while exploiting concepts from spectral graph theory to allow these operations on irregular graphs. We demonstrate the potential of our method in the field of connectomics, where neuronal pathways or functional connections between brain regions are commonly modelled as graphs. In this problem, the definition of an appropriate graph similarity function is critical to unveil patterns of disruptions associated with certain brain disorders. Experimental results on the ABIDE dataset show that our method can learn a graph similarity metric tailored for a clinical application, improving the performance of a simple k-nn classifier by 11.9% compared to a traditional distance metric. |
Tasks | Graph Similarity, Metric Learning |
Published | 2017-03-07 |
URL | http://arxiv.org/abs/1703.02161v2 |
http://arxiv.org/pdf/1703.02161v2.pdf | |
PWC | https://paperswithcode.com/paper/distance-metric-learning-using-graph |
Repo | https://github.com/sheryl-ai/MemGCN |
Framework | tf |
Free Space Estimation using Occupancy Grids and Dynamic Object Detection
Title | Free Space Estimation using Occupancy Grids and Dynamic Object Detection |
Authors | Raghavender Sahdev |
Abstract | In this paper we present an approach to estimate Free Space from a Stereo image pair using stochastic occupancy grids. We do this in the domain of autonomous driving on the famous benchmark dataset KITTI. Later based on the generated occupancy grid we match 2 image sequences to compute the top view representation of the map. We do this to map the environment. We compute a transformation between the occupancy grids of two successive images and use it to compute the top view map. Two issues need to be addressed for mapping are discussed - computing a map and dealing with dynamic objects for computing the map. Dynamic Objects are detected in successive images based on an idea similar to tracking of foreground objects from the background objects based on motion flow. A novel RANSAC based segmentation approach has been proposed here to address this issue. |
Tasks | Autonomous Driving, Object Detection |
Published | 2017-08-16 |
URL | http://arxiv.org/abs/1708.04989v1 |
http://arxiv.org/pdf/1708.04989v1.pdf | |
PWC | https://paperswithcode.com/paper/free-space-estimation-using-occupancy-grids |
Repo | https://github.com/raghavendersahdev/Free-Space |
Framework | none |
On the Effects of Batch and Weight Normalization in Generative Adversarial Networks
Title | On the Effects of Batch and Weight Normalization in Generative Adversarial Networks |
Authors | Sitao Xiang, Hao Li |
Abstract | Generative adversarial networks (GANs) are highly effective unsupervised learning frameworks that can generate very sharp data, even for data such as images with complex, highly multimodal distributions. However GANs are known to be very hard to train, suffering from problems such as mode collapse and disturbing visual artifacts. Batch normalization (BN) techniques have been introduced to address the training. Though BN accelerates the training in the beginning, our experiments show that the use of BN can be unstable and negatively impact the quality of the trained model. The evaluation of BN and numerous other recent schemes for improving GAN training is hindered by the lack of an effective objective quality measure for GAN models. To address these issues, we first introduce a weight normalization (WN) approach for GAN training that significantly improves the stability, efficiency and the quality of the generated samples. To allow a methodical evaluation, we introduce squared Euclidean reconstruction error on a test set as a new objective measure, to assess training performance in terms of speed, stability, and quality of generated samples. Our experiments with a standard DCGAN architecture on commonly used datasets (CelebA, LSUN bedroom, and CIFAR-10) indicate that training using WN is generally superior to BN for GANs, achieving 10% lower mean squared loss for reconstruction and significantly better qualitative results than BN. We further demonstrate the stability of WN on a 21-layer ResNet trained with the CelebA data set. The code for this paper is available at https://github.com/stormraiser/gan-weightnorm-resnet |
Tasks | |
Published | 2017-04-13 |
URL | http://arxiv.org/abs/1704.03971v4 |
http://arxiv.org/pdf/1704.03971v4.pdf | |
PWC | https://paperswithcode.com/paper/on-the-effects-of-batch-and-weight |
Repo | https://github.com/nardeas/MHGAN |
Framework | tf |
The Sample Complexity of Online One-Class Collaborative Filtering
Title | The Sample Complexity of Online One-Class Collaborative Filtering |
Authors | Reinhard Heckel, Kannan Ramchandran |
Abstract | We consider the online one-class collaborative filtering (CF) problem that consists of recommending items to users over time in an online fashion based on positive ratings only. This problem arises when users respond only occasionally to a recommendation with a positive rating, and never with a negative one. We study the impact of the probability of a user responding to a recommendation, p_f, on the sample complexity, i.e., the number of ratings required to make `good’ recommendations, and ask whether receiving positive and negative ratings, instead of positive ratings only, improves the sample complexity. Both questions arise in the design of recommender systems. We introduce a simple probabilistic user model, and analyze the performance of an online user-based CF algorithm. We prove that after an initial cold start phase, where recommendations are invested in exploring the user’s preferences, this algorithm makes—up to a fraction of the recommendations required for updating the user’s preferences—perfect recommendations. The number of ratings required for the cold start phase is nearly proportional to 1/p_f, and that for updating the user’s preferences is essentially independent of p_f. As a consequence we find that, receiving positive and negative ratings instead of only positive ones improves the number of ratings required for initial exploration by a factor of 1/p_f, which can be significant. | |
Tasks | Recommendation Systems |
Published | 2017-05-31 |
URL | http://arxiv.org/abs/1706.00061v1 |
http://arxiv.org/pdf/1706.00061v1.pdf | |
PWC | https://paperswithcode.com/paper/the-sample-complexity-of-online-one-class |
Repo | https://github.com/Atomu2014/product-nets-distributed |
Framework | tf |