Paper Group AWR 57
Concolic Testing for Deep Neural Networks. Active Neural Localization. Fast Convex Pruning of Deep Neural Networks. A Skeleton-Based Model for Promoting Coherence Among Sentences in Narrative Story Generation. Improving Review Representations with User Attention and Product Attention for Sentiment Classification. dhSegment: A generic deep-learning …
Concolic Testing for Deep Neural Networks
Title | Concolic Testing for Deep Neural Networks |
Authors | Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, Daniel Kroening |
Abstract | Concolic testing combines program execution and symbolic analysis to explore the execution paths of a software program. This paper presents the first concolic testing approach for Deep Neural Networks (DNNs). More specifically, we formalise coverage criteria for DNNs that have been studied in the literature, and then develop a coherent method for performing concolic testing to increase test coverage. Our experimental results show the effectiveness of the concolic testing approach in both achieving high coverage and finding adversarial examples. |
Tasks | |
Published | 2018-04-30 |
URL | http://arxiv.org/abs/1805.00089v2 |
http://arxiv.org/pdf/1805.00089v2.pdf | |
PWC | https://paperswithcode.com/paper/concolic-testing-for-deep-neural-networks |
Repo | https://github.com/TrustAI/DeepConcolic |
Framework | tf |
Active Neural Localization
Title | Active Neural Localization |
Authors | Devendra Singh Chaplot, Emilio Parisotto, Ruslan Salakhutdinov |
Abstract | Localization is the problem of estimating the location of an autonomous agent from an observation and a map of the environment. Traditional methods of localization, which filter the belief based on the observations, are sub-optimal in the number of steps required, as they do not decide the actions taken by the agent. We propose “Active Neural Localizer”, a fully differentiable neural network that learns to localize accurately and efficiently. The proposed model incorporates ideas of traditional filtering-based localization methods, by using a structured belief of the state with multiplicative interactions to propagate belief, and combines it with a policy model to localize accurately while minimizing the number of steps required for localization. Active Neural Localizer is trained end-to-end with reinforcement learning. We use a variety of simulation environments for our experiments which include random 2D mazes, random mazes in the Doom game engine and a photo-realistic environment in the Unreal game engine. The results on the 2D environments show the effectiveness of the learned policy in an idealistic setting while results on the 3D environments demonstrate the model’s capability of learning the policy and perceptual model jointly from raw-pixel based RGB observations. We also show that a model trained on random textures in the Doom environment generalizes well to a photo-realistic office space environment in the Unreal engine. |
Tasks | FPS Games, Game of Doom |
Published | 2018-01-24 |
URL | http://arxiv.org/abs/1801.08214v1 |
http://arxiv.org/pdf/1801.08214v1.pdf | |
PWC | https://paperswithcode.com/paper/active-neural-localization |
Repo | https://github.com/devendrachaplot/Neural-Localization |
Framework | pytorch |
Fast Convex Pruning of Deep Neural Networks
Title | Fast Convex Pruning of Deep Neural Networks |
Authors | Alireza Aghasi, Afshin Abdi, Justin Romberg |
Abstract | We develop a fast, tractable technique called Net-Trim for simplifying a trained neural network. The method is a convex post-processing module, which prunes (sparsifies) a trained network layer by layer, while preserving the internal responses. We present a comprehensive analysis of Net-Trim from both the algorithmic and sample complexity standpoints, centered on a fast, scalable convex optimization program. Our analysis includes consistency results between the initial and retrained models before and after Net-Trim application and guarantees on the number of training samples needed to discover a network that can be expressed using a certain number of nonzero terms. Specifically, if there is a set of weights that uses at most $s$ terms that can re-create the layer outputs from the layer inputs, we can find these weights from $\mathcal{O}(s\log N/s)$ samples, where $N$ is the input size. These theoretical results are similar to those for sparse regression using the Lasso, and our analysis uses some of the same recently-developed tools (namely recent results on the concentration of measure and convex analysis). Finally, we propose an algorithmic framework based on the alternating direction method of multipliers (ADMM), which allows a fast and simple implementation of Net-Trim for network pruning and compression. |
Tasks | Network Pruning |
Published | 2018-06-17 |
URL | http://arxiv.org/abs/1806.06457v2 |
http://arxiv.org/pdf/1806.06457v2.pdf | |
PWC | https://paperswithcode.com/paper/fast-convex-pruning-of-deep-neural-networks |
Repo | https://github.com/DNNToolBox/Net-Trim-v1 |
Framework | tf |
A Skeleton-Based Model for Promoting Coherence Among Sentences in Narrative Story Generation
Title | A Skeleton-Based Model for Promoting Coherence Among Sentences in Narrative Story Generation |
Authors | Jingjing Xu, Xuancheng Ren, Yi Zhang, Qi Zeng, Xiaoyan Cai, Xu Sun |
Abstract | Narrative story generation is a challenging problem because it demands the generated sentences with tight semantic connections, which has not been well studied by most existing generative models. To address this problem, we propose a skeleton-based model to promote the coherence of generated stories. Different from traditional models that generate a complete sentence at a stroke, the proposed model first generates the most critical phrases, called skeleton, and then expands the skeleton to a complete and fluent sentence. The skeleton is not manually defined, but learned by a reinforcement learning method. Compared to the state-of-the-art models, our skeleton-based model can generate significantly more coherent text according to human evaluation and automatic evaluation. The G-score is improved by 20.1% in the human evaluation. The code is available at https://github.com/lancopku/Skeleton-Based-Generation-Model |
Tasks | |
Published | 2018-08-21 |
URL | http://arxiv.org/abs/1808.06945v2 |
http://arxiv.org/pdf/1808.06945v2.pdf | |
PWC | https://paperswithcode.com/paper/a-skeleton-based-model-for-promoting |
Repo | https://github.com/lancopku/Skeleton-Based-Generation-Model |
Framework | tf |
Improving Review Representations with User Attention and Product Attention for Sentiment Classification
Title | Improving Review Representations with User Attention and Product Attention for Sentiment Classification |
Authors | Zhen Wu, Xin-Yu Dai, Cunyan Yin, Shujian Huang, Jiajun Chen |
Abstract | Neural network methods have achieved great success in reviews sentiment classification. Recently, some works achieved improvement by incorporating user and product information to generate a review representation. However, in reviews, we observe that some words or sentences show strong user’s preference, and some others tend to indicate product’s characteristic. The two kinds of information play different roles in determining the sentiment label of a review. Therefore, it is not reasonable to encode user and product information together into one representation. In this paper, we propose a novel framework to encode user and product information. Firstly, we apply two individual hierarchical neural networks to generate two representations, with user attention or with product attention. Then, we design a combined strategy to make full use of the two representations for training and final prediction. The experimental results show that our model obviously outperforms other state-of-the-art methods on IMDB and Yelp datasets. Through the visualization of attention over words related to user or product, we validate our observation mentioned above. |
Tasks | Sentiment Analysis |
Published | 2018-01-24 |
URL | http://arxiv.org/abs/1801.07861v1 |
http://arxiv.org/pdf/1801.07861v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-review-representations-with-user |
Repo | https://github.com/wuzhen247/HUAPA |
Framework | tf |
dhSegment: A generic deep-learning approach for document segmentation
Title | dhSegment: A generic deep-learning approach for document segmentation |
Authors | Sofia Ares Oliveira, Benoit Seguin, Frederic Kaplan |
Abstract | In recent years there have been multiple successful attempts tackling document processing problems separately by designing task specific hand-tuned strategies. We argue that the diversity of historical document processing tasks prohibits to solve them one at a time and shows a need for designing generic approaches in order to handle the variability of historical series. In this paper, we address multiple tasks simultaneously such as page extraction, baseline extraction, layout analysis or multiple typologies of illustrations and photograph extraction. We propose an open-source implementation of a CNN-based pixel-wise predictor coupled with task dependent post-processing blocks. We show that a single CNN-architecture can be used across tasks with competitive results. Moreover most of the task-specific post-precessing steps can be decomposed in a small number of simple and standard reusable operations, adding to the flexibility of our approach. |
Tasks | |
Published | 2018-04-27 |
URL | https://arxiv.org/abs/1804.10371v2 |
https://arxiv.org/pdf/1804.10371v2.pdf | |
PWC | https://paperswithcode.com/paper/dhsegment-a-generic-deep-learning-approach |
Repo | https://github.com/raphaelBarman/dhSegment |
Framework | tf |
The Sound of Pixels
Title | The Sound of Pixels |
Authors | Hang Zhao, Chuang Gan, Andrew Rouditchenko, Carl Vondrick, Josh McDermott, Antonio Torralba |
Abstract | We introduce PixelPlayer, a system that, by leveraging large amounts of unlabeled videos, learns to locate image regions which produce sounds and separate the input sounds into a set of components that represents the sound from each pixel. Our approach capitalizes on the natural synchronization of the visual and audio modalities to learn models that jointly parse sounds and images, without requiring additional manual supervision. Experimental results on a newly collected MUSIC dataset show that our proposed Mix-and-Separate framework outperforms several baselines on source separation. Qualitative results suggest our model learns to ground sounds in vision, enabling applications such as independently adjusting the volume of sound sources. |
Tasks | |
Published | 2018-04-09 |
URL | http://arxiv.org/abs/1804.03160v4 |
http://arxiv.org/pdf/1804.03160v4.pdf | |
PWC | https://paperswithcode.com/paper/the-sound-of-pixels |
Repo | https://github.com/hangzhaomit/Sound-of-Pixels |
Framework | pytorch |
AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation
Title | AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation |
Authors | Thibault Groueix, Matthew Fisher, Vladimir G. Kim, Bryan C. Russell, Mathieu Aubry |
Abstract | We introduce a method for learning to generate the surface of 3D shapes. Our approach represents a 3D shape as a collection of parametric surface elements and, in contrast to methods generating voxel grids or point clouds, naturally infers a surface representation of the shape. Beyond its novelty, our new shape generation framework, AtlasNet, comes with significant advantages, such as improved precision and generalization capabilities, and the possibility to generate a shape of arbitrary resolution without memory issues. We demonstrate these benefits and compare to strong baselines on the ShapeNet benchmark for two applications: (i) auto-encoding shapes, and (ii) single-view reconstruction from a still image. We also provide results showing its potential for other applications, such as morphing, parametrization, super-resolution, matching, and co-segmentation. |
Tasks | 3D Surface Generation, Super-Resolution |
Published | 2018-02-15 |
URL | http://arxiv.org/abs/1802.05384v3 |
http://arxiv.org/pdf/1802.05384v3.pdf | |
PWC | https://paperswithcode.com/paper/atlasnet-a-papier-mache-approach-to-learning |
Repo | https://github.com/ThibaultGROUEIX/AtlasNet |
Framework | pytorch |
Audio Tagging With Connectionist Temporal Classification Model Using Sequential Labelled Data
Title | Audio Tagging With Connectionist Temporal Classification Model Using Sequential Labelled Data |
Authors | Yuanbo Hou, Qiuqiang Kong, Shengchen Li |
Abstract | Audio tagging aims to predict one or several labels in an audio clip. Many previous works use weakly labelled data (WLD) for audio tagging, where only presence or absence of sound events is known, but the order of sound events is unknown. To use the order information of sound events, we propose sequential labelled data (SLD), where both the presence or absence and the order information of sound events are known. To utilize SLD in audio tagging, we propose a Convolutional Recurrent Neural Network followed by a Connectionist Temporal Classification (CRNN-CTC) objective function to map from an audio clip spectrogram to SLD. Experiments show that CRNN-CTC obtains an Area Under Curve (AUC) score of 0.986 in audio tagging, outperforming the baseline CRNN of 0.908 and 0.815 with Max Pooling and Average Pooling, respectively. In addition, we show CRNN-CTC has the ability to predict the order of sound events in an audio clip. |
Tasks | Audio Tagging |
Published | 2018-08-06 |
URL | http://arxiv.org/abs/1808.01935v1 |
http://arxiv.org/pdf/1808.01935v1.pdf | |
PWC | https://paperswithcode.com/paper/audio-tagging-with-connectionist-temporal |
Repo | https://github.com/iooops/CS221-Audio-Tagging |
Framework | none |
Kernel machines that adapt to GPUs for effective large batch training
Title | Kernel machines that adapt to GPUs for effective large batch training |
Authors | Siyuan Ma, Mikhail Belkin |
Abstract | Modern machine learning models are typically trained using Stochastic Gradient Descent (SGD) on massively parallel computing resources such as GPUs. Increasing mini-batch size is a simple and direct way to utilize the parallel computing capacity. For small batch an increase in batch size results in the proportional reduction in the training time, a phenomenon known as linear scaling. However, increasing batch size beyond a certain value leads to no further improvement in training time. In this paper we develop the first analytical framework that extends linear scaling to match the parallel computing capacity of a resource. The framework is designed for a class of classical kernel machines. It automatically modifies a standard kernel machine to output a mathematically equivalent prediction function, yet allowing for extended linear scaling, i.e., higher effective parallelization and faster training time on given hardware. The resulting algorithms are accurate, principled and very fast. For example, using a single Titan Xp GPU, training on ImageNet with $1.3\times 10^6$ data points and $1000$ labels takes under an hour, while smaller datasets, such as MNIST, take seconds. As the parameters are chosen analytically, based on the theoretical bounds, little tuning beyond selecting the kernel and the kernel parameter is needed, further facilitating the practical use of these methods. |
Tasks | |
Published | 2018-06-15 |
URL | http://arxiv.org/abs/1806.06144v3 |
http://arxiv.org/pdf/1806.06144v3.pdf | |
PWC | https://paperswithcode.com/paper/kernel-machines-that-adapt-to-gpus-for |
Repo | https://github.com/EigenPro/EigenPro2 |
Framework | tf |
Deepcode: Feedback Codes via Deep Learning
Title | Deepcode: Feedback Codes via Deep Learning |
Authors | Hyeji Kim, Yihan Jiang, Sreeram Kannan, Sewoong Oh, Pramod Viswanath |
Abstract | The design of codes for communicating reliably over a statistically well defined channel is an important endeavor involving deep mathematical research and wide-ranging practical applications. In this work, we present the first family of codes obtained via deep learning, which significantly beats state-of-the-art codes designed over several decades of research. The communication channel under consideration is the Gaussian noise channel with feedback, whose study was initiated by Shannon; feedback is known theoretically to improve reliability of communication, but no practical codes that do so have ever been successfully constructed. We break this logjam by integrating information theoretic insights harmoniously with recurrent-neural-network based encoders and decoders to create novel codes that outperform known codes by 3 orders of magnitude in reliability. We also demonstrate several desirable properties of the codes: (a) generalization to larger block lengths, (b) composability with known codes, (c) adaptation to practical constraints. This result also has broader ramifications for coding theory: even when the channel has a clear mathematical model, deep learning methodologies, when combined with channel-specific information-theoretic insights, can potentially beat state-of-the-art codes constructed over decades of mathematical research. |
Tasks | |
Published | 2018-07-02 |
URL | http://arxiv.org/abs/1807.00801v1 |
http://arxiv.org/pdf/1807.00801v1.pdf | |
PWC | https://paperswithcode.com/paper/deepcode-feedback-codes-via-deep-learning |
Repo | https://github.com/hyejikim1/Deepcode |
Framework | tf |
Bonnet: An Open-Source Training and Deployment Framework for Semantic Segmentation in Robotics using CNNs
Title | Bonnet: An Open-Source Training and Deployment Framework for Semantic Segmentation in Robotics using CNNs |
Authors | Andres Milioto, Cyrill Stachniss |
Abstract | The ability to interpret a scene is an important capability for a robot that is supposed to interact with its environment. The knowledge of what is in front of the robot is, for example, relevant for navigation, manipulation, or planning. Semantic segmentation labels each pixel of an image with a class label and thus provides a detailed semantic annotation of the surroundings to the robot. Convolutional neural networks (CNNs) are popular methods for addressing this type of problem. The available software for training and the integration of CNNs for real robots, however, is quite fragmented and often difficult to use for non-experts, despite the availability of several high-quality open-source frameworks for neural network implementation and training. In this paper, we propose a tool called Bonnet, which addresses this fragmentation problem by building a higher abstraction that is specific for the semantic segmentation task. It provides a modular approach to simplify the training of a semantic segmentation CNN independently of the used dataset and the intended task. Furthermore, we also address the deployment on a real robotic platform. Thus, we do not propose a new CNN approach in this paper. Instead, we provide a stable and easy-to-use tool to make this technology more approachable in the context of autonomous systems. In this sense, we aim at closing a gap between computer vision research and its use in robotics research. We provide an open-source codebase for training and deployment. The training interface is implemented in Python using TensorFlow and the deployment interface provides a C++ library that can be easily integrated in an existing robotics codebase, a ROS node, and two standalone applications for label prediction in images and videos. |
Tasks | Semantic Segmentation |
Published | 2018-02-25 |
URL | http://arxiv.org/abs/1802.08960v2 |
http://arxiv.org/pdf/1802.08960v2.pdf | |
PWC | https://paperswithcode.com/paper/bonnet-an-open-source-training-and-deployment |
Repo | https://github.com/PRBonn/bonnet |
Framework | tf |
A near Pareto optimal approach to student-supervisor allocation with two sided preferences and workload balance
Title | A near Pareto optimal approach to student-supervisor allocation with two sided preferences and workload balance |
Authors | Victor Sanchez-Anguix, Rithin Chalumuri, Reyhan Aydogan, Vicente Julian |
Abstract | The problem of allocating students to supervisors for the development of a personal project or a dissertation is a crucial activity in the higher education environment, as it enables students to get feedback on their work from an expert and improve their personal, academic, and professional abilities. In this article, we propose a multi-objective and near Pareto optimal genetic algorithm for the allocation of students to supervisors. The allocation takes into consideration the students and supervisors’ preferences on research/project topics, the lower and upper supervision quotas of supervisors, as well as the workload balance amongst supervisors. We introduce novel mutation and crossover operators for the student-supervisor allocation problem. The experiments carried out show that the components of the genetic algorithm are more apt for the problem than classic components, and that the genetic algorithm is capable of producing allocations that are near Pareto optimal in a reasonable time. |
Tasks | |
Published | 2018-12-16 |
URL | http://arxiv.org/abs/1812.06474v1 |
http://arxiv.org/pdf/1812.06474v1.pdf | |
PWC | https://paperswithcode.com/paper/a-near-pareto-optimal-approach-to-student |
Repo | https://github.com/rithinch/Student-Supervisor-Allocation |
Framework | none |
Towards a Better Metric for Evaluating Question Generation Systems
Title | Towards a Better Metric for Evaluating Question Generation Systems |
Authors | Preksha Nema, Mitesh M. Khapra |
Abstract | There has always been criticism for using $n$-gram based similarity metrics, such as BLEU, NIST, etc, for evaluating the performance of NLG systems. However, these metrics continue to remain popular and are recently being used for evaluating the performance of systems which automatically generate questions from documents, knowledge graphs, images, etc. Given the rising interest in such automatic question generation (AQG) systems, it is important to objectively examine whether these metrics are suitable for this task. In particular, it is important to verify whether such metrics used for evaluating AQG systems focus on answerability of the generated question by preferring questions which contain all relevant information such as question type (Wh-types), entities, relations, etc. In this work, we show that current automatic evaluation metrics based on $n$-gram similarity do not always correlate well with human judgments about answerability of a question. To alleviate this problem and as a first step towards better evaluation metrics for AQG, we introduce a scoring function to capture answerability and show that when this scoring function is integrated with existing metrics, they correlate significantly better with human judgments. The scripts and data developed as a part of this work are made publicly available at https://github.com/PrekshaNema25/Answerability-Metric |
Tasks | Knowledge Graphs, Question Generation |
Published | 2018-08-30 |
URL | http://arxiv.org/abs/1808.10192v2 |
http://arxiv.org/pdf/1808.10192v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-a-better-metric-for-evaluating |
Repo | https://github.com/PrekshaNema25/Answerability-Metric |
Framework | none |
Boosting Black Box Variational Inference
Title | Boosting Black Box Variational Inference |
Authors | Francesco Locatello, Gideon Dresdner, Rajiv Khanna, Isabel Valera, Gunnar Rätsch |
Abstract | Approximating a probability density in a tractable manner is a central task in Bayesian statistics. Variational Inference (VI) is a popular technique that achieves tractability by choosing a relatively simple variational family. Borrowing ideas from the classic boosting framework, recent approaches attempt to \emph{boost} VI by replacing the selection of a single density with a greedily constructed mixture of densities. In order to guarantee convergence, previous works impose stringent assumptions that require significant effort for practitioners. Specifically, they require a custom implementation of the greedy step (called the LMO) for every probabilistic model with respect to an unnatural variational family of truncated distributions. Our work fixes these issues with novel theoretical and algorithmic insights. On the theoretical side, we show that boosting VI satisfies a relaxed smoothness assumption which is sufficient for the convergence of the functional Frank-Wolfe (FW) algorithm. Furthermore, we rephrase the LMO problem and propose to maximize the Residual ELBO (RELBO) which replaces the standard ELBO optimization in VI. These theoretical enhancements allow for black box implementation of the boosting subroutine. Finally, we present a stopping criterion drawn from the duality gap in the classic FW analyses and exhaustive experiments to illustrate the usefulness of our theoretical and algorithmic contributions. |
Tasks | |
Published | 2018-06-06 |
URL | http://arxiv.org/abs/1806.02185v5 |
http://arxiv.org/pdf/1806.02185v5.pdf | |
PWC | https://paperswithcode.com/paper/boosting-black-box-variational-inference |
Repo | https://github.com/ratschlab/boosting-bbvi |
Framework | tf |