January 29, 2020

3218 words 16 mins read

Paper Group ANR 547

Paper Group ANR 547

Learning to Make Generalizable and Diverse Predictions for Retrosynthesis. Animating Landscape: Self-Supervised Learning of Decoupled Motion and Appearance for Single-Image Video Synthesis. Omnipush: accurate, diverse, real-world dataset of pushing dynamics with RGB-D video. Architecture-aware Network Pruning for Vision Quality Applications. SPFlow …

Learning to Make Generalizable and Diverse Predictions for Retrosynthesis

Title Learning to Make Generalizable and Diverse Predictions for Retrosynthesis
Authors Benson Chen, Tianxiao Shen, Tommi S. Jaakkola, Regina Barzilay
Abstract We propose a new model for making generalizable and diverse retrosynthetic reaction predictions. Given a target compound, the task is to predict the likely chemical reactants to produce the target. This generative task can be framed as a sequence-to-sequence problem by using the SMILES representations of the molecules. Building on top of the popular Transformer architecture, we propose two novel pre-training methods that construct relevant auxiliary tasks (plausible reactions) for our problem. Furthermore, we incorporate a discrete latent variable model into the architecture to encourage the model to produce a diverse set of alternative predictions. On the 50k subset of reaction examples from the United States patent literature (USPTO-50k) benchmark dataset, our model greatly improves performance over the baseline, while also generating predictions that are more diverse.
Tasks
Published 2019-10-21
URL https://arxiv.org/abs/1910.09688v1
PDF https://arxiv.org/pdf/1910.09688v1.pdf
PWC https://paperswithcode.com/paper/learning-to-make-generalizable-and-diverse
Repo
Framework

Animating Landscape: Self-Supervised Learning of Decoupled Motion and Appearance for Single-Image Video Synthesis

Title Animating Landscape: Self-Supervised Learning of Decoupled Motion and Appearance for Single-Image Video Synthesis
Authors Yuki Endo, Yoshihiro Kanamori, Shigeru Kuriyama
Abstract Automatic generation of a high-quality video from a single image remains a challenging task despite the recent advances in deep generative models. This paper proposes a method that can create a high-resolution, long-term animation using convolutional neural networks (CNNs) from a single landscape image where we mainly focus on skies and waters. Our key observation is that the motion (e.g., moving clouds) and appearance (e.g., time-varying colors in the sky) in natural scenes have different time scales. We thus learn them separately and predict them with decoupled control while handling future uncertainty in both predictions by introducing latent codes. Unlike previous methods that infer output frames directly, our CNNs predict spatially-smooth intermediate data, i.e., for motion, flow fields for warping, and for appearance, color transfer maps, via self-supervised learning, i.e., without explicitly-provided ground truth. These intermediate data are applied not to each previous output frame, but to the input image only once for each output frame. This design is crucial to alleviate error accumulation in long-term predictions, which is the essential problem in previous recurrent approaches. The output frames can be looped like cinemagraph, and also be controlled directly by specifying latent codes or indirectly via visual annotations. We demonstrate the effectiveness of our method through comparisons with the state-of-the-arts on video prediction as well as appearance manipulation.
Tasks Video Prediction
Published 2019-10-16
URL https://arxiv.org/abs/1910.07192v1
PDF https://arxiv.org/pdf/1910.07192v1.pdf
PWC https://paperswithcode.com/paper/animating-landscape-self-supervised-learning
Repo
Framework

Omnipush: accurate, diverse, real-world dataset of pushing dynamics with RGB-D video

Title Omnipush: accurate, diverse, real-world dataset of pushing dynamics with RGB-D video
Authors Maria Bauza, Ferran Alet, Yen-Chen Lin, Tomas Lozano-Perez, Leslie P. Kaelbling, Phillip Isola, Alberto Rodriguez
Abstract Pushing is a fundamental robotic skill. Existing work has shown how to exploit models of pushing to achieve a variety of tasks, including grasping under uncertainty, in-hand manipulation and clearing clutter. Such models, however, are approximate, which limits their applicability. Learning-based methods can reason directly from raw sensory data with accuracy, and have the potential to generalize to a wider diversity of scenarios. However, developing and testing such methods requires rich-enough datasets. In this paper we introduce Omnipush, a dataset with high variety of planar pushing behavior. In particular, we provide 250 pushes for each of 250 objects, all recorded with RGB-D and a high precision tracking system. The objects are constructed so as to systematically explore key factors that affect pushing –the shape of the object and its mass distribution– which have not been broadly explored in previous datasets, and allow to study generalization in model learning. Omnipush includes a benchmark for meta-learning dynamic models, which requires algorithms that make good predictions and estimate their own uncertainty. We also provide an RGB video prediction benchmark and propose other relevant tasks that can be suited with this dataset. Data and code are available at \url{https://web.mit.edu/mcube/omnipush-dataset/}.
Tasks Meta-Learning, Video Prediction
Published 2019-10-01
URL https://arxiv.org/abs/1910.00618v1
PDF https://arxiv.org/pdf/1910.00618v1.pdf
PWC https://paperswithcode.com/paper/omnipush-accurate-diverse-real-world-dataset
Repo
Framework

Architecture-aware Network Pruning for Vision Quality Applications

Title Architecture-aware Network Pruning for Vision Quality Applications
Authors Wei-Ting Wang, Han-Lin Li, Wei-Shiang Lin, Cheng-Ming Chiang, Yi-Min Tsai
Abstract Convolutional neural network (CNN) delivers impressive achievements in computer vision and machine learning field. However, CNN incurs high computational complexity, especially for vision quality applications because of large image resolution. In this paper, we propose an iterative architecture-aware pruning algorithm with adaptive magnitude threshold while cooperating with quality-metric measurement simultaneously. We show the performance improvement applied on vision quality applications and provide comprehensive analysis with flexible pruning configuration. With the proposed method, the Multiply-Accumulate (MAC) of state-of-the-art low-light imaging (SID) and super-resolution (EDSR) are reduced by 58% and 37% without quality drop, respectively. The memory bandwidth (BW) requirements of convolutional layer can be also reduced by 20% to 40%.
Tasks Network Pruning, Super-Resolution
Published 2019-08-05
URL https://arxiv.org/abs/1908.02125v1
PDF https://arxiv.org/pdf/1908.02125v1.pdf
PWC https://paperswithcode.com/paper/architecture-aware-network-pruning-for-vision
Repo
Framework

SPFlow: An Easy and Extensible Library for Deep Probabilistic Learning using Sum-Product Networks

Title SPFlow: An Easy and Extensible Library for Deep Probabilistic Learning using Sum-Product Networks
Authors Alejandro Molina, Antonio Vergari, Karl Stelzner, Robert Peharz, Pranav Subramani, Nicola Di Mauro, Pascal Poupart, Kristian Kersting
Abstract We introduce SPFlow, an open-source Python library providing a simple interface to inference, learning and manipulation routines for deep and tractable probabilistic models called Sum-Product Networks (SPNs). The library allows one to quickly create SPNs both from data and through a domain specific language (DSL). It efficiently implements several probabilistic inference routines like computing marginals, conditionals and (approximate) most probable explanations (MPEs) along with sampling as well as utilities for serializing, plotting and structure statistics on an SPN. Moreover, many of the algorithms proposed in the literature to learn the structure and parameters of SPNs are readily available in SPFlow. Furthermore, SPFlow is extremely extensible and customizable, allowing users to promptly distill new inference and learning routines by injecting custom code into a lightweight functional-oriented API framework. This is achieved in SPFlow by keeping an internal Python representation of the graph structure that also enables practical compilation of an SPN into a TensorFlow graph, C, CUDA or FPGA custom code, significantly speeding-up computations.
Tasks
Published 2019-01-11
URL http://arxiv.org/abs/1901.03704v1
PDF http://arxiv.org/pdf/1901.03704v1.pdf
PWC https://paperswithcode.com/paper/spflow-an-easy-and-extensible-library-for
Repo
Framework

Dual Directed Capsule Network for Very Low Resolution Image Recognition

Title Dual Directed Capsule Network for Very Low Resolution Image Recognition
Authors Maneet Singh, Shruti Nagpal, Richa Singh, Mayank Vatsa
Abstract Very low resolution (VLR) image recognition corresponds to classifying images with resolution 16x16 or less. Though it has widespread applicability when objects are captured at a very large stand-off distance (e.g. surveillance scenario) or from wide angle mobile cameras, it has received limited attention. This research presents a novel Dual Directed Capsule Network model, termed as DirectCapsNet, for addressing VLR digit and face recognition. The proposed architecture utilizes a combination of capsule and convolutional layers for learning an effective VLR recognition model. The architecture also incorporates two novel loss functions: (i) the proposed HR-anchor loss and (ii) the proposed targeted reconstruction loss, in order to overcome the challenges of limited information content in VLR images. The proposed losses use high resolution images as auxiliary data during training to “direct” discriminative feature learning. Multiple experiments for VLR digit classification and VLR face recognition are performed along with comparisons with state-of-the-art algorithms. The proposed DirectCapsNet consistently showcases state-of-the-art results; for example, on the UCCS face database, it shows over 95% face recognition accuracy when 16x16 images are matched with 80x80 images.
Tasks Face Recognition
Published 2019-08-27
URL https://arxiv.org/abs/1908.10027v1
PDF https://arxiv.org/pdf/1908.10027v1.pdf
PWC https://paperswithcode.com/paper/dual-directed-capsule-network-for-very-low
Repo
Framework

A Semantic-based Medical Image Fusion Approach

Title A Semantic-based Medical Image Fusion Approach
Authors Fanda Fan, Yunyou Huang, Lei Wang, Xingwang Xiong, Zihan Jiang, Zhifei Zhang, Jianfeng Zhan
Abstract It is necessary for clinicians to comprehensively analyze patient information from different sources. Medical image fusion is a promising approach to providing overall information from medical images of different modalities. However, existing medical image fusion approaches ignore the semantics of images, making the fused image difficult to understand. In this work, we propose a new evaluation index to measure the semantic loss of fused image, and put forward a Fusion W-Net (FW-Net) for multimodal medical image fusion. The experimental results are promising: the fused image generated by our approach greatly reduces the semantic information loss, and has better visual effects in contrast to five state-of-art approaches. Our approach and tool have great potential to be applied in the clinical setting.
Tasks
Published 2019-06-01
URL https://arxiv.org/abs/1906.00225v2
PDF https://arxiv.org/pdf/1906.00225v2.pdf
PWC https://paperswithcode.com/paper/190600225
Repo
Framework

Integration of returns and decomposition of customer orders in e-commerce warehouses

Title Integration of returns and decomposition of customer orders in e-commerce warehouses
Authors Albert H. Schrotenboer, Susanne Wruck, Iris F. A. Vis, Kees Jan Roodbergen
Abstract In picker-to-parts warehouses, order picking is a cost- and labor-intensive operation that must be designed efficiently. It comprises the construction of order batches and the associated order picker routes, and the assignment and sequencing of those batches to multiple order pickers. The ever-increasing competitiveness among e-commerce companies has made the joint optimization of this order picking process inevitable. Inspired by the large number of product returns and the many but small-sized customer orders, we address a new integrated order picking process problem. We integrate the restocking of returned products into regular order picking routes and we allow for the decomposition of customer orders so that multiple batches may contain products from the same customer order. We thereby generalize the existing models on order picking processing. We provide Mixed Integer Programming (MIP) formulations and a tailored adaptive large neighborhood search heuristic that, amongst others, exploits these MIPs. We propose a new set of practically-sized benchmark instances, consisting of up to 5547 to be picked products and 2491 to be restocked products. On those large-scale instances, we show that integrating the restocking of returned products into regular order picker routes results in cost-savings of 10 to 15%. Allowing for the decomposition of the customer orders’ products results in cost savings of up to 44% compared to not allowing this. Finally, we show that on average cost-savings of 17.4% can be obtained by using our ALNS instead of heuristics typically used in practice.
Tasks
Published 2019-09-01
URL https://arxiv.org/abs/1909.01794v1
PDF https://arxiv.org/pdf/1909.01794v1.pdf
PWC https://paperswithcode.com/paper/integration-of-returns-and-decomposition-of
Repo
Framework

Subjective Sentiment Analysis for Arabic Newswire Comments

Title Subjective Sentiment Analysis for Arabic Newswire Comments
Authors Sadik Bessou, Rania Aberkane
Abstract This paper presents an approach based on supervised machine learning methods to discriminate between positive, negative and neutral Arabic reviews in online newswire. The corpus is labeled for subjectivity and sentiment analysis (SSA) at the sentence-level. The model uses both count and TF-IDF representations and apply six machine learning algorithms; Multinomial Naive Bayes, Support Vector Machines (SVM), Random Forest, Logistic Regression, Multi-layer perceptron and k-nearest neighbors using uni-grams, bi-grams features. With the goal of extracting users sentiment from written text. Experimental results showed that n-gram features could substantially improve performance; and showed that the Multinomial Naive Bayes approach is the most accurate in predicting topic polarity. Best results were achieved using count vectors trained by combination of word-based uni-grams and bi-grams with an overall accuracy of 85.57% over two classes and 65.64% over three classes.
Tasks Sentiment Analysis
Published 2019-11-09
URL https://arxiv.org/abs/1911.03776v1
PDF https://arxiv.org/pdf/1911.03776v1.pdf
PWC https://paperswithcode.com/paper/subjective-sentiment-analysis-for-arabic
Repo
Framework

Manifestation of Image Contrast in Deep Networks

Title Manifestation of Image Contrast in Deep Networks
Authors Arash Akbarinia, Karl R. Gegenfurtner
Abstract Contrast is subject to dramatic changes across the visual field, depending on the source of light and scene configurations. Hence, the human visual system has evolved to be more sensitive to contrast than absolute luminance. This feature is equally desired for machine vision: the ability to recognise patterns even when aspects of them are transformed due to variation in local and global contrast. In this work, we thoroughly investigate the impact of image contrast on prominent deep convolutional networks, both during the training and testing phase. The results of conducted experiments testify to an evident deterioration in the accuracy of all state-of-the-art networks at low-contrast images. We demonstrate that “contrast-augmentation” is a sufficient condition to endow a network with invariance to contrast. This practice shows no negative side effects, quite the contrary, it might allow a model to refrain from other illuminance related over-fittings. This ability can also be achieved by a short fine-tuning procedure, which opens new lines of investigation on mechanisms involved in two networks whose weights are over 99.9% correlated, yet astonishingly produce utterly different outcomes. Our further analysis suggests that the optimisation algorithm is an influential factor, however with a significantly lower effect; and while the choice of an architecture manifests a negligible impact on this phenomenon, the first layers appear to be more critical.
Tasks
Published 2019-02-12
URL http://arxiv.org/abs/1902.04378v1
PDF http://arxiv.org/pdf/1902.04378v1.pdf
PWC https://paperswithcode.com/paper/manifestation-of-image-contrast-in-deep
Repo
Framework

A Corpus for Modeling User and Language Effects in Argumentation on Online Debating

Title A Corpus for Modeling User and Language Effects in Argumentation on Online Debating
Authors Esin Durmus, Claire Cardie
Abstract Existing argumentation datasets have succeeded in allowing researchers to develop computational methods for analyzing the content, structure and linguistic features of argumentative text. They have been much less successful in fostering studies of the effect of “user” traits – characteristics and beliefs of the participants – on the debate/argument outcome as this type of user information is generally not available. This paper presents a dataset of 78, 376 debates generated over a 10-year period along with surprisingly comprehensive participant profiles. We also complete an example study using the dataset to analyze the effect of selected user traits on the debate outcome in comparison to the linguistic features typically employed in studies of this kind.
Tasks
Published 2019-06-26
URL https://arxiv.org/abs/1906.11310v1
PDF https://arxiv.org/pdf/1906.11310v1.pdf
PWC https://paperswithcode.com/paper/a-corpus-for-modeling-user-and-language
Repo
Framework

CRNet: Image Super-Resolution Using A Convolutional Sparse Coding Inspired Network

Title CRNet: Image Super-Resolution Using A Convolutional Sparse Coding Inspired Network
Authors Menglei Zhang, Zhou Liu, Lei Yu
Abstract Convolutional Sparse Coding (CSC) has been attracting more and more attention in recent years, for making full use of image global correlation to improve performance on various computer vision applications. However, very few studies focus on solving CSC based image Super-Resolution (SR) problem. As a consequence, there is no significant progress in this area over a period of time. In this paper, we exploit the natural connection between CSC and Convolutional Neural Networks (CNN) to address CSC based image SR. Specifically, Convolutional Iterative Soft Thresholding Algorithm (CISTA) is introduced to solve CSC problem and it can be implemented using CNN architectures. Then we develop a novel CSC based SR framework analogy to the traditional SC based SR methods. Two models inspired by this framework are proposed for pre-/post-upsampling SR, respectively. Compared with recent state-of-the-art SR methods, both of our proposed models show superior performance in terms of both quantitative and qualitative measurements.
Tasks Image Super-Resolution, Super-Resolution
Published 2019-08-03
URL https://arxiv.org/abs/1908.01166v1
PDF https://arxiv.org/pdf/1908.01166v1.pdf
PWC https://paperswithcode.com/paper/crnet-image-super-resolution-using-a
Repo
Framework

Image Captioning with Integrated Bottom-Up and Multi-level Residual Top-Down Attention for Game Scene Understanding

Title Image Captioning with Integrated Bottom-Up and Multi-level Residual Top-Down Attention for Game Scene Understanding
Authors Jian Zheng, Sudha Krishnamurthy, Ruxin Chen, Min-Hung Chen, Zhenhao Ge, Xiaohua Li
Abstract Image captioning has attracted considerable attention in recent years. However, little work has been done for game image captioning which has some unique characteristics and requirements. In this work we propose a novel game image captioning model which integrates bottom-up attention with a new multi-level residual top-down attention mechanism. Firstly, a lower-level residual top-down attention network is added to the Faster R-CNN based bottom-up attention network to address the problem that the latter may lose important spatial information when extracting regional features. Secondly, an upper-level residual top-down attention network is implemented in the caption generation network to better fuse the extracted regional features for subsequent caption prediction. We create two game datasets to evaluate the proposed model. Extensive experiments show that our proposed model outperforms existing baseline models.
Tasks Image Captioning, Scene Understanding
Published 2019-06-16
URL https://arxiv.org/abs/1906.06632v1
PDF https://arxiv.org/pdf/1906.06632v1.pdf
PWC https://paperswithcode.com/paper/image-captioning-with-integrated-bottom-up
Repo
Framework

Explainable Deep Relational Networks for Predicting Compound-Protein Affinities and Contacts

Title Explainable Deep Relational Networks for Predicting Compound-Protein Affinities and Contacts
Authors Mostafa Karimi, Di Wu, Zhangyang Wang, Yang Shen
Abstract Predicting compound-protein affinity is critical for accelerating drug discovery. Recent progress made by machine learning focuses on accuracy but leaves much to be desired for interpretability. Through molecular contacts underlying affinities, our large-scale interpretability assessment finds commonly-used attention mechanisms inadequate. We thus formulate a hierarchical multi-objective learning problem whose predicted contacts form the basis for predicted affinities. We further design a physics-inspired deep relational network, DeepRelations, with intrinsically explainable architecture. Specifically, various atomic-level contacts or “relations” lead to molecular-level affinity prediction. And the embedded attentions are regularized with predicted structural contexts and supervised with partially available training contacts. DeepRelations shows superior interpretability to the state-of-the-art: without compromising affinity prediction, it boosts the AUPRC of contact prediction 9.5, 16.9, 19.3 and 5.7-fold for the test, compound-unique, protein-unique, and both-unique sets, respectively. Our study represents the first dedicated model development and systematic model assessment for interpretable machine learning of compound-protein affinity.
Tasks Drug Discovery, Interpretable Machine Learning
Published 2019-12-29
URL https://arxiv.org/abs/1912.12553v1
PDF https://arxiv.org/pdf/1912.12553v1.pdf
PWC https://paperswithcode.com/paper/explainable-deep-relational-networks-for
Repo
Framework

Personal Dynamic Cost-Aware Sensing for Latent Context Detection

Title Personal Dynamic Cost-Aware Sensing for Latent Context Detection
Authors Saar Tal, Bracha Shapira, Lior Rokach
Abstract In the past decade, the usage of mobile devices has gone far beyond simple activities like calling and texting. Today, smartphones contain multiple embedded sensors and are able to collect useful sensing data about the user and infer the user’s context. The more frequent the sensing, the more accurate the context. However, continuous sensing results in huge energy consumption, decreasing the battery’s lifetime. We propose a novel approach for cost-aware sensing when performing continuous latent context detection. The suggested method dynamically determines user’s sensors sampling policy based on three factors: (1) User’s last known context; (2) Predicted information loss using KL-Divergence; and (3) Sensors’ sampling costs. The objective function aims at minimizing both sampling cost and information loss. The method is based on various machine learning techniques including autoencoder neural networks for latent context detection, linear regression for information loss prediction, and convex optimization for determining the optimal sampling policy. To evaluate the suggested method, we performed a series of tests on real-world data recorded at a high-frequency rate; the data was collected from six mobile phone sensors of twenty users over the course of a week. Results show that by applying a dynamic sampling policy, our method naturally balances information loss and energy consumption and outperforms the static approach.% We compared the performance of our method with another state of the art dynamic sampling method and demonstrate its consistent superiority in various measures. %Our methods outperformed, and were able to improve we achieved better results in either sampling cost or information loss, and in some cases we improved both.
Tasks
Published 2019-03-13
URL http://arxiv.org/abs/1903.05376v1
PDF http://arxiv.org/pdf/1903.05376v1.pdf
PWC https://paperswithcode.com/paper/personal-dynamic-cost-aware-sensing-for
Repo
Framework
comments powered by Disqus