Paper Group AWR 11
Lightweight Image Super-Resolution with Information Multi-distillation Network. Don’t Just Scratch the Surface: Enhancing Word Representations for Korean with Hanja. Mol-CycleGAN - a generative model for molecular optimization. Neural RGB->D Sensing: Depth and Uncertainty from a Video Camera. Learning to Weight for Text Classification. Deep Learnin …
Lightweight Image Super-Resolution with Information Multi-distillation Network
Title | Lightweight Image Super-Resolution with Information Multi-distillation Network |
Authors | Zheng Hui, Xinbo Gao, Yunchu Yang, Xiumei Wang |
Abstract | In recent years, single image super-resolution (SISR) methods using deep convolution neural network (CNN) have achieved impressive results. Thanks to the powerful representation capabilities of the deep networks, numerous previous ways can learn the complex non-linear mapping between low-resolution (LR) image patches and their high-resolution (HR) versions. However, excessive convolutions will limit the application of super-resolution technology in low computing power devices. Besides, super-resolution of any arbitrary scale factor is a critical issue in practical applications, which has not been well solved in the previous approaches. To address these issues, we propose a lightweight information multi-distillation network (IMDN) by constructing the cascaded information multi-distillation blocks (IMDB), which contains distillation and selective fusion parts. Specifically, the distillation module extracts hierarchical features step-by-step, and fusion module aggregates them according to the importance of candidate features, which is evaluated by the proposed contrast-aware channel attention mechanism. To process real images with any sizes, we develop an adaptive cropping strategy (ACS) to super-resolve block-wise image patches using the same well-trained model. Extensive experiments suggest that the proposed method performs favorably against the state-of-the-art SR algorithms in term of visual quality, memory footprint, and inference time. Code is available at \url{https://github.com/Zheng222/IMDN}. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.11856v1 |
https://arxiv.org/pdf/1909.11856v1.pdf | |
PWC | https://paperswithcode.com/paper/lightweight-image-super-resolution-with-1 |
Repo | https://github.com/Zheng222/IMDN |
Framework | pytorch |
Don’t Just Scratch the Surface: Enhancing Word Representations for Korean with Hanja
Title | Don’t Just Scratch the Surface: Enhancing Word Representations for Korean with Hanja |
Authors | Kang Min Yoo, Taeuk Kim, Sang-goo Lee |
Abstract | We propose a simple yet effective approach for improving Korean word representations using additional linguistic annotation (i.e. Hanja). We employ cross-lingual transfer learning in training word representations by leveraging the fact that Hanja is closely related to Chinese. We evaluate the intrinsic quality of representations learned through our approach using the word analogy and similarity tests. In addition, we demonstrate their effectiveness on several downstream tasks, including a novel Korean news headline generation task. |
Tasks | Cross-Lingual Transfer, Transfer Learning |
Published | 2019-08-25 |
URL | https://arxiv.org/abs/1908.09282v3 |
https://arxiv.org/pdf/1908.09282v3.pdf | |
PWC | https://paperswithcode.com/paper/dont-just-scratch-the-surface-enhancing-word |
Repo | https://github.com/kaniblu/hanja-sisg |
Framework | none |
Mol-CycleGAN - a generative model for molecular optimization
Title | Mol-CycleGAN - a generative model for molecular optimization |
Authors | Łukasz Maziarka, Agnieszka Pocha, Jan Kaczmarczyk, Krzysztof Rataj, Michał Warchoł |
Abstract | Designing a molecule with desired properties is one of the biggest challenges in drug development, as it requires optimization of chemical compound structures with respect to many complex properties. To augment the compound design process we introduce Mol-CycleGAN - a CycleGAN-based model that generates optimized compounds with high structural similarity to the original ones. Namely, given a molecule our model generates a structurally similar one with an optimized value of the considered property. We evaluate the performance of the model on selected optimization objectives related to structural properties (presence of halogen groups, number of aromatic rings) and to a physicochemical property (penalized logP). In the task of optimization of penalized logP of drug-like molecules our model significantly outperforms previous results. |
Tasks | |
Published | 2019-02-06 |
URL | http://arxiv.org/abs/1902.02119v1 |
http://arxiv.org/pdf/1902.02119v1.pdf | |
PWC | https://paperswithcode.com/paper/mol-cyclegan-a-generative-model-for-molecular |
Repo | https://github.com/ardigen/mol-cycle-gan |
Framework | none |
Neural RGB->D Sensing: Depth and Uncertainty from a Video Camera
Title | Neural RGB->D Sensing: Depth and Uncertainty from a Video Camera |
Authors | Chao Liu, Jinwei Gu, Kihwan Kim, Srinivasa Narasimhan, Jan Kautz |
Abstract | Depth sensing is crucial for 3D reconstruction and scene understanding. Active depth sensors provide dense metric measurements, but often suffer from limitations such as restricted operating ranges, low spatial resolution, sensor interference, and high power consumption. In this paper, we propose a deep learning (DL) method to estimate per-pixel depth and its uncertainty continuously from a monocular video stream, with the goal of effectively turning an RGB camera into an RGB-D camera. Unlike prior DL-based methods, we estimate a depth probability distribution for each pixel rather than a single depth value, leading to an estimate of a 3D depth probability volume for each input frame. These depth probability volumes are accumulated over time under a Bayesian filtering framework as more incoming frames are processed sequentially, which effectively reduces depth uncertainty and improves accuracy, robustness, and temporal stability. Compared to prior work, the proposed approach achieves more accurate and stable results, and generalizes better to new datasets. Experimental results also show the output of our approach can be directly fed into classical RGB-D based 3D scanning methods for 3D scene reconstruction. |
Tasks | 3D Reconstruction, 3D Scene Reconstruction, Scene Understanding |
Published | 2019-01-09 |
URL | http://arxiv.org/abs/1901.02571v1 |
http://arxiv.org/pdf/1901.02571v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-rgb-d-sensing-depth-and-uncertainty |
Repo | https://github.com/NVlabs/neuralrgbd |
Framework | pytorch |
Learning to Weight for Text Classification
Title | Learning to Weight for Text Classification |
Authors | Alejandro Moreo Fernández, Andrea Esuli, Fabrizio Sebastiani |
Abstract | In information retrieval (IR) and related tasks, term weighting approaches typically consider the frequency of the term in the document and in the collection in order to compute a score reflecting the importance of the term for the document. In tasks characterized by the presence of training data (such as text classification) it seems logical that the term weighting function should take into account the distribution (as estimated from training data) of the term across the classes of interest. Although `supervised term weighting’ approaches that use this intuition have been described before, they have failed to show consistent improvements. In this article we analyse the possible reasons for this failure, and call consolidated assumptions into question. Following this criticism we propose a novel supervised term weighting approach that, instead of relying on any predefined formula, learns a term weighting function optimised on the training set of interest; we dub this approach \emph{Learning to Weight} (LTW). The experiments that we run on several well-known benchmarks, and using different learning methods, show that our method outperforms previous term weighting approaches in text classification. | |
Tasks | Information Retrieval, Text Classification |
Published | 2019-03-28 |
URL | http://arxiv.org/abs/1903.12090v1 |
http://arxiv.org/pdf/1903.12090v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-weight-for-text-classification |
Repo | https://github.com/AlexMoreo/learning-to-weight |
Framework | none |
Deep Learning Supersampled Scanning Transmission Electron Microscopy
Title | Deep Learning Supersampled Scanning Transmission Electron Microscopy |
Authors | Jeffrey M. Ede |
Abstract | Compressed sensing can increase resolution, and decrease electron dose and scan time of electron microscope point-scan systems with minimal information loss. Building on a history of successful deep learning applications in compressed sensing, we have developed a two-stage multiscale generative adversarial network to supersample scanning transmission electron micrographs with point-scan coverage reduced to 1/16, 1/25, …, 1/100 px. We propose a novel non-adversarial learning policy to train a unified generator for multiple coverages and introduce an auxiliary network to homogenize prioritization of training data with varied signal-to-noise ratios. This achieves root mean square errors of 3.23% and 4.54% at 1/16 px and 1/100 px coverage, respectively; within 1% of errors for networks trained for each coverage individually. Detailed error distributions are presented for unified and individual coverage generators, including errors per output pixel. In addition, we present a baseline one-stage network for a single coverage and investigate numerical precision for web serving. Source code, training data, and pretrained models are publicly available at https://github.com/Jeffrey-Ede/DLSS-STEM |
Tasks | |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10467v2 |
https://arxiv.org/pdf/1910.10467v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-supersampled-scanning |
Repo | https://github.com/Jeffrey-Ede/DLSS-STEM |
Framework | tf |
Integrative Factorization of Bidimensionally Linked Matrices
Title | Integrative Factorization of Bidimensionally Linked Matrices |
Authors | Jun Young Park, Eric F. Lock |
Abstract | Advances in molecular “omics’” technologies have motivated new methodology for the integration of multiple sources of high-content biomedical data. However, most statistical methods for integrating multiple data matrices only consider data shared vertically (one cohort on multiple platforms) or horizontally (different cohorts on a single platform). This is limiting for data that take the form of bidimensionally linked matrices (e.g., multiple cohorts measured on multiple platforms), which are increasingly common in large-scale biomedical studies. In this paper, we propose BIDIFAC (Bidimensional Integrative Factorization) for integrative dimension reduction and signal approximation of bidimensionally linked data matrices. Our method factorizes the data into (i) globally shared, (ii) row-shared, (iii) column-shared, and (iv) single-matrix structural components, facilitating the investigation of shared and unique patterns of variability. For estimation we use a penalized objective function that extends the nuclear norm penalization for a single matrix. As an alternative to the complicated rank selection problem, we use results from random matrix theory to choose tuning parameters. We apply our method to integrate two genomics platforms (mRNA and miRNA expression) across two sample cohorts (tumor samples and normal tissue samples) using the breast cancer data from TCGA. We provide R code for fitting BIDIFAC, imputing missing values, and generating simulated data. |
Tasks | Dimensionality Reduction |
Published | 2019-06-09 |
URL | https://arxiv.org/abs/1906.03722v1 |
https://arxiv.org/pdf/1906.03722v1.pdf | |
PWC | https://paperswithcode.com/paper/integrative-factorization-of-bidimensionally |
Repo | https://github.com/lockEF/bidifac |
Framework | none |
Discourse Marker Augmented Network with Reinforcement Learning for Natural Language Inference
Title | Discourse Marker Augmented Network with Reinforcement Learning for Natural Language Inference |
Authors | Boyuan Pan, Yazheng Yang, Zhou Zhao, Yueting Zhuang, Deng Cai, Xiaofei He |
Abstract | Natural Language Inference (NLI), also known as Recognizing Textual Entailment (RTE), is one of the most important problems in natural language processing. It requires to infer the logical relationship between two given sentences. While current approaches mostly focus on the interaction architectures of the sentences, in this paper, we propose to transfer knowledge from some important discourse markers to augment the quality of the NLI model. We observe that people usually use some discourse markers such as “so” or “but” to represent the logical relationship between two sentences. These words potentially have deep connections with the meanings of the sentences, thus can be utilized to help improve the representations of them. Moreover, we use reinforcement learning to optimize a new objective function with a reward defined by the property of the NLI datasets to make full use of the labels information. Experiments show that our method achieves the state-of-the-art performance on several large-scale datasets. |
Tasks | Natural Language Inference |
Published | 2019-07-23 |
URL | https://arxiv.org/abs/1907.09692v1 |
https://arxiv.org/pdf/1907.09692v1.pdf | |
PWC | https://paperswithcode.com/paper/discourse-marker-augmented-network-with-1 |
Repo | https://github.com/ZJULearning/DMP |
Framework | tf |
Scalable Training of Inference Networks for Gaussian-Process Models
Title | Scalable Training of Inference Networks for Gaussian-Process Models |
Authors | Jiaxin Shi, Mohammad Emtiyaz Khan, Jun Zhu |
Abstract | Inference in Gaussian process (GP) models is computationally challenging for large data, and often difficult to approximate with a small number of inducing points. We explore an alternative approximation that employs stochastic inference networks for a flexible inference. Unfortunately, for such networks, minibatch training is difficult to be able to learn meaningful correlations over function outputs for a large dataset. We propose an algorithm that enables such training by tracking a stochastic, functional mirror-descent algorithm. At each iteration, this only requires considering a finite number of input locations, resulting in a scalable and easy-to-implement algorithm. Empirical results show comparable and, sometimes, superior performance to existing sparse variational GP methods. |
Tasks | |
Published | 2019-05-27 |
URL | https://arxiv.org/abs/1905.10969v1 |
https://arxiv.org/pdf/1905.10969v1.pdf | |
PWC | https://paperswithcode.com/paper/scalable-training-of-inference-networks-for |
Repo | https://github.com/thjashin/gp-infer-net |
Framework | tf |
A Benchmark Dataset for Learning to Intervene in Online Hate Speech
Title | A Benchmark Dataset for Learning to Intervene in Online Hate Speech |
Authors | Jing Qian, Anna Bethke, Yinyin Liu, Elizabeth Belding, William Yang Wang |
Abstract | Countering online hate speech is a critical yet challenging task, but one which can be aided by the use of Natural Language Processing (NLP) techniques. Previous research has primarily focused on the development of NLP methods to automatically and effectively detect online hate speech while disregarding further action needed to calm and discourage individuals from using hate speech in the future. In addition, most existing hate speech datasets treat each post as an isolated instance, ignoring the conversational context. In this paper, we propose a novel task of generative hate speech intervention, where the goal is to automatically generate responses to intervene during online conversations that contain hate speech. As a part of this work, we introduce two fully-labeled large-scale hate speech intervention datasets collected from Gab and Reddit. These datasets provide conversation segments, hate speech labels, as well as intervention responses written by Mechanical Turk Workers. In this paper, we also analyze the datasets to understand the common intervention strategies and explore the performance of common automatic response generation methods on these new datasets to provide a benchmark for future research. |
Tasks | |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04251v1 |
https://arxiv.org/pdf/1909.04251v1.pdf | |
PWC | https://paperswithcode.com/paper/a-benchmark-dataset-for-learning-to-intervene |
Repo | https://github.com/jing-qian/A-Benchmark-Dataset-for-Learning-to-Intervene-in-Online-Hate-Speech |
Framework | none |
Implicit competitive regularization in GANs
Title | Implicit competitive regularization in GANs |
Authors | Florian Schäfer, Hongkai Zheng, Anima Anandkumar |
Abstract | To improve the stability of GAN training we need to understand why they can produce realistic samples. Presently, this is attributed to properties of the divergence obtained under an optimal discriminator. This argument has a fundamental flaw: If we do not impose regularity of the discriminator, it can exploit visually imperceptible errors of the generator to always achieve the maximal generator loss. In practice, gradient penalties are used to regularize the discriminator. However, this needs a metric on the space of images that captures visual similarity. Such a metric is not known, which explains the limited success of gradient penalties in stabilizing GANs. We argue that the performance of GANs is instead due to the implicit competitive regularization (ICR) arising from the simultaneous optimization of generator and discriminator. ICR promotes solutions that look real to the discriminator and thus leverages its inductive biases to generate realistic images. We show that opponent-aware modelling of generator and discriminator, as present in competitive gradient descent (CGD), can significantly strengthen ICR and thus stabilize GAN training without explicit regularization. In our experiments, we use an existing implementation of WGAN-GP and show that by training it with CGD we can improve the inception score (IS) on CIFAR10 for a wide range of scenarios, without any hyperparameter tuning. The highest IS is obtained by combining CGD with the WGAN-loss, without any explicit regularization. |
Tasks | Image Generation |
Published | 2019-10-13 |
URL | https://arxiv.org/abs/1910.05852v2 |
https://arxiv.org/pdf/1910.05852v2.pdf | |
PWC | https://paperswithcode.com/paper/implicit-competitive-regularization-in-gans-1 |
Repo | https://github.com/devzhk/Implicit-Competitive-Regularization |
Framework | pytorch |
Learning to Predict Without Looking Ahead: World Models Without Forward Prediction
Title | Learning to Predict Without Looking Ahead: World Models Without Forward Prediction |
Authors | C. Daniel Freeman, Luke Metz, David Ha |
Abstract | Much of model-based reinforcement learning involves learning a model of an agent’s world, and training an agent to leverage this model to perform a task more efficiently. While these models are demonstrably useful for agents, every naturally occurring model of the world of which we are aware—e.g., a brain—arose as the byproduct of competing evolutionary pressures for survival, not minimization of a supervised forward-predictive loss via gradient descent. That useful models can arise out of the messy and slow optimization process of evolution suggests that forward-predictive modeling can arise as a side-effect of optimization under the right circumstances. Crucially, this optimization process need not explicitly be a forward-predictive loss. In this work, we introduce a modification to traditional reinforcement learning which we call observational dropout, whereby we limit the agents ability to observe the real environment at each timestep. In doing so, we can coerce an agent into learning a world model to fill in the observation gaps during reinforcement learning. We show that the emerged world model, while not explicitly trained to predict the future, can help the agent learn key skills required to perform well in its environment. Videos of our results available at https://learningtopredict.github.io/ |
Tasks | |
Published | 2019-10-29 |
URL | https://arxiv.org/abs/1910.13038v2 |
https://arxiv.org/pdf/1910.13038v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-predict-without-looking-ahead |
Repo | https://github.com/google/brain-tokyo-workshop |
Framework | none |
Adversarial Regression. Generative Adversarial Networks for Non-Linear Regression: Theory and Assessment
Title | Adversarial Regression. Generative Adversarial Networks for Non-Linear Regression: Theory and Assessment |
Authors | Yoann Boget |
Abstract | Adversarial Regression is a proposition to perform high dimensional non-linear regression with uncertainty estimation. We used Conditional Generative Adversarial Network to obtain an estimate of the full predictive distribution for a new observation. Generative Adversarial Networks (GAN) are implicit generative models which produce samples from a distribution approximating the distribution of the data. The conditional version of it (CGAN) takes the following expression: $\min\limits_G \max\limits_D V(D, G) = \mathbb{E}{x\sim p{r}(x)} [log(D(x, y))] + \mathbb{E}{z\sim p{z}(z)} [log (1-D(G(z, y)))]$. An approximate solution can be found by training simultaneously two neural networks to model D and G and feeding G with a random noise vector $z$. After training, we have that $G(z, y)\mathrel{\dot\sim} p_{data}(x, y)$. By fixing $y$, we have $G(zy) \mathrel{\dot\sim} p{data}(xy)$. By sampling $z$, we can therefore obtain samples following approximately $p(xy)$, which is the predictive distribution of $x$ for a new $y$. We ran experiments to test various loss functions, data distributions, sample size, size of the noise vector, etc. Even if we observed differences, no experiment outperformed consistently the others. The quality of CGAN for regression relies on fine-tuning a range of hyperparameters. In a broader view, the results show that CGANs are very promising methods to perform uncertainty estimation for high dimensional non-linear regression. |
Tasks | |
Published | 2019-10-18 |
URL | https://arxiv.org/abs/1910.09106v1 |
https://arxiv.org/pdf/1910.09106v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-regression-generative-adversarial |
Repo | https://github.com/yoboget/Adversarial_regression |
Framework | pytorch |
Brno Urban Dataset – The New Data for Self-Driving Agents and Mapping Tasks
Title | Brno Urban Dataset – The New Data for Self-Driving Agents and Mapping Tasks |
Authors | Adam Ligocki, Ales Jelinek, Ludek Zalud |
Abstract | Autonomous driving is a dynamically growing field of research, where quality and amount of experimental data is critical. Although several rich datasets are available these days, the demands of researchers and technical possibilities are evolving. Through this paper, we bring a new dataset recorded in Brno, Czech Republic. It offers data from four WUXGA cameras, two 3D LiDARs, inertial measurement unit, infrared camera and especially differential RTK GNSS receiver with centimetre accuracy which, to the best knowledge of the authors, is not available from any other public dataset so far. In addition, all the data are precisely timestamped with sub-millisecond precision to allow wider range of applications. At the time of publishing of this paper, recordings of more than 350 km of rides in varying environment are shared at: https: //github.com/RoboticsBUT/Brno-Urban-Dataset. |
Tasks | Autonomous Driving |
Published | 2019-09-15 |
URL | https://arxiv.org/abs/1909.06897v1 |
https://arxiv.org/pdf/1909.06897v1.pdf | |
PWC | https://paperswithcode.com/paper/brno-urban-dataset-the-new-data-for-self |
Repo | https://github.com/RoboticsBUT/Brno-Urban-Dataset |
Framework | none |
Parallelizable Stack Long Short-Term Memory
Title | Parallelizable Stack Long Short-Term Memory |
Authors | Shuoyang Ding, Philipp Koehn |
Abstract | Stack Long Short-Term Memory (StackLSTM) is useful for various applications such as parsing and string-to-tree neural machine translation, but it is also known to be notoriously difficult to parallelize for GPU training due to the fact that the computations are dependent on discrete operations. In this paper, we tackle this problem by utilizing state access patterns of StackLSTM to homogenize computations with regard to different discrete operations. Our parsing experiments show that the method scales up almost linearly with increasing batch size, and our parallelized PyTorch implementation trains significantly faster compared to the Dynet C++ implementation. |
Tasks | Machine Translation |
Published | 2019-04-06 |
URL | http://arxiv.org/abs/1904.03409v1 |
http://arxiv.org/pdf/1904.03409v1.pdf | |
PWC | https://paperswithcode.com/paper/parallelizable-stack-long-short-term-memory |
Repo | https://github.com/shuoyangd/hoolock |
Framework | pytorch |