Paper Group AWR 314
RenderNet: A deep convolutional network for differentiable rendering from 3D shapes. Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models. Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Lear …
RenderNet: A deep convolutional network for differentiable rendering from 3D shapes
Title | RenderNet: A deep convolutional network for differentiable rendering from 3D shapes |
Authors | Thu Nguyen-Phuoc, Chuan Li, Stephen Balaban, Yong-Liang Yang |
Abstract | Traditional computer graphics rendering pipeline is designed for procedurally generating 2D quality images from 3D shapes with high performance. The non-differentiability due to discrete operations such as visibility computation makes it hard to explicitly correlate rendering parameters and the resulting image, posing a significant challenge for inverse rendering tasks. Recent work on differentiable rendering achieves differentiability either by designing surrogate gradients for non-differentiable operations or via an approximate but differentiable renderer. These methods, however, are still limited when it comes to handling occlusion, and restricted to particular rendering effects. We present RenderNet, a differentiable rendering convolutional network with a novel projection unit that can render 2D images from 3D shapes. Spatial occlusion and shading calculation are automatically encoded in the network. Our experiments show that RenderNet can successfully learn to implement different shaders, and can be used in inverse rendering tasks to estimate shape, pose, lighting and texture from a single image. |
Tasks | |
Published | 2018-06-18 |
URL | http://arxiv.org/abs/1806.06575v3 |
http://arxiv.org/pdf/1806.06575v3.pdf | |
PWC | https://paperswithcode.com/paper/rendernet-a-deep-convolutional-network-for |
Repo | https://github.com/thunguyenphuoc/RenderNet |
Framework | tf |
Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks
Title | Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks |
Authors | Jie Hu, Li Shen, Samuel Albanie, Gang Sun, Andrea Vedaldi |
Abstract | While the use of bottom-up local operators in convolutional neural networks (CNNs) matches well some of the statistics of natural images, it may also prevent such models from capturing contextual long-range feature interactions. In this work, we propose a simple, lightweight approach for better context exploitation in CNNs. We do so by introducing a pair of operators: gather, which efficiently aggregates feature responses from a large spatial extent, and excite, which redistributes the pooled information to local features. The operators are cheap, both in terms of number of added parameters and computational complexity, and can be integrated directly in existing architectures to improve their performance. Experiments on several datasets show that gather-excite can bring benefits comparable to increasing the depth of a CNN at a fraction of the cost. For example, we find ResNet-50 with gather-excite operators is able to outperform its 101-layer counterpart on ImageNet with no additional learnable parameters. We also propose a parametric gather-excite operator pair which yields further performance gains, relate it to the recently-introduced Squeeze-and-Excitation Networks, and analyse the effects of these changes to the CNN feature activation statistics. |
Tasks | |
Published | 2018-10-29 |
URL | http://arxiv.org/abs/1810.12348v3 |
http://arxiv.org/pdf/1810.12348v3.pdf | |
PWC | https://paperswithcode.com/paper/gather-excite-exploiting-feature-context-in |
Repo | https://github.com/hujie-frank/GENet |
Framework | none |
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
Title | Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models |
Authors | Kurtland Chua, Roberto Calandra, Rowan McAllister, Sergey Levine |
Abstract | Model-based reinforcement learning (RL) algorithms can attain excellent sample efficiency, but often lag behind the best model-free algorithms in terms of asymptotic performance. This is especially true with high-capacity parametric function approximators, such as deep networks. In this paper, we study how to bridge this gap, by employing uncertainty-aware dynamics models. We propose a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation. Our comparison to state-of-the-art model-based and model-free deep RL algorithms shows that our approach matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples (e.g., 8 and 125 times fewer samples than Soft Actor Critic and Proximal Policy Optimization respectively on the half-cheetah task). |
Tasks | |
Published | 2018-05-30 |
URL | http://arxiv.org/abs/1805.12114v2 |
http://arxiv.org/pdf/1805.12114v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-reinforcement-learning-in-a-handful-of |
Repo | https://github.com/kchua/handful-of-trials |
Framework | tf |
Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning
Title | Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning |
Authors | Yuexin Wu, Xiujun Li, Jingjing Liu, Jianfeng Gao, Yiming Yang |
Abstract | Training task-completion dialogue agents with reinforcement learning usually requires a large number of real user experiences. The Dyna-Q algorithm extends Q-learning by integrating a world model, and thus can effectively boost training efficiency using simulated experiences generated by the world model. The effectiveness of Dyna-Q, however, depends on the quality of the world model - or implicitly, the pre-specified ratio of real vs. simulated experiences used for Q-learning. To this end, we extend the recently proposed Deep Dyna-Q (DDQ) framework by integrating a switcher that automatically determines whether to use a real or simulated experience for Q-learning. Furthermore, we explore the use of active learning for improving sample efficiency, by encouraging the world model to generate simulated experiences in the state-action space where the agent has not (fully) explored. Our results show that by combining switcher and active learning, the new framework named as Switch-based Active Deep Dyna-Q (Switch-DDQ), leads to significant improvement over DDQ and Q-learning baselines in both simulation and human evaluations. |
Tasks | Active Learning, Q-Learning, Task-Completion Dialogue Policy Learning |
Published | 2018-11-19 |
URL | http://arxiv.org/abs/1811.07550v1 |
http://arxiv.org/pdf/1811.07550v1.pdf | |
PWC | https://paperswithcode.com/paper/switch-based-active-deep-dyna-q-efficient |
Repo | https://github.com/CrickWu/Swtich-DDQ |
Framework | none |
Full Image Recover for Block-Based Compressive Sensing
Title | Full Image Recover for Block-Based Compressive Sensing |
Authors | Xuemei Xie, Chenye Wang, Jiang Du, Guangming Shi |
Abstract | Recent years, compressive sensing (CS) has improved greatly for the application of deep learning technology. For convenience, the input image is usually measured and reconstructed block by block. This usually causes block effect in reconstructed images. In this paper, we present a novel CNN-based network to solve this problem. In measurement part, the input image is adaptively measured block by block to acquire a group of measurements. While in reconstruction part, all the measurements from one image are used to reconstruct the full image at the same time. Different from previous method recovering block by block, the structure information destroyed in measurement part is recovered in our framework. Block effect is removed accordingly. We train the proposed framework by mean square error (MSE) loss function. Experiments show that there is no block effect at all in the proposed method. And our results outperform 1.8 dB compared with existing methods. |
Tasks | Compressive Sensing |
Published | 2018-02-01 |
URL | http://arxiv.org/abs/1802.00179v1 |
http://arxiv.org/pdf/1802.00179v1.pdf | |
PWC | https://paperswithcode.com/paper/full-image-recover-for-block-based |
Repo | https://github.com/jiang-du/Perceptual-CS |
Framework | none |
Unsupervised Neural Machine Translation with Weight Sharing
Title | Unsupervised Neural Machine Translation with Weight Sharing |
Authors | Zhen Yang, Wei Chen, Feng Wang, Bo Xu |
Abstract | Unsupervised neural machine translation (NMT) is a recently proposed approach for machine translation which aims to train the model without using any labeled data. The models proposed for unsupervised NMT often use only one shared encoder to map the pairs of sentences from different languages to a shared-latent space, which is weak in keeping the unique and internal characteristics of each language, such as the style, terminology, and sentence structure. To address this issue, we introduce an extension by utilizing two independent encoders but sharing some partial weights which are responsible for extracting high-level representations of the input sentences. Besides, two different generative adversarial networks (GANs), namely the local GAN and global GAN, are proposed to enhance the cross-language translation. With this new approach, we achieve significant improvements on English-German, English-French and Chinese-to-English translation tasks. |
Tasks | Machine Translation |
Published | 2018-04-24 |
URL | http://arxiv.org/abs/1804.09057v1 |
http://arxiv.org/pdf/1804.09057v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-neural-machine-translation-with-1 |
Repo | https://github.com/ZhenYangIACAS/unsupervised-NMT |
Framework | tf |
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
Title | Hessian-based Analysis of Large Batch Training and Robustness to Adversaries |
Authors | Zhewei Yao, Amir Gholami, Qi Lei, Kurt Keutzer, Michael W. Mahoney |
Abstract | Large batch size training of Neural Networks has been shown to incur accuracy loss when trained with the current methods. The exact underlying reasons for this are still not completely understood. Here, we study large batch size training through the lens of the Hessian operator and robust optimization. In particular, we perform a Hessian based study to analyze exactly how the landscape of the loss function changes when training with large batch size. We compute the true Hessian spectrum, without approximation, by back-propagating the second derivative. Extensive experiments on multiple networks show that saddle-points are not the cause for generalization gap of large batch size training, and the results consistently show that large batch converges to points with noticeably higher Hessian spectrum. Furthermore, we show that robust training allows one to favor flat areas, as points with large Hessian spectrum show poor robustness to adversarial perturbation. We further study this relationship, and provide empirical and theoretical proof that the inner loop for robust training is a saddle-free optimization problem \textit{almost everywhere}. We present detailed experiments with five different network architectures, including a residual network, tested on MNIST, CIFAR-10, and CIFAR-100 datasets. We have open sourced our method which can be accessed at [1]. |
Tasks | |
Published | 2018-02-22 |
URL | http://arxiv.org/abs/1802.08241v4 |
http://arxiv.org/pdf/1802.08241v4.pdf | |
PWC | https://paperswithcode.com/paper/hessian-based-analysis-of-large-batch |
Repo | https://github.com/mtkwT/Hessian-based-analysis-tensorflow |
Framework | tf |
Evolved Policy Gradients
Title | Evolved Policy Gradients |
Authors | Rein Houthooft, Richard Y. Chen, Phillip Isola, Bradly C. Stadie, Filip Wolski, Jonathan Ho, Pieter Abbeel |
Abstract | We propose a metalearning approach for learning gradient-based reinforcement learning (RL) algorithms. The idea is to evolve a differentiable loss function, such that an agent, which optimizes its policy to minimize this loss, will achieve high rewards. The loss is parametrized via temporal convolutions over the agent’s experience. Because this loss is highly flexible in its ability to take into account the agent’s history, it enables fast task learning. Empirical results show that our evolved policy gradient algorithm (EPG) achieves faster learning on several randomized environments compared to an off-the-shelf policy gradient method. We also demonstrate that EPG’s learned loss can generalize to out-of-distribution test time tasks, and exhibits qualitatively different behavior from other popular metalearning algorithms. |
Tasks | |
Published | 2018-02-13 |
URL | http://arxiv.org/abs/1802.04821v2 |
http://arxiv.org/pdf/1802.04821v2.pdf | |
PWC | https://paperswithcode.com/paper/evolved-policy-gradients |
Repo | https://github.com/openai/EPG |
Framework | none |
False Information on Web and Social Media: A Survey
Title | False Information on Web and Social Media: A Survey |
Authors | Srijan Kumar, Neil Shah |
Abstract | False information can be created and spread easily through the web and social media platforms, resulting in widespread real-world impact. Characterizing how false information proliferates on social platforms and why it succeeds in deceiving readers are critical to develop efficient detection algorithms and tools for early detection. A recent surge of research in this area has aimed to address the key issues using methods based on feature engineering, graph mining, and information modeling. Majority of the research has primarily focused on two broad categories of false information: opinion-based (e.g., fake reviews), and fact-based (e.g., false news and hoaxes). Therefore, in this work, we present a comprehensive survey spanning diverse aspects of false information, namely (i) the actors involved in spreading false information, (ii) rationale behind successfully deceiving readers, (iii) quantifying the impact of false information, (iv) measuring its characteristics across different dimensions, and finally, (iv) algorithms developed to detect false information. In doing so, we create a unified framework to describe these recent methods and highlight a number of important directions for future research. |
Tasks | Feature Engineering |
Published | 2018-04-23 |
URL | http://arxiv.org/abs/1804.08559v1 |
http://arxiv.org/pdf/1804.08559v1.pdf | |
PWC | https://paperswithcode.com/paper/false-information-on-web-and-social-media-a |
Repo | https://github.com/bwoodhamilton/client_project_group_3 |
Framework | none |
Identifying Compromised Accounts on Social Media Using Statistical Text Analysis
Title | Identifying Compromised Accounts on Social Media Using Statistical Text Analysis |
Authors | Dominic Seyler, Lunan Li, ChengXiang Zhai |
Abstract | Compromised social media accounts are legitimate user accounts that have been hijacked by a malicious party and can cause various kinds of damage, which makes the detection of these accounts crucial. In this work we propose a novel general framework for discovering compromised accounts by utilizing statistical text analysis. The framework is built on the observation that users will use language that is measurably different from the language that an attacker would use, when the account is compromised. We use the framework to develop specific algorithms based on language modeling and use the similarity of language models of users and attackers as features in a supervised learning setup to identify compromised accounts. Evaluation results on a large Twitter corpus of over 129 million tweets show promising results of the proposed approach. |
Tasks | Language Modelling |
Published | 2018-04-19 |
URL | https://arxiv.org/abs/1804.07247v2 |
https://arxiv.org/pdf/1804.07247v2.pdf | |
PWC | https://paperswithcode.com/paper/identifying-compromised-accounts-on-social |
Repo | https://github.com/dom-s/comp-account-detect |
Framework | none |
i-RevNet: Deep Invertible Networks
Title | i-RevNet: Deep Invertible Networks |
Authors | Jörn-Henrik Jacobsen, Arnold Smeulders, Edouard Oyallon |
Abstract | It is widely believed that the success of deep convolutional networks is based on progressively discarding uninformative variability about the input with respect to the problem at hand. This is supported empirically by the difficulty of recovering images from their hidden representations, in most commonly used network architectures. In this paper we show via a one-to-one mapping that this loss of information is not a necessary condition to learn representations that generalize well on complicated problems, such as ImageNet. Via a cascade of homeomorphic layers, we build the i-RevNet, a network that can be fully inverted up to the final projection onto the classes, i.e. no information is discarded. Building an invertible architecture is difficult, for one, because the local inversion is ill-conditioned, we overcome this by providing an explicit inverse. An analysis of i-RevNets learned representations suggests an alternative explanation for the success of deep networks by a progressive contraction and linear separation with depth. To shed light on the nature of the model learned by the i-RevNet we reconstruct linear interpolations between natural image representations. |
Tasks | |
Published | 2018-02-20 |
URL | http://arxiv.org/abs/1802.07088v1 |
http://arxiv.org/pdf/1802.07088v1.pdf | |
PWC | https://paperswithcode.com/paper/i-revnet-deep-invertible-networks |
Repo | https://github.com/jhjacobsen/pytorch-i-revnet |
Framework | pytorch |
Forecasting Economics and Financial Time Series: ARIMA vs. LSTM
Title | Forecasting Economics and Financial Time Series: ARIMA vs. LSTM |
Authors | Sima Siami-Namini, Akbar Siami Namin |
Abstract | Forecasting time series data is an important subject in economics, business, and finance. Traditionally, there are several techniques to effectively forecast the next lag of time series data such as univariate Autoregressive (AR), univariate Moving Average (MA), Simple Exponential Smoothing (SES), and more notably Autoregressive Integrated Moving Average (ARIMA) with its many variations. In particular, ARIMA model has demonstrated its outperformance in precision and accuracy of predicting the next lags of time series. With the recent advancement in computational power of computers and more importantly developing more advanced machine learning algorithms and approaches such as deep learning, new algorithms are developed to forecast time series data. The research question investigated in this article is that whether and how the newly developed deep learning-based algorithms for forecasting time series data, such as “Long Short-Term Memory (LSTM)", are superior to the traditional algorithms. The empirical studies conducted and reported in this article show that deep learning-based algorithms such as LSTM outperform traditional-based algorithms such as ARIMA model. More specifically, the average reduction in error rates obtained by LSTM is between 84 - 87 percent when compared to ARIMA indicating the superiority of LSTM to ARIMA. Furthermore, it was noticed that the number of training times, known as “epoch” in deep learning, has no effect on the performance of the trained forecast model and it exhibits a truly random behavior. |
Tasks | Time Series |
Published | 2018-03-16 |
URL | http://arxiv.org/abs/1803.06386v1 |
http://arxiv.org/pdf/1803.06386v1.pdf | |
PWC | https://paperswithcode.com/paper/forecasting-economics-and-financial-time |
Repo | https://github.com/jithurjacob/deeplearning-papernotes |
Framework | none |
ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions
Title | ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions |
Authors | Hongyang Gao, Zhengyang Wang, Shuiwang Ji |
Abstract | Convolutional neural networks (CNNs) have shown great capability of solving various artificial intelligence tasks. However, the increasing model size has raised challenges in employing them in resource-limited applications. In this work, we propose to compress deep models by using channel-wise convolutions, which re- place dense connections among feature maps with sparse ones in CNNs. Based on this novel operation, we build light-weight CNNs known as ChannelNets. Channel- Nets use three instances of channel-wise convolutions; namely group channel-wise convolutions, depth-wise separable channel-wise convolutions, and the convolu- tional classification layer. Compared to prior CNNs designed for mobile devices, ChannelNets achieve a significant reduction in terms of the number of parameters and computational cost without loss in accuracy. Notably, our work represents the first attempt to compress the fully-connected classification layer, which usually accounts for about 25% of total parameters in compact CNNs. Experimental results on the ImageNet dataset demonstrate that ChannelNets achieve consistently better performance compared to prior methods. |
Tasks | |
Published | 2018-09-05 |
URL | http://arxiv.org/abs/1809.01330v1 |
http://arxiv.org/pdf/1809.01330v1.pdf | |
PWC | https://paperswithcode.com/paper/channelnets-compact-and-efficient |
Repo | https://github.com/HongyangGao/ChannelNets |
Framework | tf |
Generalization in Machine Learning via Analytical Learning Theory
Title | Generalization in Machine Learning via Analytical Learning Theory |
Authors | Kenji Kawaguchi, Yoshua Bengio, Vikas Verma, Leslie Pack Kaelbling |
Abstract | This paper introduces a novel measure-theoretic theory for machine learning that does not require statistical assumptions. Based on this theory, a new regularization method in deep learning is derived and shown to outperform previous methods in CIFAR-10, CIFAR-100, and SVHN. Moreover, the proposed theory provides a theoretical basis for a family of practically successful regularization methods in deep learning. We discuss several consequences of our results on one-shot learning, representation learning, deep learning, and curriculum learning. Unlike statistical learning theory, the proposed learning theory analyzes each problem instance individually via measure theory, rather than a set of problem instances via statistics. As a result, it provides different types of results and insights when compared to statistical learning theory. |
Tasks | One-Shot Learning, Representation Learning |
Published | 2018-02-21 |
URL | http://arxiv.org/abs/1802.07426v3 |
http://arxiv.org/pdf/1802.07426v3.pdf | |
PWC | https://paperswithcode.com/paper/generalization-in-machine-learning-via |
Repo | https://github.com/Learning-and-Intelligent-Systems/Analytical-Learning-Theory |
Framework | pytorch |
Bayesian Renewables Scenario Generation via Deep Generative Networks
Title | Bayesian Renewables Scenario Generation via Deep Generative Networks |
Authors | Yize Chen, Pan Li, Baosen Zhang |
Abstract | We present a method to generate renewable scenarios using Bayesian probabilities by implementing the Bayesian generative adversarial network~(Bayesian GAN), which is a variant of generative adversarial networks based on two interconnected deep neural networks. By using a Bayesian formulation, generators can be constructed and trained to produce scenarios that capture different salient modes in the data, allowing for better diversity and more accurate representation of the underlying physical process. Compared to conventional statistical models that are often hard to scale or sample from, this method is model-free and can generate samples extremely efficiently. For validation, we use wind and solar times-series data from NREL integration data sets to train the Bayesian GAN. We demonstrate that proposed method is able to generate clusters of wind scenarios with different variance and mean value, and is able to distinguish and generate wind and solar scenarios simultaneously even if the historical data are intentionally mixed. |
Tasks | |
Published | 2018-02-02 |
URL | http://arxiv.org/abs/1802.00868v1 |
http://arxiv.org/pdf/1802.00868v1.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-renewables-scenario-generation-via |
Repo | https://github.com/chennnnnyize/BayesianRenewablesGAN |
Framework | tf |