October 21, 2019

3546 words 17 mins read

Paper Group AWR 30

Reinforced Evolutionary Neural Architecture Search. Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving. Recommendation Through Mixtures of Heterogeneous Item Relationships. Disentangled Sequential Autoencoder. Disentangling Language and Knowledge in Task-Oriented Dialogs. Diagonal Discriminant …

Reinforced Evolutionary Neural Architecture Search


Title	Reinforced Evolutionary Neural Architecture Search
Authors	Yukang Chen, Gaofeng Meng, Qian Zhang, Shiming Xiang, Chang Huang, Lisen Mu, Xinggang Wang
Abstract	Neural Architecture Search (NAS) is an important yet challenging task in network design due to its high computational consumption. To address this issue, we propose the Reinforced Evolutionary Neural Architecture Search (RE- NAS), which is an evolutionary method with the reinforced mutation for NAS. Our method integrates reinforced mutation into an evolution algorithm for neural architecture exploration, in which a mutation controller is introduced to learn the effects of slight modifications and make mutation actions. The reinforced mutation controller guides the model population to evolve efficiently. Furthermore, as child models can inherit parameters from their parents during evolution, our method requires very limited computational resources. In experiments, we conduct the proposed search method on CIFAR-10 and obtain a powerful network architecture, RENASNet. This architecture achieves a competitive result on CIFAR-10. The explored network architecture is transferable to ImageNet and achieves a new state-of-the-art accuracy, i.e., 75.7% top-1 accuracy with 5.36M parameters on mobile ImageNet. We further test its performance on semantic segmentation with DeepLabv3 on the PASCAL VOC. RENASNet outperforms MobileNet-v1, MobileNet-v2 and NASNet. It achieves 75.83% mIOU without being pre-trained on COCO.
Tasks	Neural Architecture Search, Semantic Segmentation
Published	2018-08-01
URL	http://arxiv.org/abs/1808.00193v3
PDF	http://arxiv.org/pdf/1808.00193v3.pdf
PWC	https://paperswithcode.com/paper/reinforced-evolutionary-neural-architecture
Repo	https://github.com/yukang2017/RENAS
Framework	mxnet

Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving


Title	Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving
Authors	Yan Wang, Wei-Lun Chao, Divyansh Garg, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger
Abstract	3D object detection is an essential task in autonomous driving. Recent techniques excel with highly accurate detection rates, provided the 3D input data is obtained from precise but expensive LiDAR technology. Approaches based on cheaper monocular or stereo imagery data have, until now, resulted in drastically lower accuracies — a gap that is commonly attributed to poor image-based depth estimation. However, in this paper we argue that it is not the quality of the data but its representation that accounts for the majority of the difference. Taking the inner workings of convolutional neural networks into consideration, we propose to convert image-based depth maps to pseudo-LiDAR representations — essentially mimicking the LiDAR signal. With this representation we can apply different existing LiDAR-based detection algorithms. On the popular KITTI benchmark, our approach achieves impressive improvements over the existing state-of-the-art in image-based performance — raising the detection accuracy of objects within the 30m range from the previous state-of-the-art of 22% to an unprecedented 74%. At the time of submission our algorithm holds the highest entry on the KITTI 3D object detection leaderboard for stereo-image-based approaches. Our code is publicly available at https://github.com/mileyan/pseudo_lidar.
Tasks	3D Object Detection, 3D object detection from stereo images, Autonomous Driving, Depth Estimation, Object Detection
Published	2018-12-18
URL	https://arxiv.org/abs/1812.07179v6
PDF	https://arxiv.org/pdf/1812.07179v6.pdf
PWC	https://paperswithcode.com/paper/pseudo-lidar-from-visual-depth-estimation
Repo	https://github.com/mileyan/pseudo_lidar
Framework	pytorch

Recommendation Through Mixtures of Heterogeneous Item Relationships


Title	Recommendation Through Mixtures of Heterogeneous Item Relationships
Authors	Wang-Cheng Kang, Mengting Wan, Julian McAuley
Abstract	Recommender Systems have proliferated as general-purpose approaches to model a wide variety of consumer interaction data. Specific instances make use of signals ranging from user feedback, item relationships, geographic locality, social influence (etc.). Typically, research proceeds by showing that making use of a specific signal (within a carefully designed model) allows for higher-fidelity recommendations on a particular dataset. Of course, the real situation is more nuanced, in which a combination of many signals may be at play, or favored in different proportion by individual users. Here we seek to develop a framework that is capable of combining such heterogeneous item relationships by simultaneously modeling (a) what modality of recommendation is a user likely to be susceptible to at a particular point in time; and (b) what is the best recommendation from each modality. Our method borrows ideas from mixtures-of-experts approaches as well as knowledge graph embeddings. We find that our approach naturally yields more accurate recommendations than alternatives, while also providing intuitive `explanations’ behind the recommendations it provides. \|
Tasks	Knowledge Graph Embeddings, Recommendation Systems
Published	2018-08-29
URL	https://arxiv.org/abs/1808.10031v1
PDF	https://arxiv.org/pdf/1808.10031v1.pdf
PWC	https://paperswithcode.com/paper/recommendation-through-mixtures-of
Repo	https://github.com/kang205/MoHR
Framework	tf

Disentangled Sequential Autoencoder


Title	Disentangled Sequential Autoencoder
Authors	Yingzhen Li, Stephan Mandt
Abstract	We present a VAE architecture for encoding and generating high dimensional sequential data, such as video or audio. Our deep generative model learns a latent representation of the data which is split into a static and dynamic part, allowing us to approximately disentangle latent time-dependent features (dynamics) from features which are preserved over time (content). This architecture gives us partial control over generating content and dynamics by conditioning on either one of these sets of features. In our experiments on artificially generated cartoon video clips and voice recordings, we show that we can convert the content of a given sequence into another one by such content swapping. For audio, this allows us to convert a male speaker into a female speaker and vice versa, while for video we can separately manipulate shapes and dynamics. Furthermore, we give empirical evidence for the hypothesis that stochastic RNNs as latent state models are more efficient at compressing and generating long sequences than deterministic ones, which may be relevant for applications in video compression.
Tasks	Video Compression
Published	2018-03-08
URL	http://arxiv.org/abs/1803.02991v2
PDF	http://arxiv.org/pdf/1803.02991v2.pdf
PWC	https://paperswithcode.com/paper/disentangled-sequential-autoencoder
Repo	https://github.com/yatindandi/Disentangled-Sequential-Autoencoder
Framework	pytorch

Disentangling Language and Knowledge in Task-Oriented Dialogs


Title	Disentangling Language and Knowledge in Task-Oriented Dialogs
Authors	Dinesh Raghu, Nikhil Gupta, Mausam
Abstract	The Knowledge Base (KB) used for real-world applications, such as booking a movie or restaurant reservation, keeps changing over time. End-to-end neural networks trained for these task-oriented dialogs are expected to be immune to any changes in the KB. However, existing approaches breakdown when asked to handle such changes. We propose an encoder-decoder architecture (BoSsNet) with a novel Bag-of-Sequences (BoSs) memory, which facilitates the disentangled learning of the response’s language model and its knowledge incorporation. Consequently, the KB can be modified with new knowledge without a drop in interpretability. We find that BoSsNet outperforms state-of-the-art models, with considerable improvements (> 10%) on bAbI OOV test sets and other human-human datasets. We also systematically modify existing datasets to measure disentanglement and show BoSsNet to be robust to KB modifications.
Tasks	Language Modelling
Published	2018-05-03
URL	http://arxiv.org/abs/1805.01216v3
PDF	http://arxiv.org/pdf/1805.01216v3.pdf
PWC	https://paperswithcode.com/paper/hierarchical-pointer-generator-memory-network
Repo	https://github.com/dair-iitd/BossNet
Framework	tf

Diagonal Discriminant Analysis with Feature Selection for High Dimensional Data


Title	Diagonal Discriminant Analysis with Feature Selection for High Dimensional Data
Authors	Sarah Elizabeth Romanes, John Thomas Ormerod, Jean YH Yang
Abstract	We introduce a new method of performing high dimensional discriminant analysis, which we call multiDA. We achieve this by constructing a hybrid model that seamlessly integrates a multiclass diagonal discriminant analysis model and feature selection components. Our feature selection component naturally simplifies to weights which are simple functions of likelihood ratio statistics allowing natural comparisons with traditional hypothesis testing methods. We provide heuristic arguments suggesting desirable asymptotic properties of our algorithm with regards to feature selection. We compare our method with several other approaches, showing marked improvements in regard to prediction accuracy, interpretability of chosen features, and algorithm run time. We demonstrate such strengths of our model by showing strong classification performance on publicly available high dimensional datasets, as well as through multiple simulation studies. We make an R package available implementing our approach.
Tasks	Feature Selection
Published	2018-07-04
URL	http://arxiv.org/abs/1807.01422v1
PDF	http://arxiv.org/pdf/1807.01422v1.pdf
PWC	https://paperswithcode.com/paper/diagonal-discriminant-analysis-with-feature
Repo	https://github.com/sarahromanes/multiDA
Framework	none

Segmentation of Photovoltaic Module Cells in Electroluminescence Images


Title	Segmentation of Photovoltaic Module Cells in Electroluminescence Images
Authors	Sergiu Deitsch, Claudia Buerhop-Lutz, Andreas Maier, Florian Gallwitz, Christian Riess
Abstract	High resolution electroluminescence (EL) images captured in the infrared spectrum allow to visually and non-destructively inspect the quality of photovoltaic (PV) modules. Currently, however, such a visual inspection requires trained experts to discern different kind of defects, which is time-consuming and expensive. In this work, we propose a robust automated segmentation method for extraction of individual solar cells from EL images of PV modules. Automated segmentation of cells is a key step in automating the visual inspection workflow. It also enables controlled studies on large amounts of data to understanding the effects of module degradation over time-a process not yet fully understood. The proposed method infers in several steps a high-level solar module representation from low-level edge features. An important step in the algorithm is to formulate the segmentation problem in terms of lens calibration by exploiting the plumbline constraint. We evaluate our method on a dataset of various solar modules types containing a total of 408 solar cells with various defects. Our method robustly solves this task with a median weighted Jaccard index of 95.09% and an $F_1$ score of 97.23%, both indicating a very high similarity between automatically segmented and ground truth solar cell masks.
Tasks	Calibration
Published	2018-06-18
URL	http://arxiv.org/abs/1806.06530v2
PDF	http://arxiv.org/pdf/1806.06530v2.pdf
PWC	https://paperswithcode.com/paper/segmentation-of-photovoltaic-module-cells-in
Repo	https://github.com/zae-bayern/elpv-dataset
Framework	none

Unity: A General Platform for Intelligent Agents


Title	Unity: A General Platform for Intelligent Agents
Authors	Arthur Juliani, Vincent-Pierre Berges, Esh Vckay, Yuan Gao, Hunter Henry, Marwan Mattar, Danny Lange
Abstract	Recent advances in Deep Reinforcement Learning and Robotics have been driven by the presence of increasingly realistic and complex simulation environments. Many of the existing platforms, however, provide either unrealistic visuals, inaccurate physics, low task complexity, or a limited capacity for interaction among artificial agents. Furthermore, many platforms lack the ability to flexibly configure the simulation, hence turning the simulation environment into a black-box from the perspective of the learning system. Here we describe a new open source toolkit for creating and interacting with simulation environments using the Unity platform: Unity ML-Agents Toolkit. By taking advantage of Unity as a simulation platform, the toolkit enables the development of learning environments which are rich in sensory and physical complexity, provide compelling cognitive challenges, and support dynamic multi-agent interaction. We detail the platform design, communication protocol, set of example environments, and variety of training scenarios made possible via the toolkit.
Tasks
Published	2018-09-07
URL	http://arxiv.org/abs/1809.02627v1
PDF	http://arxiv.org/pdf/1809.02627v1.pdf
PWC	https://paperswithcode.com/paper/unity-a-general-platform-for-intelligent
Repo	https://github.com/saya1984/MonsterCarlo2
Framework	tf

Applying Faster R-CNN for Object Detection on Malaria Images


Title	Applying Faster R-CNN for Object Detection on Malaria Images
Authors	Jane Hung, Deepali Ravel, Stefanie C. P. Lopes, Gabriel Rangel, Odailton Amaral Nery, Benoit Malleret, Francois Nosten, Marcus V. G. Lacerda, Marcelo U. Ferreira, Laurent Rénia, Manoj T. Duraisingh, Fabio T. M. Costa, Matthias Marti, Anne E. Carpenter
Abstract	Deep learning based models have had great success in object detection, but the state of the art models have not yet been widely applied to biological image data. We apply for the first time an object detection model previously used on natural images to identify cells and recognize their stages in brightfield microscopy images of malaria-infected blood. Many micro-organisms like malaria parasites are still studied by expert manual inspection and hand counting. This type of object detection task is challenging due to factors like variations in cell shape, density, and color, and uncertainty of some cell classes. In addition, annotated data useful for training is scarce, and the class distribution is inherently highly imbalanced due to the dominance of uninfected red blood cells. We use Faster Region-based Convolutional Neural Network (Faster R-CNN), one of the top performing object detection models in recent years, pre-trained on ImageNet but fine tuned with our data, and compare it to a baseline, which is based on a traditional approach consisting of cell segmentation, extraction of several single-cell features, and classification using random forests. To conduct our initial study, we collect and label a dataset of 1300 fields of view consisting of around 100,000 individual cells. We demonstrate that Faster R-CNN outperforms our baseline and put the results in context of human performance.
Tasks	Cell Segmentation, Object Detection
Published	2018-04-25
URL	http://arxiv.org/abs/1804.09548v2
PDF	http://arxiv.org/pdf/1804.09548v2.pdf
PWC	https://paperswithcode.com/paper/applying-faster-r-cnn-for-object-detection-on
Repo	https://github.com/tobsecret/Awesome_Malaria_Parasite_Imaging_Datasets
Framework	none

Comparison-Based Convolutional Neural Networks for Cervical Cell/Clumps Detection in the Limited Data Scenario


Title	Comparison-Based Convolutional Neural Networks for Cervical Cell/Clumps Detection in the Limited Data Scenario
Authors	Yixiong Liang, Zhihong Tang, Meng Yan, Jialin Chen, Qing Liu, Yao Xiang
Abstract	Automated detection of cervical cancer cells or cell clumps has the potential to significantly reduce error rate and increase productivity in cervical cancer screening. However, most traditional methods rely on the success of accurate cell segmentation and discriminative hand-crafted features extraction. Recently there are emerging deep learning-based methods which train convolutional neural networks (CNN) to classify image patches, but they are computationally expensive. In this paper we propose an efficient CNN-based object detection methods for cervical cancer cells/clumps detection. Specifically, we utilize the state-of-the-art two-stage object detection method, the Faster-RCNN with Feature Pyramid Network (FPN) as the baseline and propose a novel comparison detector to deal with the limited data problem. The key idea is that classify the proposals by comparing with the reference samples of each category in object detection. In addition, we propose to learn the reference samples of the background from data instead of manually choosing them by some heuristic rules. Experimental results show that the proposed Comparison Detector yields significant improvement on the small dataset, achieving a mean Average Precision (mAP) of 26.3% and an Average Recall (AR) of 35.7%, both improving about 20 points compared to the baseline. Moreover, Comparison Detector improved AR by 4.6 points and achieved marginally better performance in terms of mAP compared with baseline model when training on the medium dataset. Our method is promising for the development of automation-assisted cervical cancer screening systems. Code is available at https://github.com/kuku-sichuan/ComparisonDetector.
Tasks	Cell Segmentation, Few-Shot Learning, Object Detection
Published	2018-10-14
URL	https://arxiv.org/abs/1810.05952v5
PDF	https://arxiv.org/pdf/1810.05952v5.pdf
PWC	https://paperswithcode.com/paper/comparison-detector-a-novel-object-detection
Repo	https://github.com/kuku-sichuan/ComparisonDetector
Framework	tf

Deep Predictive Coding Network with Local Recurrent Processing for Object Recognition


Title	Deep Predictive Coding Network with Local Recurrent Processing for Object Recognition
Authors	Kuan Han, Haiguang Wen, Yizhen Zhang, Di Fu, Eugenio Culurciello, Zhongming Liu
Abstract	Inspired by “predictive coding” - a theory in neuroscience, we develop a bi-directional and dynamic neural network with local recurrent processing, namely predictive coding network (PCN). Unlike feedforward-only convolutional neural networks, PCN includes both feedback connections, which carry top-down predictions, and feedforward connections, which carry bottom-up errors of prediction. Feedback and feedforward connections enable adjacent layers to interact locally and recurrently to refine representations towards minimization of layer-wise prediction errors. When unfolded over time, the recurrent processing gives rise to an increasingly deeper hierarchy of non-linear transformation, allowing a shallow network to dynamically extend itself into an arbitrarily deep network. We train and test PCN for image classification with SVHN, CIFAR and ImageNet datasets. Despite notably fewer layers and parameters, PCN achieves competitive performance compared to classical and state-of-the-art models. Further analysis shows that the internal representations in PCN converge over time and yield increasingly better accuracy in object recognition. Errors of top-down prediction also reveal visual saliency or bottom-up attention.
Tasks	Image Classification, Object Recognition
Published	2018-05-19
URL	http://arxiv.org/abs/1805.07526v2
PDF	http://arxiv.org/pdf/1805.07526v2.pdf
PWC	https://paperswithcode.com/paper/deep-predictive-coding-network-with-local
Repo	https://github.com/takyamamoto/Local-Predictive_Coding_Network-with_Chainer
Framework	none

Benchmarking Reinforcement Learning Algorithms on Real-World Robots


Title	Benchmarking Reinforcement Learning Algorithms on Real-World Robots
Authors	A. Rupam Mahmood, Dmytro Korenkevych, Gautham Vasan, William Ma, James Bergstra
Abstract	Through many recent successes in simulation, model-free reinforcement learning has emerged as a promising approach to solving continuous control robotic tasks. The research community is now able to reproduce, analyze and build quickly on these results due to open source implementations of learning algorithms and simulated benchmark tasks. To carry forward these successes to real-world applications, it is crucial to withhold utilizing the unique advantages of simulations that do not transfer to the real world and experiment directly with physical robots. However, reinforcement learning research with physical robots faces substantial resistance due to the lack of benchmark tasks and supporting source code. In this work, we introduce several reinforcement learning tasks with multiple commercially available robots that present varying levels of learning difficulty, setup, and repeatability. On these tasks, we test the learning performance of off-the-shelf implementations of four reinforcement learning algorithms and analyze sensitivity to their hyper-parameters to determine their readiness for applications in various real-world tasks. Our results show that with a careful setup of the task interface and computations, some of these implementations can be readily applicable to physical robots. We find that state-of-the-art learning algorithms are highly sensitive to their hyper-parameters and their relative ordering does not transfer across tasks, indicating the necessity of re-tuning them for each task for best performance. On the other hand, the best hyper-parameter configuration from one task may often result in effective learning on held-out tasks even with different robots, providing a reasonable default. We make the benchmark tasks publicly available to enhance reproducibility in real-world reinforcement learning.
Tasks	Continuous Control
Published	2018-09-20
URL	http://arxiv.org/abs/1809.07731v1
PDF	http://arxiv.org/pdf/1809.07731v1.pdf
PWC	https://paperswithcode.com/paper/benchmarking-reinforcement-learning
Repo	https://github.com/kindredresearch/SenseAct
Framework	none

Toward an AI Physicist for Unsupervised Learning


Title	Toward an AI Physicist for Unsupervised Learning
Authors	Tailin Wu, Max Tegmark
Abstract	We investigate opportunities and challenges for improving unsupervised machine learning using four common strategies with a long history in physics: divide-and-conquer, Occam’s razor, unification and lifelong learning. Instead of using one model to learn everything, we propose a novel paradigm centered around the learning and manipulation of theories, which parsimoniously predict both aspects of the future (from past observations) and the domain in which these predictions are accurate. Specifically, we propose a novel generalized-mean-loss to encourage each theory to specialize in its comparatively advantageous domain, and a differentiable description length objective to downweight bad data and “snap” learned theories into simple symbolic formulas. Theories are stored in a “theory hub”, which continuously unifies learned theories and can propose theories when encountering new environments. We test our implementation, the toy “AI Physicist” learning agent, on a suite of increasingly complex physics environments. From unsupervised observation of trajectories through worlds involving random combinations of gravity, electromagnetism, harmonic motion and elastic bounces, our agent typically learns faster and produces mean-squared prediction errors about a billion times smaller than a standard feedforward neural net of comparable complexity, typically recovering integer and rational theory parameters exactly. Our agent successfully identifies domains with different laws of motion also for a nonlinear chaotic double pendulum in a piecewise constant force field.
Tasks
Published	2018-10-24
URL	https://arxiv.org/abs/1810.10525v4
PDF	https://arxiv.org/pdf/1810.10525v4.pdf
PWC	https://paperswithcode.com/paper/toward-an-ai-physicist-for-unsupervised
Repo	https://github.com/tailintalent/AI_physicist
Framework	pytorch

SingleGAN: Image-to-Image Translation by a Single-Generator Network using Multiple Generative Adversarial Learning


Title	SingleGAN: Image-to-Image Translation by a Single-Generator Network using Multiple Generative Adversarial Learning
Authors	Xiaoming Yu, Xing Cai, Zhenqiang Ying, Thomas Li, Ge Li
Abstract	Image translation is a burgeoning field in computer vision where the goal is to learn the mapping between an input image and an output image. However, most recent methods require multiple generators for modeling different domain mappings, which are inefficient and ineffective on some multi-domain image translation tasks. In this paper, we propose a novel method, SingleGAN, to perform multi-domain image-to-image translations with a single generator. We introduce the domain code to explicitly control the different generative tasks and integrate multiple optimization goals to ensure the translation. Experimental results on several unpaired datasets show superior performance of our model in translation between two domains. Besides, we explore variants of SingleGAN for different tasks, including one-to-many domain translation, many-to-many domain translation and one-to-one domain translation with multimodality. The extended experiments show the universality and extensibility of our model.
Tasks	Image-to-Image Translation
Published	2018-10-11
URL	http://arxiv.org/abs/1810.04991v1
PDF	http://arxiv.org/pdf/1810.04991v1.pdf
PWC	https://paperswithcode.com/paper/singlegan-image-to-image-translation-by-a
Repo	https://github.com/Xiaoming-Yu/SingleGAN
Framework	pytorch

DeepFM: An End-to-End Wide & Deep Learning Framework for CTR Prediction


Title	DeepFM: An End-to-End Wide & Deep Learning Framework for CTR Prediction
Authors	Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, Xiuqiang He, Zhenhua Dong
Abstract	Learning sophisticated feature interactions behind user behaviors is critical in maximizing CTR for recommender systems. Despite great progress, existing methods have a strong bias towards low- or high-order interactions, or rely on expertise feature engineering. In this paper, we show that it is possible to derive an end-to-end learning model that emphasizes both low- and high-order feature interactions. The proposed framework, DeepFM, combines the power of factorization machines for recommendation and deep learning for feature learning in a new neural network architecture. Compared to the latest Wide & Deep model from Google, DeepFM has a shared raw feature input to both its “wide” and “deep” components, with no need of feature engineering besides raw features. DeepFM, as a general learning framework, can incorporate various network architectures in its deep component. In this paper, we study two instances of DeepFM where its “deep” component is DNN and PNN respectively, for which we denote as DeepFM-D and DeepFM-P. Comprehensive experiments are conducted to demonstrate the effectiveness of DeepFM-D and DeepFM-P over the existing models for CTR prediction, on both benchmark data and commercial data. We conduct online A/B test in Huawei App Market, which reveals that DeepFM-D leads to more than 10% improvement of click-through rate in the production environment, compared to a well-engineered LR model. We also covered related practice in deploying our framework in Huawei App Market.
Tasks	Click-Through Rate Prediction, Feature Engineering, Recommendation Systems
Published	2018-04-12
URL	http://arxiv.org/abs/1804.04950v2
PDF	http://arxiv.org/pdf/1804.04950v2.pdf
PWC	https://paperswithcode.com/paper/deepfm-an-end-to-end-wide-deep-learning
Repo	https://github.com/Taewook-Ko/Recommender-System-Papers
Framework	none