May 7, 2019

3084 words 15 mins read

Paper Group AWR 8

Concrete Problems in AI Safety. 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction. Straight to Shapes: Real-time Detection of Encoded Shapes. A Convolutional Attention Network for Extreme Summarization of Source Code. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. Optical Flow Requires Multiple Strateg …

Concrete Problems in AI Safety


Title	Concrete Problems in AI Safety
Authors	Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané
Abstract	Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society. In this paper we discuss one such potential impact: the problem of accidents in machine learning systems, defined as unintended and harmful behavior that may emerge from poor design of real-world AI systems. We present a list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function (“avoiding side effects” and “avoiding reward hacking”), an objective function that is too expensive to evaluate frequently (“scalable supervision”), or undesirable behavior during the learning process (“safe exploration” and “distributional shift”). We review previous work in these areas as well as suggesting research directions with a focus on relevance to cutting-edge AI systems. Finally, we consider the high-level question of how to think most productively about the safety of forward-looking applications of AI.
Tasks	Safe Exploration
Published	2016-06-21
URL	http://arxiv.org/abs/1606.06565v2
PDF	http://arxiv.org/pdf/1606.06565v2.pdf
PWC	https://paperswithcode.com/paper/concrete-problems-in-ai-safety
Repo	https://github.com/mateuszjurewicz/bornhack_ml_crashcourse
Framework	tf

3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction


Title	3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
Authors	Christopher B. Choy, Danfei Xu, JunYoung Gwak, Kevin Chen, Silvio Savarese
Abstract	Inspired by the recent success of methods that employ shape priors to achieve robust 3D reconstructions, we propose a novel recurrent neural network architecture that we call the 3D Recurrent Reconstruction Neural Network (3D-R2N2). The network learns a mapping from images of objects to their underlying 3D shapes from a large collection of synthetic data. Our network takes in one or more images of an object instance from arbitrary viewpoints and outputs a reconstruction of the object in the form of a 3D occupancy grid. Unlike most of the previous works, our network does not require any image annotations or object class labels for training or testing. Our extensive experimental analysis shows that our reconstruction framework i) outperforms the state-of-the-art methods for single view reconstruction, and ii) enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline).
Tasks	3D Object Reconstruction, 3D Reconstruction, Object Reconstruction
Published	2016-04-02
URL	http://arxiv.org/abs/1604.00449v1
PDF	http://arxiv.org/pdf/1604.00449v1.pdf
PWC	https://paperswithcode.com/paper/3d-r2n2-a-unified-approach-for-single-and
Repo	https://github.com/Amaranth819/3dr2n2-tensorflow
Framework	tf

Straight to Shapes: Real-time Detection of Encoded Shapes


Title	Straight to Shapes: Real-time Detection of Encoded Shapes
Authors	Saumya Jetley, Michael Sapienza, Stuart Golodetz, Philip H. S. Torr
Abstract	Current object detection approaches predict bounding boxes, but these provide little instance-specific information beyond location, scale and aspect ratio. In this work, we propose to directly regress to objects’ shapes in addition to their bounding boxes and categories. It is crucial to find an appropriate shape representation that is compact and decodable, and in which objects can be compared for higher-order concepts such as view similarity, pose variation and occlusion. To achieve this, we use a denoising convolutional auto-encoder to establish an embedding space, and place the decoder after a fast end-to-end network trained to regress directly to the encoded shape vectors. This yields what to the best of our knowledge is the first real-time shape prediction network, running at ~35 FPS on a high-end desktop. With higher-order shape reasoning well-integrated into the network pipeline, the network shows the useful practical quality of generalising to unseen categories similar to the ones in the training set, something that most existing approaches fail to handle.
Tasks	Denoising, Object Detection
Published	2016-11-23
URL	http://arxiv.org/abs/1611.07932v2
PDF	http://arxiv.org/pdf/1611.07932v2.pdf
PWC	https://paperswithcode.com/paper/straight-to-shapes-real-time-detection-of
Repo	https://github.com/torrvision/straighttoshapes
Framework	none

A Convolutional Attention Network for Extreme Summarization of Source Code


Title	A Convolutional Attention Network for Extreme Summarization of Source Code
Authors	Miltiadis Allamanis, Hao Peng, Charles Sutton
Abstract	Attention mechanisms in neural networks have proved useful for problems in which the input and output do not have fixed dimension. Often there exist features that are locally translation invariant and would be valuable for directing the model’s attention, but previous attentional architectures are not constructed to learn such features specifically. We introduce an attentional neural network that employs convolution on the input tokens to detect local time-invariant and long-range topical attention features in a context-dependent way. We apply this architecture to the problem of extreme summarization of source code snippets into short, descriptive function name-like summaries. Using those features, the model sequentially generates a summary by marginalizing over two attention mechanisms: one that predicts the next summary token based on the attention weights of the input tokens and another that is able to copy a code token as-is directly into the summary. We demonstrate our convolutional attention neural network’s performance on 10 popular Java projects showing that it achieves better performance compared to previous attentional mechanisms.
Tasks
Published	2016-02-09
URL	http://arxiv.org/abs/1602.03001v2
PDF	http://arxiv.org/pdf/1602.03001v2.pdf
PWC	https://paperswithcode.com/paper/a-convolutional-attention-network-for-extreme
Repo	https://github.com/samialabed/method-name-prediction
Framework	tf

NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis


Title	NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis
Authors	Amir Shahroudy, Jun Liu, Tian-Tsong Ng, Gang Wang
Abstract	Recent approaches in depth-based human activity analysis achieved outstanding performance and proved the effectiveness of 3D representation for classification of action classes. Currently available depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of training samples, distinct class labels, camera views and variety of subjects. In this paper we introduce a large-scale dataset for RGB+D human action recognition with more than 56 thousand video samples and 4 million frames, collected from 40 distinct subjects. Our dataset contains 60 different action classes including daily, mutual, and health-related actions. In addition, we propose a new recurrent neural network structure to model the long-term temporal correlation of the features for each body part, and utilize them for better action classification. Experimental results show the advantages of applying deep learning methods over state-of-the-art hand-crafted features on the suggested cross-subject and cross-view evaluation criteria for our dataset. The introduction of this large scale dataset will enable the community to apply, develop and adapt various data-hungry learning techniques for the task of depth-based and RGB+D-based human activity analysis.
Tasks	3D Human Action Recognition, Action Classification, Skeleton Based Action Recognition
Published	2016-04-11
URL	http://arxiv.org/abs/1604.02808v1
PDF	http://arxiv.org/pdf/1604.02808v1.pdf
PWC	https://paperswithcode.com/paper/ntu-rgbd-a-large-scale-dataset-for-3d-human
Repo	https://github.com/maxstrobel/HCN-PrototypeLoss-PyTorch
Framework	pytorch

Optical Flow Requires Multiple Strategies (but only one network)


Title	Optical Flow Requires Multiple Strategies (but only one network)
Authors	Tal Schuster, Lior Wolf, David Gadot
Abstract	We show that the matching problem that underlies optical flow requires multiple strategies, depending on the amount of image motion and other factors. We then study the implications of this observation on training a deep neural network for representing image patches in the context of descriptor based optical flow. We propose a metric learning method, which selects suitable negative samples based on the nature of the true match. This type of training produces a network that displays multiple strategies depending on the input and leads to state of the art results on the KITTI 2012 and KITTI 2015 optical flow benchmarks.
Tasks	Metric Learning, Optical Flow Estimation
Published	2016-11-17
URL	http://arxiv.org/abs/1611.05607v3
PDF	http://arxiv.org/pdf/1611.05607v3.pdf
PWC	https://paperswithcode.com/paper/optical-flow-requires-multiple-strategies-but
Repo	https://github.com/DediGadot/PatchBatch
Framework	none

Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space


Title	Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space
Authors	Anh Nguyen, Jeff Clune, Yoshua Bengio, Alexey Dosovitskiy, Jason Yosinski
Abstract	Generating high-resolution, photo-realistic images has been a long-standing goal in machine learning. Recently, Nguyen et al. (2016) showed one interesting way to synthesize novel images by performing gradient ascent in the latent space of a generator network to maximize the activations of one or multiple neurons in a separate classifier network. In this paper we extend this method by introducing an additional prior on the latent code, improving both sample quality and sample diversity, leading to a state-of-the-art generative model that produces high quality images at higher resolutions (227x227) than previous generative models, and does so for all 1000 ImageNet categories. In addition, we provide a unified probabilistic interpretation of related activation maximization methods and call the general class of models “Plug and Play Generative Networks”. PPGNs are composed of 1) a generator network G that is capable of drawing a wide range of image types and 2) a replaceable “condition” network C that tells the generator what to draw. We demonstrate the generation of images conditioned on a class (when C is an ImageNet or MIT Places classification network) and also conditioned on a caption (when C is an image captioning network). Our method also improves the state of the art of Multifaceted Feature Visualization, which generates the set of synthetic inputs that activate a neuron in order to better understand how deep neural networks operate. Finally, we show that our model performs reasonably well at the task of image inpainting. While image models are used in this paper, the approach is modality-agnostic and can be applied to many types of data.
Tasks	Image Captioning, Image Inpainting
Published	2016-11-30
URL	http://arxiv.org/abs/1612.00005v2
PDF	http://arxiv.org/pdf/1612.00005v2.pdf
PWC	https://paperswithcode.com/paper/plug-play-generative-networks-conditional
Repo	https://github.com/Evolving-AI-Lab/ppgn
Framework	caffe2

Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring


Title	Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring
Authors	Seungjun Nah, Tae Hyun Kim, Kyoung Mu Lee
Abstract	Non-uniform blind deblurring for general dynamic scenes is a challenging computer vision problem as blurs arise not only from multiple object motions but also from camera shake, scene depth variation. To remove these complicated motion blurs, conventional energy optimization based methods rely on simple assumptions such that blur kernel is partially uniform or locally linear. Moreover, recent machine learning based methods also depend on synthetic blur datasets generated under these assumptions. This makes conventional deblurring methods fail to remove blurs where blur kernel is difficult to approximate or parameterize (e.g. object motion boundaries). In this work, we propose a multi-scale convolutional neural network that restores sharp images in an end-to-end manner where blur is caused by various sources. Together, we present multi-scale loss function that mimics conventional coarse-to-fine approaches. Furthermore, we propose a new large-scale dataset that provides pairs of realistic blurry image and the corresponding ground truth sharp image that are obtained by a high-speed camera. With the proposed model trained on this dataset, we demonstrate empirically that our method achieves the state-of-the-art performance in dynamic scene deblurring not only qualitatively, but also quantitatively.
Tasks	Deblurring
Published	2016-12-07
URL	http://arxiv.org/abs/1612.02177v2
PDF	http://arxiv.org/pdf/1612.02177v2.pdf
PWC	https://paperswithcode.com/paper/deep-multi-scale-convolutional-neural-network
Repo	https://github.com/SeungjunNah/DeepDeblur_release
Framework	torch

Discovering and Deciphering Relationships Across Disparate Data Modalities


Title	Discovering and Deciphering Relationships Across Disparate Data Modalities
Authors	Joshua T. Vogelstein, Eric Bridgeford, Qing Wang, Carey E. Priebe, Mauro Maggioni, Cencheng Shen
Abstract	Understanding the relationships between different properties of data, such as whether a connectome or genome has information about disease status, is becoming increasingly important in modern biological datasets. While existing approaches can test whether two properties are related, they often require unfeasibly large sample sizes in real data scenarios, and do not provide any insight into how or why the procedure reached its decision. Our approach, “Multiscale Graph Correlation” (MGC), is a dependence test that juxtaposes previously disparate data science techniques, including k-nearest neighbors, kernel methods (such as support vector machines), and multiscale analysis (such as wavelets). Other methods typically require double or triple the number samples to achieve the same statistical power as MGC in a benchmark suite including high-dimensional and nonlinear relationships - spanning polynomial (linear, quadratic, cubic), trigonometric (sinusoidal, circular, ellipsoidal, spiral), geometric (square, diamond, W-shape), and other functions, with dimensionality ranging from 1 to 1000. Moreover, MGC uniquely provides a simple and elegant characterization of the potentially complex latent geometry underlying the relationship, providing insight while maintaining computational efficiency. In several real data applications, including brain imaging and cancer genetics, MGC is the only method that can both detect the presence of a dependency and provide specific guidance for the next experiment and/or analysis to conduct.
Tasks
Published	2016-09-16
URL	http://arxiv.org/abs/1609.05148v8
PDF	http://arxiv.org/pdf/1609.05148v8.pdf
PWC	https://paperswithcode.com/paper/discovering-and-deciphering-relationships
Repo	https://github.com/neurodata/r-mgc
Framework	none

LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection


Title	LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection
Authors	Pankaj Malhotra, Anusha Ramakrishnan, Gaurangi Anand, Lovekesh Vig, Puneet Agarwal, Gautam Shroff
Abstract	Mechanical devices such as engines, vehicles, aircrafts, etc., are typically instrumented with numerous sensors to capture the behavior and health of the machine. However, there are often external factors or variables which are not captured by sensors leading to time-series which are inherently unpredictable. For instance, manual controls and/or unmonitored environmental conditions or load may lead to inherently unpredictable time-series. Detecting anomalies in such scenarios becomes challenging using standard approaches based on mathematical models that rely on stationarity, or prediction models that utilize prediction errors to detect anomalies. We propose a Long Short Term Memory Networks based Encoder-Decoder scheme for Anomaly Detection (EncDec-AD) that learns to reconstruct ‘normal’ time-series behavior, and thereafter uses reconstruction error to detect anomalies. We experiment with three publicly available quasi predictable time-series datasets: power demand, space shuttle, and ECG, and two real-world engine datasets with both predictive and unpredictable behavior. We show that EncDec-AD is robust and can detect anomalies from predictable, unpredictable, periodic, aperiodic, and quasi-periodic time-series. Further, we show that EncDec-AD is able to detect anomalies from short time-series (length as small as 30) as well as long time-series (length as large as 500).
Tasks	Anomaly Detection, Outlier Detection, Time Series, Time Series Classification
Published	2016-07-01
URL	http://arxiv.org/abs/1607.00148v2
PDF	http://arxiv.org/pdf/1607.00148v2.pdf
PWC	https://paperswithcode.com/paper/lstm-based-encoder-decoder-for-multi-sensor
Repo	https://github.com/freedombenLiu/RNN-Time-series-Anomaly-Detection
Framework	pytorch

Lie-Access Neural Turing Machines


Title	Lie-Access Neural Turing Machines
Authors	Greg Yang, Alexander M. Rush
Abstract	External neural memory structures have recently become a popular tool for algorithmic deep learning (Graves et al. 2014, Weston et al. 2014). These models generally utilize differentiable versions of traditional discrete memory-access structures (random access, stacks, tapes) to provide the storage necessary for computational tasks. In this work, we argue that these neural memory systems lack specific structure important for relative indexing, and propose an alternative model, Lie-access memory, that is explicitly designed for the neural setting. In this paradigm, memory is accessed using a continuous head in a key-space manifold. The head is moved via Lie group actions, such as shifts or rotations, generated by a controller, and memory access is performed by linear smoothing in key space. We argue that Lie groups provide a natural generalization of discrete memory structures, such as Turing machines, as they provide inverse and identity operators while maintaining differentiability. To experiment with this approach, we implement a simplified Lie-access neural Turing machine (LANTM) with different Lie groups. We find that this approach is able to perform well on a range of algorithmic tasks.
Tasks
Published	2016-11-09
URL	http://arxiv.org/abs/1611.02854v2
PDF	http://arxiv.org/pdf/1611.02854v2.pdf
PWC	https://paperswithcode.com/paper/lie-access-neural-turing-machines
Repo	https://github.com/harvardnlp/lie-access-memory
Framework	torch

Pyramid Scene Parsing Network


Title	Pyramid Scene Parsing Network
Authors	Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia
Abstract	Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction tasks. The proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes.
Tasks	Lesion Segmentation, Real-Time Semantic Segmentation, Scene Parsing, Semantic Segmentation
Published	2016-12-04
URL	http://arxiv.org/abs/1612.01105v2
PDF	http://arxiv.org/pdf/1612.01105v2.pdf
PWC	https://paperswithcode.com/paper/pyramid-scene-parsing-network
Repo	https://github.com/monsieurmona/LinkCollectionAutonomousDriving
Framework	none

Enhancing the LexVec Distributed Word Representation Model Using Positional Contexts and External Memory


Title	Enhancing the LexVec Distributed Word Representation Model Using Positional Contexts and External Memory
Authors	Alexandre Salle, Marco Idiart, Aline Villavicencio
Abstract	In this paper we take a state-of-the-art model for distributed word representation that explicitly factorizes the positive pointwise mutual information (PPMI) matrix using window sampling and negative sampling and address two of its shortcomings. We improve syntactic performance by using positional contexts, and solve the need to store the PPMI matrix in memory by working on aggregate data in external memory. The effectiveness of both modifications is shown using word similarity and analogy tasks.
Tasks
Published	2016-06-03
URL	http://arxiv.org/abs/1606.01283v1
PDF	http://arxiv.org/pdf/1606.01283v1.pdf
PWC	https://paperswithcode.com/paper/enhancing-the-lexvec-distributed-word
Repo	https://github.com/alexandres/lexvec
Framework	none

Sequence-to-sequence neural network models for transliteration


Title	Sequence-to-sequence neural network models for transliteration
Authors	Mihaela Rosca, Thomas Breuel
Abstract	Transliteration is a key component of machine translation systems and software internationalization. This paper demonstrates that neural sequence-to-sequence models obtain state of the art or close to state of the art results on existing datasets. In an effort to make machine transliteration accessible, we open source a new Arabic to English transliteration dataset and our trained models.
Tasks	Machine Translation, Transliteration
Published	2016-10-29
URL	http://arxiv.org/abs/1610.09565v1
PDF	http://arxiv.org/pdf/1610.09565v1.pdf
PWC	https://paperswithcode.com/paper/sequence-to-sequence-neural-network-models
Repo	https://github.com/google/transliteration
Framework	none

Representation Learning with Deconvolution for Multivariate Time Series Classification and Visualization


Title	Representation Learning with Deconvolution for Multivariate Time Series Classification and Visualization
Authors	Zhiguang Wang, Wei Song, Lu Liu, Fan Zhang, Junxiao Xue, Yangdong Ye, Ming Fan, Mingliang Xu
Abstract	We propose a new model based on the deconvolutional networks and SAX discretization to learn the representation for multivariate time series. Deconvolutional networks fully exploit the advantage the powerful expressiveness of deep neural networks in the manner of unsupervised learning. We design a network structure specifically to capture the cross-channel correlation with deconvolution, forcing the pooling operation to perform the dimension reduction along each position in the individual channel. Discretization based on Symbolic Aggregate Approximation is applied on the feature vectors to further extract the bag of features. We show how this representation and bag of features helps on classification. A full comparison with the sequence distance based approach is provided to demonstrate the effectiveness of our approach on the standard datasets. We further build the Markov matrix from the discretized representation from the deconvolution to visualize the time series as complex networks, which show more class-specific statistical properties and clear structures with respect to different labels.
Tasks	Dimensionality Reduction, Representation Learning, Time Series, Time Series Classification
Published	2016-10-24
URL	http://arxiv.org/abs/1610.07258v3
PDF	http://arxiv.org/pdf/1610.07258v3.pdf
PWC	https://paperswithcode.com/paper/representation-learning-with-deconvolution
Repo	https://github.com/cauchyturing/Deconv_SAX
Framework	none