Paper Group AWR 8
Concrete Problems in AI Safety. 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction. Straight to Shapes: Real-time Detection of Encoded Shapes. A Convolutional Attention Network for Extreme Summarization of Source Code. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. Optical Flow Requires Multiple Strateg …
Concrete Problems in AI Safety
Title | Concrete Problems in AI Safety |
Authors | Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané |
Abstract | Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society. In this paper we discuss one such potential impact: the problem of accidents in machine learning systems, defined as unintended and harmful behavior that may emerge from poor design of real-world AI systems. We present a list of five practical research problems related to accident risk, categorized according to whether the problem originates from having the wrong objective function (“avoiding side effects” and “avoiding reward hacking”), an objective function that is too expensive to evaluate frequently (“scalable supervision”), or undesirable behavior during the learning process (“safe exploration” and “distributional shift”). We review previous work in these areas as well as suggesting research directions with a focus on relevance to cutting-edge AI systems. Finally, we consider the high-level question of how to think most productively about the safety of forward-looking applications of AI. |
Tasks | Safe Exploration |
Published | 2016-06-21 |
URL | http://arxiv.org/abs/1606.06565v2 |
http://arxiv.org/pdf/1606.06565v2.pdf | |
PWC | https://paperswithcode.com/paper/concrete-problems-in-ai-safety |
Repo | https://github.com/mateuszjurewicz/bornhack_ml_crashcourse |
Framework | tf |
3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
Title | 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction |
Authors | Christopher B. Choy, Danfei Xu, JunYoung Gwak, Kevin Chen, Silvio Savarese |
Abstract | Inspired by the recent success of methods that employ shape priors to achieve robust 3D reconstructions, we propose a novel recurrent neural network architecture that we call the 3D Recurrent Reconstruction Neural Network (3D-R2N2). The network learns a mapping from images of objects to their underlying 3D shapes from a large collection of synthetic data. Our network takes in one or more images of an object instance from arbitrary viewpoints and outputs a reconstruction of the object in the form of a 3D occupancy grid. Unlike most of the previous works, our network does not require any image annotations or object class labels for training or testing. Our extensive experimental analysis shows that our reconstruction framework i) outperforms the state-of-the-art methods for single view reconstruction, and ii) enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline). |
Tasks | 3D Object Reconstruction, 3D Reconstruction, Object Reconstruction |
Published | 2016-04-02 |
URL | http://arxiv.org/abs/1604.00449v1 |
http://arxiv.org/pdf/1604.00449v1.pdf | |
PWC | https://paperswithcode.com/paper/3d-r2n2-a-unified-approach-for-single-and |
Repo | https://github.com/Amaranth819/3dr2n2-tensorflow |
Framework | tf |
Straight to Shapes: Real-time Detection of Encoded Shapes
Title | Straight to Shapes: Real-time Detection of Encoded Shapes |
Authors | Saumya Jetley, Michael Sapienza, Stuart Golodetz, Philip H. S. Torr |
Abstract | Current object detection approaches predict bounding boxes, but these provide little instance-specific information beyond location, scale and aspect ratio. In this work, we propose to directly regress to objects’ shapes in addition to their bounding boxes and categories. It is crucial to find an appropriate shape representation that is compact and decodable, and in which objects can be compared for higher-order concepts such as view similarity, pose variation and occlusion. To achieve this, we use a denoising convolutional auto-encoder to establish an embedding space, and place the decoder after a fast end-to-end network trained to regress directly to the encoded shape vectors. This yields what to the best of our knowledge is the first real-time shape prediction network, running at ~35 FPS on a high-end desktop. With higher-order shape reasoning well-integrated into the network pipeline, the network shows the useful practical quality of generalising to unseen categories similar to the ones in the training set, something that most existing approaches fail to handle. |
Tasks | Denoising, Object Detection |
Published | 2016-11-23 |
URL | http://arxiv.org/abs/1611.07932v2 |
http://arxiv.org/pdf/1611.07932v2.pdf | |
PWC | https://paperswithcode.com/paper/straight-to-shapes-real-time-detection-of |
Repo | https://github.com/torrvision/straighttoshapes |
Framework | none |
A Convolutional Attention Network for Extreme Summarization of Source Code
Title | A Convolutional Attention Network for Extreme Summarization of Source Code |
Authors | Miltiadis Allamanis, Hao Peng, Charles Sutton |
Abstract | Attention mechanisms in neural networks have proved useful for problems in which the input and output do not have fixed dimension. Often there exist features that are locally translation invariant and would be valuable for directing the model’s attention, but previous attentional architectures are not constructed to learn such features specifically. We introduce an attentional neural network that employs convolution on the input tokens to detect local time-invariant and long-range topical attention features in a context-dependent way. We apply this architecture to the problem of extreme summarization of source code snippets into short, descriptive function name-like summaries. Using those features, the model sequentially generates a summary by marginalizing over two attention mechanisms: one that predicts the next summary token based on the attention weights of the input tokens and another that is able to copy a code token as-is directly into the summary. We demonstrate our convolutional attention neural network’s performance on 10 popular Java projects showing that it achieves better performance compared to previous attentional mechanisms. |
Tasks | |
Published | 2016-02-09 |
URL | http://arxiv.org/abs/1602.03001v2 |
http://arxiv.org/pdf/1602.03001v2.pdf | |
PWC | https://paperswithcode.com/paper/a-convolutional-attention-network-for-extreme |
Repo | https://github.com/samialabed/method-name-prediction |
Framework | tf |
NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis
Title | NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis |
Authors | Amir Shahroudy, Jun Liu, Tian-Tsong Ng, Gang Wang |
Abstract | Recent approaches in depth-based human activity analysis achieved outstanding performance and proved the effectiveness of 3D representation for classification of action classes. Currently available depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of training samples, distinct class labels, camera views and variety of subjects. In this paper we introduce a large-scale dataset for RGB+D human action recognition with more than 56 thousand video samples and 4 million frames, collected from 40 distinct subjects. Our dataset contains 60 different action classes including daily, mutual, and health-related actions. In addition, we propose a new recurrent neural network structure to model the long-term temporal correlation of the features for each body part, and utilize them for better action classification. Experimental results show the advantages of applying deep learning methods over state-of-the-art hand-crafted features on the suggested cross-subject and cross-view evaluation criteria for our dataset. The introduction of this large scale dataset will enable the community to apply, develop and adapt various data-hungry learning techniques for the task of depth-based and RGB+D-based human activity analysis. |
Tasks | 3D Human Action Recognition, Action Classification, Skeleton Based Action Recognition |
Published | 2016-04-11 |
URL | http://arxiv.org/abs/1604.02808v1 |
http://arxiv.org/pdf/1604.02808v1.pdf | |
PWC | https://paperswithcode.com/paper/ntu-rgbd-a-large-scale-dataset-for-3d-human |
Repo | https://github.com/maxstrobel/HCN-PrototypeLoss-PyTorch |
Framework | pytorch |
Optical Flow Requires Multiple Strategies (but only one network)
Title | Optical Flow Requires Multiple Strategies (but only one network) |
Authors | Tal Schuster, Lior Wolf, David Gadot |
Abstract | We show that the matching problem that underlies optical flow requires multiple strategies, depending on the amount of image motion and other factors. We then study the implications of this observation on training a deep neural network for representing image patches in the context of descriptor based optical flow. We propose a metric learning method, which selects suitable negative samples based on the nature of the true match. This type of training produces a network that displays multiple strategies depending on the input and leads to state of the art results on the KITTI 2012 and KITTI 2015 optical flow benchmarks. |
Tasks | Metric Learning, Optical Flow Estimation |
Published | 2016-11-17 |
URL | http://arxiv.org/abs/1611.05607v3 |
http://arxiv.org/pdf/1611.05607v3.pdf | |
PWC | https://paperswithcode.com/paper/optical-flow-requires-multiple-strategies-but |
Repo | https://github.com/DediGadot/PatchBatch |
Framework | none |
Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space
Title | Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space |
Authors | Anh Nguyen, Jeff Clune, Yoshua Bengio, Alexey Dosovitskiy, Jason Yosinski |
Abstract | Generating high-resolution, photo-realistic images has been a long-standing goal in machine learning. Recently, Nguyen et al. (2016) showed one interesting way to synthesize novel images by performing gradient ascent in the latent space of a generator network to maximize the activations of one or multiple neurons in a separate classifier network. In this paper we extend this method by introducing an additional prior on the latent code, improving both sample quality and sample diversity, leading to a state-of-the-art generative model that produces high quality images at higher resolutions (227x227) than previous generative models, and does so for all 1000 ImageNet categories. In addition, we provide a unified probabilistic interpretation of related activation maximization methods and call the general class of models “Plug and Play Generative Networks”. PPGNs are composed of 1) a generator network G that is capable of drawing a wide range of image types and 2) a replaceable “condition” network C that tells the generator what to draw. We demonstrate the generation of images conditioned on a class (when C is an ImageNet or MIT Places classification network) and also conditioned on a caption (when C is an image captioning network). Our method also improves the state of the art of Multifaceted Feature Visualization, which generates the set of synthetic inputs that activate a neuron in order to better understand how deep neural networks operate. Finally, we show that our model performs reasonably well at the task of image inpainting. While image models are used in this paper, the approach is modality-agnostic and can be applied to many types of data. |
Tasks | Image Captioning, Image Inpainting |
Published | 2016-11-30 |
URL | http://arxiv.org/abs/1612.00005v2 |
http://arxiv.org/pdf/1612.00005v2.pdf | |
PWC | https://paperswithcode.com/paper/plug-play-generative-networks-conditional |
Repo | https://github.com/Evolving-AI-Lab/ppgn |
Framework | caffe2 |
Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring
Title | Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring |
Authors | Seungjun Nah, Tae Hyun Kim, Kyoung Mu Lee |
Abstract | Non-uniform blind deblurring for general dynamic scenes is a challenging computer vision problem as blurs arise not only from multiple object motions but also from camera shake, scene depth variation. To remove these complicated motion blurs, conventional energy optimization based methods rely on simple assumptions such that blur kernel is partially uniform or locally linear. Moreover, recent machine learning based methods also depend on synthetic blur datasets generated under these assumptions. This makes conventional deblurring methods fail to remove blurs where blur kernel is difficult to approximate or parameterize (e.g. object motion boundaries). In this work, we propose a multi-scale convolutional neural network that restores sharp images in an end-to-end manner where blur is caused by various sources. Together, we present multi-scale loss function that mimics conventional coarse-to-fine approaches. Furthermore, we propose a new large-scale dataset that provides pairs of realistic blurry image and the corresponding ground truth sharp image that are obtained by a high-speed camera. With the proposed model trained on this dataset, we demonstrate empirically that our method achieves the state-of-the-art performance in dynamic scene deblurring not only qualitatively, but also quantitatively. |
Tasks | Deblurring |
Published | 2016-12-07 |
URL | http://arxiv.org/abs/1612.02177v2 |
http://arxiv.org/pdf/1612.02177v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-multi-scale-convolutional-neural-network |
Repo | https://github.com/SeungjunNah/DeepDeblur_release |
Framework | torch |
Discovering and Deciphering Relationships Across Disparate Data Modalities
Title | Discovering and Deciphering Relationships Across Disparate Data Modalities |
Authors | Joshua T. Vogelstein, Eric Bridgeford, Qing Wang, Carey E. Priebe, Mauro Maggioni, Cencheng Shen |
Abstract | Understanding the relationships between different properties of data, such as whether a connectome or genome has information about disease status, is becoming increasingly important in modern biological datasets. While existing approaches can test whether two properties are related, they often require unfeasibly large sample sizes in real data scenarios, and do not provide any insight into how or why the procedure reached its decision. Our approach, “Multiscale Graph Correlation” (MGC), is a dependence test that juxtaposes previously disparate data science techniques, including k-nearest neighbors, kernel methods (such as support vector machines), and multiscale analysis (such as wavelets). Other methods typically require double or triple the number samples to achieve the same statistical power as MGC in a benchmark suite including high-dimensional and nonlinear relationships - spanning polynomial (linear, quadratic, cubic), trigonometric (sinusoidal, circular, ellipsoidal, spiral), geometric (square, diamond, W-shape), and other functions, with dimensionality ranging from 1 to 1000. Moreover, MGC uniquely provides a simple and elegant characterization of the potentially complex latent geometry underlying the relationship, providing insight while maintaining computational efficiency. In several real data applications, including brain imaging and cancer genetics, MGC is the only method that can both detect the presence of a dependency and provide specific guidance for the next experiment and/or analysis to conduct. |
Tasks | |
Published | 2016-09-16 |
URL | http://arxiv.org/abs/1609.05148v8 |
http://arxiv.org/pdf/1609.05148v8.pdf | |
PWC | https://paperswithcode.com/paper/discovering-and-deciphering-relationships |
Repo | https://github.com/neurodata/r-mgc |
Framework | none |
LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection
Title | LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection |
Authors | Pankaj Malhotra, Anusha Ramakrishnan, Gaurangi Anand, Lovekesh Vig, Puneet Agarwal, Gautam Shroff |
Abstract | Mechanical devices such as engines, vehicles, aircrafts, etc., are typically instrumented with numerous sensors to capture the behavior and health of the machine. However, there are often external factors or variables which are not captured by sensors leading to time-series which are inherently unpredictable. For instance, manual controls and/or unmonitored environmental conditions or load may lead to inherently unpredictable time-series. Detecting anomalies in such scenarios becomes challenging using standard approaches based on mathematical models that rely on stationarity, or prediction models that utilize prediction errors to detect anomalies. We propose a Long Short Term Memory Networks based Encoder-Decoder scheme for Anomaly Detection (EncDec-AD) that learns to reconstruct ‘normal’ time-series behavior, and thereafter uses reconstruction error to detect anomalies. We experiment with three publicly available quasi predictable time-series datasets: power demand, space shuttle, and ECG, and two real-world engine datasets with both predictive and unpredictable behavior. We show that EncDec-AD is robust and can detect anomalies from predictable, unpredictable, periodic, aperiodic, and quasi-periodic time-series. Further, we show that EncDec-AD is able to detect anomalies from short time-series (length as small as 30) as well as long time-series (length as large as 500). |
Tasks | Anomaly Detection, Outlier Detection, Time Series, Time Series Classification |
Published | 2016-07-01 |
URL | http://arxiv.org/abs/1607.00148v2 |
http://arxiv.org/pdf/1607.00148v2.pdf | |
PWC | https://paperswithcode.com/paper/lstm-based-encoder-decoder-for-multi-sensor |
Repo | https://github.com/freedombenLiu/RNN-Time-series-Anomaly-Detection |
Framework | pytorch |
Lie-Access Neural Turing Machines
Title | Lie-Access Neural Turing Machines |
Authors | Greg Yang, Alexander M. Rush |
Abstract | External neural memory structures have recently become a popular tool for algorithmic deep learning (Graves et al. 2014, Weston et al. 2014). These models generally utilize differentiable versions of traditional discrete memory-access structures (random access, stacks, tapes) to provide the storage necessary for computational tasks. In this work, we argue that these neural memory systems lack specific structure important for relative indexing, and propose an alternative model, Lie-access memory, that is explicitly designed for the neural setting. In this paradigm, memory is accessed using a continuous head in a key-space manifold. The head is moved via Lie group actions, such as shifts or rotations, generated by a controller, and memory access is performed by linear smoothing in key space. We argue that Lie groups provide a natural generalization of discrete memory structures, such as Turing machines, as they provide inverse and identity operators while maintaining differentiability. To experiment with this approach, we implement a simplified Lie-access neural Turing machine (LANTM) with different Lie groups. We find that this approach is able to perform well on a range of algorithmic tasks. |
Tasks | |
Published | 2016-11-09 |
URL | http://arxiv.org/abs/1611.02854v2 |
http://arxiv.org/pdf/1611.02854v2.pdf | |
PWC | https://paperswithcode.com/paper/lie-access-neural-turing-machines |
Repo | https://github.com/harvardnlp/lie-access-memory |
Framework | torch |
Pyramid Scene Parsing Network
Title | Pyramid Scene Parsing Network |
Authors | Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia |
Abstract | Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction tasks. The proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes. |
Tasks | Lesion Segmentation, Real-Time Semantic Segmentation, Scene Parsing, Semantic Segmentation |
Published | 2016-12-04 |
URL | http://arxiv.org/abs/1612.01105v2 |
http://arxiv.org/pdf/1612.01105v2.pdf | |
PWC | https://paperswithcode.com/paper/pyramid-scene-parsing-network |
Repo | https://github.com/monsieurmona/LinkCollectionAutonomousDriving |
Framework | none |
Enhancing the LexVec Distributed Word Representation Model Using Positional Contexts and External Memory
Title | Enhancing the LexVec Distributed Word Representation Model Using Positional Contexts and External Memory |
Authors | Alexandre Salle, Marco Idiart, Aline Villavicencio |
Abstract | In this paper we take a state-of-the-art model for distributed word representation that explicitly factorizes the positive pointwise mutual information (PPMI) matrix using window sampling and negative sampling and address two of its shortcomings. We improve syntactic performance by using positional contexts, and solve the need to store the PPMI matrix in memory by working on aggregate data in external memory. The effectiveness of both modifications is shown using word similarity and analogy tasks. |
Tasks | |
Published | 2016-06-03 |
URL | http://arxiv.org/abs/1606.01283v1 |
http://arxiv.org/pdf/1606.01283v1.pdf | |
PWC | https://paperswithcode.com/paper/enhancing-the-lexvec-distributed-word |
Repo | https://github.com/alexandres/lexvec |
Framework | none |
Sequence-to-sequence neural network models for transliteration
Title | Sequence-to-sequence neural network models for transliteration |
Authors | Mihaela Rosca, Thomas Breuel |
Abstract | Transliteration is a key component of machine translation systems and software internationalization. This paper demonstrates that neural sequence-to-sequence models obtain state of the art or close to state of the art results on existing datasets. In an effort to make machine transliteration accessible, we open source a new Arabic to English transliteration dataset and our trained models. |
Tasks | Machine Translation, Transliteration |
Published | 2016-10-29 |
URL | http://arxiv.org/abs/1610.09565v1 |
http://arxiv.org/pdf/1610.09565v1.pdf | |
PWC | https://paperswithcode.com/paper/sequence-to-sequence-neural-network-models |
Repo | https://github.com/google/transliteration |
Framework | none |
Representation Learning with Deconvolution for Multivariate Time Series Classification and Visualization
Title | Representation Learning with Deconvolution for Multivariate Time Series Classification and Visualization |
Authors | Zhiguang Wang, Wei Song, Lu Liu, Fan Zhang, Junxiao Xue, Yangdong Ye, Ming Fan, Mingliang Xu |
Abstract | We propose a new model based on the deconvolutional networks and SAX discretization to learn the representation for multivariate time series. Deconvolutional networks fully exploit the advantage the powerful expressiveness of deep neural networks in the manner of unsupervised learning. We design a network structure specifically to capture the cross-channel correlation with deconvolution, forcing the pooling operation to perform the dimension reduction along each position in the individual channel. Discretization based on Symbolic Aggregate Approximation is applied on the feature vectors to further extract the bag of features. We show how this representation and bag of features helps on classification. A full comparison with the sequence distance based approach is provided to demonstrate the effectiveness of our approach on the standard datasets. We further build the Markov matrix from the discretized representation from the deconvolution to visualize the time series as complex networks, which show more class-specific statistical properties and clear structures with respect to different labels. |
Tasks | Dimensionality Reduction, Representation Learning, Time Series, Time Series Classification |
Published | 2016-10-24 |
URL | http://arxiv.org/abs/1610.07258v3 |
http://arxiv.org/pdf/1610.07258v3.pdf | |
PWC | https://paperswithcode.com/paper/representation-learning-with-deconvolution |
Repo | https://github.com/cauchyturing/Deconv_SAX |
Framework | none |