October 21, 2019

3092 words 15 mins read

Paper Group AWR 31

Automated Treatment Planning in Radiation Therapy using Generative Adversarial Networks. Stochastic Chebyshev Gradient Descent for Spectral Optimization. Trust Region Based Adversarial Attack on Neural Networks. Quantifying the amount of visual information used by neural caption generators. Predicting Future Instance Segmentation by Forecasting Con …

Automated Treatment Planning in Radiation Therapy using Generative Adversarial Networks


Title	Automated Treatment Planning in Radiation Therapy using Generative Adversarial Networks
Authors	Rafid Mahmood, Aaron Babier, Andrea McNiven, Adam Diamant, Timothy C. Y. Chan
Abstract	Knowledge-based planning (KBP) is an automated approach to radiation therapy treatment planning that involves predicting desirable treatment plans before they are then corrected to deliverable ones. We propose a generative adversarial network (GAN) approach for predicting desirable 3D dose distributions that eschews the previous paradigms of site-specific feature engineering and predicting low-dimensional representations of the plan. Experiments on a dataset of oropharyngeal cancer patients show that our approach significantly outperforms previous methods on several clinical satisfaction criteria and similarity metrics.
Tasks	Feature Engineering
Published	2018-07-17
URL	http://arxiv.org/abs/1807.06489v1
PDF	http://arxiv.org/pdf/1807.06489v1.pdf
PWC	https://paperswithcode.com/paper/automated-treatment-planning-in-radiation
Repo	https://github.com/rafidrm/gancer
Framework	pytorch

Stochastic Chebyshev Gradient Descent for Spectral Optimization


Title	Stochastic Chebyshev Gradient Descent for Spectral Optimization
Authors	Insu Han, Haim Avron, Jinwoo Shin
Abstract	A large class of machine learning techniques requires the solution of optimization problems involving spectral functions of parametric matrices, e.g. log-determinant and nuclear norm. Unfortunately, computing the gradient of a spectral function is generally of cubic complexity, as such gradient descent methods are rather expensive for optimizing objectives involving the spectral function. Thus, one naturally turns to stochastic gradient methods in hope that they will provide a way to reduce or altogether avoid the computation of full gradients. However, here a new challenge appears: there is no straightforward way to compute unbiased stochastic gradients for spectral functions. In this paper, we develop unbiased stochastic gradients for spectral-sums, an important subclass of spectral functions. Our unbiased stochastic gradients are based on combining randomized trace estimators with stochastic truncation of the Chebyshev expansions. A careful design of the truncation distribution allows us to offer distributions that are variance-optimal, which is crucial for fast and stable convergence of stochastic gradient methods. We further leverage our proposed stochastic gradients to devise stochastic methods for objective functions involving spectral-sums, and rigorously analyze their convergence rate. The utility of our methods is demonstrated in numerical experiments.
Tasks
Published	2018-02-18
URL	http://arxiv.org/abs/1802.06355v3
PDF	http://arxiv.org/pdf/1802.06355v3.pdf
PWC	https://paperswithcode.com/paper/stochastic-chebyshev-gradient-descent-for
Repo	https://github.com/EiffL/SpectralFlow
Framework	tf

Trust Region Based Adversarial Attack on Neural Networks


Title	Trust Region Based Adversarial Attack on Neural Networks
Authors	Zhewei Yao, Amir Gholami, Peng Xu, Kurt Keutzer, Michael Mahoney
Abstract	Deep Neural Networks are quite vulnerable to adversarial perturbations. Current state-of-the-art adversarial attack methods typically require very time consuming hyper-parameter tuning, or require many iterations to solve an optimization based adversarial attack. To address this problem, we present a new family of trust region based adversarial attacks, with the goal of computing adversarial perturbations efficiently. We propose several attacks based on variants of the trust region optimization method. We test the proposed methods on Cifar-10 and ImageNet datasets using several different models including AlexNet, ResNet-50, VGG-16, and DenseNet-121 models. Our methods achieve comparable results with the Carlini-Wagner (CW) attack, but with significant speed up of up to $37\times$, for the VGG-16 model on a Titan Xp GPU. For the case of ResNet-50 on ImageNet, we can bring down its classification accuracy to less than 0.1% with at most $1.5%$ relative $L_\infty$ (or $L_2$) perturbation requiring only $1.02$ seconds as compared to $27.04$ seconds for the CW attack. We have open sourced our method which can be accessed at [1].
Tasks	Adversarial Attack
Published	2018-12-16
URL	http://arxiv.org/abs/1812.06371v1
PDF	http://arxiv.org/pdf/1812.06371v1.pdf
PWC	https://paperswithcode.com/paper/trust-region-based-adversarial-attack-on
Repo	https://github.com/amirgholami/trattack
Framework	pytorch

Quantifying the amount of visual information used by neural caption generators


Title	Quantifying the amount of visual information used by neural caption generators
Authors	Marc Tanti, Albert Gatt, Kenneth P. Camilleri
Abstract	This paper addresses the sensitivity of neural image caption generators to their visual input. A sensitivity analysis and omission analysis based on image foils is reported, showing that the extent to which image captioning architectures retain and are sensitive to visual information varies depending on the type of word being generated and the position in the caption as a whole. We motivate this work in the context of broader goals in the field to achieve more explainability in AI.
Tasks	Image Captioning
Published	2018-10-12
URL	http://arxiv.org/abs/1810.05475v1
PDF	http://arxiv.org/pdf/1810.05475v1.pdf
PWC	https://paperswithcode.com/paper/quantifying-the-amount-of-visual-information
Repo	https://github.com/mtanti/quantifing-visual-information
Framework	tf

Predicting Future Instance Segmentation by Forecasting Convolutional Features


Title	Predicting Future Instance Segmentation by Forecasting Convolutional Features
Authors	Pauline Luc, Camille Couprie, Yann LeCun, Jakob Verbeek
Abstract	Anticipating future events is an important prerequisite towards intelligent behavior. Video forecasting has been studied as a proxy task towards this goal. Recent work has shown that to predict semantic segmentation of future frames, forecasting at the semantic level is more effective than forecasting RGB frames and then segmenting these. In this paper we consider the more challenging problem of future instance segmentation, which additionally segments out individual objects. To deal with a varying number of output labels per image, we develop a predictive model in the space of fixed-sized convolutional features of the Mask R-CNN instance segmentation model. We apply the “detection head’” of Mask R-CNN on the predicted features to produce the instance segmentation of future frames. Experiments show that this approach significantly improves over strong baselines based on optical flow and repurposed instance segmentation architectures.
Tasks	Instance Segmentation, Optical Flow Estimation, Semantic Segmentation, Video Prediction
Published	2018-03-30
URL	http://arxiv.org/abs/1803.11496v2
PDF	http://arxiv.org/pdf/1803.11496v2.pdf
PWC	https://paperswithcode.com/paper/predicting-future-instance-segmentation-by
Repo	https://github.com/facebookresearch/instpred
Framework	pytorch

Improving Coverage and Runtime Complexity for Exact Inference in Non-Projective Transition-Based Dependency Parsers


Title	Improving Coverage and Runtime Complexity for Exact Inference in Non-Projective Transition-Based Dependency Parsers
Authors	Tianze Shi, Carlos Gómez-Rodríguez, Lillian Lee
Abstract	We generalize Cohen, G'omez-Rodr'iguez, and Satta’s (2011) parser to a family of non-projective transition-based dependency parsers allowing polynomial-time exact inference. This includes novel parsers with better coverage than Cohen et al. (2011), and even a variant that reduces time complexity to $O(n^6)$, improving over the known bounds in exact inference for non-projective transition-based parsing. We hope that this piece of theoretical work inspires design of novel transition systems with better coverage and better run-time guarantees. Code available at https://github.com/tzshi/nonproj-dp-variants-naacl2018
Tasks
Published	2018-04-27
URL	http://arxiv.org/abs/1804.10615v2
PDF	http://arxiv.org/pdf/1804.10615v2.pdf
PWC	https://paperswithcode.com/paper/improving-coverage-and-runtime-complexity-for
Repo	https://github.com/tzshi/nonproj-dp-variants-naacl2018
Framework	none

Deep Functional Dictionaries: Learning Consistent Semantic Structures on 3D Models from Functions


Title	Deep Functional Dictionaries: Learning Consistent Semantic Structures on 3D Models from Functions
Authors	Minhyuk Sung, Hao Su, Ronald Yu, Leonidas Guibas
Abstract	Various 3D semantic attributes such as segmentation masks, geometric features, keypoints, and materials can be encoded as per-point probe functions on 3D geometries. Given a collection of related 3D shapes, we consider how to jointly analyze such probe functions over different shapes, and how to discover common latent structures using a neural network — even in the absence of any correspondence information. Our network is trained on point cloud representations of shape geometry and associated semantic functions on that point cloud. These functions express a shared semantic understanding of the shapes but are not coordinated in any way. For example, in a segmentation task, the functions can be indicator functions of arbitrary sets of shape parts, with the particular combination involved not known to the network. Our network is able to produce a small dictionary of basis functions for each shape, a dictionary whose span includes the semantic functions provided for that shape. Even though our shapes have independent discretizations and no functional correspondences are provided, the network is able to generate latent bases, in a consistent order, that reflect the shared semantic structure among the shapes. We demonstrate the effectiveness of our technique in various segmentation and keypoint selection applications.
Tasks
Published	2018-05-25
URL	http://arxiv.org/abs/1805.09957v3
PDF	http://arxiv.org/pdf/1805.09957v3.pdf
PWC	https://paperswithcode.com/paper/deep-functional-dictionaries-learning
Repo	https://github.com/mhsung/deep-functional-dictionaries
Framework	tf

Open Source Presentation Attack Detection Baseline for Iris Recognition


Title	Open Source Presentation Attack Detection Baseline for Iris Recognition
Authors	Joseph McGrath, Kevin W. Bowyer, Adam Czajka
Abstract	This paper proposes the first, known to us, open source presentation attack detection (PAD) solution to distinguish between authentic iris images (possibly wearing clear contact lenses) and irises with textured contact lenses. This software can serve as a baseline in various PAD evaluations, and also as an open-source platform with an up-to-date reference method for iris PAD. The software is written in C++ and Python and uses only open source resources, such as OpenCV. This method does not incorporate iris image segmentation, which may be problematic for unknown fake samples. Instead, it makes a best guess to localize the rough position of the iris. The PAD-related features are extracted with the Binary Statistical Image Features (BSIF), which are classified by an ensemble of classifiers incorporating support vector machine, random forest and multilayer perceptron. The models attached to the current software have been trained with the NDCLD’15 database and evaluated on the independent datasets included in the LivDet-Iris 2017 competition. The software implements the functionality of retraining the classifiers with any database of authentic and attack images. The accuracy of the current version offered with this paper exceeds 99% when tested on subject-disjoint subsets of NDCLD’15, and oscillates around 85% when tested on the LivDet-Iris 2017 benchmarks, which is on par with the results obtained by the LivDet-Iris 2017 winner.
Tasks	Iris Recognition, Semantic Segmentation
Published	2018-09-26
URL	https://arxiv.org/abs/1809.10172v2
PDF	https://arxiv.org/pdf/1809.10172v2.pdf
PWC	https://paperswithcode.com/paper/open-source-presentation-attack-detection
Repo	https://github.com/CVRL/OpenSourceIrisPAD
Framework	none

Domain-Specific Human-Inspired Binarized Statistical Image Features for Iris Recognition


Title	Domain-Specific Human-Inspired Binarized Statistical Image Features for Iris Recognition
Authors	Adam Czajka, Daniel Moreira, Kevin W. Bowyer, Patrick J. Flynn
Abstract	Binarized statistical image features (BSIF) have been successfully used for texture analysis in many computer vision tasks, including iris recognition and biometric presentation attack detection. One important point is that all applications of BSIF in iris recognition have used the original BSIF filters, which were trained on image patches extracted from natural images. This paper tests the question of whether domain-specific BSIF can give better performance than the default BSIF. The second important point is in the selection of image patches to use in training for BSIF. Can image patches derived from eye-tracking experiments, in which humans perform an iris recognition task, give better performance than random patches? Our results say that (1) domain-specific BSIF features can out-perform the default BSIF features, and (2) selecting image patches in a task-specific manner guided by human performance can out-perform selecting random patches. These results are important because BSIF is often regarded as a generic texture tool that does not need any domain adaptation, and human-task-guided selection of patches for training has never (to our knowledge) been done. This paper follows the reproducible research requirements, and the new iris-domain-specific BSIF filters, the patches used in filter training, the database used in testing and the source codes of the designed iris recognition method are made available along with this paper to facilitate applications of this concept.
Tasks	Domain Adaptation, Eye Tracking, Iris Recognition, Texture Classification
Published	2018-07-13
URL	http://arxiv.org/abs/1807.05248v2
PDF	http://arxiv.org/pdf/1807.05248v2.pdf
PWC	https://paperswithcode.com/paper/domain-specific-human-inspired-binarized
Repo	https://github.com/CVRL/domain-specific-BSIF-for-iris-recognition
Framework	none

Meta-Learning Update Rules for Unsupervised Representation Learning


Title	Meta-Learning Update Rules for Unsupervised Representation Learning
Authors	Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein
Abstract	A major goal of unsupervised learning is to discover data representations that are useful for subsequent tasks, without access to supervised labels during training. Typically, this involves minimizing a surrogate objective, such as the negative log likelihood of a generative model, with the hope that representations useful for subsequent tasks will arise as a side effect. In this work, we propose instead to directly target later desired tasks by meta-learning an unsupervised learning rule which leads to representations useful for those tasks. Specifically, we target semi-supervised classification performance, and we meta-learn an algorithm – an unsupervised weight update rule – that produces representations useful for this task. Additionally, we constrain our unsupervised update rule to a be a biologically-motivated, neuron-local function, which enables it to generalize to different neural network architectures, datasets, and data modalities. We show that the meta-learned update rule produces useful features and sometimes outperforms existing unsupervised learning techniques. We further show that the meta-learned unsupervised update rule generalizes to train networks with different widths, depths, and nonlinearities. It also generalizes to train on data with randomly permuted input dimensions and even generalizes from image datasets to a text task.
Tasks	Meta-Learning, Representation Learning, Unsupervised Representation Learning
Published	2018-03-31
URL	http://arxiv.org/abs/1804.00222v3
PDF	http://arxiv.org/pdf/1804.00222v3.pdf
PWC	https://paperswithcode.com/paper/meta-learning-update-rules-for-unsupervised
Repo	https://github.com/tensorflow/models/tree/master/research/learning_unsupervised_learning
Framework	tf

Wizard of Wikipedia: Knowledge-Powered Conversational agents


Title	Wizard of Wikipedia: Knowledge-Powered Conversational agents
Authors	Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, Jason Weston
Abstract	In open-domain dialogue intelligent agents should exhibit the use of knowledge, however there are few convincing demonstrations of this to date. The most popular sequence to sequence models typically “generate and hope” generic utterances that can be memorized in the weights of the model when mapping from input utterance(s) to output, rather than employing recalled knowledge as context. Use of knowledge has so far proved difficult, in part because of the lack of a supervised learning benchmark task which exhibits knowledgeable open dialogue with clear grounding. To that end we collect and release a large dataset with conversations directly grounded with knowledge retrieved from Wikipedia. We then design architectures capable of retrieving knowledge, reading and conditioning on it, and finally generating natural responses. Our best performing dialogue models are able to conduct knowledgeable discussions on open-domain topics as evaluated by automatic metrics and human evaluations, while our new benchmark allows for measuring further improvements in this important research direction.
Tasks
Published	2018-11-03
URL	http://arxiv.org/abs/1811.01241v2
PDF	http://arxiv.org/pdf/1811.01241v2.pdf
PWC	https://paperswithcode.com/paper/wizard-of-wikipedia-knowledge-powered
Repo	https://github.com/NinaTian98369/Papers
Framework	none

Action Completion: A Temporal Model for Moment Detection


Title	Action Completion: A Temporal Model for Moment Detection
Authors	Farnoosh Heidarivincheh, Majid Mirmehdi, Dima Damen
Abstract	We introduce completion moment detection for actions - the problem of locating the moment of completion, when the action’s goal is confidently considered achieved. The paper proposes a joint classification-regression recurrent model that predicts completion from a given frame, and then integrates frame-level contributions to detect sequence-level completion moment. We introduce a recurrent voting node that predicts the frame’s relative position of the completion moment by either classification or regression. The method is also capable of detecting incompletion. For example, the method is capable of detecting a missed ball-catch, as well as the moment at which the ball is safely caught. We test the method on 16 actions from three public datasets, covering sports as well as daily actions. Results show that when combining contributions from frames prior to the completion moment as well as frames post completion, the completion moment is detected within one second in 89% of all tested sequences.
Tasks
Published	2018-05-17
URL	http://arxiv.org/abs/1805.06749v2
PDF	http://arxiv.org/pdf/1805.06749v2.pdf
PWC	https://paperswithcode.com/paper/action-completion-a-temporal-model-for-moment
Repo	https://github.com/FarnooshHeidari/CompletionDetection
Framework	none

Clebsch-Gordan Nets: a Fully Fourier Space Spherical Convolutional Neural Network


Title	Clebsch-Gordan Nets: a Fully Fourier Space Spherical Convolutional Neural Network
Authors	Risi Kondor, Zhen Lin, Shubhendu Trivedi
Abstract	Recent work by Cohen \emph{et al.} has achieved state-of-the-art results for learning spherical images in a rotation invariant way by using ideas from group representation theory and noncommutative harmonic analysis. In this paper we propose a generalization of this work that generally exhibits improved performace, but from an implementation point of view is actually simpler. An unusual feature of the proposed architecture is that it uses the Clebsch–Gordan transform as its only source of nonlinearity, thus avoiding repeated forward and backward Fourier transforms. The underlying ideas of the paper generalize to constructing neural networks that are invariant to the action of other compact groups.
Tasks
Published	2018-06-24
URL	http://arxiv.org/abs/1806.09231v2
PDF	http://arxiv.org/pdf/1806.09231v2.pdf
PWC	https://paperswithcode.com/paper/clebsch-gordan-nets-a-fully-fourier-space
Repo	https://github.com/zlin7/CGNet
Framework	pytorch

Anomaly Detection using One-Class Neural Networks


Title	Anomaly Detection using One-Class Neural Networks
Authors	Raghavendra Chalapathy, Aditya Krishna Menon, Sanjay Chawla
Abstract	We propose a one-class neural network (OC-NN) model to detect anomalies in complex data sets. OC-NN combines the ability of deep networks to extract a progressively rich representation of data with the one-class objective of creating a tight envelope around normal data. The OC-NN approach breaks new ground for the following crucial reason: data representation in the hidden layer is driven by the OC-NN objective and is thus customized for anomaly detection. This is a departure from other approaches which use a hybrid approach of learning deep features using an autoencoder and then feeding the features into a separate anomaly detection method like one-class SVM (OC-SVM). The hybrid OC-SVM approach is sub-optimal because it is unable to influence representational learning in the hidden layers. A comprehensive set of experiments demonstrate that on complex data sets (like CIFAR and GTSRB), OC-NN performs on par with state-of-the-art methods and outperformed conventional shallow methods in some scenarios.
Tasks	Anomaly Detection
Published	2018-02-18
URL	http://arxiv.org/abs/1802.06360v2
PDF	http://arxiv.org/pdf/1802.06360v2.pdf
PWC	https://paperswithcode.com/paper/anomaly-detection-using-one-class-neural
Repo	https://github.com/raghavchalapathy/oc-nn
Framework	tf

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation


Title	Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation
Authors	Yi Luo, Nima Mesgarani
Abstract	Single-channel, speaker-independent speech separation methods have recently seen great progress. However, the accuracy, latency, and computational cost of such methods remain insufficient. The majority of the previous methods have formulated the separation problem through the time-frequency representation of the mixed signal, which has several drawbacks, including the decoupling of the phase and magnitude of the signal, the suboptimality of time-frequency representation for speech separation, and the long latency in calculating the spectrograms. To address these shortcomings, we propose a fully-convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time-domain speech separation. Conv-TasNet uses a linear encoder to generate a representation of the speech waveform optimized for separating individual speakers. Speaker separation is achieved by applying a set of weighting functions (masks) to the encoder output. The modified encoder representations are then inverted back to the waveforms using a linear decoder. The masks are found using a temporal convolutional network (TCN) consisting of stacked 1-D dilated convolutional blocks, which allows the network to model the long-term dependencies of the speech signal while maintaining a small model size. The proposed Conv-TasNet system significantly outperforms previous time-frequency masking methods in separating two- and three-speaker mixtures. Additionally, Conv-TasNet surpasses several ideal time-frequency magnitude masks in two-speaker speech separation as evaluated by both objective distortion measures and subjective quality assessment by human listeners. Finally, Conv-TasNet has a significantly smaller model size and a shorter minimum latency, making it a suitable solution for both offline and real-time speech separation applications.
Tasks	Speaker Separation, Speech Separation
Published	2018-09-20
URL	https://arxiv.org/abs/1809.07454v3
PDF	https://arxiv.org/pdf/1809.07454v3.pdf
PWC	https://paperswithcode.com/paper/tasnet-surpassing-ideal-time-frequency
Repo	https://github.com/kaituoxu/Conv-TasNet
Framework	pytorch