Paper Group AWR 31
Automated Treatment Planning in Radiation Therapy using Generative Adversarial Networks. Stochastic Chebyshev Gradient Descent for Spectral Optimization. Trust Region Based Adversarial Attack on Neural Networks. Quantifying the amount of visual information used by neural caption generators. Predicting Future Instance Segmentation by Forecasting Con …
Automated Treatment Planning in Radiation Therapy using Generative Adversarial Networks
Title | Automated Treatment Planning in Radiation Therapy using Generative Adversarial Networks |
Authors | Rafid Mahmood, Aaron Babier, Andrea McNiven, Adam Diamant, Timothy C. Y. Chan |
Abstract | Knowledge-based planning (KBP) is an automated approach to radiation therapy treatment planning that involves predicting desirable treatment plans before they are then corrected to deliverable ones. We propose a generative adversarial network (GAN) approach for predicting desirable 3D dose distributions that eschews the previous paradigms of site-specific feature engineering and predicting low-dimensional representations of the plan. Experiments on a dataset of oropharyngeal cancer patients show that our approach significantly outperforms previous methods on several clinical satisfaction criteria and similarity metrics. |
Tasks | Feature Engineering |
Published | 2018-07-17 |
URL | http://arxiv.org/abs/1807.06489v1 |
http://arxiv.org/pdf/1807.06489v1.pdf | |
PWC | https://paperswithcode.com/paper/automated-treatment-planning-in-radiation |
Repo | https://github.com/rafidrm/gancer |
Framework | pytorch |
Stochastic Chebyshev Gradient Descent for Spectral Optimization
Title | Stochastic Chebyshev Gradient Descent for Spectral Optimization |
Authors | Insu Han, Haim Avron, Jinwoo Shin |
Abstract | A large class of machine learning techniques requires the solution of optimization problems involving spectral functions of parametric matrices, e.g. log-determinant and nuclear norm. Unfortunately, computing the gradient of a spectral function is generally of cubic complexity, as such gradient descent methods are rather expensive for optimizing objectives involving the spectral function. Thus, one naturally turns to stochastic gradient methods in hope that they will provide a way to reduce or altogether avoid the computation of full gradients. However, here a new challenge appears: there is no straightforward way to compute unbiased stochastic gradients for spectral functions. In this paper, we develop unbiased stochastic gradients for spectral-sums, an important subclass of spectral functions. Our unbiased stochastic gradients are based on combining randomized trace estimators with stochastic truncation of the Chebyshev expansions. A careful design of the truncation distribution allows us to offer distributions that are variance-optimal, which is crucial for fast and stable convergence of stochastic gradient methods. We further leverage our proposed stochastic gradients to devise stochastic methods for objective functions involving spectral-sums, and rigorously analyze their convergence rate. The utility of our methods is demonstrated in numerical experiments. |
Tasks | |
Published | 2018-02-18 |
URL | http://arxiv.org/abs/1802.06355v3 |
http://arxiv.org/pdf/1802.06355v3.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-chebyshev-gradient-descent-for |
Repo | https://github.com/EiffL/SpectralFlow |
Framework | tf |
Trust Region Based Adversarial Attack on Neural Networks
Title | Trust Region Based Adversarial Attack on Neural Networks |
Authors | Zhewei Yao, Amir Gholami, Peng Xu, Kurt Keutzer, Michael Mahoney |
Abstract | Deep Neural Networks are quite vulnerable to adversarial perturbations. Current state-of-the-art adversarial attack methods typically require very time consuming hyper-parameter tuning, or require many iterations to solve an optimization based adversarial attack. To address this problem, we present a new family of trust region based adversarial attacks, with the goal of computing adversarial perturbations efficiently. We propose several attacks based on variants of the trust region optimization method. We test the proposed methods on Cifar-10 and ImageNet datasets using several different models including AlexNet, ResNet-50, VGG-16, and DenseNet-121 models. Our methods achieve comparable results with the Carlini-Wagner (CW) attack, but with significant speed up of up to $37\times$, for the VGG-16 model on a Titan Xp GPU. For the case of ResNet-50 on ImageNet, we can bring down its classification accuracy to less than 0.1% with at most $1.5%$ relative $L_\infty$ (or $L_2$) perturbation requiring only $1.02$ seconds as compared to $27.04$ seconds for the CW attack. We have open sourced our method which can be accessed at [1]. |
Tasks | Adversarial Attack |
Published | 2018-12-16 |
URL | http://arxiv.org/abs/1812.06371v1 |
http://arxiv.org/pdf/1812.06371v1.pdf | |
PWC | https://paperswithcode.com/paper/trust-region-based-adversarial-attack-on |
Repo | https://github.com/amirgholami/trattack |
Framework | pytorch |
Quantifying the amount of visual information used by neural caption generators
Title | Quantifying the amount of visual information used by neural caption generators |
Authors | Marc Tanti, Albert Gatt, Kenneth P. Camilleri |
Abstract | This paper addresses the sensitivity of neural image caption generators to their visual input. A sensitivity analysis and omission analysis based on image foils is reported, showing that the extent to which image captioning architectures retain and are sensitive to visual information varies depending on the type of word being generated and the position in the caption as a whole. We motivate this work in the context of broader goals in the field to achieve more explainability in AI. |
Tasks | Image Captioning |
Published | 2018-10-12 |
URL | http://arxiv.org/abs/1810.05475v1 |
http://arxiv.org/pdf/1810.05475v1.pdf | |
PWC | https://paperswithcode.com/paper/quantifying-the-amount-of-visual-information |
Repo | https://github.com/mtanti/quantifing-visual-information |
Framework | tf |
Predicting Future Instance Segmentation by Forecasting Convolutional Features
Title | Predicting Future Instance Segmentation by Forecasting Convolutional Features |
Authors | Pauline Luc, Camille Couprie, Yann LeCun, Jakob Verbeek |
Abstract | Anticipating future events is an important prerequisite towards intelligent behavior. Video forecasting has been studied as a proxy task towards this goal. Recent work has shown that to predict semantic segmentation of future frames, forecasting at the semantic level is more effective than forecasting RGB frames and then segmenting these. In this paper we consider the more challenging problem of future instance segmentation, which additionally segments out individual objects. To deal with a varying number of output labels per image, we develop a predictive model in the space of fixed-sized convolutional features of the Mask R-CNN instance segmentation model. We apply the “detection head’” of Mask R-CNN on the predicted features to produce the instance segmentation of future frames. Experiments show that this approach significantly improves over strong baselines based on optical flow and repurposed instance segmentation architectures. |
Tasks | Instance Segmentation, Optical Flow Estimation, Semantic Segmentation, Video Prediction |
Published | 2018-03-30 |
URL | http://arxiv.org/abs/1803.11496v2 |
http://arxiv.org/pdf/1803.11496v2.pdf | |
PWC | https://paperswithcode.com/paper/predicting-future-instance-segmentation-by |
Repo | https://github.com/facebookresearch/instpred |
Framework | pytorch |
Improving Coverage and Runtime Complexity for Exact Inference in Non-Projective Transition-Based Dependency Parsers
Title | Improving Coverage and Runtime Complexity for Exact Inference in Non-Projective Transition-Based Dependency Parsers |
Authors | Tianze Shi, Carlos Gómez-Rodríguez, Lillian Lee |
Abstract | We generalize Cohen, G'omez-Rodr'iguez, and Satta’s (2011) parser to a family of non-projective transition-based dependency parsers allowing polynomial-time exact inference. This includes novel parsers with better coverage than Cohen et al. (2011), and even a variant that reduces time complexity to $O(n^6)$, improving over the known bounds in exact inference for non-projective transition-based parsing. We hope that this piece of theoretical work inspires design of novel transition systems with better coverage and better run-time guarantees. Code available at https://github.com/tzshi/nonproj-dp-variants-naacl2018 |
Tasks | |
Published | 2018-04-27 |
URL | http://arxiv.org/abs/1804.10615v2 |
http://arxiv.org/pdf/1804.10615v2.pdf | |
PWC | https://paperswithcode.com/paper/improving-coverage-and-runtime-complexity-for |
Repo | https://github.com/tzshi/nonproj-dp-variants-naacl2018 |
Framework | none |
Deep Functional Dictionaries: Learning Consistent Semantic Structures on 3D Models from Functions
Title | Deep Functional Dictionaries: Learning Consistent Semantic Structures on 3D Models from Functions |
Authors | Minhyuk Sung, Hao Su, Ronald Yu, Leonidas Guibas |
Abstract | Various 3D semantic attributes such as segmentation masks, geometric features, keypoints, and materials can be encoded as per-point probe functions on 3D geometries. Given a collection of related 3D shapes, we consider how to jointly analyze such probe functions over different shapes, and how to discover common latent structures using a neural network — even in the absence of any correspondence information. Our network is trained on point cloud representations of shape geometry and associated semantic functions on that point cloud. These functions express a shared semantic understanding of the shapes but are not coordinated in any way. For example, in a segmentation task, the functions can be indicator functions of arbitrary sets of shape parts, with the particular combination involved not known to the network. Our network is able to produce a small dictionary of basis functions for each shape, a dictionary whose span includes the semantic functions provided for that shape. Even though our shapes have independent discretizations and no functional correspondences are provided, the network is able to generate latent bases, in a consistent order, that reflect the shared semantic structure among the shapes. We demonstrate the effectiveness of our technique in various segmentation and keypoint selection applications. |
Tasks | |
Published | 2018-05-25 |
URL | http://arxiv.org/abs/1805.09957v3 |
http://arxiv.org/pdf/1805.09957v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-functional-dictionaries-learning |
Repo | https://github.com/mhsung/deep-functional-dictionaries |
Framework | tf |
Open Source Presentation Attack Detection Baseline for Iris Recognition
Title | Open Source Presentation Attack Detection Baseline for Iris Recognition |
Authors | Joseph McGrath, Kevin W. Bowyer, Adam Czajka |
Abstract | This paper proposes the first, known to us, open source presentation attack detection (PAD) solution to distinguish between authentic iris images (possibly wearing clear contact lenses) and irises with textured contact lenses. This software can serve as a baseline in various PAD evaluations, and also as an open-source platform with an up-to-date reference method for iris PAD. The software is written in C++ and Python and uses only open source resources, such as OpenCV. This method does not incorporate iris image segmentation, which may be problematic for unknown fake samples. Instead, it makes a best guess to localize the rough position of the iris. The PAD-related features are extracted with the Binary Statistical Image Features (BSIF), which are classified by an ensemble of classifiers incorporating support vector machine, random forest and multilayer perceptron. The models attached to the current software have been trained with the NDCLD’15 database and evaluated on the independent datasets included in the LivDet-Iris 2017 competition. The software implements the functionality of retraining the classifiers with any database of authentic and attack images. The accuracy of the current version offered with this paper exceeds 99% when tested on subject-disjoint subsets of NDCLD’15, and oscillates around 85% when tested on the LivDet-Iris 2017 benchmarks, which is on par with the results obtained by the LivDet-Iris 2017 winner. |
Tasks | Iris Recognition, Semantic Segmentation |
Published | 2018-09-26 |
URL | https://arxiv.org/abs/1809.10172v2 |
https://arxiv.org/pdf/1809.10172v2.pdf | |
PWC | https://paperswithcode.com/paper/open-source-presentation-attack-detection |
Repo | https://github.com/CVRL/OpenSourceIrisPAD |
Framework | none |
Domain-Specific Human-Inspired Binarized Statistical Image Features for Iris Recognition
Title | Domain-Specific Human-Inspired Binarized Statistical Image Features for Iris Recognition |
Authors | Adam Czajka, Daniel Moreira, Kevin W. Bowyer, Patrick J. Flynn |
Abstract | Binarized statistical image features (BSIF) have been successfully used for texture analysis in many computer vision tasks, including iris recognition and biometric presentation attack detection. One important point is that all applications of BSIF in iris recognition have used the original BSIF filters, which were trained on image patches extracted from natural images. This paper tests the question of whether domain-specific BSIF can give better performance than the default BSIF. The second important point is in the selection of image patches to use in training for BSIF. Can image patches derived from eye-tracking experiments, in which humans perform an iris recognition task, give better performance than random patches? Our results say that (1) domain-specific BSIF features can out-perform the default BSIF features, and (2) selecting image patches in a task-specific manner guided by human performance can out-perform selecting random patches. These results are important because BSIF is often regarded as a generic texture tool that does not need any domain adaptation, and human-task-guided selection of patches for training has never (to our knowledge) been done. This paper follows the reproducible research requirements, and the new iris-domain-specific BSIF filters, the patches used in filter training, the database used in testing and the source codes of the designed iris recognition method are made available along with this paper to facilitate applications of this concept. |
Tasks | Domain Adaptation, Eye Tracking, Iris Recognition, Texture Classification |
Published | 2018-07-13 |
URL | http://arxiv.org/abs/1807.05248v2 |
http://arxiv.org/pdf/1807.05248v2.pdf | |
PWC | https://paperswithcode.com/paper/domain-specific-human-inspired-binarized |
Repo | https://github.com/CVRL/domain-specific-BSIF-for-iris-recognition |
Framework | none |
Meta-Learning Update Rules for Unsupervised Representation Learning
Title | Meta-Learning Update Rules for Unsupervised Representation Learning |
Authors | Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein |
Abstract | A major goal of unsupervised learning is to discover data representations that are useful for subsequent tasks, without access to supervised labels during training. Typically, this involves minimizing a surrogate objective, such as the negative log likelihood of a generative model, with the hope that representations useful for subsequent tasks will arise as a side effect. In this work, we propose instead to directly target later desired tasks by meta-learning an unsupervised learning rule which leads to representations useful for those tasks. Specifically, we target semi-supervised classification performance, and we meta-learn an algorithm – an unsupervised weight update rule – that produces representations useful for this task. Additionally, we constrain our unsupervised update rule to a be a biologically-motivated, neuron-local function, which enables it to generalize to different neural network architectures, datasets, and data modalities. We show that the meta-learned update rule produces useful features and sometimes outperforms existing unsupervised learning techniques. We further show that the meta-learned unsupervised update rule generalizes to train networks with different widths, depths, and nonlinearities. It also generalizes to train on data with randomly permuted input dimensions and even generalizes from image datasets to a text task. |
Tasks | Meta-Learning, Representation Learning, Unsupervised Representation Learning |
Published | 2018-03-31 |
URL | http://arxiv.org/abs/1804.00222v3 |
http://arxiv.org/pdf/1804.00222v3.pdf | |
PWC | https://paperswithcode.com/paper/meta-learning-update-rules-for-unsupervised |
Repo | https://github.com/tensorflow/models/tree/master/research/learning_unsupervised_learning |
Framework | tf |
Wizard of Wikipedia: Knowledge-Powered Conversational agents
Title | Wizard of Wikipedia: Knowledge-Powered Conversational agents |
Authors | Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, Jason Weston |
Abstract | In open-domain dialogue intelligent agents should exhibit the use of knowledge, however there are few convincing demonstrations of this to date. The most popular sequence to sequence models typically “generate and hope” generic utterances that can be memorized in the weights of the model when mapping from input utterance(s) to output, rather than employing recalled knowledge as context. Use of knowledge has so far proved difficult, in part because of the lack of a supervised learning benchmark task which exhibits knowledgeable open dialogue with clear grounding. To that end we collect and release a large dataset with conversations directly grounded with knowledge retrieved from Wikipedia. We then design architectures capable of retrieving knowledge, reading and conditioning on it, and finally generating natural responses. Our best performing dialogue models are able to conduct knowledgeable discussions on open-domain topics as evaluated by automatic metrics and human evaluations, while our new benchmark allows for measuring further improvements in this important research direction. |
Tasks | |
Published | 2018-11-03 |
URL | http://arxiv.org/abs/1811.01241v2 |
http://arxiv.org/pdf/1811.01241v2.pdf | |
PWC | https://paperswithcode.com/paper/wizard-of-wikipedia-knowledge-powered |
Repo | https://github.com/NinaTian98369/Papers |
Framework | none |
Action Completion: A Temporal Model for Moment Detection
Title | Action Completion: A Temporal Model for Moment Detection |
Authors | Farnoosh Heidarivincheh, Majid Mirmehdi, Dima Damen |
Abstract | We introduce completion moment detection for actions - the problem of locating the moment of completion, when the action’s goal is confidently considered achieved. The paper proposes a joint classification-regression recurrent model that predicts completion from a given frame, and then integrates frame-level contributions to detect sequence-level completion moment. We introduce a recurrent voting node that predicts the frame’s relative position of the completion moment by either classification or regression. The method is also capable of detecting incompletion. For example, the method is capable of detecting a missed ball-catch, as well as the moment at which the ball is safely caught. We test the method on 16 actions from three public datasets, covering sports as well as daily actions. Results show that when combining contributions from frames prior to the completion moment as well as frames post completion, the completion moment is detected within one second in 89% of all tested sequences. |
Tasks | |
Published | 2018-05-17 |
URL | http://arxiv.org/abs/1805.06749v2 |
http://arxiv.org/pdf/1805.06749v2.pdf | |
PWC | https://paperswithcode.com/paper/action-completion-a-temporal-model-for-moment |
Repo | https://github.com/FarnooshHeidari/CompletionDetection |
Framework | none |
Clebsch-Gordan Nets: a Fully Fourier Space Spherical Convolutional Neural Network
Title | Clebsch-Gordan Nets: a Fully Fourier Space Spherical Convolutional Neural Network |
Authors | Risi Kondor, Zhen Lin, Shubhendu Trivedi |
Abstract | Recent work by Cohen \emph{et al.} has achieved state-of-the-art results for learning spherical images in a rotation invariant way by using ideas from group representation theory and noncommutative harmonic analysis. In this paper we propose a generalization of this work that generally exhibits improved performace, but from an implementation point of view is actually simpler. An unusual feature of the proposed architecture is that it uses the Clebsch–Gordan transform as its only source of nonlinearity, thus avoiding repeated forward and backward Fourier transforms. The underlying ideas of the paper generalize to constructing neural networks that are invariant to the action of other compact groups. |
Tasks | |
Published | 2018-06-24 |
URL | http://arxiv.org/abs/1806.09231v2 |
http://arxiv.org/pdf/1806.09231v2.pdf | |
PWC | https://paperswithcode.com/paper/clebsch-gordan-nets-a-fully-fourier-space |
Repo | https://github.com/zlin7/CGNet |
Framework | pytorch |
Anomaly Detection using One-Class Neural Networks
Title | Anomaly Detection using One-Class Neural Networks |
Authors | Raghavendra Chalapathy, Aditya Krishna Menon, Sanjay Chawla |
Abstract | We propose a one-class neural network (OC-NN) model to detect anomalies in complex data sets. OC-NN combines the ability of deep networks to extract a progressively rich representation of data with the one-class objective of creating a tight envelope around normal data. The OC-NN approach breaks new ground for the following crucial reason: data representation in the hidden layer is driven by the OC-NN objective and is thus customized for anomaly detection. This is a departure from other approaches which use a hybrid approach of learning deep features using an autoencoder and then feeding the features into a separate anomaly detection method like one-class SVM (OC-SVM). The hybrid OC-SVM approach is sub-optimal because it is unable to influence representational learning in the hidden layers. A comprehensive set of experiments demonstrate that on complex data sets (like CIFAR and GTSRB), OC-NN performs on par with state-of-the-art methods and outperformed conventional shallow methods in some scenarios. |
Tasks | Anomaly Detection |
Published | 2018-02-18 |
URL | http://arxiv.org/abs/1802.06360v2 |
http://arxiv.org/pdf/1802.06360v2.pdf | |
PWC | https://paperswithcode.com/paper/anomaly-detection-using-one-class-neural |
Repo | https://github.com/raghavchalapathy/oc-nn |
Framework | tf |
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation
Title | Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation |
Authors | Yi Luo, Nima Mesgarani |
Abstract | Single-channel, speaker-independent speech separation methods have recently seen great progress. However, the accuracy, latency, and computational cost of such methods remain insufficient. The majority of the previous methods have formulated the separation problem through the time-frequency representation of the mixed signal, which has several drawbacks, including the decoupling of the phase and magnitude of the signal, the suboptimality of time-frequency representation for speech separation, and the long latency in calculating the spectrograms. To address these shortcomings, we propose a fully-convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time-domain speech separation. Conv-TasNet uses a linear encoder to generate a representation of the speech waveform optimized for separating individual speakers. Speaker separation is achieved by applying a set of weighting functions (masks) to the encoder output. The modified encoder representations are then inverted back to the waveforms using a linear decoder. The masks are found using a temporal convolutional network (TCN) consisting of stacked 1-D dilated convolutional blocks, which allows the network to model the long-term dependencies of the speech signal while maintaining a small model size. The proposed Conv-TasNet system significantly outperforms previous time-frequency masking methods in separating two- and three-speaker mixtures. Additionally, Conv-TasNet surpasses several ideal time-frequency magnitude masks in two-speaker speech separation as evaluated by both objective distortion measures and subjective quality assessment by human listeners. Finally, Conv-TasNet has a significantly smaller model size and a shorter minimum latency, making it a suitable solution for both offline and real-time speech separation applications. |
Tasks | Speaker Separation, Speech Separation |
Published | 2018-09-20 |
URL | https://arxiv.org/abs/1809.07454v3 |
https://arxiv.org/pdf/1809.07454v3.pdf | |
PWC | https://paperswithcode.com/paper/tasnet-surpassing-ideal-time-frequency |
Repo | https://github.com/kaituoxu/Conv-TasNet |
Framework | pytorch |