July 30, 2019

3172 words 15 mins read

Paper Group AWR 14

Evaluating Robustness of Neural Networks with Mixed Integer Programming. Deep Alignment Network: A convolutional neural network for robust face alignment. A Review on Deep Learning Techniques Applied to Semantic Segmentation. Splenomegaly Segmentation using Global Convolutional Kernels and Conditional Generative Adversarial Networks. Deep Residual …

Evaluating Robustness of Neural Networks with Mixed Integer Programming


Title	Evaluating Robustness of Neural Networks with Mixed Integer Programming
Authors	Vincent Tjeng, Kai Xiao, Russ Tedrake
Abstract	Neural networks have demonstrated considerable success on a wide variety of real-world problems. However, networks trained only to optimize for training accuracy can often be fooled by adversarial examples - slightly perturbed inputs that are misclassified with high confidence. Verification of networks enables us to gauge their vulnerability to such adversarial examples. We formulate verification of piecewise-linear neural networks as a mixed integer program. On a representative task of finding minimum adversarial distortions, our verifier is two to three orders of magnitude quicker than the state-of-the-art. We achieve this computational speedup via tight formulations for non-linearities, as well as a novel presolve algorithm that makes full use of all information available. The computational speedup allows us to verify properties on convolutional networks with an order of magnitude more ReLUs than networks previously verified by any complete verifier. In particular, we determine for the first time the exact adversarial accuracy of an MNIST classifier to perturbations with bounded $l_\infty$ norm $\epsilon=0.1$: for this classifier, we find an adversarial example for 4.38% of samples, and a certificate of robustness (to perturbations with bounded norm) for the remainder. Across all robust training procedures and network architectures considered, we are able to certify more samples than the state-of-the-art and find more adversarial examples than a strong first-order attack.
Tasks
Published	2017-11-20
URL	http://arxiv.org/abs/1711.07356v3
PDF	http://arxiv.org/pdf/1711.07356v3.pdf
PWC	https://paperswithcode.com/paper/evaluating-robustness-of-neural-networks-with
Repo	https://github.com/fra31/mmr-universal
Framework	pytorch

Deep Alignment Network: A convolutional neural network for robust face alignment


Title	Deep Alignment Network: A convolutional neural network for robust face alignment
Authors	Marek Kowalski, Jacek Naruniec, Tomasz Trzcinski
Abstract	In this paper, we propose Deep Alignment Network (DAN), a robust face alignment method based on a deep neural network architecture. DAN consists of multiple stages, where each stage improves the locations of the facial landmarks estimated by the previous stage. Our method uses entire face images at all stages, contrary to the recently proposed face alignment methods that rely on local patches. This is possible thanks to the use of landmark heatmaps which provide visual information about landmark locations estimated at the previous stages of the algorithm. The use of entire face images rather than patches allows DAN to handle face images with large variation in head pose and difficult initializations. An extensive evaluation on two publicly available datasets shows that DAN reduces the state-of-the-art failure rate by up to 70%. Our method has also been submitted for evaluation as part of the Menpo challenge.
Tasks	Face Alignment, Keypoint Detection, Robust Face Alignment
Published	2017-06-06
URL	http://arxiv.org/abs/1706.01789v2
PDF	http://arxiv.org/pdf/1706.01789v2.pdf
PWC	https://paperswithcode.com/paper/deep-alignment-network-a-convolutional-neural
Repo	https://github.com/MarekKowalski/DeepAlignmentNetwork
Framework	tf

A Review on Deep Learning Techniques Applied to Semantic Segmentation


Title	A Review on Deep Learning Techniques Applied to Semantic Segmentation
Authors	Alberto Garcia-Garcia, Sergio Orts-Escolano, Sergiu Oprea, Victor Villena-Martinez, Jose Garcia-Rodriguez
Abstract	Image semantic segmentation is more and more being of interest for computer vision and machine learning researchers. Many applications on the rise need accurate and efficient segmentation mechanisms: autonomous driving, indoor navigation, and even virtual or augmented reality systems to name a few. This demand coincides with the rise of deep learning approaches in almost every field or application target related to computer vision, including semantic segmentation or scene understanding. This paper provides a review on deep learning methods for semantic segmentation applied to various application areas. Firstly, we describe the terminology of this field as well as mandatory background concepts. Next, the main datasets and challenges are exposed to help researchers decide which are the ones that best suit their needs and their targets. Then, existing methods are reviewed, highlighting their contributions and their significance in the field. Finally, quantitative results are given for the described methods and the datasets in which they were evaluated, following up with a discussion of the results. At last, we point out a set of promising future works and draw our own conclusions about the state of the art of semantic segmentation using deep learning techniques.
Tasks	Autonomous Driving, Scene Understanding, Semantic Segmentation
Published	2017-04-22
URL	http://arxiv.org/abs/1704.06857v1
PDF	http://arxiv.org/pdf/1704.06857v1.pdf
PWC	https://paperswithcode.com/paper/a-review-on-deep-learning-techniques-applied
Repo	https://github.com/Lxrd-AJ/Advanced_ML
Framework	pytorch

Splenomegaly Segmentation using Global Convolutional Kernels and Conditional Generative Adversarial Networks


Title	Splenomegaly Segmentation using Global Convolutional Kernels and Conditional Generative Adversarial Networks
Authors	Yuankai Huo, Zhoubing Xu, Shunxing Bao, Camilo Bermudez, Andrew J. Plassard, Jiaqi Liu, Yuang Yao, Albert Assad, Richard G. Abramson, Bennett A. Landman
Abstract	Spleen volume estimation using automated image segmentation technique may be used to detect splenomegaly (abnormally enlarged spleen) on Magnetic Resonance Imaging (MRI) scans. In recent years, Deep Convolutional Neural Networks (DCNN) segmentation methods have demonstrated advantages for abdominal organ segmentation. However, variations in both size and shape of the spleen on MRI images may result in large false positive and false negative labeling when deploying DCNN based methods. In this paper, we propose the Splenomegaly Segmentation Network (SSNet) to address spatial variations when segmenting extraordinarily large spleens. SSNet was designed based on the framework of image-to-image conditional generative adversarial networks (cGAN). Specifically, the Global Convolutional Network (GCN) was used as the generator to reduce false negatives, while the Markovian discriminator (PatchGAN) was used to alleviate false positives. A cohort of clinically acquired 3D MRI scans (both T1 weighted and T2 weighted) from patients with splenomegaly were used to train and test the networks. The experimental results demonstrated that a mean Dice coefficient of 0.9260 and a median Dice coefficient of 0.9262 using SSNet on independently tested MRI volumes of patients with splenomegaly.
Tasks	Semantic Segmentation
Published	2017-12-02
URL	http://arxiv.org/abs/1712.00542v1
PDF	http://arxiv.org/pdf/1712.00542v1.pdf
PWC	https://paperswithcode.com/paper/splenomegaly-segmentation-using-global
Repo	https://github.com/MASILab/SSNet
Framework	caffe2

Deep Residual Learning for Instrument Segmentation in Robotic Surgery


Title	Deep Residual Learning for Instrument Segmentation in Robotic Surgery
Authors	Daniil Pakhomov, Vittal Premachandran, Max Allan, Mahdi Azizian, Nassir Navab
Abstract	Detection, tracking, and pose estimation of surgical instruments are crucial tasks for computer assistance during minimally invasive robotic surgery. In the majority of cases, the first step is the automatic segmentation of surgical tools. Prior work has focused on binary segmentation, where the objective is to label every pixel in an image as tool or background. We improve upon previous work in two major ways. First, we leverage recent techniques such as deep residual learning and dilated convolutions to advance binary-segmentation performance. Second, we extend the approach to multi-class segmentation, which lets us segment different parts of the tool, in addition to background. We demonstrate the performance of this method on the MICCAI Endoscopic Vision Challenge Robotic Instruments dataset.
Tasks	Pose Estimation
Published	2017-03-24
URL	http://arxiv.org/abs/1703.08580v1
PDF	http://arxiv.org/pdf/1703.08580v1.pdf
PWC	https://paperswithcode.com/paper/deep-residual-learning-for-instrument
Repo	https://github.com/warmspringwinds/tf-image-segmentation
Framework	tf

Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation


Title	Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation
Authors	Shikhar Sharma, Layla El Asri, Hannes Schulz, Jeremie Zumer
Abstract	Automated metrics such as BLEU are widely used in the machine translation literature. They have also been used recently in the dialogue community for evaluating dialogue response generation. However, previous work in dialogue response generation has shown that these metrics do not correlate strongly with human judgment in the non task-oriented dialogue setting. Task-oriented dialogue responses are expressed on narrower domains and exhibit lower diversity. It is thus reasonable to think that these automated metrics would correlate well with human judgment in the task-oriented setting where the generation task consists of translating dialogue acts into a sentence. We conduct an empirical study to confirm whether this is the case. Our findings indicate that these automated metrics have stronger correlation with human judgments in the task-oriented setting compared to what has been observed in the non task-oriented setting. We also observe that these metrics correlate even better for datasets which provide multiple ground truth reference sentences. In addition, we show that some of the currently available corpora for task-oriented language generation can be solved with simple models and advocate for more challenging datasets.
Tasks	Dialogue Generation, Machine Translation, Text Generation
Published	2017-06-29
URL	http://arxiv.org/abs/1706.09799v1
PDF	http://arxiv.org/pdf/1706.09799v1.pdf
PWC	https://paperswithcode.com/paper/relevance-of-unsupervised-metrics-in-task
Repo	https://github.com/Maluuba/nlg-eval
Framework	none

Latent Multi-task Architecture Learning


Title	Latent Multi-task Architecture Learning
Authors	Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, Anders Søgaard
Abstract	Multi-task learning (MTL) allows deep neural networks to learn from related tasks by sharing parameters with other networks. In practice, however, MTL involves searching an enormous space of possible parameter sharing architectures to find (a) the layers or subspaces that benefit from sharing, (b) the appropriate amount of sharing, and (c) the appropriate relative weights of the different task losses. Recent work has addressed each of the above problems in isolation. In this work we present an approach that learns a latent multi-task architecture that jointly addresses (a)–(c). We present experiments on synthetic data and data from OntoNotes 5.0, including four different tasks and seven different domains. Our extension consistently outperforms previous approaches to learning latent architectures for multi-task problems and achieves up to 15% average error reductions over common approaches to MTL.
Tasks	Multi-Task Learning
Published	2017-05-23
URL	http://arxiv.org/abs/1705.08142v3
PDF	http://arxiv.org/pdf/1705.08142v3.pdf
PWC	https://paperswithcode.com/paper/latent-multi-task-architecture-learning
Repo	https://github.com/sebastianruder/sluice-networks
Framework	none

OctNetFusion: Learning Depth Fusion from Data


Title	OctNetFusion: Learning Depth Fusion from Data
Authors	Gernot Riegler, Ali Osman Ulusoy, Horst Bischof, Andreas Geiger
Abstract	In this paper, we present a learning based approach to depth fusion, i.e., dense 3D reconstruction from multiple depth images. The most common approach to depth fusion is based on averaging truncated signed distance functions, which was originally proposed by Curless and Levoy in 1996. While this method is simple and provides great results, it is not able to reconstruct (partially) occluded surfaces and requires a large number frames to filter out sensor noise and outliers. Motivated by the availability of large 3D model repositories and recent advances in deep learning, we present a novel 3D CNN architecture that learns to predict an implicit surface representation from the input depth maps. Our learning based method significantly outperforms the traditional volumetric fusion approach in terms of noise reduction and outlier suppression. By learning the structure of real world 3D objects and scenes, our approach is further able to reconstruct occluded regions and to fill in gaps in the reconstruction. We demonstrate that our learning based approach outperforms both vanilla TSDF fusion as well as TV-L1 fusion on the task of volumetric fusion. Further, we demonstrate state-of-the-art 3D shape completion results.
Tasks	3D Reconstruction
Published	2017-04-04
URL	http://arxiv.org/abs/1704.01047v3
PDF	http://arxiv.org/pdf/1704.01047v3.pdf
PWC	https://paperswithcode.com/paper/octnetfusion-learning-depth-fusion-from-data
Repo	https://github.com/griegler/octnetfusion
Framework	none

VIGAN: Missing View Imputation with Generative Adversarial Networks


Title	VIGAN: Missing View Imputation with Generative Adversarial Networks
Authors	Chao Shang, Aaron Palmer, Jiangwen Sun, Ko-Shin Chen, Jin Lu, Jinbo Bi
Abstract	In an era when big data are becoming the norm, there is less concern with the quantity but more with the quality and completeness of the data. In many disciplines, data are collected from heterogeneous sources, resulting in multi-view or multi-modal datasets. The missing data problem has been challenging to address in multi-view data analysis. Especially, when certain samples miss an entire view of data, it creates the missing view problem. Classic multiple imputations or matrix completion methods are hardly effective here when no information can be based on in the specific view to impute data for such samples. The commonly-used simple method of removing samples with a missing view can dramatically reduce sample size, thus diminishing the statistical power of a subsequent analysis. In this paper, we propose a novel approach for view imputation via generative adversarial networks (GANs), which we name by VIGAN. This approach first treats each view as a separate domain and identifies domain-to-domain mappings via a GAN using randomly-sampled data from each view, and then employs a multi-modal denoising autoencoder (DAE) to reconstruct the missing view from the GAN outputs based on paired data across the views. Then, by optimizing the GAN and DAE jointly, our model enables the knowledge integration for domain mappings and view correspondences to effectively recover the missing view. Empirical results on benchmark datasets validate the VIGAN approach by comparing against the state of the art. The evaluation of VIGAN in a genetic study of substance use disorders further proves the effectiveness and usability of this approach in life science.
Tasks	Denoising, Imputation, Matrix Completion
Published	2017-08-22
URL	http://arxiv.org/abs/1708.06724v5
PDF	http://arxiv.org/pdf/1708.06724v5.pdf
PWC	https://paperswithcode.com/paper/vigan-missing-view-imputation-with-generative
Repo	https://github.com/chaoshangcs/VIGAN
Framework	pytorch

Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models


Title	Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models
Authors	Garrett B. Goh, Charles Siegel, Abhinav Vishnu, Nathan O. Hodas, Nathan Baker
Abstract	In the last few years, we have seen the transformative impact of deep learning in many applications, particularly in speech recognition and computer vision. Inspired by Google’s Inception-ResNet deep convolutional neural network (CNN) for image classification, we have developed “Chemception”, a deep CNN for the prediction of chemical properties, using just the images of 2D drawings of molecules. We develop Chemception without providing any additional explicit chemistry knowledge, such as basic concepts like periodicity, or advanced features like molecular descriptors and fingerprints. We then show how Chemception can serve as a general-purpose neural network architecture for predicting toxicity, activity, and solvation properties when trained on a modest database of 600 to 40,000 compounds. When compared to multi-layer perceptron (MLP) deep neural networks trained with ECFP fingerprints, Chemception slightly outperforms in activity and solvation prediction and slightly underperforms in toxicity prediction. Having matched the performance of expert-developed QSAR/QSPR deep learning models, our work demonstrates the plausibility of using deep neural networks to assist in computational chemistry research, where the feature engineering process is performed primarily by a deep learning algorithm.
Tasks	Feature Engineering, Image Classification, Speech Recognition
Published	2017-06-20
URL	http://arxiv.org/abs/1706.06689v1
PDF	http://arxiv.org/pdf/1706.06689v1.pdf
PWC	https://paperswithcode.com/paper/chemception-a-deep-neural-network-with
Repo	https://github.com/Bunseki2/DeepL
Framework	none

Jet Constituents for Deep Neural Network Based Top Quark Tagging


Title	Jet Constituents for Deep Neural Network Based Top Quark Tagging
Authors	Jannicke Pearkes, Wojciech Fedorko, Alison Lister, Colin Gay
Abstract	Recent literature on deep neural networks for tagging of highly energetic jets resulting from top quark decays has focused on image based techniques or multivariate approaches using high-level jet substructure variables. Here, a sequential approach to this task is taken by using an ordered sequence of jet constituents as training inputs. Unlike the majority of previous approaches, this strategy does not result in a loss of information during pixelisation or the calculation of high level features. The jet classification method achieves a background rejection of 45 at a 50% efficiency operating point for reconstruction level jets with transverse momentum range of 600 to 2500 GeV and is insensitive to multiple proton-proton interactions at the levels expected throughout Run 2 of the LHC.
Tasks
Published	2017-04-07
URL	http://arxiv.org/abs/1704.02124v2
PDF	http://arxiv.org/pdf/1704.02124v2.pdf
PWC	https://paperswithcode.com/paper/jet-constituents-for-deep-neural-network
Repo	https://github.com/jpearkes/topo_dnn
Framework	none

Randomized Nonnegative Matrix Factorization


Title	Randomized Nonnegative Matrix Factorization
Authors	N. Benjamin Erichson, Ariana Mendible, Sophie Wihlborn, J. Nathan Kutz
Abstract	Nonnegative matrix factorization (NMF) is a powerful tool for data mining. However, the emergence of `big data’ has severely challenged our ability to compute this fundamental decomposition using deterministic algorithms. This paper presents a randomized hierarchical alternating least squares (HALS) algorithm to compute the NMF. By deriving a smaller matrix from the nonnegative input data, a more efficient nonnegative decomposition can be computed. Our algorithm scales to big data applications while attaining a near-optimal factorization. The proposed algorithm is evaluated using synthetic and real world data and shows substantial speedups compared to deterministic HALS. \|
Tasks
Published	2017-11-06
URL	http://arxiv.org/abs/1711.02037v2
PDF	http://arxiv.org/pdf/1711.02037v2.pdf
PWC	https://paperswithcode.com/paper/randomized-nonnegative-matrix-factorization
Repo	https://github.com/erichson/ristretto
Framework	none

Global Relation Embedding for Relation Extraction


Title	Global Relation Embedding for Relation Extraction
Authors	Yu Su, Honglei Liu, Semih Yavuz, Izzeddin Gur, Huan Sun, Xifeng Yan
Abstract	We study the problem of textual relation embedding with distant supervision. To combat the wrong labeling problem of distant supervision, we propose to embed textual relations with global statistics of relations, i.e., the co-occurrence statistics of textual and knowledge base relations collected from the entire corpus. This approach turns out to be more robust to the training noise introduced by distant supervision. On a popular relation extraction dataset, we show that the learned textual relation embedding can be used to augment existing relation extraction models and significantly improve their performance. Most remarkably, for the top 1,000 relational facts discovered by the best existing model, the precision can be improved from 83.9% to 89.3%.
Tasks	Relation Extraction
Published	2017-04-19
URL	http://arxiv.org/abs/1704.05958v2
PDF	http://arxiv.org/pdf/1704.05958v2.pdf
PWC	https://paperswithcode.com/paper/global-relation-embedding-for-relation
Repo	https://github.com/ppuliu/GloRE
Framework	tf

Interpretable 3D Human Action Analysis with Temporal Convolutional Networks


Title	Interpretable 3D Human Action Analysis with Temporal Convolutional Networks
Authors	Tae Soo Kim, Austin Reiter
Abstract	The discriminative power of modern deep learning models for 3D human action recognition is growing ever so potent. In conjunction with the recent resurgence of 3D human action representation with 3D skeletons, the quality and the pace of recent progress have been significant. However, the inner workings of state-of-the-art learning based methods in 3D human action recognition still remain mostly black-box. In this work, we propose to use a new class of models known as Temporal Convolutional Neural Networks (TCN) for 3D human action recognition. Compared to popular LSTM-based Recurrent Neural Network models, given interpretable input such as 3D skeletons, TCN provides us a way to explicitly learn readily interpretable spatio-temporal representations for 3D human action recognition. We provide our strategy in re-designing the TCN with interpretability in mind and how such characteristics of the model is leveraged to construct a powerful 3D activity recognition method. Through this work, we wish to take a step towards a spatio-temporal model that is easier to understand, explain and interpret. The resulting model, Res-TCN, achieves state-of-the-art results on the largest 3D human action recognition dataset, NTU-RGBD.
Tasks	3D Human Action Recognition, Activity Recognition, Multimodal Activity Recognition, Skeleton Based Action Recognition, Temporal Action Localization
Published	2017-04-14
URL	http://arxiv.org/abs/1704.04516v1
PDF	http://arxiv.org/pdf/1704.04516v1.pdf
PWC	https://paperswithcode.com/paper/interpretable-3d-human-action-analysis-with
Repo	https://github.com/TaeSoo-Kim/TCNActionRecognition
Framework	none

Audio Super Resolution using Neural Networks


Title	Audio Super Resolution using Neural Networks
Authors	Volodymyr Kuleshov, S. Zayd Enam, Stefano Ermon
Abstract	We introduce a new audio processing technique that increases the sampling rate of signals such as speech or music using deep convolutional neural networks. Our model is trained on pairs of low and high-quality audio examples; at test-time, it predicts missing samples within a low-resolution signal in an interpolation process similar to image super-resolution. Our method is simple and does not involve specialized audio processing techniques; in our experiments, it outperforms baselines on standard speech and music benchmarks at upscaling ratios of 2x, 4x, and 6x. The method has practical applications in telephony, compression, and text-to-speech generation; it demonstrates the effectiveness of feed-forward convolutional architectures on an audio generation task.
Tasks	Audio Generation, Audio Super-Resolution, Super-Resolution
Published	2017-08-02
URL	http://arxiv.org/abs/1708.00853v1
PDF	http://arxiv.org/pdf/1708.00853v1.pdf
PWC	https://paperswithcode.com/paper/audio-super-resolution-using-neural-networks
Repo	https://github.com/Amuzak-NTL/ASR-for-Speech-Recog
Framework	tf