February 1, 2020

3098 words 15 mins read

Paper Group AWR 87

Time-to-Event Prediction with Neural Networks and Cox Regression. The FLoRes Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English. Semantic-Transferable Weakly-Supervised Endoscopic Lesions Segmentation. Theoretically Principled Trade-off between Robustness and Accuracy. Cascade R-CNN: High Quality Object Det …

Time-to-Event Prediction with Neural Networks and Cox Regression


Title	Time-to-Event Prediction with Neural Networks and Cox Regression
Authors	Håvard Kvamme, Ørnulf Borgan, Ida Scheel
Abstract	New methods for time-to-event prediction are proposed by extending the Cox proportional hazards model with neural networks. Building on methodology from nested case-control studies, we propose a loss function that scales well to large data sets, and enables fitting of both proportional and non-proportional extensions of the Cox model. Through simulation studies, the proposed loss function is verified to be a good approximation for the Cox partial log-likelihood. The proposed methodology is compared to existing methodologies on real-world data sets, and is found to be highly competitive, typically yielding the best performance in terms of Brier score and binomial log-likelihood. A python package for the proposed methods is available at https://github.com/havakv/pycox.
Tasks	Survival Analysis, Time-to-Event Prediction
Published	2019-07-01
URL	https://arxiv.org/abs/1907.00825v2
PDF	https://arxiv.org/pdf/1907.00825v2.pdf
PWC	https://paperswithcode.com/paper/time-to-event-prediction-with-neural-networks
Repo	https://github.com/havakv/pycox
Framework	pytorch

The FLoRes Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English


Title	The FLoRes Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English
Authors	Francisco Guzmán, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, Marc’Aurelio Ranzato
Abstract	For machine translation, a vast majority of language pairs in the world are considered low-resource because they have little parallel data available. Besides the technical challenges of learning with limited supervision, it is difficult to evaluate methods trained on low-resource language pairs because of the lack of freely and publicly available benchmarks. In this work, we introduce the FLoRes evaluation datasets for Nepali-English and Sinhala-English, based on sentences translated from Wikipedia. Compared to English, these are languages with very different morphology and syntax, for which little out-of-domain parallel data is available and for which relatively large amounts of monolingual data are freely available. We describe our process to collect and cross-check the quality of translations, and we report baseline performance using several learning settings: fully supervised, weakly supervised, semi-supervised, and fully unsupervised. Our experiments demonstrate that current state-of-the-art methods perform rather poorly on this benchmark, posing a challenge to the research community working on low-resource MT. Data and code to reproduce our experiments are available at https://github.com/facebookresearch/flores.
Tasks	Machine Translation
Published	2019-02-04
URL	https://arxiv.org/abs/1902.01382v3
PDF	https://arxiv.org/pdf/1902.01382v3.pdf
PWC	https://paperswithcode.com/paper/two-new-evaluation-datasets-for-low-resource
Repo	https://github.com/lknlp/lknlp.github.io
Framework	none

Semantic-Transferable Weakly-Supervised Endoscopic Lesions Segmentation


Title	Semantic-Transferable Weakly-Supervised Endoscopic Lesions Segmentation
Authors	Jiahua Dong, Yang Cong, Gan Sun, Dongdong Hou
Abstract	Weakly-supervised learning under image-level labels supervision has been widely applied to semantic segmentation of medical lesions regions. However, 1) most existing models rely on effective constraints to explore the internal representation of lesions, which only produces inaccurate and coarse lesions regions; 2) they ignore the strong probabilistic dependencies between target lesions dataset (e.g., enteroscopy images) and well-to-annotated source diseases dataset (e.g., gastroscope images). To better utilize these dependencies, we present a new semantic lesions representation transfer model for weakly-supervised endoscopic lesions segmentation, which can exploit useful knowledge from relevant fully-labeled diseases segmentation task to enhance the performance of target weakly-labeled lesions segmentation task. More specifically, a pseudo label generator is proposed to leverage seed information to generate highly-confident pseudo pixel labels by incorporating class balance and super-pixel spatial prior. It can iteratively include more hard-to-transfer samples from weakly-labeled target dataset into training set. Afterwards, dynamically searched feature centroids for same class among different datasets are aligned by accumulating previously-learned features. Meanwhile, adversarial learning is also employed in this paper, to narrow the gap between the lesions among different datasets in output space. Finally, we build a new medical endoscopic dataset with 3659 images collected from more than 1100 volunteers. Extensive experiments on our collected dataset and several benchmark datasets validate the effectiveness of our model.
Tasks	Semantic Segmentation
Published	2019-08-21
URL	https://arxiv.org/abs/1908.07669v2
PDF	https://arxiv.org/pdf/1908.07669v2.pdf
PWC	https://paperswithcode.com/paper/semantic-transferable-weakly-supervised
Repo	https://github.com/JiahuaDong/ICCV2019Publication-Semantic-Transferable-Weakly-Supervised-Endoscopic-Lesions-Segmentation
Framework	pytorch

Theoretically Principled Trade-off between Robustness and Accuracy


Title	Theoretically Principled Trade-off between Robustness and Accuracy
Authors	Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, Michael I. Jordan
Abstract	We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples. Although this problem has been widely studied empirically, much remains unknown concerning the theory underlying this trade-off. In this work, we decompose the prediction error for adversarial examples (robust error) as the sum of the natural (classification) error and boundary error, and provide a differentiable upper bound using the theory of classification-calibrated loss, which is shown to be the tightest possible upper bound uniform over all probability distributions and measurable predictors. Inspired by our theoretical analysis, we also design a new defense method, TRADES, to trade adversarial robustness off against accuracy. Our proposed algorithm performs well experimentally in real-world datasets. The methodology is the foundation of our entry to the NeurIPS 2018 Adversarial Vision Challenge in which we won the 1st place out of ~2,000 submissions, surpassing the runner-up approach by $11.41%$ in terms of mean $\ell_2$ perturbation distance.
Tasks	Adversarial Attack, Adversarial Defense
Published	2019-01-24
URL	https://arxiv.org/abs/1901.08573v3
PDF	https://arxiv.org/pdf/1901.08573v3.pdf
PWC	https://paperswithcode.com/paper/theoretically-principled-trade-off-between
Repo	https://github.com/goldblum/AdversariallyRobustDistillation
Framework	pytorch

Cascade R-CNN: High Quality Object Detection and Instance Segmentation


Title	Cascade R-CNN: High Quality Object Detection and Instance Segmentation
Authors	Zhaowei Cai, Nuno Vasconcelos
Abstract	In object detection, the intersection over union (IoU) threshold is frequently used to define positives/negatives. The threshold used to train a detector defines its \textit{quality}. While the commonly used threshold of 0.5 leads to noisy (low-quality) detections, detection performance frequently degrades for larger thresholds. This paradox of high-quality detection has two causes: 1) overfitting, due to vanishing positive samples for large thresholds, and 2) inference-time quality mismatch between detector and test hypotheses. A multi-stage object detection architecture, the Cascade R-CNN, composed of a sequence of detectors trained with increasing IoU thresholds, is proposed to address these problems. The detectors are trained sequentially, using the output of a detector as training set for the next. This resampling progressively improves hypotheses quality, guaranteeing a positive training set of equivalent size for all detectors and minimizing overfitting. The same cascade is applied at inference, to eliminate quality mismatches between hypotheses and detectors. An implementation of the Cascade R-CNN without bells or whistles achieves state-of-the-art performance on the COCO dataset, and significantly improves high-quality detection on generic and specific object detection datasets, including VOC, KITTI, CityPerson, and WiderFace. Finally, the Cascade R-CNN is generalized to instance segmentation, with nontrivial improvements over the Mask R-CNN. To facilitate future research, two implementations are made available at \url{https://github.com/zhaoweicai/cascade-rcnn} (Caffe) and \url{https://github.com/zhaoweicai/Detectron-Cascade-RCNN} (Detectron).
Tasks	Instance Segmentation, Object Detection, Semantic Segmentation
Published	2019-06-24
URL	https://arxiv.org/abs/1906.09756v1
PDF	https://arxiv.org/pdf/1906.09756v1.pdf
PWC	https://paperswithcode.com/paper/cascade-r-cnn-high-quality-object-detection
Repo	https://github.com/zhaoweicai/cascade-rcnn
Framework	tf

Adversarial Generation of Time-Frequency Features with application in audio synthesis


Title	Adversarial Generation of Time-Frequency Features with application in audio synthesis
Authors	Andrés Marafioti, Nicki Holighaus, Nathanaël Perraudin, Piotr Majdak
Abstract	Time-frequency (TF) representations provide powerful and intuitive features for the analysis of time series such as audio. But still, generative modeling of audio in the TF domain is a subtle matter. Consequently, neural audio synthesis widely relies on directly modeling the waveform and previous attempts at unconditionally synthesizing audio from neurally generated invertible TF features still struggle to produce audio at satisfying quality. In this article, focusing on the short-time Fourier transform, we discuss the challenges that arise in audio synthesis based on generated invertible TF features and how to overcome them. We demonstrate the potential of deliberate generative TF modeling by training a generative adversarial network (GAN) on short-time Fourier features. We show that by applying our guidelines, our TF-based network was able to outperform a state-of-the-art GAN generating waveforms directly, despite the similar architecture in the two networks.
Tasks	Time Series
Published	2019-02-11
URL	https://arxiv.org/abs/1902.04072v2
PDF	https://arxiv.org/pdf/1902.04072v2.pdf
PWC	https://paperswithcode.com/paper/adversarial-generation-of-time-frequency
Repo	https://github.com/MathewStylianidis/TiFGAN-tensorflow2
Framework	tf

CondConv: Conditionally Parameterized Convolutions for Efficient Inference


Title	CondConv: Conditionally Parameterized Convolutions for Efficient Inference
Authors	Brandon Yang, Gabriel Bender, Quoc V. Le, Jiquan Ngiam
Abstract	Convolutional layers are one of the basic building blocks of modern deep neural networks. One fundamental assumption is that convolutional kernels should be shared for all examples in a dataset. We propose conditionally parameterized convolutions (CondConv), which learn specialized convolutional kernels for each example. Replacing normal convolutions with CondConv enables us to increase the size and capacity of a network, while maintaining efficient inference. We demonstrate that scaling networks with CondConv improves the performance and inference cost trade-off of several existing convolutional neural network architectures on both classification and detection tasks. On ImageNet classification, our CondConv approach applied to EfficientNet-B0 achieves state-of-the-art performance of 78.3% accuracy with only 413M multiply-adds. Code and checkpoints for the CondConv Tensorflow layer and CondConv-EfficientNet models are available at: https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/condconv.
Tasks	Image Classification, Object Detection
Published	2019-04-10
URL	https://arxiv.org/abs/1904.04971v2
PDF	https://arxiv.org/pdf/1904.04971v2.pdf
PWC	https://paperswithcode.com/paper/soft-conditional-computation
Repo	https://github.com/hangg7/deformable-kernels
Framework	pytorch

Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries


Title	Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries
Authors	Fuwen Tan, Paola Cascante-Bonilla, Xiaoxiao Guo, Hui Wu, Song Feng, Vicente Ordonez
Abstract	This paper explores the task of interactive image retrieval using natural language queries, where a user progressively provides input queries to refine a set of retrieval results. Moreover, our work explores this problem in the context of complex image scenes containing multiple objects. We propose Drill-down, an effective framework for encoding multiple queries with an efficient compact state representation that significantly extends current methods for single-round image retrieval. We show that using multiple rounds of natural language queries as input can be surprisingly effective to find arbitrarily specific images of complex scenes. Furthermore, we find that existing image datasets with textual captions can provide a surprisingly effective form of weak supervision for this task. We compare our method with existing sequential encoding and embedding networks, demonstrating superior performance on two proposed benchmarks: automatic image retrieval on a simulated scenario that uses region captions as queries, and interactive image retrieval using real queries from human evaluators.
Tasks	Image Retrieval
Published	2019-11-10
URL	https://arxiv.org/abs/1911.03826v1
PDF	https://arxiv.org/pdf/1911.03826v1.pdf
PWC	https://paperswithcode.com/paper/drill-down-interactive-retrieval-of-complex
Repo	https://github.com/uvavision/DrillDown
Framework	pytorch

Classification via local manifold approximation


Title	Classification via local manifold approximation
Authors	Didong Li, David B Dunson
Abstract	Classifiers label data as belonging to one of a set of groups based on input features. It is challenging to obtain accurate classification performance when the feature distributions in the different classes are complex, with nonlinear, overlapping and intersecting supports. This is particularly true when training data are limited. To address this problem, this article proposes a new type of classifier based on obtaining a local approximation to the support of the data within each class in a neighborhood of the feature to be classified, and assigning the feature to the class having the closest support. This general algorithm is referred to as LOcal Manifold Approximation (LOMA) classification. As a simple and theoretically supported special case having excellent performance in a broad variety of examples, we use spheres for local approximation, obtaining a SPherical Approximation (SPA) classifier. We illustrate substantial gains for SPA over competitors on a variety of challenging simulated and real data examples.
Tasks
Published	2019-03-03
URL	http://arxiv.org/abs/1903.00985v1
PDF	http://arxiv.org/pdf/1903.00985v1.pdf
PWC	https://paperswithcode.com/paper/classification-via-local-manifold
Repo	https://github.com/david-dunson/SPAclassifier
Framework	none

LiDAR Sensor modeling and Data augmentation with GANs for Autonomous driving


Title	LiDAR Sensor modeling and Data augmentation with GANs for Autonomous driving
Authors	Ahmad El Sallab, Ibrahim Sobh, Mohamed Zahran, Nader Essam
Abstract	In the autonomous driving domain, data collection and annotation from real vehicles are expensive and sometimes unsafe. Simulators are often used for data augmentation, which requires realistic sensor models that are hard to formulate and model in closed forms. Instead, sensors models can be learned from real data. The main challenge is the absence of paired data set, which makes traditional supervised learning techniques not suitable. In this work, we formulate the problem as image translation from unpaired data and employ CycleGANs to solve the sensor modeling problem for LiDAR, to produce realistic LiDAR from simulated LiDAR (sim2real). Further, we generate high-resolution, realistic LiDAR from lower resolution one (real2real). The LiDAR 3D point cloud is processed in Bird-eye View and Polar 2D representations. The experimental results show a high potential of the proposed approach.
Tasks	Autonomous Driving, Data Augmentation, Image-to-Image Translation, Point Cloud Generation, Sensor Modeling, Unsupervised Image-To-Image Translation
Published	2019-05-17
URL	https://arxiv.org/abs/1905.07290v1
PDF	https://arxiv.org/pdf/1905.07290v1.pdf
PWC	https://paperswithcode.com/paper/lidar-sensor-modeling-and-data-augmentation
Repo	https://github.com/ahmadelsallab/lidargan
Framework	none

Covariance Matrix Adaptation Greedy Search Applied to Water Distribution System Optimization


Title	Covariance Matrix Adaptation Greedy Search Applied to Water Distribution System Optimization
Authors	Mehdi Neshat, Bradley Alexander, Angus Simpson
Abstract	Water distribution system design is a challenging optimisation problem with a high number of search dimensions and constraints. In this way, Evolutionary Algorithms (EAs) have been widely applied to optimise WDS to minimise cost subject whilst meeting pressure constraints. This paper proposes a new hybrid evolutionary framework that consists of three distinct phases. The first phase applied CMA-ES, a robust adaptive meta-heuristic for continuous optimisation. This is followed by an upward-greedy search phase to remove pressure violations. Finally, a downward greedy search phase is used to reduce oversized pipes. To assess the effectiveness of the hybrid method, it was applied to five well-known WDSs case studies. The results reveal that the new framework outperforms CMA-ES by itself and other previously applied heuristics on most benchmarks in terms of both optimisation speed and network cost.
Tasks
Published	2019-09-11
URL	https://arxiv.org/abs/1909.04846v1
PDF	https://arxiv.org/pdf/1909.04846v1.pdf
PWC	https://paperswithcode.com/paper/covariance-matrix-adaptation-greedy-search
Repo	https://github.com/a1708192/WDSOP
Framework	none

Characterizing and evaluating adversarial examples for Offline Handwritten Signature Verification


Title	Characterizing and evaluating adversarial examples for Offline Handwritten Signature Verification
Authors	Luiz G. Hafemann, Robert Sabourin, Luiz S. Oliveira
Abstract	The phenomenon of Adversarial Examples is attracting increasing interest from the Machine Learning community, due to its significant impact to the security of Machine Learning systems. Adversarial examples are similar (from a perceptual notion of similarity) to samples from the data distribution, that “fool” a machine learning classifier. For computer vision applications, these are images with carefully crafted but almost imperceptible changes, that are misclassified. In this work, we characterize this phenomenon under an existing taxonomy of threats to biometric systems, in particular identifying new attacks for Offline Handwritten Signature Verification systems. We conducted an extensive set of experiments on four widely used datasets: MCYT-75, CEDAR, GPDS-160 and the Brazilian PUC-PR, considering both a CNN-based system and a system using a handcrafted feature extractor (CLBP). We found that attacks that aim to get a genuine signature rejected are easy to generate, even in a limited knowledge scenario, where the attacker does not have access to the trained classifier nor the signatures used for training. Attacks that get a forgery to be accepted are harder to produce, and often require a higher level of noise - in most cases, no longer “imperceptible” as previous findings in object recognition. We also evaluated the impact of two countermeasures on the success rate of the attacks and the amount of noise required for generating successful attacks.
Tasks	Object Recognition
Published	2019-01-10
URL	http://arxiv.org/abs/1901.03398v1
PDF	http://arxiv.org/pdf/1901.03398v1.pdf
PWC	https://paperswithcode.com/paper/characterizing-and-evaluating-adversarial
Repo	https://github.com/luizgh/sigver
Framework	pytorch

Learning Visual Actions Using Multiple Verb-Only Labels


Title	Learning Visual Actions Using Multiple Verb-Only Labels
Authors	Michael Wray, Dima Damen
Abstract	This work introduces verb-only representations for both recognition and retrieval of visual actions, in video. Current methods neglect legitimate semantic ambiguities between verbs, instead choosing unambiguous subsets of verbs along with objects to disambiguate the actions. We instead propose multiple verb-only labels, which we learn through hard or soft assignment as a regression. This enables learning a much larger vocabulary of verbs, including contextual overlaps of these verbs. We collect multi-verb annotations for three action video datasets and evaluate the verb-only labelling representations for action recognition and cross-modal retrieval (video-to-text and text-to-video). We demonstrate that multi-label verb-only representations outperform conventional single verb labels. We also explore other benefits of a multi-verb representation including cross-dataset retrieval and verb type manner and result verb types) retrieval.
Tasks	Cross-Modal Retrieval
Published	2019-07-25
URL	https://arxiv.org/abs/1907.11117v2
PDF	https://arxiv.org/pdf/1907.11117v2.pdf
PWC	https://paperswithcode.com/paper/learning-visual-actions-using-multiple-verb
Repo	https://github.com/mwray/Multi-Verb-Labels
Framework	none

Musical Instrument Classification via Low-Dimensional Feature Vectors


Title	Musical Instrument Classification via Low-Dimensional Feature Vectors
Authors	Zishuo Zhao, Haoyun Wang
Abstract	Music is a mysterious language that conveys feeling and thoughts via different tones and timbre. For better understanding of timbre in music, we chose music data of 6 representative instruments, analysed their timbre features and classified them. Instead of the current trend of Neural Network for black-box classification, our project is based on a combination of MFCC and LPC, and augmented with a 6-dimensional feature vector designed by ourselves from observation and attempts. In our white-box model, we observed significant patterns of sound that distinguish different timbres, and discovered some connection between objective data and subjective senses. With a totally 32-dimensional feature vector and a naive all-pairs SVM, we achieved improved classification accuracy compared to a single tool. We also attempted to analyze music pieces downloaded from the Internet, found out different performance on different instruments, explored the reasons and suggested possible ways to improve the performance.
Tasks
Published	2019-09-16
URL	https://arxiv.org/abs/1909.08444v1
PDF	https://arxiv.org/pdf/1909.08444v1.pdf
PWC	https://paperswithcode.com/paper/musical-instrument-classification-via-low
Repo	https://github.com/wiku30/SST
Framework	none

Arithmetic addition of two integers by deep image classification networks: experiments to quantify their autonomous reasoning ability


Title	Arithmetic addition of two integers by deep image classification networks: experiments to quantify their autonomous reasoning ability
Authors	Shuaicheng Liu, Zehao Zhang, Kai Song, Bing Zeng
Abstract	The unprecedented performance achieved by deep convolutional neural networks for image classification is linked primarily to their ability of capturing rich structural features at various layers within networks. Here we design a series of experiments, inspired by children’s learning of the arithmetic addition of two integers, to showcase that such deep networks can go beyond the structural features to learn deeper knowledge. In our experiments, a set of images is constructed, each image containing an arithmetic addition $n+m$ in its central area, and several classification networks are then trained over a subset of images, using the sum as the label. Tests on the excluded images show that, as the image set gets larger, the networks have well learnt the law of arithmetic additions so as to build up their autonomous reasoning ability strongly. For instance, networks trained over a small percentage of images can classify a big majority of the remaining images correctly, and many arithmetic additions involving some integers that have never been seen during the training can also be solved correctly by the trained networks.
Tasks	Image Classification
Published	2019-12-10
URL	https://arxiv.org/abs/1912.04518v1
PDF	https://arxiv.org/pdf/1912.04518v1.pdf
PWC	https://paperswithcode.com/paper/arithmetic-addition-of-two-integers-by-deep
Repo	https://github.com/kaileysong/arithadd
Framework	pytorch