Paper Group AWR 87
Time-to-Event Prediction with Neural Networks and Cox Regression. The FLoRes Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English. Semantic-Transferable Weakly-Supervised Endoscopic Lesions Segmentation. Theoretically Principled Trade-off between Robustness and Accuracy. Cascade R-CNN: High Quality Object Det …
Time-to-Event Prediction with Neural Networks and Cox Regression
Title | Time-to-Event Prediction with Neural Networks and Cox Regression |
Authors | Håvard Kvamme, Ørnulf Borgan, Ida Scheel |
Abstract | New methods for time-to-event prediction are proposed by extending the Cox proportional hazards model with neural networks. Building on methodology from nested case-control studies, we propose a loss function that scales well to large data sets, and enables fitting of both proportional and non-proportional extensions of the Cox model. Through simulation studies, the proposed loss function is verified to be a good approximation for the Cox partial log-likelihood. The proposed methodology is compared to existing methodologies on real-world data sets, and is found to be highly competitive, typically yielding the best performance in terms of Brier score and binomial log-likelihood. A python package for the proposed methods is available at https://github.com/havakv/pycox. |
Tasks | Survival Analysis, Time-to-Event Prediction |
Published | 2019-07-01 |
URL | https://arxiv.org/abs/1907.00825v2 |
https://arxiv.org/pdf/1907.00825v2.pdf | |
PWC | https://paperswithcode.com/paper/time-to-event-prediction-with-neural-networks |
Repo | https://github.com/havakv/pycox |
Framework | pytorch |
The FLoRes Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English
Title | The FLoRes Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English |
Authors | Francisco Guzmán, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, Marc’Aurelio Ranzato |
Abstract | For machine translation, a vast majority of language pairs in the world are considered low-resource because they have little parallel data available. Besides the technical challenges of learning with limited supervision, it is difficult to evaluate methods trained on low-resource language pairs because of the lack of freely and publicly available benchmarks. In this work, we introduce the FLoRes evaluation datasets for Nepali-English and Sinhala-English, based on sentences translated from Wikipedia. Compared to English, these are languages with very different morphology and syntax, for which little out-of-domain parallel data is available and for which relatively large amounts of monolingual data are freely available. We describe our process to collect and cross-check the quality of translations, and we report baseline performance using several learning settings: fully supervised, weakly supervised, semi-supervised, and fully unsupervised. Our experiments demonstrate that current state-of-the-art methods perform rather poorly on this benchmark, posing a challenge to the research community working on low-resource MT. Data and code to reproduce our experiments are available at https://github.com/facebookresearch/flores. |
Tasks | Machine Translation |
Published | 2019-02-04 |
URL | https://arxiv.org/abs/1902.01382v3 |
https://arxiv.org/pdf/1902.01382v3.pdf | |
PWC | https://paperswithcode.com/paper/two-new-evaluation-datasets-for-low-resource |
Repo | https://github.com/lknlp/lknlp.github.io |
Framework | none |
Semantic-Transferable Weakly-Supervised Endoscopic Lesions Segmentation
Title | Semantic-Transferable Weakly-Supervised Endoscopic Lesions Segmentation |
Authors | Jiahua Dong, Yang Cong, Gan Sun, Dongdong Hou |
Abstract | Weakly-supervised learning under image-level labels supervision has been widely applied to semantic segmentation of medical lesions regions. However, 1) most existing models rely on effective constraints to explore the internal representation of lesions, which only produces inaccurate and coarse lesions regions; 2) they ignore the strong probabilistic dependencies between target lesions dataset (e.g., enteroscopy images) and well-to-annotated source diseases dataset (e.g., gastroscope images). To better utilize these dependencies, we present a new semantic lesions representation transfer model for weakly-supervised endoscopic lesions segmentation, which can exploit useful knowledge from relevant fully-labeled diseases segmentation task to enhance the performance of target weakly-labeled lesions segmentation task. More specifically, a pseudo label generator is proposed to leverage seed information to generate highly-confident pseudo pixel labels by incorporating class balance and super-pixel spatial prior. It can iteratively include more hard-to-transfer samples from weakly-labeled target dataset into training set. Afterwards, dynamically searched feature centroids for same class among different datasets are aligned by accumulating previously-learned features. Meanwhile, adversarial learning is also employed in this paper, to narrow the gap between the lesions among different datasets in output space. Finally, we build a new medical endoscopic dataset with 3659 images collected from more than 1100 volunteers. Extensive experiments on our collected dataset and several benchmark datasets validate the effectiveness of our model. |
Tasks | Semantic Segmentation |
Published | 2019-08-21 |
URL | https://arxiv.org/abs/1908.07669v2 |
https://arxiv.org/pdf/1908.07669v2.pdf | |
PWC | https://paperswithcode.com/paper/semantic-transferable-weakly-supervised |
Repo | https://github.com/JiahuaDong/ICCV2019Publication-Semantic-Transferable-Weakly-Supervised-Endoscopic-Lesions-Segmentation |
Framework | pytorch |
Theoretically Principled Trade-off between Robustness and Accuracy
Title | Theoretically Principled Trade-off between Robustness and Accuracy |
Authors | Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, Michael I. Jordan |
Abstract | We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples. Although this problem has been widely studied empirically, much remains unknown concerning the theory underlying this trade-off. In this work, we decompose the prediction error for adversarial examples (robust error) as the sum of the natural (classification) error and boundary error, and provide a differentiable upper bound using the theory of classification-calibrated loss, which is shown to be the tightest possible upper bound uniform over all probability distributions and measurable predictors. Inspired by our theoretical analysis, we also design a new defense method, TRADES, to trade adversarial robustness off against accuracy. Our proposed algorithm performs well experimentally in real-world datasets. The methodology is the foundation of our entry to the NeurIPS 2018 Adversarial Vision Challenge in which we won the 1st place out of ~2,000 submissions, surpassing the runner-up approach by $11.41%$ in terms of mean $\ell_2$ perturbation distance. |
Tasks | Adversarial Attack, Adversarial Defense |
Published | 2019-01-24 |
URL | https://arxiv.org/abs/1901.08573v3 |
https://arxiv.org/pdf/1901.08573v3.pdf | |
PWC | https://paperswithcode.com/paper/theoretically-principled-trade-off-between |
Repo | https://github.com/goldblum/AdversariallyRobustDistillation |
Framework | pytorch |
Cascade R-CNN: High Quality Object Detection and Instance Segmentation
Title | Cascade R-CNN: High Quality Object Detection and Instance Segmentation |
Authors | Zhaowei Cai, Nuno Vasconcelos |
Abstract | In object detection, the intersection over union (IoU) threshold is frequently used to define positives/negatives. The threshold used to train a detector defines its \textit{quality}. While the commonly used threshold of 0.5 leads to noisy (low-quality) detections, detection performance frequently degrades for larger thresholds. This paradox of high-quality detection has two causes: 1) overfitting, due to vanishing positive samples for large thresholds, and 2) inference-time quality mismatch between detector and test hypotheses. A multi-stage object detection architecture, the Cascade R-CNN, composed of a sequence of detectors trained with increasing IoU thresholds, is proposed to address these problems. The detectors are trained sequentially, using the output of a detector as training set for the next. This resampling progressively improves hypotheses quality, guaranteeing a positive training set of equivalent size for all detectors and minimizing overfitting. The same cascade is applied at inference, to eliminate quality mismatches between hypotheses and detectors. An implementation of the Cascade R-CNN without bells or whistles achieves state-of-the-art performance on the COCO dataset, and significantly improves high-quality detection on generic and specific object detection datasets, including VOC, KITTI, CityPerson, and WiderFace. Finally, the Cascade R-CNN is generalized to instance segmentation, with nontrivial improvements over the Mask R-CNN. To facilitate future research, two implementations are made available at \url{https://github.com/zhaoweicai/cascade-rcnn} (Caffe) and \url{https://github.com/zhaoweicai/Detectron-Cascade-RCNN} (Detectron). |
Tasks | Instance Segmentation, Object Detection, Semantic Segmentation |
Published | 2019-06-24 |
URL | https://arxiv.org/abs/1906.09756v1 |
https://arxiv.org/pdf/1906.09756v1.pdf | |
PWC | https://paperswithcode.com/paper/cascade-r-cnn-high-quality-object-detection |
Repo | https://github.com/zhaoweicai/cascade-rcnn |
Framework | tf |
Adversarial Generation of Time-Frequency Features with application in audio synthesis
Title | Adversarial Generation of Time-Frequency Features with application in audio synthesis |
Authors | Andrés Marafioti, Nicki Holighaus, Nathanaël Perraudin, Piotr Majdak |
Abstract | Time-frequency (TF) representations provide powerful and intuitive features for the analysis of time series such as audio. But still, generative modeling of audio in the TF domain is a subtle matter. Consequently, neural audio synthesis widely relies on directly modeling the waveform and previous attempts at unconditionally synthesizing audio from neurally generated invertible TF features still struggle to produce audio at satisfying quality. In this article, focusing on the short-time Fourier transform, we discuss the challenges that arise in audio synthesis based on generated invertible TF features and how to overcome them. We demonstrate the potential of deliberate generative TF modeling by training a generative adversarial network (GAN) on short-time Fourier features. We show that by applying our guidelines, our TF-based network was able to outperform a state-of-the-art GAN generating waveforms directly, despite the similar architecture in the two networks. |
Tasks | Time Series |
Published | 2019-02-11 |
URL | https://arxiv.org/abs/1902.04072v2 |
https://arxiv.org/pdf/1902.04072v2.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-generation-of-time-frequency |
Repo | https://github.com/MathewStylianidis/TiFGAN-tensorflow2 |
Framework | tf |
CondConv: Conditionally Parameterized Convolutions for Efficient Inference
Title | CondConv: Conditionally Parameterized Convolutions for Efficient Inference |
Authors | Brandon Yang, Gabriel Bender, Quoc V. Le, Jiquan Ngiam |
Abstract | Convolutional layers are one of the basic building blocks of modern deep neural networks. One fundamental assumption is that convolutional kernels should be shared for all examples in a dataset. We propose conditionally parameterized convolutions (CondConv), which learn specialized convolutional kernels for each example. Replacing normal convolutions with CondConv enables us to increase the size and capacity of a network, while maintaining efficient inference. We demonstrate that scaling networks with CondConv improves the performance and inference cost trade-off of several existing convolutional neural network architectures on both classification and detection tasks. On ImageNet classification, our CondConv approach applied to EfficientNet-B0 achieves state-of-the-art performance of 78.3% accuracy with only 413M multiply-adds. Code and checkpoints for the CondConv Tensorflow layer and CondConv-EfficientNet models are available at: https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/condconv. |
Tasks | Image Classification, Object Detection |
Published | 2019-04-10 |
URL | https://arxiv.org/abs/1904.04971v2 |
https://arxiv.org/pdf/1904.04971v2.pdf | |
PWC | https://paperswithcode.com/paper/soft-conditional-computation |
Repo | https://github.com/hangg7/deformable-kernels |
Framework | pytorch |
Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries
Title | Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries |
Authors | Fuwen Tan, Paola Cascante-Bonilla, Xiaoxiao Guo, Hui Wu, Song Feng, Vicente Ordonez |
Abstract | This paper explores the task of interactive image retrieval using natural language queries, where a user progressively provides input queries to refine a set of retrieval results. Moreover, our work explores this problem in the context of complex image scenes containing multiple objects. We propose Drill-down, an effective framework for encoding multiple queries with an efficient compact state representation that significantly extends current methods for single-round image retrieval. We show that using multiple rounds of natural language queries as input can be surprisingly effective to find arbitrarily specific images of complex scenes. Furthermore, we find that existing image datasets with textual captions can provide a surprisingly effective form of weak supervision for this task. We compare our method with existing sequential encoding and embedding networks, demonstrating superior performance on two proposed benchmarks: automatic image retrieval on a simulated scenario that uses region captions as queries, and interactive image retrieval using real queries from human evaluators. |
Tasks | Image Retrieval |
Published | 2019-11-10 |
URL | https://arxiv.org/abs/1911.03826v1 |
https://arxiv.org/pdf/1911.03826v1.pdf | |
PWC | https://paperswithcode.com/paper/drill-down-interactive-retrieval-of-complex |
Repo | https://github.com/uvavision/DrillDown |
Framework | pytorch |
Classification via local manifold approximation
Title | Classification via local manifold approximation |
Authors | Didong Li, David B Dunson |
Abstract | Classifiers label data as belonging to one of a set of groups based on input features. It is challenging to obtain accurate classification performance when the feature distributions in the different classes are complex, with nonlinear, overlapping and intersecting supports. This is particularly true when training data are limited. To address this problem, this article proposes a new type of classifier based on obtaining a local approximation to the support of the data within each class in a neighborhood of the feature to be classified, and assigning the feature to the class having the closest support. This general algorithm is referred to as LOcal Manifold Approximation (LOMA) classification. As a simple and theoretically supported special case having excellent performance in a broad variety of examples, we use spheres for local approximation, obtaining a SPherical Approximation (SPA) classifier. We illustrate substantial gains for SPA over competitors on a variety of challenging simulated and real data examples. |
Tasks | |
Published | 2019-03-03 |
URL | http://arxiv.org/abs/1903.00985v1 |
http://arxiv.org/pdf/1903.00985v1.pdf | |
PWC | https://paperswithcode.com/paper/classification-via-local-manifold |
Repo | https://github.com/david-dunson/SPAclassifier |
Framework | none |
LiDAR Sensor modeling and Data augmentation with GANs for Autonomous driving
Title | LiDAR Sensor modeling and Data augmentation with GANs for Autonomous driving |
Authors | Ahmad El Sallab, Ibrahim Sobh, Mohamed Zahran, Nader Essam |
Abstract | In the autonomous driving domain, data collection and annotation from real vehicles are expensive and sometimes unsafe. Simulators are often used for data augmentation, which requires realistic sensor models that are hard to formulate and model in closed forms. Instead, sensors models can be learned from real data. The main challenge is the absence of paired data set, which makes traditional supervised learning techniques not suitable. In this work, we formulate the problem as image translation from unpaired data and employ CycleGANs to solve the sensor modeling problem for LiDAR, to produce realistic LiDAR from simulated LiDAR (sim2real). Further, we generate high-resolution, realistic LiDAR from lower resolution one (real2real). The LiDAR 3D point cloud is processed in Bird-eye View and Polar 2D representations. The experimental results show a high potential of the proposed approach. |
Tasks | Autonomous Driving, Data Augmentation, Image-to-Image Translation, Point Cloud Generation, Sensor Modeling, Unsupervised Image-To-Image Translation |
Published | 2019-05-17 |
URL | https://arxiv.org/abs/1905.07290v1 |
https://arxiv.org/pdf/1905.07290v1.pdf | |
PWC | https://paperswithcode.com/paper/lidar-sensor-modeling-and-data-augmentation |
Repo | https://github.com/ahmadelsallab/lidargan |
Framework | none |
Covariance Matrix Adaptation Greedy Search Applied to Water Distribution System Optimization
Title | Covariance Matrix Adaptation Greedy Search Applied to Water Distribution System Optimization |
Authors | Mehdi Neshat, Bradley Alexander, Angus Simpson |
Abstract | Water distribution system design is a challenging optimisation problem with a high number of search dimensions and constraints. In this way, Evolutionary Algorithms (EAs) have been widely applied to optimise WDS to minimise cost subject whilst meeting pressure constraints. This paper proposes a new hybrid evolutionary framework that consists of three distinct phases. The first phase applied CMA-ES, a robust adaptive meta-heuristic for continuous optimisation. This is followed by an upward-greedy search phase to remove pressure violations. Finally, a downward greedy search phase is used to reduce oversized pipes. To assess the effectiveness of the hybrid method, it was applied to five well-known WDSs case studies. The results reveal that the new framework outperforms CMA-ES by itself and other previously applied heuristics on most benchmarks in terms of both optimisation speed and network cost. |
Tasks | |
Published | 2019-09-11 |
URL | https://arxiv.org/abs/1909.04846v1 |
https://arxiv.org/pdf/1909.04846v1.pdf | |
PWC | https://paperswithcode.com/paper/covariance-matrix-adaptation-greedy-search |
Repo | https://github.com/a1708192/WDSOP |
Framework | none |
Characterizing and evaluating adversarial examples for Offline Handwritten Signature Verification
Title | Characterizing and evaluating adversarial examples for Offline Handwritten Signature Verification |
Authors | Luiz G. Hafemann, Robert Sabourin, Luiz S. Oliveira |
Abstract | The phenomenon of Adversarial Examples is attracting increasing interest from the Machine Learning community, due to its significant impact to the security of Machine Learning systems. Adversarial examples are similar (from a perceptual notion of similarity) to samples from the data distribution, that “fool” a machine learning classifier. For computer vision applications, these are images with carefully crafted but almost imperceptible changes, that are misclassified. In this work, we characterize this phenomenon under an existing taxonomy of threats to biometric systems, in particular identifying new attacks for Offline Handwritten Signature Verification systems. We conducted an extensive set of experiments on four widely used datasets: MCYT-75, CEDAR, GPDS-160 and the Brazilian PUC-PR, considering both a CNN-based system and a system using a handcrafted feature extractor (CLBP). We found that attacks that aim to get a genuine signature rejected are easy to generate, even in a limited knowledge scenario, where the attacker does not have access to the trained classifier nor the signatures used for training. Attacks that get a forgery to be accepted are harder to produce, and often require a higher level of noise - in most cases, no longer “imperceptible” as previous findings in object recognition. We also evaluated the impact of two countermeasures on the success rate of the attacks and the amount of noise required for generating successful attacks. |
Tasks | Object Recognition |
Published | 2019-01-10 |
URL | http://arxiv.org/abs/1901.03398v1 |
http://arxiv.org/pdf/1901.03398v1.pdf | |
PWC | https://paperswithcode.com/paper/characterizing-and-evaluating-adversarial |
Repo | https://github.com/luizgh/sigver |
Framework | pytorch |
Learning Visual Actions Using Multiple Verb-Only Labels
Title | Learning Visual Actions Using Multiple Verb-Only Labels |
Authors | Michael Wray, Dima Damen |
Abstract | This work introduces verb-only representations for both recognition and retrieval of visual actions, in video. Current methods neglect legitimate semantic ambiguities between verbs, instead choosing unambiguous subsets of verbs along with objects to disambiguate the actions. We instead propose multiple verb-only labels, which we learn through hard or soft assignment as a regression. This enables learning a much larger vocabulary of verbs, including contextual overlaps of these verbs. We collect multi-verb annotations for three action video datasets and evaluate the verb-only labelling representations for action recognition and cross-modal retrieval (video-to-text and text-to-video). We demonstrate that multi-label verb-only representations outperform conventional single verb labels. We also explore other benefits of a multi-verb representation including cross-dataset retrieval and verb type manner and result verb types) retrieval. |
Tasks | Cross-Modal Retrieval |
Published | 2019-07-25 |
URL | https://arxiv.org/abs/1907.11117v2 |
https://arxiv.org/pdf/1907.11117v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-visual-actions-using-multiple-verb |
Repo | https://github.com/mwray/Multi-Verb-Labels |
Framework | none |
Musical Instrument Classification via Low-Dimensional Feature Vectors
Title | Musical Instrument Classification via Low-Dimensional Feature Vectors |
Authors | Zishuo Zhao, Haoyun Wang |
Abstract | Music is a mysterious language that conveys feeling and thoughts via different tones and timbre. For better understanding of timbre in music, we chose music data of 6 representative instruments, analysed their timbre features and classified them. Instead of the current trend of Neural Network for black-box classification, our project is based on a combination of MFCC and LPC, and augmented with a 6-dimensional feature vector designed by ourselves from observation and attempts. In our white-box model, we observed significant patterns of sound that distinguish different timbres, and discovered some connection between objective data and subjective senses. With a totally 32-dimensional feature vector and a naive all-pairs SVM, we achieved improved classification accuracy compared to a single tool. We also attempted to analyze music pieces downloaded from the Internet, found out different performance on different instruments, explored the reasons and suggested possible ways to improve the performance. |
Tasks | |
Published | 2019-09-16 |
URL | https://arxiv.org/abs/1909.08444v1 |
https://arxiv.org/pdf/1909.08444v1.pdf | |
PWC | https://paperswithcode.com/paper/musical-instrument-classification-via-low |
Repo | https://github.com/wiku30/SST |
Framework | none |
Arithmetic addition of two integers by deep image classification networks: experiments to quantify their autonomous reasoning ability
Title | Arithmetic addition of two integers by deep image classification networks: experiments to quantify their autonomous reasoning ability |
Authors | Shuaicheng Liu, Zehao Zhang, Kai Song, Bing Zeng |
Abstract | The unprecedented performance achieved by deep convolutional neural networks for image classification is linked primarily to their ability of capturing rich structural features at various layers within networks. Here we design a series of experiments, inspired by children’s learning of the arithmetic addition of two integers, to showcase that such deep networks can go beyond the structural features to learn deeper knowledge. In our experiments, a set of images is constructed, each image containing an arithmetic addition $n+m$ in its central area, and several classification networks are then trained over a subset of images, using the sum as the label. Tests on the excluded images show that, as the image set gets larger, the networks have well learnt the law of arithmetic additions so as to build up their autonomous reasoning ability strongly. For instance, networks trained over a small percentage of images can classify a big majority of the remaining images correctly, and many arithmetic additions involving some integers that have never been seen during the training can also be solved correctly by the trained networks. |
Tasks | Image Classification |
Published | 2019-12-10 |
URL | https://arxiv.org/abs/1912.04518v1 |
https://arxiv.org/pdf/1912.04518v1.pdf | |
PWC | https://paperswithcode.com/paper/arithmetic-addition-of-two-integers-by-deep |
Repo | https://github.com/kaileysong/arithadd |
Framework | pytorch |