Paper Group ANR 1114
DeepISP: Towards Learning an End-to-End Image Processing Pipeline. Constrained speaker diarization of TV series based on visual patterns. Weakly Supervised Active Learning with Cluster Annotation. Audiovisual speaker diarization of TV series. Second-order Democratic Aggregation. Class-Aware Adversarial Lung Nodule Synthesis in CT Images. Learning L …
DeepISP: Towards Learning an End-to-End Image Processing Pipeline
Title | DeepISP: Towards Learning an End-to-End Image Processing Pipeline |
Authors | Eli Schwartz, Raja Giryes, Alex M. Bronstein |
Abstract | We present DeepISP, a full end-to-end deep neural model of the camera image signal processing (ISP) pipeline. Our model learns a mapping from the raw low-light mosaiced image to the final visually compelling image and encompasses low-level tasks such as demosaicing and denoising as well as higher-level tasks such as color correction and image adjustment. The training and evaluation of the pipeline were performed on a dedicated dataset containing pairs of low-light and well-lit images captured by a Samsung S7 smartphone camera in both raw and processed JPEG formats. The proposed solution achieves state-of-the-art performance in objective evaluation of PSNR on the subtask of joint denoising and demosaicing. For the full end-to-end pipeline, it achieves better visual quality compared to the manufacturer ISP, in both a subjective human assessment and when rated by a deep model trained for assessing image quality. |
Tasks | Demosaicking, Denoising |
Published | 2018-01-20 |
URL | http://arxiv.org/abs/1801.06724v2 |
http://arxiv.org/pdf/1801.06724v2.pdf | |
PWC | https://paperswithcode.com/paper/deepisp-towards-learning-an-end-to-end-image |
Repo | |
Framework | |
Constrained speaker diarization of TV series based on visual patterns
Title | Constrained speaker diarization of TV series based on visual patterns |
Authors | Xavier Bost, Georges Linares |
Abstract | Speaker diarization, usually denoted as the ‘‘who spoke when’’ task, turns out to be particularly challenging when applied to fictional films, where many characters talk in various acoustic conditions (background music, sound effects…). Despite this acoustic variability , such movies exhibit specific visual patterns in the dialogue scenes. In this paper, we introduce a two-step method to achieve speaker diarization in TV series: a speaker diarization is first performed locally in the scenes detected as dialogues; then, the hypothesized local speakers are merged in a second agglomerative clustering process, with the constraint that speakers locally hypothesized to be distinct must not be assigned to the same cluster. The performances of our approach are compared to those obtained by standard speaker diarization tools applied to the same data. |
Tasks | Speaker Diarization |
Published | 2018-12-18 |
URL | http://arxiv.org/abs/1812.07209v2 |
http://arxiv.org/pdf/1812.07209v2.pdf | |
PWC | https://paperswithcode.com/paper/constrained-speaker-diarization-of-tv-series |
Repo | |
Framework | |
Weakly Supervised Active Learning with Cluster Annotation
Title | Weakly Supervised Active Learning with Cluster Annotation |
Authors | Fábio Perez, Rémi Lebret, Karl Aberer |
Abstract | In this work, we introduce a novel framework that employs cluster annotation to boost active learning by reducing the number of human interactions required to train deep neural networks. Instead of annotating single samples individually, humans can also label clusters, producing a higher number of annotated samples with the cost of a small label error. Our experiments show that the proposed framework requires 82% and 87% less human interactions for CIFAR-10 and EuroSAT datasets respectively when compared with the fully-supervised training while maintaining similar performance on the test set. |
Tasks | Active Learning |
Published | 2018-12-31 |
URL | http://arxiv.org/abs/1812.11780v2 |
http://arxiv.org/pdf/1812.11780v2.pdf | |
PWC | https://paperswithcode.com/paper/weakly-supervised-active-learning-with |
Repo | |
Framework | |
Audiovisual speaker diarization of TV series
Title | Audiovisual speaker diarization of TV series |
Authors | Xavier Bost, Georges Linarès, Serigne Gueye |
Abstract | Speaker diarization may be difficult to achieve when applied to narrative films, where speakers usually talk in adverse acoustic conditions: background music, sound effects, wide variations in intonation may hide the inter-speaker variability and make audio-based speaker diarization approaches error prone. On the other hand, such fictional movies exhibit strong regularities at the image level, particularly within dialogue scenes. In this paper, we propose to perform speaker diarization within dialogue scenes of TV series by combining the audio and video modalities: speaker diarization is first performed by using each modality, the two resulting partitions of the instance set are then optimally matched, before the remaining instances, corresponding to cases of disagreement between both modalities, are finally processed. The results obtained by applying such a multi-modal approach to fictional films turn out to outperform those obtained by relying on a single modality. |
Tasks | Speaker Diarization |
Published | 2018-12-18 |
URL | http://arxiv.org/abs/1812.07205v2 |
http://arxiv.org/pdf/1812.07205v2.pdf | |
PWC | https://paperswithcode.com/paper/audiovisual-speaker-diarization-of-tv-series |
Repo | |
Framework | |
Second-order Democratic Aggregation
Title | Second-order Democratic Aggregation |
Authors | Tsung-Yu Lin, Subhransu Maji, Piotr Koniusz |
Abstract | Aggregated second-order features extracted from deep convolutional networks have been shown to be effective for texture generation, fine-grained recognition, material classification, and scene understanding. In this paper, we study a class of orderless aggregation functions designed to minimize interference or equalize contributions in the context of second-order features and we show that they can be computed just as efficiently as their first-order counterparts and they have favorable properties over aggregation by summation. Another line of work has shown that matrix power normalization after aggregation can significantly improve the generalization of second-order representations. We show that matrix power normalization implicitly equalizes contributions during aggregation thus establishing a connection between matrix normalization techniques and prior work on minimizing interference. Based on the analysis we present {\gamma}-democratic aggregators that interpolate between sum ({\gamma}=1) and democratic pooling ({\gamma}=0) outperforming both on several classification tasks. Moreover, unlike power normalization, the {\gamma}-democratic aggregations can be computed in a low dimensional space by sketching that allows the use of very high-dimensional second-order features. This results in a state-of-the-art performance on several datasets. |
Tasks | Material Classification, Scene Understanding, Texture Synthesis |
Published | 2018-08-22 |
URL | http://arxiv.org/abs/1808.07503v1 |
http://arxiv.org/pdf/1808.07503v1.pdf | |
PWC | https://paperswithcode.com/paper/second-order-democratic-aggregation |
Repo | |
Framework | |
Class-Aware Adversarial Lung Nodule Synthesis in CT Images
Title | Class-Aware Adversarial Lung Nodule Synthesis in CT Images |
Authors | Jie Yang, Siqi Liu, Sasa Grbic, Arnaud Arindra Adiyoso Setio, Zhoubing Xu, Eli Gibson, Guillaume Chabin, Bogdan Georgescu, Andrew F. Laine, Dorin Comaniciu |
Abstract | Though large-scale datasets are essential for training deep learning systems, it is expensive to scale up the collection of medical imaging datasets. Synthesizing the objects of interests, such as lung nodules, in medical images based on the distribution of annotated datasets can be helpful for improving the supervised learning tasks, especially when the datasets are limited by size and class balance. In this paper, we propose the class-aware adversarial synthesis framework to synthesize lung nodules in CT images. The framework is built with a coarse-to-fine patch in-painter (generator) and two class-aware discriminators. By conditioning on the random latent variables and the target nodule labels, the trained networks are able to generate diverse nodules given the same context. By evaluating on the public LIDC-IDRI dataset, we demonstrate an example application of the proposed framework for improving the accuracy of the lung nodule malignancy estimation as a binary classification problem, which is important in the lung screening scenario. We show that combining the real image patches and the synthetic lung nodules in the training set can improve the mean AUC classification score across different network architectures by 2%. |
Tasks | |
Published | 2018-12-28 |
URL | http://arxiv.org/abs/1812.11204v1 |
http://arxiv.org/pdf/1812.11204v1.pdf | |
PWC | https://paperswithcode.com/paper/class-aware-adversarial-lung-nodule-synthesis |
Repo | |
Framework | |
Learning Latent Subspaces in Variational Autoencoders
Title | Learning Latent Subspaces in Variational Autoencoders |
Authors | Jack Klys, Jake Snell, Richard Zemel |
Abstract | Variational autoencoders (VAEs) are widely used deep generative models capable of learning unsupervised latent representations of data. Such representations are often difficult to interpret or control. We consider the problem of unsupervised learning of features correlated to specific labels in a dataset. We propose a VAE-based generative model which we show is capable of extracting features correlated to binary labels in the data and structuring it in a latent subspace which is easy to interpret. Our model, the Conditional Subspace VAE (CSVAE), uses mutual information minimization to learn a low-dimensional latent subspace associated with each label that can easily be inspected and independently manipulated. We demonstrate the utility of the learned representations for attribute manipulation tasks on both the Toronto Face and CelebA datasets. |
Tasks | |
Published | 2018-12-14 |
URL | http://arxiv.org/abs/1812.06190v1 |
http://arxiv.org/pdf/1812.06190v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-latent-subspaces-in-variational |
Repo | |
Framework | |
Progress & Compress: A scalable framework for continual learning
Title | Progress & Compress: A scalable framework for continual learning |
Authors | Jonathan Schwarz, Jelena Luketina, Wojciech M. Czarnecki, Agnieszka Grabska-Barwinska, Yee Whye Teh, Razvan Pascanu, Raia Hadsell |
Abstract | We introduce a conceptually simple and scalable framework for continual learning domains where tasks are learned sequentially. Our method is constant in the number of parameters and is designed to preserve performance on previously encountered tasks while accelerating learning progress on subsequent problems. This is achieved by training a network with two components: A knowledge base, capable of solving previously encountered problems, which is connected to an active column that is employed to efficiently learn the current task. After learning a new task, the active column is distilled into the knowledge base, taking care to protect any previously acquired skills. This cycle of active learning (progression) followed by consolidation (compression) requires no architecture growth, no access to or storing of previous data or tasks, and no task-specific parameters. We demonstrate the progress & compress approach on sequential classification of handwritten alphabets as well as two reinforcement learning domains: Atari games and 3D maze navigation. |
Tasks | Active Learning, Atari Games, Continual Learning |
Published | 2018-05-16 |
URL | http://arxiv.org/abs/1805.06370v2 |
http://arxiv.org/pdf/1805.06370v2.pdf | |
PWC | https://paperswithcode.com/paper/progress-compress-a-scalable-framework-for |
Repo | |
Framework | |
Segmentation of Bleeding Regions in Wireless Capsule Endoscopy for Detection of Informative Frames
Title | Segmentation of Bleeding Regions in Wireless Capsule Endoscopy for Detection of Informative Frames |
Authors | Mohsen Hajabdollahi, Reza Esfandiarpoor, Pejman Khadivi, S. M. Reza Soroushmehr, Nader Karimi, Kayvan Najarian, Shadrokh Samavi |
Abstract | Wireless capsule endoscopy (WCE) is an effective mean for diagnosis of gastrointestinal disorders. Detection of informative scenes in WCE video could reduce the length of transmitted videos and help the diagnosis procedure. In this paper, we investigate the problem of simplification of neural networks for automatic bleeding region detection inside capsule endoscopy device. Suitable color channels are selected as neural networks inputs, and image classification is conducted using a multi-layer perceptron (MLP) and a convolutional neural network (CNN) separately. Both CNN and MLP structures are simplified to reduce the number of computational operations. Performances of two simplified networks are evaluated on a WCE bleeding image dataset using the DICE score. Simulation results show that applying simplification methods on both MLP and CNN structures reduces the number of computational operations significantly with AUC greater than 0.97. Although CNN performs better in comparison with simplified MLP, the simplified MLP segments bleeding regions with a significantly smaller number of computational operations. Concerning the importance of having a simple structure or a more accurate model, each of the designed structures could be selected for inside capsule implementation. |
Tasks | Image Classification |
Published | 2018-08-23 |
URL | http://arxiv.org/abs/1808.07746v1 |
http://arxiv.org/pdf/1808.07746v1.pdf | |
PWC | https://paperswithcode.com/paper/segmentation-of-bleeding-regions-in-wireless |
Repo | |
Framework | |
Query Adaptive Late Fusion for Image Retrieval
Title | Query Adaptive Late Fusion for Image Retrieval |
Authors | Zhongdao Wang, Liang Zheng, Shengjin Wang |
Abstract | Feature fusion is a commonly used strategy in image retrieval tasks, which aggregates the matching responses of multiple visual features. Feasible sets of features can be either descriptors (SIFT, HSV) for an entire image or the same descriptor for different local parts (face, body). Ideally, the to-be-fused heterogeneous features are pre-assumed to be discriminative and complementary to each other. However, the effectiveness of different features varies dramatically according to different queries. That is to say, for some queries, a feature may be neither discriminative nor complementary to existing ones, while for other queries, the feature suffices. As a result, it is important to estimate the effectiveness of features in a query-adaptive manner. To this end, this article proposes a new late fusion scheme at the score level. We base our method on the observation that the sorted score curves contain patterns that describe their effectiveness. For example, an “L”-shaped curve indicates that the feature is discriminative while a gradually descending curve suggests a bad feature. As such, this paper introduces a query-adaptive late fusion pipeline. In the hand-crafted version, it can be an unsupervised approach to tasks like particular object retrieval. In the learning version, it can also be applied to supervised tasks like person recognition and pedestrian retrieval, based on a trainable neural module. Extensive experiments are conducted on two object retrieval datasets and one person recognition dataset. We show that our method is able to highlight the good features and suppress the bad ones, is resilient to distractor features, and achieves very competitive retrieval accuracy compared with the state of the art. In an additional person re-identification dataset, the application scope and limitation of the proposed method are studied. |
Tasks | Image Retrieval, Person Recognition, Person Re-Identification |
Published | 2018-10-31 |
URL | http://arxiv.org/abs/1810.13103v1 |
http://arxiv.org/pdf/1810.13103v1.pdf | |
PWC | https://paperswithcode.com/paper/query-adaptive-late-fusion-for-image |
Repo | |
Framework | |
A Parallel Divide-and-Conquer based Evolutionary Algorithm for Large-scale Optimization
Title | A Parallel Divide-and-Conquer based Evolutionary Algorithm for Large-scale Optimization |
Authors | Peng Yang, Ke Tang, Xin Yao |
Abstract | Large-scale optimization problems that involve thousands of decision variables have extensively arisen from various industrial areas. As a powerful optimization tool for many real-world applications, evolutionary algorithms (EAs) fail to solve the emerging large-scale problems both effectively and efficiently. In this paper, we propose a novel Divide-and-Conquer (DC) based EA that can not only produce high-quality solution by solving sub-problems separately, but also highly utilizes the power of parallel computing by solving the sub-problems simultaneously. Existing DC-based EAs that were deemed to enjoy the same advantages of the proposed algorithm, are shown to be practically incompatible with the parallel computing scheme, unless some trade-offs are made by compromising the solution quality. |
Tasks | |
Published | 2018-12-06 |
URL | http://arxiv.org/abs/1812.02500v1 |
http://arxiv.org/pdf/1812.02500v1.pdf | |
PWC | https://paperswithcode.com/paper/a-parallel-divide-and-conquer-based |
Repo | |
Framework | |
Zoom-RNN: A Novel Method for Person Recognition Using Recurrent Neural Networks
Title | Zoom-RNN: A Novel Method for Person Recognition Using Recurrent Neural Networks |
Authors | Sina Mokhtarzadeh Azar, Sajjad Azami, Mina Ghadimi Atigh, Mohammad Javadi, Ahmad Nickabadi |
Abstract | The overwhelming popularity of social media has resulted in bulk amounts of personal photos being uploaded to the internet every day. Since these photos are taken in unconstrained settings, recognizing the identities of people among the photos remains a challenge. Studies have indicated that utilizing evidence other than face appearance improves the performance of person recognition systems. In this work, we aim to take advantage of additional cues obtained from different body regions in a zooming in fashion for person recognition. Hence, we present Zoom-RNN, a novel method based on recurrent neural networks for combining evidence extracted from the whole body, upper body, and head regions. Our model is evaluated on a challenging dataset, namely People In Photo Albums (PIPA), and we demonstrate that employing our system improves the performance of conventional fusion methods by a noticeable margin. |
Tasks | Person Recognition |
Published | 2018-09-24 |
URL | http://arxiv.org/abs/1809.09189v2 |
http://arxiv.org/pdf/1809.09189v2.pdf | |
PWC | https://paperswithcode.com/paper/zoom-rnn-a-novel-method-for-person |
Repo | |
Framework | |
The Importance of Norm Regularization in Linear Graph Embedding: Theoretical Analysis and Empirical Demonstration
Title | The Importance of Norm Regularization in Linear Graph Embedding: Theoretical Analysis and Empirical Demonstration |
Authors | Yihan Gao, Chao Zhang, Jian Peng, Aditya Parameswaran |
Abstract | Learning distributed representations for nodes in graphs is a crucial primitive in network analysis with a wide spectrum of applications. Linear graph embedding methods learn such representations by optimizing the likelihood of both positive and negative edges while constraining the dimension of the embedding vectors. We argue that the generalization performance of these methods is not due to the dimensionality constraint as commonly believed, but rather the small norm of embedding vectors. Both theoretical and empirical evidence are provided to support this argument: (a) we prove that the generalization error of these methods can be bounded by limiting the norm of vectors, regardless of the embedding dimension; (b) we show that the generalization performance of linear graph embedding methods is correlated with the norm of embedding vectors, which is small due to the early stopping of SGD and the vanishing gradients. We performed extensive experiments to validate our analysis and showcased the importance of proper norm regularization in practice. |
Tasks | Graph Embedding |
Published | 2018-02-10 |
URL | http://arxiv.org/abs/1802.03560v2 |
http://arxiv.org/pdf/1802.03560v2.pdf | |
PWC | https://paperswithcode.com/paper/the-importance-of-norm-regularization-in |
Repo | |
Framework | |
Motorcycle detection and classification in urban Scenarios using a model based on Faster R-CNN
Title | Motorcycle detection and classification in urban Scenarios using a model based on Faster R-CNN |
Authors | Jorge E. Espinosa, Sergio A. Velastin, John W. Branch |
Abstract | This paper introduces a Deep Learning Convolutional Neural Network model based on Faster-RCNN for motorcycle detection and classification on urban environments. The model is evaluated in occluded scenarios where more than 60% of the vehicles present a degree of occlusion. For training and evaluation, we introduce a new dataset of 7500 annotated images, captured under real traffic scenes, using a drone mounted camera. Several tests were carried out to design the network, achieving promising results of 75% in average precision (AP), even with the high number of occluded motorbikes, the low angle of capture and the moving camera. The model is also evaluated on low occlusions datasets, reaching results of up to 92% in AP. |
Tasks | |
Published | 2018-08-07 |
URL | http://arxiv.org/abs/1808.02299v2 |
http://arxiv.org/pdf/1808.02299v2.pdf | |
PWC | https://paperswithcode.com/paper/motorcycle-detection-and-classification-in |
Repo | |
Framework | |
Region-filtering Correlation Tracking
Title | Region-filtering Correlation Tracking |
Authors | Nana Fan, Zhenyu He |
Abstract | Recently, correlation filters have demonstrated the excellent performance in visual tracking. However, the base training sample region is larger than the object region,including the Interference Region(IR). The IRs in training samples from cyclic shifts of the base training sample severely degrade the quality of a tracking model. In this paper, we propose the novel Region-filtering Correlation Tracking (RFCT) to address this problem. We immediately filter training samples by introducing a spatial map into the standard CF formulation. Compared with existing correlation filter trackers, our proposed tracker has the following advantages: (1) The correlation filter can be learned on a larger search region without the interference of the IR by a spatial map. (2) Due to processing training samples by a spatial map, it is more general way to control background information and target information in training samples. The values of the spatial map are not restricted, then a better spatial map can be explored. (3) The weight proportions of accurate filters are increased to alleviate model corruption. Experiments are performed on two benchmark datasets: OTB-2013 and OTB-2015. Quantitative evaluations on these benchmarks demonstrate that the proposed RFCT algorithm performs favorably against several state-of-the-art methods. |
Tasks | Visual Tracking |
Published | 2018-03-23 |
URL | http://arxiv.org/abs/1803.08687v1 |
http://arxiv.org/pdf/1803.08687v1.pdf | |
PWC | https://paperswithcode.com/paper/region-filtering-correlation-tracking |
Repo | |
Framework | |