October 20, 2019

3706 words 18 mins read

Paper Group AWR 251

Convolutional Analysis Operator Learning: Acceleration and Convergence. Learning Factorized Representations for Open-set Domain Adaptation. Adapted Deep Embeddings: A Synthesis of Methods for $k$-Shot Inductive Transfer Learning. Modeling Sparse Deviations for Compressed Sensing using Generative Models. Analysis of Hand Segmentation in the Wild. De …

Convolutional Analysis Operator Learning: Acceleration and Convergence


Title	Convolutional Analysis Operator Learning: Acceleration and Convergence
Authors	Il Yong Chun, Jeffrey A. Fessler
Abstract	Convolutional operator learning is gaining attention in many signal processing and computer vision applications. Learning kernels has mostly relied on so-called patch-domain approaches that extract and store many overlapping patches across training signals. Due to memory demands, patch-domain methods have limitations when learning kernels from large datasets – particularly with multi-layered structures, e.g., convolutional neural networks – or when applying the learned kernels to high-dimensional signal recovery problems. The so-called convolution approach does not store many overlapping patches, and thus overcomes the memory problems particularly with careful algorithmic designs; it has been studied within the “synthesis” signal model, e.g., convolutional dictionary learning. This paper proposes a new convolutional analysis operator learning (CAOL) framework that learns an analysis sparsifying regularizer with the convolution perspective, and develops a new convergent Block Proximal Extrapolated Gradient method using a Majorizer (BPEG-M) to solve the corresponding block multi-nonconvex problems. To learn diverse filters within the CAOL framework, this paper introduces an orthogonality constraint that enforces a tight-frame filter condition, and a regularizer that promotes diversity between filters. Numerical experiments show that, with sharp majorizers, BPEG-M significantly accelerates the CAOL convergence rate compared to the state-of-the-art block proximal gradient (BPG) method. Numerical experiments for sparse-view computational tomography show that a convolutional sparsifying regularizer learned via CAOL significantly improves reconstruction quality compared to a conventional edge-preserving regularizer. Using more and wider kernels in a learned regularizer better preserves edges in reconstructed images.
Tasks	Dictionary Learning
Published	2018-02-15
URL	https://arxiv.org/abs/1802.05584v7
PDF	https://arxiv.org/pdf/1802.05584v7.pdf
PWC	https://paperswithcode.com/paper/convolutional-analysis-operator-learning-1
Repo	https://github.com/dahong67/ConvolutionalAnalysisOperatorLearning.jl
Framework	none

Learning Factorized Representations for Open-set Domain Adaptation


Title	Learning Factorized Representations for Open-set Domain Adaptation
Authors	Mahsa Baktashmotlagh, Masoud Faraki, Tom Drummond, Mathieu Salzmann
Abstract	Domain adaptation for visual recognition has undergone great progress in the past few years. Nevertheless, most existing methods work in the so-called closed-set scenario, assuming that the classes depicted by the target images are exactly the same as those of the source domain. In this paper, we tackle the more challenging, yet more realistic case of open-set domain adaptation, where new, unknown classes can be present in the target data. While, in the unsupervised scenario, one cannot expect to be able to identify each specific new class, we aim to automatically detect which samples belong to these new classes and discard them from the recognition process. To this end, we rely on the intuition that the source and target samples depicting the known classes can be generated by a shared subspace, whereas the target samples from unknown classes come from a different, private subspace. We therefore introduce a framework that factorizes the data into shared and private parts, while encouraging the shared representation to be discriminative. Our experiments on standard benchmarks evidence that our approach significantly outperforms the state-of-the-art in open-set domain adaptation.
Tasks	Domain Adaptation
Published	2018-05-31
URL	http://arxiv.org/abs/1805.12277v1
PDF	http://arxiv.org/pdf/1805.12277v1.pdf
PWC	https://paperswithcode.com/paper/learning-factorized-representations-for-open
Repo	https://github.com/ChenJinBIT/OSDA
Framework	tf

Adapted Deep Embeddings: A Synthesis of Methods for $k$-Shot Inductive Transfer Learning


Title	Adapted Deep Embeddings: A Synthesis of Methods for $k$-Shot Inductive Transfer Learning
Authors	Tyler R. Scott, Karl Ridgeway, Michael C. Mozer
Abstract	The focus in machine learning has branched beyond training classifiers on a single task to investigating how previously acquired knowledge in a source domain can be leveraged to facilitate learning in a related target domain, known as inductive transfer learning. Three active lines of research have independently explored transfer learning using neural networks. In weight transfer, a model trained on the source domain is used as an initialization point for a network to be trained on the target domain. In deep metric learning, the source domain is used to construct an embedding that captures class structure in both the source and target domains. In few-shot learning, the focus is on generalizing well in the target domain based on a limited number of labeled examples. We compare state-of-the-art methods from these three paradigms and also explore hybrid adapted-embedding methods that use limited target-domain data to fine tune embeddings constructed from source-domain data. We conduct a systematic comparison of methods in a variety of domains, varying the number of labeled instances available in the target domain ($k$), as well as the number of target-domain classes. We reach three principal conclusions: (1) Deep embeddings are far superior, compared to weight transfer, as a starting point for inter-domain transfer or model re-use (2) Our hybrid methods robustly outperform every few-shot learning and every deep metric learning method previously proposed, with a mean error reduction of 34% over state-of-the-art. (3) Among loss functions for discovering embeddings, the histogram loss (Ustinova & Lempitsky, 2016) is most robust. We hope our results will motivate a unification of research in weight transfer, deep metric learning, and few-shot learning.
Tasks	Few-Shot Learning, Metric Learning, Transfer Learning
Published	2018-05-22
URL	http://arxiv.org/abs/1805.08402v4
PDF	http://arxiv.org/pdf/1805.08402v4.pdf
PWC	https://paperswithcode.com/paper/adapted-deep-embeddings-a-synthesis-of-1
Repo	https://github.com/tylersco/adapted_deep_embeddings
Framework	tf

Modeling Sparse Deviations for Compressed Sensing using Generative Models


Title	Modeling Sparse Deviations for Compressed Sensing using Generative Models
Authors	Manik Dhar, Aditya Grover, Stefano Ermon
Abstract	In compressed sensing, a small number of linear measurements can be used to reconstruct an unknown signal. Existing approaches leverage assumptions on the structure of these signals, such as sparsity or the availability of a generative model. A domain-specific generative model can provide a stronger prior and thus allow for recovery with far fewer measurements. However, unlike sparsity-based approaches, existing methods based on generative models guarantee exact recovery only over their support, which is typically only a small subset of the space on which the signals are defined. We propose Sparse-Gen, a framework that allows for sparse deviations from the support set, thereby achieving the best of both worlds by using a domain specific prior and allowing reconstruction over the full space of signals. Theoretically, our framework provides a new class of signals that can be acquired using compressed sensing, reducing classic sparse vector recovery to a special case and avoiding the restrictive support due to a generative model prior. Empirically, we observe consistent improvements in reconstruction accuracy over competing approaches, especially in the more practical setting of transfer compressed sensing where a generative model for a data-rich, source domain aids sensing on a data-scarce, target domain.
Tasks
Published	2018-07-04
URL	http://arxiv.org/abs/1807.01442v2
PDF	http://arxiv.org/pdf/1807.01442v2.pdf
PWC	https://paperswithcode.com/paper/modeling-sparse-deviations-for-compressed
Repo	https://github.com/ermongroup/sparse_gen
Framework	tf

Analysis of Hand Segmentation in the Wild


Title	Analysis of Hand Segmentation in the Wild
Authors	Aisha Urooj Khan, Ali Borji
Abstract	A large number of works in egocentric vision have concentrated on action and object recognition. Detection and segmentation of hands in first-person videos, however, has less been explored. For many applications in this domain, it is necessary to accurately segment not only hands of the camera wearer but also the hands of others with whom he is interacting. Here, we take an in-depth look at the hand segmentation problem. In the quest for robust hand segmentation methods, we evaluated the performance of the state of the art semantic segmentation methods, off the shelf and fine-tuned, on existing datasets. We fine-tune RefineNet, a leading semantic segmentation method, for hand segmentation and find that it does much better than the best contenders. Existing hand segmentation datasets are collected in the laboratory settings. To overcome this limitation, we contribute by collecting two new datasets: a) EgoYouTubeHands including egocentric videos containing hands in the wild, and b) HandOverFace to analyze the performance of our models in presence of similar appearance occlusions. We further explore whether conditional random fields can help refine generated hand segmentations. To demonstrate the benefit of accurate hand maps, we train a CNN for hand-based activity recognition and achieve higher accuracy when a CNN was trained using hand maps produced by the fine-tuned RefineNet. Finally, we annotate a subset of the EgoHands dataset for fine-grained action recognition and show that an accuracy of 58.6% can be achieved by just looking at a single hand pose which is much better than the chance level (12.5%).
Tasks	Activity Recognition, Hand Segmentation, Semantic Segmentation, Temporal Action Localization
Published	2018-03-08
URL	http://arxiv.org/abs/1803.03317v2
PDF	http://arxiv.org/pdf/1803.03317v2.pdf
PWC	https://paperswithcode.com/paper/analysis-of-hand-segmentation-in-the-wild
Repo	https://github.com/aurooj/Hand-Segmentation-in-the-Wild
Framework	none


Title	Decoupled Classification Refinement: Hard False Positive Suppression for Object Detection
Authors	Bowen Cheng, Yunchao Wei, Honghui Shi, Rogerio Feris, Jinjun Xiong, Thomas Huang
Abstract	In this paper, we analyze failure cases of state-of-the-art detectors and observe that most hard false positives result from classification instead of localization and they have a large negative impact on the performance of object detectors. We conjecture there are three factors: (1) Shared feature representation is not optimal due to the mismatched goals of feature learning for classification and localization; (2) multi-task learning helps, yet optimization of the multi-task loss may result in sub-optimal for individual tasks; (3) large receptive field for different scales leads to redundant context information for small objects. We demonstrate the potential of detector classification power by a simple, effective, and widely-applicable Decoupled Classification Refinement (DCR) network. In particular, DCR places a separate classification network in parallel with the localization network (base detector). With ROI Pooling placed on the early stage of the classification network, we enforce an adaptive receptive field in DCR. During training, DCR samples hard false positives from the base detector and trains a strong classifier to refine classification results. During testing, DCR refines all boxes from the base detector. Experiments show competitive results on PASCAL VOC and COCO without any bells and whistles. Our codes are available at: https://github.com/bowenc0221/Decoupled-Classification-Refinement.
Tasks	Multi-Task Learning, Object Detection
Published	2018-10-05
URL	http://arxiv.org/abs/1810.04002v1
PDF	http://arxiv.org/pdf/1810.04002v1.pdf
PWC	https://paperswithcode.com/paper/decoupled-classification-refinement-hard
Repo	https://github.com/bowenc0221/Decoupled-Classification-Refinement
Framework	tf

Taming VAEs


Title	Taming VAEs
Authors	Danilo Jimenez Rezende, Fabio Viola
Abstract	In spite of remarkable progress in deep latent variable generative modeling, training still remains a challenge due to a combination of optimization and generalization issues. In practice, a combination of heuristic algorithms (such as hand-crafted annealing of KL-terms) is often used in order to achieve the desired results, but such solutions are not robust to changes in model architecture or dataset. The best settings can often vary dramatically from one problem to another, which requires doing expensive parameter sweeps for each new case. Here we develop on the idea of training VAEs with additional constraints as a way to control their behaviour. We first present a detailed theoretical analysis of constrained VAEs, expanding our understanding of how these models work. We then introduce and analyze a practical algorithm termed Generalized ELBO with Constrained Optimization, GECO. The main advantage of GECO for the machine learning practitioner is a more intuitive, yet principled, process of tuning the loss. This involves defining of a set of constraints, which typically have an explicit relation to the desired model performance, in contrast to tweaking abstract hyper-parameters which implicitly affect the model behavior. Encouraging experimental results in several standard datasets indicate that GECO is a very robust and effective tool to balance reconstruction and compression constraints.
Tasks
Published	2018-10-01
URL	http://arxiv.org/abs/1810.00597v1
PDF	http://arxiv.org/pdf/1810.00597v1.pdf
PWC	https://paperswithcode.com/paper/taming-vaes
Repo	https://github.com/denproc/Taming-VAEs
Framework	pytorch

Efficient and accurate inversion of multiple scattering with deep learning


Title	Efficient and accurate inversion of multiple scattering with deep learning
Authors	Yu Sun, Zhihao Xia, Ulugbek S. Kamilov
Abstract	Image reconstruction under multiple light scattering is crucial in a number of applications such as diffraction tomography. The reconstruction problem is often formulated as a nonconvex optimization, where a nonlinear measurement model is used to account for multiple scattering and regularization is used to enforce prior constraints on the object. In this paper, we propose a powerful alternative to this optimization-based view of image reconstruction by designing and training a deep convolutional neural network that can invert multiple scattered measurements to produce a high-quality image of the refractive index. Our results on both simulated and experimental datasets show that the proposed approach is substantially faster and achieves higher imaging quality compared to the state-of-the-art methods based on optimization.
Tasks	Image Reconstruction
Published	2018-03-18
URL	http://arxiv.org/abs/1803.06594v2
PDF	http://arxiv.org/pdf/1803.06594v2.pdf
PWC	https://paperswithcode.com/paper/efficient-and-accurate-inversion-of-multiple
Repo	https://github.com/zouhanrui/2dunet
Framework	tf

Protecting Sensory Data against Sensitive Inferences


Title	Protecting Sensory Data against Sensitive Inferences
Authors	Mohammad Malekzadeh, Richard G. Clegg, Andrea Cavallaro, Hamed Haddadi
Abstract	There is growing concern about how personal data are used when users grant applications direct access to the sensors of their mobile devices. In fact, high resolution temporal data generated by motion sensors reflect directly the activities of a user and indirectly physical and demographic attributes. In this paper, we propose a feature learning architecture for mobile devices that provides flexible and negotiable privacy-preserving sensor data transmission by appropriately transforming raw sensor data. The objective is to move from the current binary setting of granting or not permission to an application, toward a model that allows users to grant each application permission over a limited range of inferences according to the provided services. The internal structure of each component of the proposed architecture can be flexibly changed and the trade-off between privacy and utility can be negotiated between the constraints of the user and the underlying application. We validated the proposed architecture in an activity recognition application using two real-world datasets, with the objective of recognizing an activity without disclosing gender as an example of private information. Results show that the proposed framework maintains the usefulness of the transformed data for activity recognition, with an average loss of only around three percentage points, while reducing the possibility of gender classification to around 50%, the target random guess, from more than 90% when using raw sensor data. We also present and distribute MotionSense, a new dataset for activity and attribute recognition collected from motion sensors.
Tasks	Activity Recognition
Published	2018-02-21
URL	http://arxiv.org/abs/1802.07802v4
PDF	http://arxiv.org/pdf/1802.07802v4.pdf
PWC	https://paperswithcode.com/paper/protecting-sensory-data-against-sensitive
Repo	https://github.com/mmalekzadeh/motion-sense
Framework	none

Strict Very Fast Decision Tree: a memory conservative algorithm for data stream mining


Title	Strict Very Fast Decision Tree: a memory conservative algorithm for data stream mining
Authors	Victor Guilherme Turrisi da Costa, André Carlos Ponce de Leon Ferreira de Carvalho, Sylvio Barbon Junior
Abstract	Dealing with memory and time constraints are current challenges when learning from data streams with a massive amount of data. Many algorithms have been proposed to handle these difficulties, among them, the Very Fast Decision Tree (VFDT) algorithm. Although the VFDT has been widely used in data stream mining, in the last years, several authors have suggested modifications to increase its performance, putting aside memory concerns by proposing memory-costly solutions. Besides, most data stream mining solutions have been centred around ensembles, which combine the memory costs of their weak learners, usually VFDTs. To reduce the memory cost, keeping the predictive performance, this study proposes the Strict VFDT (SVFDT), a novel algorithm based on the VFDT. The SVFDT algorithm minimises unnecessary tree growth, substantially reducing memory usage and keeping competitive predictive performance. Moreover, since it creates much more shallow trees than VFDT, SVFDT can achieve a shorter processing time. Experiments were carried out comparing the SVFDT with the VFDT in 11 benchmark data stream datasets. This comparison assessed the trade-off between accuracy, memory, and processing time. Statistical analysis showed that the proposed algorithm obtained similar predictive performance and significantly reduced processing time and memory use. Thus, SVFDT is a suitable option for data stream mining with memory and time limitations, recommended as a weak learner in ensemble-based solutions.
Tasks
Published	2018-05-16
URL	http://arxiv.org/abs/1805.06368v2
PDF	http://arxiv.org/pdf/1805.06368v2.pdf
PWC	https://paperswithcode.com/paper/strict-very-fast-decision-tree-a-memory
Repo	https://github.com/vturrisi/pystream
Framework	none

Technical Report: Adjudication of Coreference Annotations via Answer Set Optimization


Title	Technical Report: Adjudication of Coreference Annotations via Answer Set Optimization
Authors	Peter Schüller
Abstract	We describe the first automatic approach for merging coreference annotations obtained from multiple annotators into a single gold standard. This merging is subject to certain linguistic hard constraints and optimization criteria that prefer solutions with minimal divergence from annotators. The representation involves an equivalence relation over a large number of elements. We use Answer Set Programming to describe two representations of the problem and four objective functions suitable for different datasets. We provide two structurally different real-world benchmark datasets based on the METU-Sabanci Turkish Treebank and we report our experiences in using the Gringo, Clasp, and Wasp tools for computing optimal adjudication results on these datasets.
Tasks
Published	2018-01-31
URL	http://arxiv.org/abs/1802.00033v1
PDF	http://arxiv.org/pdf/1802.00033v1.pdf
PWC	https://paperswithcode.com/paper/technical-report-adjudication-of-coreference
Repo	https://github.com/knowlp/caspr-coreference-tool
Framework	none

Infrared and Visible Image Fusion with ResNet and zero-phase component analysis


Title	Infrared and Visible Image Fusion with ResNet and zero-phase component analysis
Authors	Hui Li, Xiao-Jun Wu, Tariq S. Durrani
Abstract	Feature extraction and processing tasks play a key role in Image Fusion, and the fusion performance is directly affected by the different features and processing methods undertaken. By contrast, most of deep learning-based methods use deep features directly without feature extraction or processing. This leads to the fusion performance degradation in some cases. To solve these drawbacks, we propose a deep features and zero-phase component analysis (ZCA) based novel fusion framework is this paper. Firstly, the residual network (ResNet) is used to extract deep features from source images. Then ZCA is utilized to normalize the deep features and obtain initial weight maps. The final weight maps are obtained by employing a soft-max operation in association with the initial weight maps. Finally, the fused image is reconstructed using a weighted-averaging strategy. Compared with the existing fusion methods, experimental results demonstrate that the proposed framework achieves better performance in both objective assessment and visual quality. The code of our fusion algorithm is available at https://github.com/hli1221/imagefusion_resnet50
Tasks	Infrared And Visible Image Fusion
Published	2018-06-19
URL	https://arxiv.org/abs/1806.07119v7
PDF	https://arxiv.org/pdf/1806.07119v7.pdf
PWC	https://paperswithcode.com/paper/infrared-and-visible-image-fusion-with-resnet
Repo	https://github.com/hli1221/imagefusion_resnet50
Framework	none

On the Computation of Kantorovich-Wasserstein Distances between 2D-Histograms by Uncapacitated Minimum Cost Flows


Title	On the Computation of Kantorovich-Wasserstein Distances between 2D-Histograms by Uncapacitated Minimum Cost Flows
Authors	Federico Bassetti, Stefano Gualandi, Marco Veneroni
Abstract	In this work, we present a method to compute the Kantorovich-Wasserstein distance of order one between a pair of two-dimensional histograms. Recent works in Computer Vision and Machine Learning have shown the benefits of measuring Wasserstein distances of order one between histograms with $n$ bins, by solving a classical transportation problem on very large complete bipartite graphs with $n$ nodes and $n^2$ edges. The main contribution of our work is to approximate the original transportation problem by an uncapacitated min cost flow problem on a reduced flow network of size $O(n)$ that exploits the geometric structure of the cost function. More precisely, when the distance among the bin centers is measured with the 1-norm or the $\infty$-norm, our approach provides an optimal solution. When the distance among bins is measured with the 2-norm: (i) we derive a quantitative estimate on the error between optimal and approximate solution; (ii) given the error, we construct a reduced flow network of size $O(n)$. We numerically show the benefits of our approach by computing Wasserstein distances of order one on a set of grey scale images used as benchmark in the literature. We show how our approach scales with the size of the images with 1-norm, 2-norm and $\infty$-norm ground distances, and we compare it with other two methods which are largely used in the literature.
Tasks
Published	2018-04-02
URL	https://arxiv.org/abs/1804.00445v3
PDF	https://arxiv.org/pdf/1804.00445v3.pdf
PWC	https://paperswithcode.com/paper/on-the-computation-of-kantorovich-wasserstein
Repo	https://github.com/stegua/dotlib
Framework	none

Neural Code Comprehension: A Learnable Representation of Code Semantics


Title	Neural Code Comprehension: A Learnable Representation of Code Semantics
Authors	Tal Ben-Nun, Alice Shoshana Jakobovits, Torsten Hoefler
Abstract	With the recent success of embeddings in natural language processing, research has been conducted into applying similar methods to code analysis. Most works attempt to process the code directly or use a syntactic tree representation, treating it like sentences written in a natural language. However, none of the existing methods are sufficient to comprehend program semantics robustly, due to structural features such as function calls, branching, and interchangeable order of statements. In this paper, we propose a novel processing technique to learn code semantics, and apply it to a variety of program analysis tasks. In particular, we stipulate that a robust distributional hypothesis of code applies to both human- and machine-generated programs. Following this hypothesis, we define an embedding space, inst2vec, based on an Intermediate Representation (IR) of the code that is independent of the source programming language. We provide a novel definition of contextual flow for this IR, leveraging both the underlying data- and control-flow of the program. We then analyze the embeddings qualitatively using analogies and clustering, and evaluate the learned representation on three different high-level tasks. We show that even without fine-tuning, a single RNN architecture and fixed inst2vec embeddings outperform specialized approaches for performance prediction (compute device mapping, optimal thread coarsening); and algorithm classification from raw code (104 classes), where we set a new state-of-the-art.
Tasks
Published	2018-06-19
URL	http://arxiv.org/abs/1806.07336v3
PDF	http://arxiv.org/pdf/1806.07336v3.pdf
PWC	https://paperswithcode.com/paper/neural-code-comprehension-a-learnable
Repo	https://github.com/spcl/ncc
Framework	tf

Improving Semantic Segmentation via Video Propagation and Label Relaxation


Title	Improving Semantic Segmentation via Video Propagation and Label Relaxation
Authors	Yi Zhu, Karan Sapra, Fitsum A. Reda, Kevin J. Shih, Shawn Newsam, Andrew Tao, Bryan Catanzaro
Abstract	Semantic segmentation requires large amounts of pixel-wise annotations to learn accurate models. In this paper, we present a video prediction-based methodology to scale up training sets by synthesizing new training samples in order to improve the accuracy of semantic segmentation networks. We exploit video prediction models’ ability to predict future frames in order to also predict future labels. A joint propagation strategy is also proposed to alleviate mis-alignments in synthesized samples. We demonstrate that training segmentation models on datasets augmented by the synthesized samples leads to significant improvements in accuracy. Furthermore, we introduce a novel boundary label relaxation technique that makes training robust to annotation noise and propagation artifacts along object boundaries. Our proposed methods achieve state-of-the-art mIoUs of 83.5% on Cityscapes and 82.9% on CamVid. Our single model, without model ensembles, achieves 72.8% mIoU on the KITTI semantic segmentation test set, which surpasses the winning entry of the ROB challenge 2018. Our code and videos can be found at https://nv-adlr.github.io/publication/2018-Segmentation.
Tasks	Semantic Segmentation, Video Prediction
Published	2018-12-04
URL	https://arxiv.org/abs/1812.01593v3
PDF	https://arxiv.org/pdf/1812.01593v3.pdf
PWC	https://paperswithcode.com/paper/improving-semantic-segmentation-via-video
Repo	https://github.com/NVIDIA/semantic-segmentation
Framework	pytorch