Paper Group AWR 251
Convolutional Analysis Operator Learning: Acceleration and Convergence. Learning Factorized Representations for Open-set Domain Adaptation. Adapted Deep Embeddings: A Synthesis of Methods for $k$-Shot Inductive Transfer Learning. Modeling Sparse Deviations for Compressed Sensing using Generative Models. Analysis of Hand Segmentation in the Wild. De …
Convolutional Analysis Operator Learning: Acceleration and Convergence
Title | Convolutional Analysis Operator Learning: Acceleration and Convergence |
Authors | Il Yong Chun, Jeffrey A. Fessler |
Abstract | Convolutional operator learning is gaining attention in many signal processing and computer vision applications. Learning kernels has mostly relied on so-called patch-domain approaches that extract and store many overlapping patches across training signals. Due to memory demands, patch-domain methods have limitations when learning kernels from large datasets – particularly with multi-layered structures, e.g., convolutional neural networks – or when applying the learned kernels to high-dimensional signal recovery problems. The so-called convolution approach does not store many overlapping patches, and thus overcomes the memory problems particularly with careful algorithmic designs; it has been studied within the “synthesis” signal model, e.g., convolutional dictionary learning. This paper proposes a new convolutional analysis operator learning (CAOL) framework that learns an analysis sparsifying regularizer with the convolution perspective, and develops a new convergent Block Proximal Extrapolated Gradient method using a Majorizer (BPEG-M) to solve the corresponding block multi-nonconvex problems. To learn diverse filters within the CAOL framework, this paper introduces an orthogonality constraint that enforces a tight-frame filter condition, and a regularizer that promotes diversity between filters. Numerical experiments show that, with sharp majorizers, BPEG-M significantly accelerates the CAOL convergence rate compared to the state-of-the-art block proximal gradient (BPG) method. Numerical experiments for sparse-view computational tomography show that a convolutional sparsifying regularizer learned via CAOL significantly improves reconstruction quality compared to a conventional edge-preserving regularizer. Using more and wider kernels in a learned regularizer better preserves edges in reconstructed images. |
Tasks | Dictionary Learning |
Published | 2018-02-15 |
URL | https://arxiv.org/abs/1802.05584v7 |
https://arxiv.org/pdf/1802.05584v7.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-analysis-operator-learning-1 |
Repo | https://github.com/dahong67/ConvolutionalAnalysisOperatorLearning.jl |
Framework | none |
Learning Factorized Representations for Open-set Domain Adaptation
Title | Learning Factorized Representations for Open-set Domain Adaptation |
Authors | Mahsa Baktashmotlagh, Masoud Faraki, Tom Drummond, Mathieu Salzmann |
Abstract | Domain adaptation for visual recognition has undergone great progress in the past few years. Nevertheless, most existing methods work in the so-called closed-set scenario, assuming that the classes depicted by the target images are exactly the same as those of the source domain. In this paper, we tackle the more challenging, yet more realistic case of open-set domain adaptation, where new, unknown classes can be present in the target data. While, in the unsupervised scenario, one cannot expect to be able to identify each specific new class, we aim to automatically detect which samples belong to these new classes and discard them from the recognition process. To this end, we rely on the intuition that the source and target samples depicting the known classes can be generated by a shared subspace, whereas the target samples from unknown classes come from a different, private subspace. We therefore introduce a framework that factorizes the data into shared and private parts, while encouraging the shared representation to be discriminative. Our experiments on standard benchmarks evidence that our approach significantly outperforms the state-of-the-art in open-set domain adaptation. |
Tasks | Domain Adaptation |
Published | 2018-05-31 |
URL | http://arxiv.org/abs/1805.12277v1 |
http://arxiv.org/pdf/1805.12277v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-factorized-representations-for-open |
Repo | https://github.com/ChenJinBIT/OSDA |
Framework | tf |
Adapted Deep Embeddings: A Synthesis of Methods for $k$-Shot Inductive Transfer Learning
Title | Adapted Deep Embeddings: A Synthesis of Methods for $k$-Shot Inductive Transfer Learning |
Authors | Tyler R. Scott, Karl Ridgeway, Michael C. Mozer |
Abstract | The focus in machine learning has branched beyond training classifiers on a single task to investigating how previously acquired knowledge in a source domain can be leveraged to facilitate learning in a related target domain, known as inductive transfer learning. Three active lines of research have independently explored transfer learning using neural networks. In weight transfer, a model trained on the source domain is used as an initialization point for a network to be trained on the target domain. In deep metric learning, the source domain is used to construct an embedding that captures class structure in both the source and target domains. In few-shot learning, the focus is on generalizing well in the target domain based on a limited number of labeled examples. We compare state-of-the-art methods from these three paradigms and also explore hybrid adapted-embedding methods that use limited target-domain data to fine tune embeddings constructed from source-domain data. We conduct a systematic comparison of methods in a variety of domains, varying the number of labeled instances available in the target domain ($k$), as well as the number of target-domain classes. We reach three principal conclusions: (1) Deep embeddings are far superior, compared to weight transfer, as a starting point for inter-domain transfer or model re-use (2) Our hybrid methods robustly outperform every few-shot learning and every deep metric learning method previously proposed, with a mean error reduction of 34% over state-of-the-art. (3) Among loss functions for discovering embeddings, the histogram loss (Ustinova & Lempitsky, 2016) is most robust. We hope our results will motivate a unification of research in weight transfer, deep metric learning, and few-shot learning. |
Tasks | Few-Shot Learning, Metric Learning, Transfer Learning |
Published | 2018-05-22 |
URL | http://arxiv.org/abs/1805.08402v4 |
http://arxiv.org/pdf/1805.08402v4.pdf | |
PWC | https://paperswithcode.com/paper/adapted-deep-embeddings-a-synthesis-of-1 |
Repo | https://github.com/tylersco/adapted_deep_embeddings |
Framework | tf |
Modeling Sparse Deviations for Compressed Sensing using Generative Models
Title | Modeling Sparse Deviations for Compressed Sensing using Generative Models |
Authors | Manik Dhar, Aditya Grover, Stefano Ermon |
Abstract | In compressed sensing, a small number of linear measurements can be used to reconstruct an unknown signal. Existing approaches leverage assumptions on the structure of these signals, such as sparsity or the availability of a generative model. A domain-specific generative model can provide a stronger prior and thus allow for recovery with far fewer measurements. However, unlike sparsity-based approaches, existing methods based on generative models guarantee exact recovery only over their support, which is typically only a small subset of the space on which the signals are defined. We propose Sparse-Gen, a framework that allows for sparse deviations from the support set, thereby achieving the best of both worlds by using a domain specific prior and allowing reconstruction over the full space of signals. Theoretically, our framework provides a new class of signals that can be acquired using compressed sensing, reducing classic sparse vector recovery to a special case and avoiding the restrictive support due to a generative model prior. Empirically, we observe consistent improvements in reconstruction accuracy over competing approaches, especially in the more practical setting of transfer compressed sensing where a generative model for a data-rich, source domain aids sensing on a data-scarce, target domain. |
Tasks | |
Published | 2018-07-04 |
URL | http://arxiv.org/abs/1807.01442v2 |
http://arxiv.org/pdf/1807.01442v2.pdf | |
PWC | https://paperswithcode.com/paper/modeling-sparse-deviations-for-compressed |
Repo | https://github.com/ermongroup/sparse_gen |
Framework | tf |
Analysis of Hand Segmentation in the Wild
Title | Analysis of Hand Segmentation in the Wild |
Authors | Aisha Urooj Khan, Ali Borji |
Abstract | A large number of works in egocentric vision have concentrated on action and object recognition. Detection and segmentation of hands in first-person videos, however, has less been explored. For many applications in this domain, it is necessary to accurately segment not only hands of the camera wearer but also the hands of others with whom he is interacting. Here, we take an in-depth look at the hand segmentation problem. In the quest for robust hand segmentation methods, we evaluated the performance of the state of the art semantic segmentation methods, off the shelf and fine-tuned, on existing datasets. We fine-tune RefineNet, a leading semantic segmentation method, for hand segmentation and find that it does much better than the best contenders. Existing hand segmentation datasets are collected in the laboratory settings. To overcome this limitation, we contribute by collecting two new datasets: a) EgoYouTubeHands including egocentric videos containing hands in the wild, and b) HandOverFace to analyze the performance of our models in presence of similar appearance occlusions. We further explore whether conditional random fields can help refine generated hand segmentations. To demonstrate the benefit of accurate hand maps, we train a CNN for hand-based activity recognition and achieve higher accuracy when a CNN was trained using hand maps produced by the fine-tuned RefineNet. Finally, we annotate a subset of the EgoHands dataset for fine-grained action recognition and show that an accuracy of 58.6% can be achieved by just looking at a single hand pose which is much better than the chance level (12.5%). |
Tasks | Activity Recognition, Hand Segmentation, Semantic Segmentation, Temporal Action Localization |
Published | 2018-03-08 |
URL | http://arxiv.org/abs/1803.03317v2 |
http://arxiv.org/pdf/1803.03317v2.pdf | |
PWC | https://paperswithcode.com/paper/analysis-of-hand-segmentation-in-the-wild |
Repo | https://github.com/aurooj/Hand-Segmentation-in-the-Wild |
Framework | none |
Decoupled Classification Refinement: Hard False Positive Suppression for Object Detection
Title | Decoupled Classification Refinement: Hard False Positive Suppression for Object Detection |
Authors | Bowen Cheng, Yunchao Wei, Honghui Shi, Rogerio Feris, Jinjun Xiong, Thomas Huang |
Abstract | In this paper, we analyze failure cases of state-of-the-art detectors and observe that most hard false positives result from classification instead of localization and they have a large negative impact on the performance of object detectors. We conjecture there are three factors: (1) Shared feature representation is not optimal due to the mismatched goals of feature learning for classification and localization; (2) multi-task learning helps, yet optimization of the multi-task loss may result in sub-optimal for individual tasks; (3) large receptive field for different scales leads to redundant context information for small objects. We demonstrate the potential of detector classification power by a simple, effective, and widely-applicable Decoupled Classification Refinement (DCR) network. In particular, DCR places a separate classification network in parallel with the localization network (base detector). With ROI Pooling placed on the early stage of the classification network, we enforce an adaptive receptive field in DCR. During training, DCR samples hard false positives from the base detector and trains a strong classifier to refine classification results. During testing, DCR refines all boxes from the base detector. Experiments show competitive results on PASCAL VOC and COCO without any bells and whistles. Our codes are available at: https://github.com/bowenc0221/Decoupled-Classification-Refinement. |
Tasks | Multi-Task Learning, Object Detection |
Published | 2018-10-05 |
URL | http://arxiv.org/abs/1810.04002v1 |
http://arxiv.org/pdf/1810.04002v1.pdf | |
PWC | https://paperswithcode.com/paper/decoupled-classification-refinement-hard |
Repo | https://github.com/bowenc0221/Decoupled-Classification-Refinement |
Framework | tf |
Taming VAEs
Title | Taming VAEs |
Authors | Danilo Jimenez Rezende, Fabio Viola |
Abstract | In spite of remarkable progress in deep latent variable generative modeling, training still remains a challenge due to a combination of optimization and generalization issues. In practice, a combination of heuristic algorithms (such as hand-crafted annealing of KL-terms) is often used in order to achieve the desired results, but such solutions are not robust to changes in model architecture or dataset. The best settings can often vary dramatically from one problem to another, which requires doing expensive parameter sweeps for each new case. Here we develop on the idea of training VAEs with additional constraints as a way to control their behaviour. We first present a detailed theoretical analysis of constrained VAEs, expanding our understanding of how these models work. We then introduce and analyze a practical algorithm termed Generalized ELBO with Constrained Optimization, GECO. The main advantage of GECO for the machine learning practitioner is a more intuitive, yet principled, process of tuning the loss. This involves defining of a set of constraints, which typically have an explicit relation to the desired model performance, in contrast to tweaking abstract hyper-parameters which implicitly affect the model behavior. Encouraging experimental results in several standard datasets indicate that GECO is a very robust and effective tool to balance reconstruction and compression constraints. |
Tasks | |
Published | 2018-10-01 |
URL | http://arxiv.org/abs/1810.00597v1 |
http://arxiv.org/pdf/1810.00597v1.pdf | |
PWC | https://paperswithcode.com/paper/taming-vaes |
Repo | https://github.com/denproc/Taming-VAEs |
Framework | pytorch |
Efficient and accurate inversion of multiple scattering with deep learning
Title | Efficient and accurate inversion of multiple scattering with deep learning |
Authors | Yu Sun, Zhihao Xia, Ulugbek S. Kamilov |
Abstract | Image reconstruction under multiple light scattering is crucial in a number of applications such as diffraction tomography. The reconstruction problem is often formulated as a nonconvex optimization, where a nonlinear measurement model is used to account for multiple scattering and regularization is used to enforce prior constraints on the object. In this paper, we propose a powerful alternative to this optimization-based view of image reconstruction by designing and training a deep convolutional neural network that can invert multiple scattered measurements to produce a high-quality image of the refractive index. Our results on both simulated and experimental datasets show that the proposed approach is substantially faster and achieves higher imaging quality compared to the state-of-the-art methods based on optimization. |
Tasks | Image Reconstruction |
Published | 2018-03-18 |
URL | http://arxiv.org/abs/1803.06594v2 |
http://arxiv.org/pdf/1803.06594v2.pdf | |
PWC | https://paperswithcode.com/paper/efficient-and-accurate-inversion-of-multiple |
Repo | https://github.com/zouhanrui/2dunet |
Framework | tf |
Protecting Sensory Data against Sensitive Inferences
Title | Protecting Sensory Data against Sensitive Inferences |
Authors | Mohammad Malekzadeh, Richard G. Clegg, Andrea Cavallaro, Hamed Haddadi |
Abstract | There is growing concern about how personal data are used when users grant applications direct access to the sensors of their mobile devices. In fact, high resolution temporal data generated by motion sensors reflect directly the activities of a user and indirectly physical and demographic attributes. In this paper, we propose a feature learning architecture for mobile devices that provides flexible and negotiable privacy-preserving sensor data transmission by appropriately transforming raw sensor data. The objective is to move from the current binary setting of granting or not permission to an application, toward a model that allows users to grant each application permission over a limited range of inferences according to the provided services. The internal structure of each component of the proposed architecture can be flexibly changed and the trade-off between privacy and utility can be negotiated between the constraints of the user and the underlying application. We validated the proposed architecture in an activity recognition application using two real-world datasets, with the objective of recognizing an activity without disclosing gender as an example of private information. Results show that the proposed framework maintains the usefulness of the transformed data for activity recognition, with an average loss of only around three percentage points, while reducing the possibility of gender classification to around 50%, the target random guess, from more than 90% when using raw sensor data. We also present and distribute MotionSense, a new dataset for activity and attribute recognition collected from motion sensors. |
Tasks | Activity Recognition |
Published | 2018-02-21 |
URL | http://arxiv.org/abs/1802.07802v4 |
http://arxiv.org/pdf/1802.07802v4.pdf | |
PWC | https://paperswithcode.com/paper/protecting-sensory-data-against-sensitive |
Repo | https://github.com/mmalekzadeh/motion-sense |
Framework | none |
Strict Very Fast Decision Tree: a memory conservative algorithm for data stream mining
Title | Strict Very Fast Decision Tree: a memory conservative algorithm for data stream mining |
Authors | Victor Guilherme Turrisi da Costa, André Carlos Ponce de Leon Ferreira de Carvalho, Sylvio Barbon Junior |
Abstract | Dealing with memory and time constraints are current challenges when learning from data streams with a massive amount of data. Many algorithms have been proposed to handle these difficulties, among them, the Very Fast Decision Tree (VFDT) algorithm. Although the VFDT has been widely used in data stream mining, in the last years, several authors have suggested modifications to increase its performance, putting aside memory concerns by proposing memory-costly solutions. Besides, most data stream mining solutions have been centred around ensembles, which combine the memory costs of their weak learners, usually VFDTs. To reduce the memory cost, keeping the predictive performance, this study proposes the Strict VFDT (SVFDT), a novel algorithm based on the VFDT. The SVFDT algorithm minimises unnecessary tree growth, substantially reducing memory usage and keeping competitive predictive performance. Moreover, since it creates much more shallow trees than VFDT, SVFDT can achieve a shorter processing time. Experiments were carried out comparing the SVFDT with the VFDT in 11 benchmark data stream datasets. This comparison assessed the trade-off between accuracy, memory, and processing time. Statistical analysis showed that the proposed algorithm obtained similar predictive performance and significantly reduced processing time and memory use. Thus, SVFDT is a suitable option for data stream mining with memory and time limitations, recommended as a weak learner in ensemble-based solutions. |
Tasks | |
Published | 2018-05-16 |
URL | http://arxiv.org/abs/1805.06368v2 |
http://arxiv.org/pdf/1805.06368v2.pdf | |
PWC | https://paperswithcode.com/paper/strict-very-fast-decision-tree-a-memory |
Repo | https://github.com/vturrisi/pystream |
Framework | none |
Technical Report: Adjudication of Coreference Annotations via Answer Set Optimization
Title | Technical Report: Adjudication of Coreference Annotations via Answer Set Optimization |
Authors | Peter Schüller |
Abstract | We describe the first automatic approach for merging coreference annotations obtained from multiple annotators into a single gold standard. This merging is subject to certain linguistic hard constraints and optimization criteria that prefer solutions with minimal divergence from annotators. The representation involves an equivalence relation over a large number of elements. We use Answer Set Programming to describe two representations of the problem and four objective functions suitable for different datasets. We provide two structurally different real-world benchmark datasets based on the METU-Sabanci Turkish Treebank and we report our experiences in using the Gringo, Clasp, and Wasp tools for computing optimal adjudication results on these datasets. |
Tasks | |
Published | 2018-01-31 |
URL | http://arxiv.org/abs/1802.00033v1 |
http://arxiv.org/pdf/1802.00033v1.pdf | |
PWC | https://paperswithcode.com/paper/technical-report-adjudication-of-coreference |
Repo | https://github.com/knowlp/caspr-coreference-tool |
Framework | none |
Infrared and Visible Image Fusion with ResNet and zero-phase component analysis
Title | Infrared and Visible Image Fusion with ResNet and zero-phase component analysis |
Authors | Hui Li, Xiao-Jun Wu, Tariq S. Durrani |
Abstract | Feature extraction and processing tasks play a key role in Image Fusion, and the fusion performance is directly affected by the different features and processing methods undertaken. By contrast, most of deep learning-based methods use deep features directly without feature extraction or processing. This leads to the fusion performance degradation in some cases. To solve these drawbacks, we propose a deep features and zero-phase component analysis (ZCA) based novel fusion framework is this paper. Firstly, the residual network (ResNet) is used to extract deep features from source images. Then ZCA is utilized to normalize the deep features and obtain initial weight maps. The final weight maps are obtained by employing a soft-max operation in association with the initial weight maps. Finally, the fused image is reconstructed using a weighted-averaging strategy. Compared with the existing fusion methods, experimental results demonstrate that the proposed framework achieves better performance in both objective assessment and visual quality. The code of our fusion algorithm is available at https://github.com/hli1221/imagefusion_resnet50 |
Tasks | Infrared And Visible Image Fusion |
Published | 2018-06-19 |
URL | https://arxiv.org/abs/1806.07119v7 |
https://arxiv.org/pdf/1806.07119v7.pdf | |
PWC | https://paperswithcode.com/paper/infrared-and-visible-image-fusion-with-resnet |
Repo | https://github.com/hli1221/imagefusion_resnet50 |
Framework | none |
On the Computation of Kantorovich-Wasserstein Distances between 2D-Histograms by Uncapacitated Minimum Cost Flows
Title | On the Computation of Kantorovich-Wasserstein Distances between 2D-Histograms by Uncapacitated Minimum Cost Flows |
Authors | Federico Bassetti, Stefano Gualandi, Marco Veneroni |
Abstract | In this work, we present a method to compute the Kantorovich-Wasserstein distance of order one between a pair of two-dimensional histograms. Recent works in Computer Vision and Machine Learning have shown the benefits of measuring Wasserstein distances of order one between histograms with $n$ bins, by solving a classical transportation problem on very large complete bipartite graphs with $n$ nodes and $n^2$ edges. The main contribution of our work is to approximate the original transportation problem by an uncapacitated min cost flow problem on a reduced flow network of size $O(n)$ that exploits the geometric structure of the cost function. More precisely, when the distance among the bin centers is measured with the 1-norm or the $\infty$-norm, our approach provides an optimal solution. When the distance among bins is measured with the 2-norm: (i) we derive a quantitative estimate on the error between optimal and approximate solution; (ii) given the error, we construct a reduced flow network of size $O(n)$. We numerically show the benefits of our approach by computing Wasserstein distances of order one on a set of grey scale images used as benchmark in the literature. We show how our approach scales with the size of the images with 1-norm, 2-norm and $\infty$-norm ground distances, and we compare it with other two methods which are largely used in the literature. |
Tasks | |
Published | 2018-04-02 |
URL | https://arxiv.org/abs/1804.00445v3 |
https://arxiv.org/pdf/1804.00445v3.pdf | |
PWC | https://paperswithcode.com/paper/on-the-computation-of-kantorovich-wasserstein |
Repo | https://github.com/stegua/dotlib |
Framework | none |
Neural Code Comprehension: A Learnable Representation of Code Semantics
Title | Neural Code Comprehension: A Learnable Representation of Code Semantics |
Authors | Tal Ben-Nun, Alice Shoshana Jakobovits, Torsten Hoefler |
Abstract | With the recent success of embeddings in natural language processing, research has been conducted into applying similar methods to code analysis. Most works attempt to process the code directly or use a syntactic tree representation, treating it like sentences written in a natural language. However, none of the existing methods are sufficient to comprehend program semantics robustly, due to structural features such as function calls, branching, and interchangeable order of statements. In this paper, we propose a novel processing technique to learn code semantics, and apply it to a variety of program analysis tasks. In particular, we stipulate that a robust distributional hypothesis of code applies to both human- and machine-generated programs. Following this hypothesis, we define an embedding space, inst2vec, based on an Intermediate Representation (IR) of the code that is independent of the source programming language. We provide a novel definition of contextual flow for this IR, leveraging both the underlying data- and control-flow of the program. We then analyze the embeddings qualitatively using analogies and clustering, and evaluate the learned representation on three different high-level tasks. We show that even without fine-tuning, a single RNN architecture and fixed inst2vec embeddings outperform specialized approaches for performance prediction (compute device mapping, optimal thread coarsening); and algorithm classification from raw code (104 classes), where we set a new state-of-the-art. |
Tasks | |
Published | 2018-06-19 |
URL | http://arxiv.org/abs/1806.07336v3 |
http://arxiv.org/pdf/1806.07336v3.pdf | |
PWC | https://paperswithcode.com/paper/neural-code-comprehension-a-learnable |
Repo | https://github.com/spcl/ncc |
Framework | tf |
Improving Semantic Segmentation via Video Propagation and Label Relaxation
Title | Improving Semantic Segmentation via Video Propagation and Label Relaxation |
Authors | Yi Zhu, Karan Sapra, Fitsum A. Reda, Kevin J. Shih, Shawn Newsam, Andrew Tao, Bryan Catanzaro |
Abstract | Semantic segmentation requires large amounts of pixel-wise annotations to learn accurate models. In this paper, we present a video prediction-based methodology to scale up training sets by synthesizing new training samples in order to improve the accuracy of semantic segmentation networks. We exploit video prediction models’ ability to predict future frames in order to also predict future labels. A joint propagation strategy is also proposed to alleviate mis-alignments in synthesized samples. We demonstrate that training segmentation models on datasets augmented by the synthesized samples leads to significant improvements in accuracy. Furthermore, we introduce a novel boundary label relaxation technique that makes training robust to annotation noise and propagation artifacts along object boundaries. Our proposed methods achieve state-of-the-art mIoUs of 83.5% on Cityscapes and 82.9% on CamVid. Our single model, without model ensembles, achieves 72.8% mIoU on the KITTI semantic segmentation test set, which surpasses the winning entry of the ROB challenge 2018. Our code and videos can be found at https://nv-adlr.github.io/publication/2018-Segmentation. |
Tasks | Semantic Segmentation, Video Prediction |
Published | 2018-12-04 |
URL | https://arxiv.org/abs/1812.01593v3 |
https://arxiv.org/pdf/1812.01593v3.pdf | |
PWC | https://paperswithcode.com/paper/improving-semantic-segmentation-via-video |
Repo | https://github.com/NVIDIA/semantic-segmentation |
Framework | pytorch |