Paper Group AWR 161
Graph Learning from Filtered Signals: Graph System and Diffusion Kernel Identification. AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods. Unseen Class Discovery in Open-world Classification. What can AI do for me: Evaluating Machine Learning Interpretations in Cooperative Play. Diverse Image-to-Image Translation via Disenta …
Graph Learning from Filtered Signals: Graph System and Diffusion Kernel Identification
Title | Graph Learning from Filtered Signals: Graph System and Diffusion Kernel Identification |
Authors | Hilmi E. Egilmez, Eduardo Pavez, Antonio Ortega |
Abstract | This paper introduces a novel graph signal processing framework for building graph-based models from classes of filtered signals. In our framework, graph-based modeling is formulated as a graph system identification problem, where the goal is to learn a weighted graph (a graph Laplacian matrix) and a graph-based filter (a function of graph Laplacian matrices). In order to solve the proposed problem, an algorithm is developed to jointly identify a graph and a graph-based filter (GBF) from multiple signal/data observations. Our algorithm is valid under the assumption that GBFs are one-to-one functions. The proposed approach can be applied to learn diffusion (heat) kernels, which are popular in various fields for modeling diffusion processes. In addition, for specific choices of graph-based filters, the proposed problem reduces to a graph Laplacian estimation problem. Our experimental results demonstrate that the proposed algorithm outperforms the current state-of-the-art methods. We also implement our framework on a real climate dataset for modeling of temperature signals. |
Tasks | |
Published | 2018-03-07 |
URL | http://arxiv.org/abs/1803.02553v1 |
http://arxiv.org/pdf/1803.02553v1.pdf | |
PWC | https://paperswithcode.com/paper/graph-learning-from-filtered-signals-graph |
Repo | https://github.com/STAC-USC/Graph_Learning |
Framework | none |
AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods
Title | AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods |
Authors | Zhiming Zhou, Qingru Zhang, Guansong Lu, Hongwei Wang, Weinan Zhang, Yong Yu |
Abstract | Adam is shown not being able to converge to the optimal solution in certain cases. Researchers recently propose several algorithms to avoid the issue of non-convergence of Adam, but their efficiency turns out to be unsatisfactory in practice. In this paper, we provide new insight into the non-convergence issue of Adam as well as other adaptive learning rate methods. We argue that there exists an inappropriate correlation between gradient $g_t$ and the second-moment term $v_t$ in Adam ($t$ is the timestep), which results in that a large gradient is likely to have small step size while a small gradient may have a large step size. We demonstrate that such biased step sizes are the fundamental cause of non-convergence of Adam, and we further prove that decorrelating $v_t$ and $g_t$ will lead to unbiased step size for each gradient, thus solving the non-convergence problem of Adam. Finally, we propose AdaShift, a novel adaptive learning rate method that decorrelates $v_t$ and $g_t$ by temporal shifting, i.e., using temporally shifted gradient $g_{t-n}$ to calculate $v_t$. The experiment results demonstrate that AdaShift is able to address the non-convergence issue of Adam, while still maintaining a competitive performance with Adam in terms of both training speed and generalization. |
Tasks | |
Published | 2018-09-29 |
URL | https://arxiv.org/abs/1810.00143v4 |
https://arxiv.org/pdf/1810.00143v4.pdf | |
PWC | https://paperswithcode.com/paper/adashift-decorrelation-and-convergence-of |
Repo | https://github.com/ZhimingZhou/AdaShift-Lipschitz-GANs-MaxGP |
Framework | tf |
Unseen Class Discovery in Open-world Classification
Title | Unseen Class Discovery in Open-world Classification |
Authors | Lei Shu, Hu Xu, Bing Liu |
Abstract | This paper concerns open-world classification, where the classifier not only needs to classify test examples into seen classes that have appeared in training but also reject examples from unseen or novel classes that have not appeared in training. Specifically, this paper focuses on discovering the hidden unseen classes of the rejected examples. Clearly, without prior knowledge this is difficult. However, we do have the data from the seen training classes, which can tell us what kind of similarity/difference is expected for examples from the same class or from different classes. It is reasonable to assume that this knowledge can be transferred to the rejected examples and used to discover the hidden unseen classes in them. This paper aims to solve this problem. It first proposes a joint open classification model with a sub-model for classifying whether a pair of examples belongs to the same or different classes. This sub-model can serve as a distance function for clustering to discover the hidden classes of the rejected examples. Experimental results show that the proposed model is highly promising. |
Tasks | |
Published | 2018-01-17 |
URL | http://arxiv.org/abs/1801.05609v1 |
http://arxiv.org/pdf/1801.05609v1.pdf | |
PWC | https://paperswithcode.com/paper/unseen-class-discovery-in-open-world |
Repo | https://github.com/leishu02/EMNLP2017_DOC |
Framework | none |
What can AI do for me: Evaluating Machine Learning Interpretations in Cooperative Play
Title | What can AI do for me: Evaluating Machine Learning Interpretations in Cooperative Play |
Authors | Shi Feng, Jordan Boyd-Graber |
Abstract | Machine learning is an important tool for decision making, but its ethical and responsible application requires rigorous vetting of its interpretability and utility: an understudied problem, particularly for natural language processing models. We propose an evaluation of interpretation on a real task with real human users, where the effectiveness of interpretation is measured by how much it improves human performance. We design a grounded, realistic human-computer cooperative setting using a question answering task, Quizbowl. We recruit both trivia experts and novices to play this game with computer as their teammate, who communicates its prediction via three different interpretations. We also provide design guidance for natural language processing human-in-the-loop settings. |
Tasks | Decision Making, Question Answering |
Published | 2018-10-23 |
URL | https://arxiv.org/abs/1810.09648v3 |
https://arxiv.org/pdf/1810.09648v3.pdf | |
PWC | https://paperswithcode.com/paper/what-can-ai-do-for-me-evaluating-machine |
Repo | https://github.com/Eric-Wallace/qb_interface |
Framework | none |
Diverse Image-to-Image Translation via Disentangled Representations
Title | Diverse Image-to-Image Translation via Disentangled Representations |
Authors | Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Kumar Singh, Ming-Hsuan Yang |
Abstract | Image-to-image translation aims to learn the mapping between two visual domains. There are two main challenges for many applications: 1) the lack of aligned training pairs and 2) multiple possible outputs from a single input image. In this work, we present an approach based on disentangled representation for producing diverse outputs without paired training images. To achieve diversity, we propose to embed images onto two spaces: a domain-invariant content space capturing shared information across domains and a domain-specific attribute space. Our model takes the encoded content features extracted from a given input and the attribute vectors sampled from the attribute space to produce diverse outputs at test time. To handle unpaired training data, we introduce a novel cross-cycle consistency loss based on disentangled representations. Qualitative results show that our model can generate diverse and realistic images on a wide range of tasks without paired training data. For quantitative comparisons, we measure realism with user study and diversity with a perceptual distance metric. We apply the proposed model to domain adaptation and show competitive performance when compared to the state-of-the-art on the MNIST-M and the LineMod datasets. |
Tasks | Domain Adaptation, Image-to-Image Translation, Synthetic-to-Real Translation |
Published | 2018-08-02 |
URL | http://arxiv.org/abs/1808.00948v1 |
http://arxiv.org/pdf/1808.00948v1.pdf | |
PWC | https://paperswithcode.com/paper/diverse-image-to-image-translation-via |
Repo | https://github.com/taki0112/DRIT-Tensorflow |
Framework | tf |
A Systematic Evaluation of Recent Deep Learning Architectures for Fine-Grained Vehicle Classification
Title | A Systematic Evaluation of Recent Deep Learning Architectures for Fine-Grained Vehicle Classification |
Authors | Krassimir Valev, Arne Schumann, Lars Sommer, Jürgen Beyerer |
Abstract | Fine-grained vehicle classification is the task of classifying make, model, and year of a vehicle. This is a very challenging task, because vehicles of different types but similar color and viewpoint can often look much more similar than vehicles of same type but differing color and viewpoint. Vehicle make, model, and year in com- bination with vehicle color - are of importance in several applications such as vehicle search, re-identification, tracking, and traffic analysis. In this work we investigate the suitability of several recent landmark convolutional neural network (CNN) architectures, which have shown top results on large scale image classification tasks, for the task of fine-grained classification of vehicles. We compare the performance of the networks VGG16, several ResNets, Inception architectures, the recent DenseNets, and MobileNet. For classification we use the Stanford Cars-196 dataset which features 196 different types of vehicles. We investigate several aspects of CNN training, such as data augmentation and training from scratch vs. fine-tuning. Importantly, we introduce no aspects in the architectures or training process which are specific to vehicle classification. Our final model achieves a state-of-the-art classification accuracy of 94.6% outperforming all related works, even approaches which are specifically tailored for the task, e.g. by including vehicle part detections. |
Tasks | Data Augmentation, Image Classification |
Published | 2018-06-08 |
URL | http://arxiv.org/abs/1806.02987v1 |
http://arxiv.org/pdf/1806.02987v1.pdf | |
PWC | https://paperswithcode.com/paper/a-systematic-evaluation-of-recent-deep |
Repo | https://github.com/OrkhanHI/Grab-AI-Computer-Vision-Challenge |
Framework | pytorch |
Pricing Engine: Estimating Causal Impacts in Real World Business Settings
Title | Pricing Engine: Estimating Causal Impacts in Real World Business Settings |
Authors | Matt Goldman, Brian Quistorff |
Abstract | We introduce the Pricing Engine package to enable the use of Double ML estimation techniques in general panel data settings. Customization allows the user to specify first-stage models, first-stage featurization, second stage treatment selection and second stage causal-modeling. We also introduce a DynamicDML class that allows the user to generate dynamic treatment-aware forecasts at a range of leads and to understand how the forecasts will vary as a function of causally estimated treatment parameters. The Pricing Engine is built on Python 3.5 and can be run on an Azure ML Workbench environment with the addition of only a few Python packages. This note provides high-level discussion of the Double ML method, describes the packages intended use and includes an example Jupyter notebook demonstrating application to some publicly available data. Installation of the package and additional technical documentation is available at $\href{https://github.com/bquistorff/pricingengine}{github.com/bquistorff/pricingengine}$. |
Tasks | |
Published | 2018-06-08 |
URL | http://arxiv.org/abs/1806.03285v2 |
http://arxiv.org/pdf/1806.03285v2.pdf | |
PWC | https://paperswithcode.com/paper/pricing-engine-estimating-causal-impacts-in |
Repo | https://github.com/vsemenova/orthoml |
Framework | none |
Matching Article Pairs with Graphical Decomposition and Convolutions
Title | Matching Article Pairs with Graphical Decomposition and Convolutions |
Authors | Bang Liu, Di Niu, Haojie Wei, Jinghong Lin, Yancheng He, Kunfeng Lai, Yu Xu |
Abstract | Identifying the relationship between two articles, e.g., whether two articles published from different sources describe the same breaking news, is critical to many document understanding tasks. Existing approaches for modeling and matching sentence pairs do not perform well in matching longer documents, which embody more complex interactions between the enclosed entities than a sentence does. To model article pairs, we propose the Concept Interaction Graph to represent an article as a graph of concepts. We then match a pair of articles by comparing the sentences that enclose the same concept vertex through a series of encoding techniques, and aggregate the matching signals through a graph convolutional network. To facilitate the evaluation of long article matching, we have created two datasets, each consisting of about 30K pairs of breaking news articles covering diverse topics in the open domain. Extensive evaluations of the proposed methods on the two datasets demonstrate significant improvements over a wide range of state-of-the-art methods for natural language matching. |
Tasks | Question Answering, Text Matching |
Published | 2018-02-21 |
URL | https://arxiv.org/abs/1802.07459v2 |
https://arxiv.org/pdf/1802.07459v2.pdf | |
PWC | https://paperswithcode.com/paper/matching-long-text-documents-via-graph |
Repo | https://github.com/BangLiu/ArticlePairMatching |
Framework | pytorch |
SNIP: Single-shot Network Pruning based on Connection Sensitivity
Title | SNIP: Single-shot Network Pruning based on Connection Sensitivity |
Authors | Namhoon Lee, Thalaiyasingam Ajanthan, Philip H. S. Torr |
Abstract | Pruning large neural networks while maintaining their performance is often desirable due to the reduced space and time complexity. In existing methods, pruning is done within an iterative optimization procedure with either heuristically designed pruning schedules or additional hyperparameters, undermining their utility. In this work, we present a new approach that prunes a given network once at initialization prior to training. To achieve this, we introduce a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task. This eliminates the need for both pretraining and the complex pruning schedule while making it robust to architecture variations. After pruning, the sparse network is trained in the standard way. Our method obtains extremely sparse networks with virtually the same accuracy as the reference network on the MNIST, CIFAR-10, and Tiny-ImageNet classification tasks and is broadly applicable to various architectures including convolutional, residual and recurrent networks. Unlike existing methods, our approach enables us to demonstrate that the retained connections are indeed relevant to the given task. |
Tasks | Image Classification, Network Pruning, Object Detection |
Published | 2018-10-04 |
URL | http://arxiv.org/abs/1810.02340v2 |
http://arxiv.org/pdf/1810.02340v2.pdf | |
PWC | https://paperswithcode.com/paper/snip-single-shot-network-pruning-based-on |
Repo | https://github.com/namhoonlee/snip-public |
Framework | tf |
FlowNet3D: Learning Scene Flow in 3D Point Clouds
Title | FlowNet3D: Learning Scene Flow in 3D Point Clouds |
Authors | Xingyu Liu, Charles R. Qi, Leonidas J. Guibas |
Abstract | Many applications in robotics and human-computer interaction can benefit from understanding 3D motion of points in a dynamic environment, widely noted as scene flow. While most previous methods focus on stereo and RGB-D images as input, few try to estimate scene flow directly from point clouds. In this work, we propose a novel deep neural network named $FlowNet3D$ that learns scene flow from point clouds in an end-to-end fashion. Our network simultaneously learns deep hierarchical features of point clouds and flow embeddings that represent point motions, supported by two newly proposed learning layers for point sets. We evaluate the network on both challenging synthetic data from FlyingThings3D and real Lidar scans from KITTI. Trained on synthetic data only, our network successfully generalizes to real scans, outperforming various baselines and showing competitive results to the prior art. We also demonstrate two applications of our scene flow output (scan registration and motion segmentation) to show its potential wide use cases. |
Tasks | Motion Segmentation |
Published | 2018-06-04 |
URL | https://arxiv.org/abs/1806.01411v3 |
https://arxiv.org/pdf/1806.01411v3.pdf | |
PWC | https://paperswithcode.com/paper/flownet3d-learning-scene-flow-in-3d-point |
Repo | https://github.com/xingyul/flownet3d |
Framework | tf |
From Third Person to First Person: Dataset and Baselines for Synthesis and Retrieval
Title | From Third Person to First Person: Dataset and Baselines for Synthesis and Retrieval |
Authors | Mohamed Elfeki, Krishna Regmi, Shervin Ardeshir, Ali Borji |
Abstract | First-person (egocentric) and third person (exocentric) videos are drastically different in nature. The relationship between these two views have been studied in recent years, however, it has yet to be fully explored. In this work, we introduce two datasets (synthetic and natural/real) containing simultaneously recorded egocentric and exocentric videos. We also explore relating the two domains (egocentric and exocentric) in two aspects. First, we synthesize images in the egocentric domain from the exocentric domain using a conditional generative adversarial network (cGAN). We show that with enough training data, our network is capable of hallucinating how the world would look like from an egocentric perspective, given an exocentric video. Second, we address the cross-view retrieval problem across the two views. Given an egocentric query frame (or its momentary optical flow), we retrieve its corresponding exocentric frame (or optical flow) from a gallery set. We show that using synthetic data could be beneficial in retrieving real data. We show that performing domain adaptation from the synthetic domain to the natural/real domain, is helpful in tasks such as retrieval. We believe that the presented datasets and the proposed baselines offer new opportunities for further research in this direction. The code and dataset are publicly available. |
Tasks | Domain Adaptation, Optical Flow Estimation |
Published | 2018-12-01 |
URL | http://arxiv.org/abs/1812.00104v1 |
http://arxiv.org/pdf/1812.00104v1.pdf | |
PWC | https://paperswithcode.com/paper/from-third-person-to-first-person-dataset-and |
Repo | https://github.com/M-Elfeki/ThirdToFirst |
Framework | none |
Frustratingly Easy Meta-Embedding – Computing Meta-Embeddings by Averaging Source Word Embeddings
Title | Frustratingly Easy Meta-Embedding – Computing Meta-Embeddings by Averaging Source Word Embeddings |
Authors | Joshua Coates, Danushka Bollegala |
Abstract | Creating accurate meta-embeddings from pre-trained source embeddings has received attention lately. Methods based on global and locally-linear transformation and concatenation have shown to produce accurate meta-embeddings. In this paper, we show that the arithmetic mean of two distinct word embedding sets yields a performant meta-embedding that is comparable or better than more complex meta-embedding learning methods. The result seems counter-intuitive given that vector spaces in different source embeddings are not comparable and cannot be simply averaged. We give insight into why averaging can still produce accurate meta-embedding despite the incomparability of the source vector spaces. |
Tasks | Word Embeddings |
Published | 2018-04-14 |
URL | http://arxiv.org/abs/1804.05262v1 |
http://arxiv.org/pdf/1804.05262v1.pdf | |
PWC | https://paperswithcode.com/paper/frustratingly-easy-meta-embedding-computing |
Repo | https://github.com/Shujian2015/meta-embedding-paper-list |
Framework | none |
Unsupervised Learning for Fast Probabilistic Diffeomorphic Registration
Title | Unsupervised Learning for Fast Probabilistic Diffeomorphic Registration |
Authors | Adrian V. Dalca, Guha Balakrishnan, John Guttag, Mert R. Sabuncu |
Abstract | Traditional deformable registration techniques achieve impressive results and offer a rigorous theoretical treatment, but are computationally intensive since they solve an optimization problem for each image pair. Recently, learning-based methods have facilitated fast registration by learning spatial deformation functions. However, these approaches use restricted deformation models, require supervised labels, or do not guarantee a diffeomorphic (topology-preserving) registration. Furthermore, learning-based registration tools have not been derived from a probabilistic framework that can offer uncertainty estimates. In this paper, we present a probabilistic generative model and derive an unsupervised learning-based inference algorithm that makes use of recent developments in convolutional neural networks (CNNs). We demonstrate our method on a 3D brain registration task, and provide an empirical analysis of the algorithm. Our approach results in state of the art accuracy and very fast runtimes, while providing diffeomorphic guarantees and uncertainty estimates. Our implementation is available online at http://voxelmorph.csail.mit.edu . |
Tasks | |
Published | 2018-05-11 |
URL | http://arxiv.org/abs/1805.04605v2 |
http://arxiv.org/pdf/1805.04605v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-learning-for-fast-probabilistic |
Repo | https://github.com/yh854/Rigid-Registration-of-3D-MRI-Based-on-Unsupervised-Learning |
Framework | tf |
Collaborative Annotation of Semantic Objects in Images with Multi-granularity Supervisions
Title | Collaborative Annotation of Semantic Objects in Images with Multi-granularity Supervisions |
Authors | Lishi Zhang, Chenghan Fu, Jia Li |
Abstract | Per-pixel masks of semantic objects are very useful in many applications, which, however, are tedious to be annotated. In this paper, we propose a human-agent collaborative annotation approach that can efficiently generate per-pixel masks of semantic objects in tagged images with multi-granularity supervisions. Given a set of tagged image, a computer agent is first dynamically generated to roughly localize the semantic objects described by the tag. The agent first extracts massive object proposals from an image and then infer the tag-related ones under the weak and strong supervisions from linguistically and visually similar images and previously annotated object masks. By representing such supervisions by over-complete dictionaries, the tag-related object proposals can pop-out according to their sparse coding length, which are then converted to superpixels with binary labels. After that, human annotators participate in the annotation process by flipping labels and dividing superpixels with mouse clicks, which are used as click supervisions that teach the agent to recover false positives/negatives in processing images with the same tags. Experimental results show that our approach can facilitate the annotation process and generate object masks that are highly consistent with those generated by the LabelMe toolbox. |
Tasks | |
Published | 2018-06-27 |
URL | http://arxiv.org/abs/1806.10269v1 |
http://arxiv.org/pdf/1806.10269v1.pdf | |
PWC | https://paperswithcode.com/paper/collaborative-annotation-of-semantic-objects |
Repo | https://github.com/yuxi120407/transfer_learning |
Framework | none |
Fast End-to-End Trainable Guided Filter
Title | Fast End-to-End Trainable Guided Filter |
Authors | Huikai Wu, Shuai Zheng, Junge Zhang, Kaiqi Huang |
Abstract | Dense pixel-wise image prediction has been advanced by harnessing the capabilities of Fully Convolutional Networks (FCNs). One central issue of FCNs is the limited capacity to handle joint upsampling. To address the problem, we present a novel building block for FCNs, namely guided filtering layer, which is designed for efficiently generating a high-resolution output given the corresponding low-resolution one and a high-resolution guidance map. Such a layer contains learnable parameters, which can be integrated with FCNs and jointly optimized through end-to-end training. To further take advantage of end-to-end training, we plug in a trainable transformation function for generating the task-specific guidance map. Based on the proposed layer, we present a general framework for pixel-wise image prediction, named deep guided filtering network (DGF). The proposed network is evaluated on five image processing tasks. Experiments on MIT-Adobe FiveK Dataset demonstrate that DGF runs 10-100 times faster and achieves the state-of-the-art performance. We also show that DGF helps to improve the performance of multiple computer vision tasks. |
Tasks | |
Published | 2018-03-15 |
URL | https://arxiv.org/abs/1803.05619v2 |
https://arxiv.org/pdf/1803.05619v2.pdf | |
PWC | https://paperswithcode.com/paper/fast-end-to-end-trainable-guided-filter |
Repo | https://github.com/wuhuikai/DeepGuidedFilter |
Framework | pytorch |