October 21, 2019

3177 words 15 mins read

Paper Group AWR 161

Graph Learning from Filtered Signals: Graph System and Diffusion Kernel Identification. AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods. Unseen Class Discovery in Open-world Classification. What can AI do for me: Evaluating Machine Learning Interpretations in Cooperative Play. Diverse Image-to-Image Translation via Disenta …

Graph Learning from Filtered Signals: Graph System and Diffusion Kernel Identification


Title	Graph Learning from Filtered Signals: Graph System and Diffusion Kernel Identification
Authors	Hilmi E. Egilmez, Eduardo Pavez, Antonio Ortega
Abstract	This paper introduces a novel graph signal processing framework for building graph-based models from classes of filtered signals. In our framework, graph-based modeling is formulated as a graph system identification problem, where the goal is to learn a weighted graph (a graph Laplacian matrix) and a graph-based filter (a function of graph Laplacian matrices). In order to solve the proposed problem, an algorithm is developed to jointly identify a graph and a graph-based filter (GBF) from multiple signal/data observations. Our algorithm is valid under the assumption that GBFs are one-to-one functions. The proposed approach can be applied to learn diffusion (heat) kernels, which are popular in various fields for modeling diffusion processes. In addition, for specific choices of graph-based filters, the proposed problem reduces to a graph Laplacian estimation problem. Our experimental results demonstrate that the proposed algorithm outperforms the current state-of-the-art methods. We also implement our framework on a real climate dataset for modeling of temperature signals.
Tasks
Published	2018-03-07
URL	http://arxiv.org/abs/1803.02553v1
PDF	http://arxiv.org/pdf/1803.02553v1.pdf
PWC	https://paperswithcode.com/paper/graph-learning-from-filtered-signals-graph
Repo	https://github.com/STAC-USC/Graph_Learning
Framework	none

AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods


Title	AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods
Authors	Zhiming Zhou, Qingru Zhang, Guansong Lu, Hongwei Wang, Weinan Zhang, Yong Yu
Abstract	Adam is shown not being able to converge to the optimal solution in certain cases. Researchers recently propose several algorithms to avoid the issue of non-convergence of Adam, but their efficiency turns out to be unsatisfactory in practice. In this paper, we provide new insight into the non-convergence issue of Adam as well as other adaptive learning rate methods. We argue that there exists an inappropriate correlation between gradient $g_t$ and the second-moment term $v_t$ in Adam ($t$ is the timestep), which results in that a large gradient is likely to have small step size while a small gradient may have a large step size. We demonstrate that such biased step sizes are the fundamental cause of non-convergence of Adam, and we further prove that decorrelating $v_t$ and $g_t$ will lead to unbiased step size for each gradient, thus solving the non-convergence problem of Adam. Finally, we propose AdaShift, a novel adaptive learning rate method that decorrelates $v_t$ and $g_t$ by temporal shifting, i.e., using temporally shifted gradient $g_{t-n}$ to calculate $v_t$. The experiment results demonstrate that AdaShift is able to address the non-convergence issue of Adam, while still maintaining a competitive performance with Adam in terms of both training speed and generalization.
Tasks
Published	2018-09-29
URL	https://arxiv.org/abs/1810.00143v4
PDF	https://arxiv.org/pdf/1810.00143v4.pdf
PWC	https://paperswithcode.com/paper/adashift-decorrelation-and-convergence-of
Repo	https://github.com/ZhimingZhou/AdaShift-Lipschitz-GANs-MaxGP
Framework	tf

Unseen Class Discovery in Open-world Classification


Title	Unseen Class Discovery in Open-world Classification
Authors	Lei Shu, Hu Xu, Bing Liu
Abstract	This paper concerns open-world classification, where the classifier not only needs to classify test examples into seen classes that have appeared in training but also reject examples from unseen or novel classes that have not appeared in training. Specifically, this paper focuses on discovering the hidden unseen classes of the rejected examples. Clearly, without prior knowledge this is difficult. However, we do have the data from the seen training classes, which can tell us what kind of similarity/difference is expected for examples from the same class or from different classes. It is reasonable to assume that this knowledge can be transferred to the rejected examples and used to discover the hidden unseen classes in them. This paper aims to solve this problem. It first proposes a joint open classification model with a sub-model for classifying whether a pair of examples belongs to the same or different classes. This sub-model can serve as a distance function for clustering to discover the hidden classes of the rejected examples. Experimental results show that the proposed model is highly promising.
Tasks
Published	2018-01-17
URL	http://arxiv.org/abs/1801.05609v1
PDF	http://arxiv.org/pdf/1801.05609v1.pdf
PWC	https://paperswithcode.com/paper/unseen-class-discovery-in-open-world
Repo	https://github.com/leishu02/EMNLP2017_DOC
Framework	none

What can AI do for me: Evaluating Machine Learning Interpretations in Cooperative Play


Title	What can AI do for me: Evaluating Machine Learning Interpretations in Cooperative Play
Authors	Shi Feng, Jordan Boyd-Graber
Abstract	Machine learning is an important tool for decision making, but its ethical and responsible application requires rigorous vetting of its interpretability and utility: an understudied problem, particularly for natural language processing models. We propose an evaluation of interpretation on a real task with real human users, where the effectiveness of interpretation is measured by how much it improves human performance. We design a grounded, realistic human-computer cooperative setting using a question answering task, Quizbowl. We recruit both trivia experts and novices to play this game with computer as their teammate, who communicates its prediction via three different interpretations. We also provide design guidance for natural language processing human-in-the-loop settings.
Tasks	Decision Making, Question Answering
Published	2018-10-23
URL	https://arxiv.org/abs/1810.09648v3
PDF	https://arxiv.org/pdf/1810.09648v3.pdf
PWC	https://paperswithcode.com/paper/what-can-ai-do-for-me-evaluating-machine
Repo	https://github.com/Eric-Wallace/qb_interface
Framework	none

Diverse Image-to-Image Translation via Disentangled Representations


Title	Diverse Image-to-Image Translation via Disentangled Representations
Authors	Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Kumar Singh, Ming-Hsuan Yang
Abstract	Image-to-image translation aims to learn the mapping between two visual domains. There are two main challenges for many applications: 1) the lack of aligned training pairs and 2) multiple possible outputs from a single input image. In this work, we present an approach based on disentangled representation for producing diverse outputs without paired training images. To achieve diversity, we propose to embed images onto two spaces: a domain-invariant content space capturing shared information across domains and a domain-specific attribute space. Our model takes the encoded content features extracted from a given input and the attribute vectors sampled from the attribute space to produce diverse outputs at test time. To handle unpaired training data, we introduce a novel cross-cycle consistency loss based on disentangled representations. Qualitative results show that our model can generate diverse and realistic images on a wide range of tasks without paired training data. For quantitative comparisons, we measure realism with user study and diversity with a perceptual distance metric. We apply the proposed model to domain adaptation and show competitive performance when compared to the state-of-the-art on the MNIST-M and the LineMod datasets.
Tasks	Domain Adaptation, Image-to-Image Translation, Synthetic-to-Real Translation
Published	2018-08-02
URL	http://arxiv.org/abs/1808.00948v1
PDF	http://arxiv.org/pdf/1808.00948v1.pdf
PWC	https://paperswithcode.com/paper/diverse-image-to-image-translation-via
Repo	https://github.com/taki0112/DRIT-Tensorflow
Framework	tf

A Systematic Evaluation of Recent Deep Learning Architectures for Fine-Grained Vehicle Classification


Title	A Systematic Evaluation of Recent Deep Learning Architectures for Fine-Grained Vehicle Classification
Authors	Krassimir Valev, Arne Schumann, Lars Sommer, Jürgen Beyerer
Abstract	Fine-grained vehicle classification is the task of classifying make, model, and year of a vehicle. This is a very challenging task, because vehicles of different types but similar color and viewpoint can often look much more similar than vehicles of same type but differing color and viewpoint. Vehicle make, model, and year in com- bination with vehicle color - are of importance in several applications such as vehicle search, re-identification, tracking, and traffic analysis. In this work we investigate the suitability of several recent landmark convolutional neural network (CNN) architectures, which have shown top results on large scale image classification tasks, for the task of fine-grained classification of vehicles. We compare the performance of the networks VGG16, several ResNets, Inception architectures, the recent DenseNets, and MobileNet. For classification we use the Stanford Cars-196 dataset which features 196 different types of vehicles. We investigate several aspects of CNN training, such as data augmentation and training from scratch vs. fine-tuning. Importantly, we introduce no aspects in the architectures or training process which are specific to vehicle classification. Our final model achieves a state-of-the-art classification accuracy of 94.6% outperforming all related works, even approaches which are specifically tailored for the task, e.g. by including vehicle part detections.
Tasks	Data Augmentation, Image Classification
Published	2018-06-08
URL	http://arxiv.org/abs/1806.02987v1
PDF	http://arxiv.org/pdf/1806.02987v1.pdf
PWC	https://paperswithcode.com/paper/a-systematic-evaluation-of-recent-deep
Repo	https://github.com/OrkhanHI/Grab-AI-Computer-Vision-Challenge
Framework	pytorch

Pricing Engine: Estimating Causal Impacts in Real World Business Settings


Title	Pricing Engine: Estimating Causal Impacts in Real World Business Settings
Authors	Matt Goldman, Brian Quistorff
Abstract	We introduce the Pricing Engine package to enable the use of Double ML estimation techniques in general panel data settings. Customization allows the user to specify first-stage models, first-stage featurization, second stage treatment selection and second stage causal-modeling. We also introduce a DynamicDML class that allows the user to generate dynamic treatment-aware forecasts at a range of leads and to understand how the forecasts will vary as a function of causally estimated treatment parameters. The Pricing Engine is built on Python 3.5 and can be run on an Azure ML Workbench environment with the addition of only a few Python packages. This note provides high-level discussion of the Double ML method, describes the packages intended use and includes an example Jupyter notebook demonstrating application to some publicly available data. Installation of the package and additional technical documentation is available at $\href{https://github.com/bquistorff/pricingengine}{github.com/bquistorff/pricingengine}$.
Tasks
Published	2018-06-08
URL	http://arxiv.org/abs/1806.03285v2
PDF	http://arxiv.org/pdf/1806.03285v2.pdf
PWC	https://paperswithcode.com/paper/pricing-engine-estimating-causal-impacts-in
Repo	https://github.com/vsemenova/orthoml
Framework	none

Matching Article Pairs with Graphical Decomposition and Convolutions


Title	Matching Article Pairs with Graphical Decomposition and Convolutions
Authors	Bang Liu, Di Niu, Haojie Wei, Jinghong Lin, Yancheng He, Kunfeng Lai, Yu Xu
Abstract	Identifying the relationship between two articles, e.g., whether two articles published from different sources describe the same breaking news, is critical to many document understanding tasks. Existing approaches for modeling and matching sentence pairs do not perform well in matching longer documents, which embody more complex interactions between the enclosed entities than a sentence does. To model article pairs, we propose the Concept Interaction Graph to represent an article as a graph of concepts. We then match a pair of articles by comparing the sentences that enclose the same concept vertex through a series of encoding techniques, and aggregate the matching signals through a graph convolutional network. To facilitate the evaluation of long article matching, we have created two datasets, each consisting of about 30K pairs of breaking news articles covering diverse topics in the open domain. Extensive evaluations of the proposed methods on the two datasets demonstrate significant improvements over a wide range of state-of-the-art methods for natural language matching.
Tasks	Question Answering, Text Matching
Published	2018-02-21
URL	https://arxiv.org/abs/1802.07459v2
PDF	https://arxiv.org/pdf/1802.07459v2.pdf
PWC	https://paperswithcode.com/paper/matching-long-text-documents-via-graph
Repo	https://github.com/BangLiu/ArticlePairMatching
Framework	pytorch

SNIP: Single-shot Network Pruning based on Connection Sensitivity


Title	SNIP: Single-shot Network Pruning based on Connection Sensitivity
Authors	Namhoon Lee, Thalaiyasingam Ajanthan, Philip H. S. Torr
Abstract	Pruning large neural networks while maintaining their performance is often desirable due to the reduced space and time complexity. In existing methods, pruning is done within an iterative optimization procedure with either heuristically designed pruning schedules or additional hyperparameters, undermining their utility. In this work, we present a new approach that prunes a given network once at initialization prior to training. To achieve this, we introduce a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task. This eliminates the need for both pretraining and the complex pruning schedule while making it robust to architecture variations. After pruning, the sparse network is trained in the standard way. Our method obtains extremely sparse networks with virtually the same accuracy as the reference network on the MNIST, CIFAR-10, and Tiny-ImageNet classification tasks and is broadly applicable to various architectures including convolutional, residual and recurrent networks. Unlike existing methods, our approach enables us to demonstrate that the retained connections are indeed relevant to the given task.
Tasks	Image Classification, Network Pruning, Object Detection
Published	2018-10-04
URL	http://arxiv.org/abs/1810.02340v2
PDF	http://arxiv.org/pdf/1810.02340v2.pdf
PWC	https://paperswithcode.com/paper/snip-single-shot-network-pruning-based-on
Repo	https://github.com/namhoonlee/snip-public
Framework	tf

FlowNet3D: Learning Scene Flow in 3D Point Clouds


Title	FlowNet3D: Learning Scene Flow in 3D Point Clouds
Authors	Xingyu Liu, Charles R. Qi, Leonidas J. Guibas
Abstract	Many applications in robotics and human-computer interaction can benefit from understanding 3D motion of points in a dynamic environment, widely noted as scene flow. While most previous methods focus on stereo and RGB-D images as input, few try to estimate scene flow directly from point clouds. In this work, we propose a novel deep neural network named $FlowNet3D$ that learns scene flow from point clouds in an end-to-end fashion. Our network simultaneously learns deep hierarchical features of point clouds and flow embeddings that represent point motions, supported by two newly proposed learning layers for point sets. We evaluate the network on both challenging synthetic data from FlyingThings3D and real Lidar scans from KITTI. Trained on synthetic data only, our network successfully generalizes to real scans, outperforming various baselines and showing competitive results to the prior art. We also demonstrate two applications of our scene flow output (scan registration and motion segmentation) to show its potential wide use cases.
Tasks	Motion Segmentation
Published	2018-06-04
URL	https://arxiv.org/abs/1806.01411v3
PDF	https://arxiv.org/pdf/1806.01411v3.pdf
PWC	https://paperswithcode.com/paper/flownet3d-learning-scene-flow-in-3d-point
Repo	https://github.com/xingyul/flownet3d
Framework	tf

From Third Person to First Person: Dataset and Baselines for Synthesis and Retrieval


Title	From Third Person to First Person: Dataset and Baselines for Synthesis and Retrieval
Authors	Mohamed Elfeki, Krishna Regmi, Shervin Ardeshir, Ali Borji
Abstract	First-person (egocentric) and third person (exocentric) videos are drastically different in nature. The relationship between these two views have been studied in recent years, however, it has yet to be fully explored. In this work, we introduce two datasets (synthetic and natural/real) containing simultaneously recorded egocentric and exocentric videos. We also explore relating the two domains (egocentric and exocentric) in two aspects. First, we synthesize images in the egocentric domain from the exocentric domain using a conditional generative adversarial network (cGAN). We show that with enough training data, our network is capable of hallucinating how the world would look like from an egocentric perspective, given an exocentric video. Second, we address the cross-view retrieval problem across the two views. Given an egocentric query frame (or its momentary optical flow), we retrieve its corresponding exocentric frame (or optical flow) from a gallery set. We show that using synthetic data could be beneficial in retrieving real data. We show that performing domain adaptation from the synthetic domain to the natural/real domain, is helpful in tasks such as retrieval. We believe that the presented datasets and the proposed baselines offer new opportunities for further research in this direction. The code and dataset are publicly available.
Tasks	Domain Adaptation, Optical Flow Estimation
Published	2018-12-01
URL	http://arxiv.org/abs/1812.00104v1
PDF	http://arxiv.org/pdf/1812.00104v1.pdf
PWC	https://paperswithcode.com/paper/from-third-person-to-first-person-dataset-and
Repo	https://github.com/M-Elfeki/ThirdToFirst
Framework	none

Frustratingly Easy Meta-Embedding – Computing Meta-Embeddings by Averaging Source Word Embeddings


Title	Frustratingly Easy Meta-Embedding – Computing Meta-Embeddings by Averaging Source Word Embeddings
Authors	Joshua Coates, Danushka Bollegala
Abstract	Creating accurate meta-embeddings from pre-trained source embeddings has received attention lately. Methods based on global and locally-linear transformation and concatenation have shown to produce accurate meta-embeddings. In this paper, we show that the arithmetic mean of two distinct word embedding sets yields a performant meta-embedding that is comparable or better than more complex meta-embedding learning methods. The result seems counter-intuitive given that vector spaces in different source embeddings are not comparable and cannot be simply averaged. We give insight into why averaging can still produce accurate meta-embedding despite the incomparability of the source vector spaces.
Tasks	Word Embeddings
Published	2018-04-14
URL	http://arxiv.org/abs/1804.05262v1
PDF	http://arxiv.org/pdf/1804.05262v1.pdf
PWC	https://paperswithcode.com/paper/frustratingly-easy-meta-embedding-computing
Repo	https://github.com/Shujian2015/meta-embedding-paper-list
Framework	none

Unsupervised Learning for Fast Probabilistic Diffeomorphic Registration


Title	Unsupervised Learning for Fast Probabilistic Diffeomorphic Registration
Authors	Adrian V. Dalca, Guha Balakrishnan, John Guttag, Mert R. Sabuncu
Abstract	Traditional deformable registration techniques achieve impressive results and offer a rigorous theoretical treatment, but are computationally intensive since they solve an optimization problem for each image pair. Recently, learning-based methods have facilitated fast registration by learning spatial deformation functions. However, these approaches use restricted deformation models, require supervised labels, or do not guarantee a diffeomorphic (topology-preserving) registration. Furthermore, learning-based registration tools have not been derived from a probabilistic framework that can offer uncertainty estimates. In this paper, we present a probabilistic generative model and derive an unsupervised learning-based inference algorithm that makes use of recent developments in convolutional neural networks (CNNs). We demonstrate our method on a 3D brain registration task, and provide an empirical analysis of the algorithm. Our approach results in state of the art accuracy and very fast runtimes, while providing diffeomorphic guarantees and uncertainty estimates. Our implementation is available online at http://voxelmorph.csail.mit.edu .
Tasks
Published	2018-05-11
URL	http://arxiv.org/abs/1805.04605v2
PDF	http://arxiv.org/pdf/1805.04605v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-learning-for-fast-probabilistic
Repo	https://github.com/yh854/Rigid-Registration-of-3D-MRI-Based-on-Unsupervised-Learning
Framework	tf

Collaborative Annotation of Semantic Objects in Images with Multi-granularity Supervisions


Title	Collaborative Annotation of Semantic Objects in Images with Multi-granularity Supervisions
Authors	Lishi Zhang, Chenghan Fu, Jia Li
Abstract	Per-pixel masks of semantic objects are very useful in many applications, which, however, are tedious to be annotated. In this paper, we propose a human-agent collaborative annotation approach that can efficiently generate per-pixel masks of semantic objects in tagged images with multi-granularity supervisions. Given a set of tagged image, a computer agent is first dynamically generated to roughly localize the semantic objects described by the tag. The agent first extracts massive object proposals from an image and then infer the tag-related ones under the weak and strong supervisions from linguistically and visually similar images and previously annotated object masks. By representing such supervisions by over-complete dictionaries, the tag-related object proposals can pop-out according to their sparse coding length, which are then converted to superpixels with binary labels. After that, human annotators participate in the annotation process by flipping labels and dividing superpixels with mouse clicks, which are used as click supervisions that teach the agent to recover false positives/negatives in processing images with the same tags. Experimental results show that our approach can facilitate the annotation process and generate object masks that are highly consistent with those generated by the LabelMe toolbox.
Tasks
Published	2018-06-27
URL	http://arxiv.org/abs/1806.10269v1
PDF	http://arxiv.org/pdf/1806.10269v1.pdf
PWC	https://paperswithcode.com/paper/collaborative-annotation-of-semantic-objects
Repo	https://github.com/yuxi120407/transfer_learning
Framework	none

Fast End-to-End Trainable Guided Filter


Title	Fast End-to-End Trainable Guided Filter
Authors	Huikai Wu, Shuai Zheng, Junge Zhang, Kaiqi Huang
Abstract	Dense pixel-wise image prediction has been advanced by harnessing the capabilities of Fully Convolutional Networks (FCNs). One central issue of FCNs is the limited capacity to handle joint upsampling. To address the problem, we present a novel building block for FCNs, namely guided filtering layer, which is designed for efficiently generating a high-resolution output given the corresponding low-resolution one and a high-resolution guidance map. Such a layer contains learnable parameters, which can be integrated with FCNs and jointly optimized through end-to-end training. To further take advantage of end-to-end training, we plug in a trainable transformation function for generating the task-specific guidance map. Based on the proposed layer, we present a general framework for pixel-wise image prediction, named deep guided filtering network (DGF). The proposed network is evaluated on five image processing tasks. Experiments on MIT-Adobe FiveK Dataset demonstrate that DGF runs 10-100 times faster and achieves the state-of-the-art performance. We also show that DGF helps to improve the performance of multiple computer vision tasks.
Tasks
Published	2018-03-15
URL	https://arxiv.org/abs/1803.05619v2
PDF	https://arxiv.org/pdf/1803.05619v2.pdf
PWC	https://paperswithcode.com/paper/fast-end-to-end-trainable-guided-filter
Repo	https://github.com/wuhuikai/DeepGuidedFilter
Framework	pytorch