May 7, 2019

2991 words 15 mins read

Paper Group AWR 92

Split-door criterion: Identification of causal effects through auxiliary outcomes. A Discrete and Bounded Envy-Free Cake Cutting Protocol for Any Number of Agents. Hybrid Collaborative Filtering with Autoencoders. TensorLy: Tensor Learning in Python. Accelerated Convolutions for Efficient Multi-Scale Time to Contact Computation in Julia. Discrimina …

Split-door criterion: Identification of causal effects through auxiliary outcomes


Title	Split-door criterion: Identification of causal effects through auxiliary outcomes
Authors	Amit Sharma, Jake M. Hofman, Duncan J. Watts
Abstract	We present a method for estimating causal effects in time series data when fine-grained information about the outcome of interest is available. Specifically, we examine what we call the split-door setting, where the outcome variable can be split into two parts: one that is potentially affected by the cause being studied and another that is independent of it, with both parts sharing the same (unobserved) confounders. We show that under these conditions, the problem of identification reduces to that of testing for independence among observed variables, and present a method that uses this approach to automatically find subsets of the data that are causally identified. We demonstrate the method by estimating the causal impact of Amazon’s recommender system on traffic to product pages, finding thousands of examples within the dataset that satisfy the split-door criterion. Unlike past studies based on natural experiments that were limited to a single product category, our method applies to a large and representative sample of products viewed on the site. In line with previous work, we find that the widely-used click-through rate (CTR) metric overestimates the causal impact of recommender systems; depending on the product category, we estimate that 50-80% of the traffic attributed to recommender systems would have happened even without any recommendations. We conclude with guidelines for using the split-door criterion as well as a discussion of other contexts where the method can be applied.
Tasks	Recommendation Systems, Time Series
Published	2016-11-28
URL	http://arxiv.org/abs/1611.09414v2
PDF	http://arxiv.org/pdf/1611.09414v2.pdf
PWC	https://paperswithcode.com/paper/split-door-criterion-identification-of-causal
Repo	https://github.com/amit-sharma/splitdoor-causal-criterion
Framework	none

A Discrete and Bounded Envy-Free Cake Cutting Protocol for Any Number of Agents


Title	A Discrete and Bounded Envy-Free Cake Cutting Protocol for Any Number of Agents
Authors	Haris Aziz, Simon Mackenzie
Abstract	We consider the well-studied cake cutting problem in which the goal is to find an envy-free allocation based on queries from $n$ agents. The problem has received attention in computer science, mathematics, and economics. It has been a major open problem whether there exists a discrete and bounded envy-free protocol. We resolve the problem by proposing a discrete and bounded envy-free protocol for any number of agents. The maximum number of queries required by the protocol is $n^{n^{n^{n^{n^n}}}}$. We additionally show that even if we do not run our protocol to completion, it can find in at most $n^3{(n^2)}^n$ queries a partial allocation of the cake that achieves proportionality (each agent gets at least $1/n$ of the value of the whole cake) and envy-freeness. Finally we show that an envy-free partial allocation can be computed in at most $n^3{(n^2)}^n$ queries such that each agent gets a connected piece that gives the agent at least $1/(3n)$ of the value of the whole cake.
Tasks
Published	2016-04-13
URL	http://arxiv.org/abs/1604.03655v12
PDF	http://arxiv.org/pdf/1604.03655v12.pdf
PWC	https://paperswithcode.com/paper/a-discrete-and-bounded-envy-free-cake-cutting
Repo	https://github.com/cowtrix/kake
Framework	none

Hybrid Collaborative Filtering with Autoencoders


Title	Hybrid Collaborative Filtering with Autoencoders
Authors	Florian Strub, Jeremie Mary, Romaric Gaudel
Abstract	Collaborative Filtering aims at exploiting the feedback of users to provide personalised recommendations. Such algorithms look for latent variables in a large sparse matrix of ratings. They can be enhanced by adding side information to tackle the well-known cold start problem. While Neu-ral Networks have tremendous success in image and speech recognition, they have received less attention in Collaborative Filtering. This is all the more surprising that Neural Networks are able to discover latent variables in large and heterogeneous datasets. In this paper, we introduce a Collaborative Filtering Neural network architecture aka CFN which computes a non-linear Matrix Factorization from sparse rating inputs and side information. We show experimentally on the MovieLens and Douban dataset that CFN outper-forms the state of the art and benefits from side information. We provide an implementation of the algorithm as a reusable plugin for Torch, a popular Neural Network framework.
Tasks
Published	2016-03-02
URL	http://arxiv.org/abs/1603.00806v3
PDF	http://arxiv.org/pdf/1603.00806v3.pdf
PWC	https://paperswithcode.com/paper/hybrid-collaborative-filtering-with
Repo	https://github.com/hojinYang/recsys_papers_using_autoencoders
Framework	none

TensorLy: Tensor Learning in Python


Title	TensorLy: Tensor Learning in Python
Authors	Jean Kossaifi, Yannis Panagakis, Anima Anandkumar, Maja Pantic
Abstract	Tensors are higher-order extensions of matrices. While matrix methods form the cornerstone of machine learning and data analysis, tensor methods have been gaining increasing traction. However, software support for tensor operations is not on the same footing. In order to bridge this gap, we have developed \emph{TensorLy}, a high-level API for tensor methods and deep tensorized neural networks in Python. TensorLy aims to follow the same standards adopted by the main projects of the Python scientific community, and seamlessly integrates with them. Its BSD license makes it suitable for both academic and commercial applications. TensorLy’s backend system allows users to perform computations with NumPy, MXNet, PyTorch, TensorFlow and CuPy. They can be scaled on multiple CPU or GPU machines. In addition, using the deep-learning frameworks as backend allows users to easily design and train deep tensorized neural networks. TensorLy is available at https://github.com/tensorly/tensorly
Tasks
Published	2016-10-29
URL	http://arxiv.org/abs/1610.09555v2
PDF	http://arxiv.org/pdf/1610.09555v2.pdf
PWC	https://paperswithcode.com/paper/tensorly-tensor-learning-in-python
Repo	https://github.com/tensorly/tensorly
Framework	pytorch

Accelerated Convolutions for Efficient Multi-Scale Time to Contact Computation in Julia


Title	Accelerated Convolutions for Efficient Multi-Scale Time to Contact Computation in Julia
Authors	Alexander Amini, Berthold Horn, Alan Edelman
Abstract	Convolutions have long been regarded as fundamental to applied mathematics, physics and engineering. Their mathematical elegance allows for common tasks such as numerical differentiation to be computed efficiently on large data sets. Efficient computation of convolutions is critical to artificial intelligence in real-time applications, like machine vision, where convolutions must be continuously and efficiently computed on tens to hundreds of kilobytes per second. In this paper, we explore how convolutions are used in fundamental machine vision applications. We present an accelerated n-dimensional convolution package in the high performance computing language, Julia, and demonstrate its efficacy in solving the time to contact problem for machine vision. Results are measured against synthetically generated videos and quantitatively assessed according to their mean squared error from the ground truth. We achieve over an order of magnitude decrease in compute time and allocated memory for comparable machine vision applications. All code is packaged and integrated into the official Julia Package Manager to be used in various other scenarios.
Tasks
Published	2016-12-28
URL	http://arxiv.org/abs/1612.08825v1
PDF	http://arxiv.org/pdf/1612.08825v1.pdf
PWC	https://paperswithcode.com/paper/accelerated-convolutions-for-efficient-multi
Repo	https://github.com/aamini/FastConv.jl
Framework	none

Discriminative Embeddings of Latent Variable Models for Structured Data


Title	Discriminative Embeddings of Latent Variable Models for Structured Data
Authors	Hanjun Dai, Bo Dai, Le Song
Abstract	Kernel classifiers and regressors designed for structured data, such as sequences, trees and graphs, have significantly advanced a number of interdisciplinary areas such as computational biology and drug design. Typically, kernels are designed beforehand for a data type which either exploit statistics of the structures or make use of probabilistic generative models, and then a discriminative classifier is learned based on the kernels via convex optimization. However, such an elegant two-stage approach also limited kernel methods from scaling up to millions of data points, and exploiting discriminative information to learn feature representations. We propose, structure2vec, an effective and scalable approach for structured data representation based on the idea of embedding latent variable models into feature spaces, and learning such feature spaces using discriminative information. Interestingly, structure2vec extracts features by performing a sequence of function mappings in a way similar to graphical model inference procedures, such as mean field and belief propagation. In applications involving millions of data points, we showed that structure2vec runs 2 times faster, produces models which are $10,000$ times smaller, while at the same time achieving the state-of-the-art predictive performance.
Tasks	Latent Variable Models
Published	2016-03-17
URL	https://arxiv.org/abs/1603.05629v5
PDF	https://arxiv.org/pdf/1603.05629v5.pdf
PWC	https://paperswithcode.com/paper/discriminative-embeddings-of-latent-variable
Repo	https://github.com/LeeeWee/Note
Framework	none

Deep Metric Learning via Facility Location


Title	Deep Metric Learning via Facility Location
Authors	Hyun Oh Song, Stefanie Jegelka, Vivek Rathod, Kevin Murphy
Abstract	Learning the representation and the similarity metric in an end-to-end fashion with deep networks have demonstrated outstanding results for clustering and retrieval. However, these recent approaches still suffer from the performance degradation stemming from the local metric training procedure which is unaware of the global structure of the embedding space. We propose a global metric learning scheme for optimizing the deep metric embedding with the learnable clustering function and the clustering metric (NMI) in a novel structured prediction framework. Our experiments on CUB200-2011, Cars196, and Stanford online products datasets show state of the art performance both on the clustering and retrieval tasks measured in the NMI and Recall@K evaluation metrics.
Tasks	Metric Learning, Structured Prediction
Published	2016-12-05
URL	http://arxiv.org/abs/1612.01213v2
PDF	http://arxiv.org/pdf/1612.01213v2.pdf
PWC	https://paperswithcode.com/paper/deep-metric-learning-via-facility-location
Repo	https://github.com/michaelfiman/face_rec_metric_comparison
Framework	tf

Log-time and Log-space Extreme Classification


Title	Log-time and Log-space Extreme Classification
Authors	Kalina Jasinska, Nikos Karampatziakis
Abstract	We present LTLS, a technique for multiclass and multilabel prediction that can perform training and inference in logarithmic time and space. LTLS embeds large classification problems into simple structured prediction problems and relies on efficient dynamic programming algorithms for inference. We train LTLS with stochastic gradient descent on a number of multiclass and multilabel datasets and show that despite its small memory footprint it is often competitive with existing approaches.
Tasks	Structured Prediction
Published	2016-11-07
URL	http://arxiv.org/abs/1611.01964v1
PDF	http://arxiv.org/pdf/1611.01964v1.pdf
PWC	https://paperswithcode.com/paper/log-time-and-log-space-extreme-classification
Repo	https://github.com/ievron/wltls
Framework	none

Revisiting Batch Normalization For Practical Domain Adaptation


Title	Revisiting Batch Normalization For Practical Domain Adaptation
Authors	Yanghao Li, Naiyan Wang, Jianping Shi, Jiaying Liu, Xiaodi Hou
Abstract	Deep neural networks (DNN) have shown unprecedented success in various computer vision applications such as image classification and object detection. However, it is still a common annoyance during the training phase, that one has to prepare at least thousands of labeled images to fine-tune a network to a specific domain. Recent study (Tommasi et al. 2015) shows that a DNN has strong dependency towards the training dataset, and the learned features cannot be easily transferred to a different but relevant task without fine-tuning. In this paper, we propose a simple yet powerful remedy, called Adaptive Batch Normalization (AdaBN) to increase the generalization ability of a DNN. By modulating the statistics in all Batch Normalization layers across the network, our approach achieves deep adaptation effect for domain adaptation tasks. In contrary to other deep learning domain adaptation methods, our method does not require additional components, and is parameter-free. It archives state-of-the-art performance despite its surprising simplicity. Furthermore, we demonstrate that our method is complementary with other existing methods. Combining AdaBN with existing domain adaptation treatments may further improve model performance.
Tasks	Domain Adaptation, Image Classification, Object Detection
Published	2016-03-15
URL	http://arxiv.org/abs/1603.04779v4
PDF	http://arxiv.org/pdf/1603.04779v4.pdf
PWC	https://paperswithcode.com/paper/revisiting-batch-normalization-for-practical
Repo	https://github.com/erlendd/ddan
Framework	tf

Directional Statistics in Machine Learning: a Brief Review


Title	Directional Statistics in Machine Learning: a Brief Review
Authors	Suvrit Sra
Abstract	The modern data analyst must cope with data encoded in various forms, vectors, matrices, strings, graphs, or more. Consequently, statistical and machine learning models tailored to different data encodings are important. We focus on data encoded as normalized vectors, so that their “direction” is more important than their magnitude. Specifically, we consider high-dimensional vectors that lie either on the surface of the unit hypersphere or on the real projective plane. For such data, we briefly review common mathematical models prevalent in machine learning, while also outlining some technical aspects, software, applications, and open mathematical challenges.
Tasks
Published	2016-05-01
URL	http://arxiv.org/abs/1605.00316v1
PDF	http://arxiv.org/pdf/1605.00316v1.pdf
PWC	https://paperswithcode.com/paper/directional-statistics-in-machine-learning-a
Repo	https://github.com/clara-labs/spherecluster
Framework	none

Protein-Ligand Scoring with Convolutional Neural Networks


Title	Protein-Ligand Scoring with Convolutional Neural Networks
Authors	Matthew Ragoza, Joshua Hochuli, Elisa Idrobo, Jocelyn Sunseri, David Ryan Koes
Abstract	Computational approaches to drug discovery can reduce the time and cost associated with experimental assays and enable the screening of novel chemotypes. Structure-based drug design methods rely on scoring functions to rank and predict binding affinities and poses. The ever-expanding amount of protein-ligand binding and structural data enables the use of deep machine learning techniques for protein-ligand scoring. We describe convolutional neural network (CNN) scoring functions that take as input a comprehensive 3D representation of a protein-ligand interaction. A CNN scoring function automatically learns the key features of protein-ligand interactions that correlate with binding. We train and optimize our CNN scoring functions to discriminate between correct and incorrect binding poses and known binders and non-binders. We find that our CNN scoring function outperforms the AutoDock Vina scoring function when ranking poses both for pose prediction and virtual screening.
Tasks	Drug Discovery, Pose Prediction
Published	2016-12-08
URL	http://arxiv.org/abs/1612.02751v1
PDF	http://arxiv.org/pdf/1612.02751v1.pdf
PWC	https://paperswithcode.com/paper/protein-ligand-scoring-with-convolutional
Repo	https://github.com/gnina/gnina
Framework	none

Fitting a 3D Morphable Model to Edges: A Comparison Between Hard and Soft Correspondences


Title	Fitting a 3D Morphable Model to Edges: A Comparison Between Hard and Soft Correspondences
Authors	Anil Bas, William A. P. Smith, Timo Bolkart, Stefanie Wuhrer
Abstract	We propose a fully automatic method for fitting a 3D morphable model to single face images in arbitrary pose and lighting. Our approach relies on geometric features (edges and landmarks) and, inspired by the iterated closest point algorithm, is based on computing hard correspondences between model vertices and edge pixels. We demonstrate that this is superior to previous work that uses soft correspondences to form an edge-derived cost surface that is minimised by nonlinear optimisation.
Tasks
Published	2016-02-02
URL	http://arxiv.org/abs/1602.01125v2
PDF	http://arxiv.org/pdf/1602.01125v2.pdf
PWC	https://paperswithcode.com/paper/fitting-a-3d-morphable-model-to-edges-a
Repo	https://github.com/waps101/3DMM_edges
Framework	none

Visual Dialog


Title	Visual Dialog
Authors	Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, Dhruv Batra
Abstract	We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a question about the image, the agent has to ground the question in image, infer context from history, and answer the question accurately. Visual Dialog is disentangled enough from a specific downstream task so as to serve as a general test of machine intelligence, while being grounded in vision enough to allow objective evaluation of individual responses and benchmark progress. We develop a novel two-person chat data-collection protocol to curate a large-scale Visual Dialog dataset (VisDial). VisDial v0.9 has been released and contains 1 dialog with 10 question-answer pairs on ~120k images from COCO, with a total of ~1.2M dialog question-answer pairs. We introduce a family of neural encoder-decoder models for Visual Dialog with 3 encoders – Late Fusion, Hierarchical Recurrent Encoder and Memory Network – and 2 decoders (generative and discriminative), which outperform a number of sophisticated baselines. We propose a retrieval-based evaluation protocol for Visual Dialog where the AI agent is asked to sort a set of candidate answers and evaluated on metrics such as mean-reciprocal-rank of human response. We quantify gap between machine and human performance on the Visual Dialog task via human studies. Putting it all together, we demonstrate the first ‘visual chatbot’! Our dataset, code, trained models and visual chatbot are available on https://visualdialog.org
Tasks	Chatbot, Visual Dialog
Published	2016-11-26
URL	http://arxiv.org/abs/1611.08669v5
PDF	http://arxiv.org/pdf/1611.08669v5.pdf
PWC	https://paperswithcode.com/paper/visual-dialog
Repo	https://github.com/batra-mlp-lab/visdial
Framework	torch

STD2P: RGBD Semantic Segmentation Using Spatio-Temporal Data-Driven Pooling


Title	STD2P: RGBD Semantic Segmentation Using Spatio-Temporal Data-Driven Pooling
Authors	Yang He, Wei-Chen Chiu, Margret Keuper, Mario Fritz
Abstract	We propose a novel superpixel-based multi-view convolutional neural network for semantic image segmentation. The proposed network produces a high quality segmentation of a single image by leveraging information from additional views of the same scene. Particularly in indoor videos such as captured by robotic platforms or handheld and bodyworn RGBD cameras, nearby video frames provide diverse viewpoints and additional context of objects and scenes. To leverage such information, we first compute region correspondences by optical flow and image boundary-based superpixels. Given these region correspondences, we propose a novel spatio-temporal pooling layer to aggregate information over space and time. We evaluate our approach on the NYU–Depth–V2 and the SUN3D datasets and compare it to various state-of-the-art single-view and multi-view approaches. Besides a general improvement over the state-of-the-art, we also show the benefits of making use of unlabeled frames during training for multi-view as well as single-view prediction.
Tasks	Optical Flow Estimation, Semantic Segmentation
Published	2016-04-08
URL	http://arxiv.org/abs/1604.02388v3
PDF	http://arxiv.org/pdf/1604.02388v3.pdf
PWC	https://paperswithcode.com/paper/std2p-rgbd-semantic-segmentation-using-spatio
Repo	https://github.com/SSAW14/STD2P
Framework	none

Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network


Title	Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network
Authors	Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz, Andrew P. Aitken, Rob Bishop, Daniel Rueckert, Zehan Wang
Abstract	Recently, several models based on deep neural networks have achieved great success in terms of both reconstruction accuracy and computational performance for single image super-resolution. In these methods, the low resolution (LR) input image is upscaled to the high resolution (HR) space using a single filter, commonly bicubic interpolation, before reconstruction. This means that the super-resolution (SR) operation is performed in HR space. We demonstrate that this is sub-optimal and adds computational complexity. In this paper, we present the first convolutional neural network (CNN) capable of real-time SR of 1080p videos on a single K2 GPU. To achieve this, we propose a novel CNN architecture where the feature maps are extracted in the LR space. In addition, we introduce an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output. By doing so, we effectively replace the handcrafted bicubic filter in the SR pipeline with more complex upscaling filters specifically trained for each feature map, whilst also reducing the computational complexity of the overall SR operation. We evaluate the proposed approach using images and videos from publicly available datasets and show that it performs significantly better (+0.15dB on Images and +0.39dB on Videos) and is an order of magnitude faster than previous CNN-based methods.
Tasks	Image Super-Resolution, Super-Resolution, Video Super-Resolution
Published	2016-09-16
URL	http://arxiv.org/abs/1609.05158v2
PDF	http://arxiv.org/pdf/1609.05158v2.pdf
PWC	https://paperswithcode.com/paper/real-time-single-image-and-video-super
Repo	https://github.com/XueweiMeng/derain_filter
Framework	tf