October 16, 2019

2906 words 14 mins read

Paper Group ANR 1044

Learning Product Codebooks using Vector Quantized Autoencoders for Image Retrieval. MACNet: Multi-scale Atrous Convolution Networks for Food Places Classification in Egocentric Photo-streams. Mitigating Unwanted Biases with Adversarial Learning. Automatic Graphics Program Generation using Attention-Based Hierarchical Decoder. Faster and More Robust …

Learning Product Codebooks using Vector Quantized Autoencoders for Image Retrieval


Title	Learning Product Codebooks using Vector Quantized Autoencoders for Image Retrieval
Authors	Hanwei Wu, Markus Flierl
Abstract	Vector-Quantized Variational Autoencoders (VQ-VAE)[1] provide an unsupervised model for learning discrete representations by combining vector quantization and autoencoders. In this paper, we study the use of VQ-VAE for representation learning for downstream tasks, such as image retrieval. We first describe the VQ-VAE in the context of an information-theoretic framework. We show that the regularization term on the learned representation is determined by the size of the embedded codebook before the training and it affects the generalization ability of the model. As a result, we introduce a hyperparameter to balance the strength of the vector quantizer and the reconstruction error. By tuning the hyperparameter, the embedded bottleneck quantizer is used as a regularizer that forces the output of the encoder to share a constrained coding space such that learned latent features preserve the similarity relations of the data space. In addition, we provide a search range for finding the best hyperparameter. Finally, we incorporate the product quantization into the bottleneck stage of VQ-VAE and propose an end-to-end unsupervised learning model for the image retrieval task. The product quantizer has the advantage of generating large-size codebooks. Fast retrieval can be achieved by using the lookup tables that store the distance between any pair of sub-codewords. State-of-the-art retrieval results are achieved by the learned codebooks.
Tasks	Image Retrieval, Quantization, Representation Learning
Published	2018-07-12
URL	http://arxiv.org/abs/1807.04629v4
PDF	http://arxiv.org/pdf/1807.04629v4.pdf
PWC	https://paperswithcode.com/paper/learning-product-codebooks-using-vector
Repo
Framework

MACNet: Multi-scale Atrous Convolution Networks for Food Places Classification in Egocentric Photo-streams


Title	MACNet: Multi-scale Atrous Convolution Networks for Food Places Classification in Egocentric Photo-streams
Authors	Md. Mostafa Kamal Sarker, Hatem A. Rashwan, Estefania Talavera, Syeda Furruka Banu, Petia Radeva, Domenec Puig
Abstract	First-person (wearable) camera continually captures unscripted interactions of the camera user with objects, people, and scenes reflecting his personal and relational tendencies. One of the preferences of people is their interaction with food events. The regulation of food intake and its duration has a great importance to protect against diseases. Consequently, this work aims to develop a smart model that is able to determine the recurrences of a person on food places during a day. This model is based on a deep end-to-end model for automatic food places recognition by analyzing egocentric photo-streams. In this paper, we apply multi-scale Atrous convolution networks to extract the key features related to food places of the input images. The proposed model is evaluated on an in-house private dataset called “EgoFoodPlaces”. Experimental results shows promising results of food places classification recognition in egocentric photo-streams.
Tasks
Published	2018-08-29
URL	http://arxiv.org/abs/1808.09829v1
PDF	http://arxiv.org/pdf/1808.09829v1.pdf
PWC	https://paperswithcode.com/paper/macnet-multi-scale-atrous-convolution
Repo
Framework

Mitigating Unwanted Biases with Adversarial Learning


Title	Mitigating Unwanted Biases with Adversarial Learning
Authors	Brian Hu Zhang, Blake Lemoine, Margaret Mitchell
Abstract	Machine learning is a tool for building models that accurately represent input training data. When undesired biases concerning demographic groups are in the training data, well-trained models will reflect those biases. We present a framework for mitigating such biases by including a variable for the group of interest and simultaneously learning a predictor and an adversary. The input to the network X, here text or census data, produces a prediction Y, such as an analogy completion or income bracket, while the adversary tries to model a protected variable Z, here gender or zip code. The objective is to maximize the predictor’s ability to predict Y while minimizing the adversary’s ability to predict Z. Applied to analogy completion, this method results in accurate predictions that exhibit less evidence of stereotyping Z. When applied to a classification task using the UCI Adult (Census) Dataset, it results in a predictive model that does not lose much accuracy while achieving very close to equality of odds (Hardt, et al., 2016). The method is flexible and applicable to multiple definitions of fairness as well as a wide range of gradient-based learning models, including both regression and classification tasks.
Tasks
Published	2018-01-22
URL	http://arxiv.org/abs/1801.07593v1
PDF	http://arxiv.org/pdf/1801.07593v1.pdf
PWC	https://paperswithcode.com/paper/mitigating-unwanted-biases-with-adversarial
Repo
Framework

Automatic Graphics Program Generation using Attention-Based Hierarchical Decoder


Title	Automatic Graphics Program Generation using Attention-Based Hierarchical Decoder
Authors	Zhihao Zhu, Zhan Xue, Zejian Yuan
Abstract	Recent progress on deep learning has made it possible to automatically transform the screenshot of Graphic User Interface (GUI) into code by using the encoder-decoder framework. While the commonly adopted image encoder (e.g., CNN network), might be capable of extracting image features to the desired level, interpreting these abstract image features into hundreds of tokens of code puts a particular challenge on the decoding power of the RNN-based code generator. Considering the code used for describing GUI is usually hierarchically structured, we propose a new attention-based hierarchical code generation model, which can describe GUI images in a finer level of details, while also being able to generate hierarchically structured code in consistency with the hierarchical layout of the graphic elements in the GUI. Our model follows the encoder-decoder framework, all the components of which can be trained jointly in an end-to-end manner. The experimental results show that our method outperforms other current state-of-the-art methods on both a publicly available GUI-code dataset as well as a dataset established by our own.
Tasks	Code Generation
Published	2018-10-26
URL	http://arxiv.org/abs/1810.11536v1
PDF	http://arxiv.org/pdf/1810.11536v1.pdf
PWC	https://paperswithcode.com/paper/automatic-graphics-program-generation-using
Repo
Framework

Faster and More Robust Mesh-based Algorithms for Obstacle k-Nearest Neighbour


Title	Faster and More Robust Mesh-based Algorithms for Obstacle k-Nearest Neighbour
Authors	Shizhe Zhao, Daniel D. Harabor, David Taniar
Abstract	We are interested in the problem of finding $k$ nearest neighbours in the plane and in the presence of polygonal obstacles ($\textit{OkNN}$). Widely used algorithms for OkNN are based on incremental visibility graphs, which means they require costly and online visibility checking and have worst-case quadratic running time. Recently $\mathbf{Polyanya}$, a fast point-to-point pathfinding algorithm was proposed which avoids the disadvantages of visibility graphs by searching over an alternative data structure known as a navigation mesh. Previously, we adapted $\mathbf{Polyanya}$ to multi-target scenarios by developing two specialised heuristic functions: the $\mathbf{Interval heuristic}$ $h_v$ and the $\mathbf{Target heuristic}$ $h_t$. Though these methods outperform visibility graph algorithms by orders of magnitude in all our experiments they are not robust: $h_v$ expands many redundant nodes when the set of neighbours is small while $h_t$ performs poorly when the set of neighbours is large. In this paper, we propose new algorithms and heuristics for OkNN which perform well regardless of neighbour density.
Tasks
Published	2018-08-13
URL	http://arxiv.org/abs/1808.04043v1
PDF	http://arxiv.org/pdf/1808.04043v1.pdf
PWC	https://paperswithcode.com/paper/faster-and-more-robust-mesh-based-algorithms
Repo
Framework

Pay attention! - Robustifying a Deep Visuomotor Policy through Task-Focused Attention


Title	Pay attention! - Robustifying a Deep Visuomotor Policy through Task-Focused Attention
Authors	Pooya Abolghasemi, Amir Mazaheri, Mubarak Shah, Ladislau Bölöni
Abstract	Several recent studies have demonstrated the promise of deep visuomotor policies for robot manipulator control. Despite impressive progress, these systems are known to be vulnerable to physical disturbances, such as accidental or adversarial bumps that make them drop the manipulated object. They also tend to be distracted by visual disturbances such as objects moving in the robot’s field of view, even if the disturbance does not physically prevent the execution of the task. In this paper, we propose an approach for augmenting a deep visuomotor policy trained through demonstrations with Task Focused visual Attention (TFA). The manipulation task is specified with a natural language text such as `move the red bowl to the left’. This allows the visual attention component to concentrate on the current object that the robot needs to manipulate. We show that even in benign environments, the TFA allows the policy to consistently outperform a variant with no attention mechanism. More importantly, the new policy is significantly more robust: it regularly recovers from severe physical disturbances (such as bumps causing it to drop the object) from which the baseline policy, i.e. with no visual attention, almost never recovers. In addition, we show that the proposed policy performs correctly in the presence of a wide class of visual disturbances, exhibiting a behavior reminiscent of human selective visual attention experiments. Our proposed approach consists of a VAE-GAN network which encodes the visual input and feeds it to a Motor network that moves the robot joints. Also, our approach benefits from a teacher network for the TFA that leverages textual input command to robustify the visual encoder against various types of disturbances. \|
Tasks
Published	2018-09-26
URL	http://arxiv.org/abs/1809.10093v2
PDF	http://arxiv.org/pdf/1809.10093v2.pdf
PWC	https://paperswithcode.com/paper/pay-attention-robustifying-a-deep-visuomotor
Repo
Framework

On Estimating Multi-Attribute Choice Preferences using Private Signals and Matrix Factorization


Title	On Estimating Multi-Attribute Choice Preferences using Private Signals and Matrix Factorization
Authors	Venkata Sriram Siddhardh Nadendla, Cedric Langbort
Abstract	Revealed preference theory studies the possibility of modeling an agent’s revealed preferences and the construction of a consistent utility function. However, modeling agent’s choices over preference orderings is not always practical and demands strong assumptions on human rationality and data-acquisition abilities. Therefore, we propose a simple generative choice model where agents are assumed to generate the choice probabilities based on latent factor matrices that capture their choice evaluation across multiple attributes. Since the multi-attribute evaluation is typically hidden within the agent’s psyche, we consider a signaling mechanism where agents are provided with choice information through private signals, so that the agent’s choices provide more insight about his/her latent evaluation across multiple attributes. We estimate the choice model via a novel multi-stage matrix factorization algorithm that minimizes the average deviation of the factor estimates from choice data. Simulation results are presented to validate the estimation performance of our proposed algorithm.
Tasks
Published	2018-02-19
URL	http://arxiv.org/abs/1802.07126v1
PDF	http://arxiv.org/pdf/1802.07126v1.pdf
PWC	https://paperswithcode.com/paper/on-estimating-multi-attribute-choice
Repo
Framework

A Comparative Study of Distributional and Symbolic Paradigms for Relational Learning


Title	A Comparative Study of Distributional and Symbolic Paradigms for Relational Learning
Authors	Sebastijan Dumancic, Alberto Garcia-Duran, Mathias Niepert
Abstract	Many real-world domains can be expressed as graphs and, more generally, as multi-relational knowledge graphs. Though reasoning and learning with knowledge graphs has traditionally been addressed by symbolic approaches, recent methods in (deep) representation learning has shown promising results for specialized tasks such as knowledge base completion. These approaches abandon the traditional symbolic paradigm by replacing symbols with vectors in Euclidean space. With few exceptions, symbolic and distributional approaches are explored in different communities and little is known about their respective strengths and weaknesses. In this work, we compare representation learning and relational learning on various relational classification and clustering tasks and analyse the complexity of the rules used implicitly by these approaches. Preliminary results reveal possible indicators that could help in choosing one approach over the other for particular knowledge graphs.
Tasks	Knowledge Base Completion, Knowledge Graphs, Relational Reasoning, Representation Learning
Published	2018-06-29
URL	https://arxiv.org/abs/1806.11391v4
PDF	https://arxiv.org/pdf/1806.11391v4.pdf
PWC	https://paperswithcode.com/paper/on-embeddings-as-an-alternative-paradigm-for
Repo
Framework

Detecting and Correcting for Label Shift with Black Box Predictors


Title	Detecting and Correcting for Label Shift with Black Box Predictors
Authors	Zachary C. Lipton, Yu-Xiang Wang, Alex Smola
Abstract	Faced with distribution shift between training and test set, we wish to detect and quantify the shift, and to correct our classifiers without test set labels. Motivated by medical diagnosis, where diseases (targets) cause symptoms (observations), we focus on label shift, where the label marginal $p(y)$ changes but the conditional $p(x y)$ does not. We propose Black Box Shift Estimation (BBSE) to estimate the test distribution $p(y)$. BBSE exploits arbitrary black box predictors to reduce dimensionality prior to shift correction. While better predictors give tighter estimates, BBSE works even when predictors are biased, inaccurate, or uncalibrated, so long as their confusion matrices are invertible. We prove BBSE’s consistency, bound its error, and introduce a statistical test that uses BBSE to detect shift. We also leverage BBSE to correct classifiers. Experiments demonstrate accurate estimates and improved prediction, even on high-dimensional datasets of natural images.
Tasks	Medical Diagnosis
Published	2018-02-12
URL	http://arxiv.org/abs/1802.03916v3
PDF	http://arxiv.org/pdf/1802.03916v3.pdf
PWC	https://paperswithcode.com/paper/detecting-and-correcting-for-label-shift-with
Repo
Framework

Bilevel Programming for Hyperparameter Optimization and Meta-Learning


Title	Bilevel Programming for Hyperparameter Optimization and Meta-Learning
Authors	Luca Franceschi, Paolo Frasconi, Saverio Salzo, Riccardo Grazzi, Massimilano Pontil
Abstract	We introduce a framework based on bilevel programming that unifies gradient-based hyperparameter optimization and meta-learning. We show that an approximate version of the bilevel problem can be solved by taking into explicit account the optimization dynamics for the inner objective. Depending on the specific setting, the outer variables take either the meaning of hyperparameters in a supervised learning problem or parameters of a meta-learner. We provide sufficient conditions under which solutions of the approximate problem converge to those of the exact problem. We instantiate our approach for meta-learning in the case of deep learning where representation layers are treated as hyperparameters shared across a set of training episodes. In experiments, we confirm our theoretical findings, present encouraging results for few-shot learning and contrast the bilevel approach against classical approaches for learning-to-learn.
Tasks	Few-Shot Learning, Hyperparameter Optimization, Meta-Learning
Published	2018-06-13
URL	http://arxiv.org/abs/1806.04910v2
PDF	http://arxiv.org/pdf/1806.04910v2.pdf
PWC	https://paperswithcode.com/paper/bilevel-programming-for-hyperparameter
Repo
Framework

Adversarial Risk and the Dangers of Evaluating Against Weak Attacks


Title	Adversarial Risk and the Dangers of Evaluating Against Weak Attacks
Authors	Jonathan Uesato, Brendan O’Donoghue, Aaron van den Oord, Pushmeet Kohli
Abstract	This paper investigates recently proposed approaches for defending against adversarial examples and evaluating adversarial robustness. We motivate ‘adversarial risk’ as an objective for achieving models robust to worst-case inputs. We then frame commonly used attacks and evaluation metrics as defining a tractable surrogate objective to the true adversarial risk. This suggests that models may optimize this surrogate rather than the true adversarial risk. We formalize this notion as ‘obscurity to an adversary,’ and develop tools and heuristics for identifying obscured models and designing transparent models. We demonstrate that this is a significant problem in practice by repurposing gradient-free optimization techniques into adversarial attacks, which we use to decrease the accuracy of several recently proposed defenses to near zero. Our hope is that our formulations and results will help researchers to develop more powerful defenses.
Tasks
Published	2018-02-15
URL	http://arxiv.org/abs/1802.05666v2
PDF	http://arxiv.org/pdf/1802.05666v2.pdf
PWC	https://paperswithcode.com/paper/adversarial-risk-and-the-dangers-of
Repo
Framework

Domain and Geometry Agnostic CNNs for Left Atrium Segmentation in 3D Ultrasound


Title	Domain and Geometry Agnostic CNNs for Left Atrium Segmentation in 3D Ultrasound
Authors	Markus A. Degel, Nassir Navab, Shadi Albarqouni
Abstract	Segmentation of the left atrium and deriving its size can help to predict and detect various cardiovascular conditions. Automation of this process in 3D Ultrasound image data is desirable, since manual delineations are time-consuming, challenging and observer-dependent. Convolutional neural networks have made improvements in computer vision and in medical image analysis. They have successfully been applied to segmentation tasks and were extended to work on volumetric data. In this paper we introduce a combined deep-learning based approach on volumetric segmentation in Ultrasound acquisitions with incorporation of prior knowledge about left atrial shape and imaging device. The results show, that including a shape prior helps the domain adaptation and the accuracy of segmentation is further increased with adversarial learning.
Tasks	Domain Adaptation
Published	2018-04-20
URL	http://arxiv.org/abs/1805.00357v1
PDF	http://arxiv.org/pdf/1805.00357v1.pdf
PWC	https://paperswithcode.com/paper/domain-and-geometry-agnostic-cnns-for-left
Repo
Framework

Camera Pose Estimation from Sequence of Calibrated Images


Title	Camera Pose Estimation from Sequence of Calibrated Images
Authors	Jacek Komorowski, Przemyslaw Rokita
Abstract	In this paper a method for camera pose estimation from a sequence of images is presented. The method assumes camera is calibrated (intrinsic parameters are known) which allows to decrease a number of required pairs of corresponding points compared to uncalibrated case. Our algorithm can be used as a first stage in a structure from motion stereo reconstruction system.
Tasks	Pose Estimation
Published	2018-09-28
URL	http://arxiv.org/abs/1809.11066v1
PDF	http://arxiv.org/pdf/1809.11066v1.pdf
PWC	https://paperswithcode.com/paper/camera-pose-estimation-from-sequence-of
Repo
Framework

A Novel Parallel Ray-Casting Algorithm


Title	A Novel Parallel Ray-Casting Algorithm
Authors	Yan Zhang, Peng Gao, Xiao-Qing Li
Abstract	The Ray-Casting algorithm is an important method for fast real-time surface display from 3D medical images. Based on the Ray-Casting algorithm, a novel parallel Ray-Casting algorithm is proposed in this paper. A novel operation is introduced and defined as a star operation, and star operations can be computed in parallel in the proposed algorithm compared with the serial chain of star operations in the Ray-Casting algorithm. The computation complexity of the proposed algorithm is reduced from $O(n)$ to $O(\log^n_2)$.
Tasks
Published	2018-04-16
URL	http://arxiv.org/abs/1804.05541v2
PDF	http://arxiv.org/pdf/1804.05541v2.pdf
PWC	https://paperswithcode.com/paper/a-novel-parallel-ray-casting-algorithm
Repo
Framework

VTrails: Inferring Vessels with Geodesic Connectivity Trees


Title	VTrails: Inferring Vessels with Geodesic Connectivity Trees
Authors	Stefano Moriconi, Maria A. Zuluaga, H. Rolf Jäger, Parashkev Nachev, Sébastien Ourselin, M. Jorge Cardoso
Abstract	The analysis of vessel morphology and connectivity has an impact on a number of cardiovascular and neurovascular applications by providing patient-specific high-level quantitative features such as spatial location, direction and scale. In this paper we present an end-to-end approach to extract an acyclic vascular tree from angiographic data by solving a connectivity-enforcing anisotropic fast marching over a voxel-wise tensor field representing the orientation of the underlying vascular tree. The method is validated using synthetic and real vascular images. We compare VTrails against classical and state-of-the-art ridge detectors for tubular structures by assessing the connectedness of the vesselness map and inspecting the synthesized tensor field as proof of concept. VTrails performance is evaluated on images with different levels of degradation: we verify that the extracted vascular network is an acyclic graph (i.e. a tree), and we report the extraction accuracy, precision and recall.
Tasks
Published	2018-06-08
URL	http://arxiv.org/abs/1806.03111v1
PDF	http://arxiv.org/pdf/1806.03111v1.pdf
PWC	https://paperswithcode.com/paper/vtrails-inferring-vessels-with-geodesic
Repo
Framework