Paper Group ANR 1044
Learning Product Codebooks using Vector Quantized Autoencoders for Image Retrieval. MACNet: Multi-scale Atrous Convolution Networks for Food Places Classification in Egocentric Photo-streams. Mitigating Unwanted Biases with Adversarial Learning. Automatic Graphics Program Generation using Attention-Based Hierarchical Decoder. Faster and More Robust …
Learning Product Codebooks using Vector Quantized Autoencoders for Image Retrieval
Title | Learning Product Codebooks using Vector Quantized Autoencoders for Image Retrieval |
Authors | Hanwei Wu, Markus Flierl |
Abstract | Vector-Quantized Variational Autoencoders (VQ-VAE)[1] provide an unsupervised model for learning discrete representations by combining vector quantization and autoencoders. In this paper, we study the use of VQ-VAE for representation learning for downstream tasks, such as image retrieval. We first describe the VQ-VAE in the context of an information-theoretic framework. We show that the regularization term on the learned representation is determined by the size of the embedded codebook before the training and it affects the generalization ability of the model. As a result, we introduce a hyperparameter to balance the strength of the vector quantizer and the reconstruction error. By tuning the hyperparameter, the embedded bottleneck quantizer is used as a regularizer that forces the output of the encoder to share a constrained coding space such that learned latent features preserve the similarity relations of the data space. In addition, we provide a search range for finding the best hyperparameter. Finally, we incorporate the product quantization into the bottleneck stage of VQ-VAE and propose an end-to-end unsupervised learning model for the image retrieval task. The product quantizer has the advantage of generating large-size codebooks. Fast retrieval can be achieved by using the lookup tables that store the distance between any pair of sub-codewords. State-of-the-art retrieval results are achieved by the learned codebooks. |
Tasks | Image Retrieval, Quantization, Representation Learning |
Published | 2018-07-12 |
URL | http://arxiv.org/abs/1807.04629v4 |
http://arxiv.org/pdf/1807.04629v4.pdf | |
PWC | https://paperswithcode.com/paper/learning-product-codebooks-using-vector |
Repo | |
Framework | |
MACNet: Multi-scale Atrous Convolution Networks for Food Places Classification in Egocentric Photo-streams
Title | MACNet: Multi-scale Atrous Convolution Networks for Food Places Classification in Egocentric Photo-streams |
Authors | Md. Mostafa Kamal Sarker, Hatem A. Rashwan, Estefania Talavera, Syeda Furruka Banu, Petia Radeva, Domenec Puig |
Abstract | First-person (wearable) camera continually captures unscripted interactions of the camera user with objects, people, and scenes reflecting his personal and relational tendencies. One of the preferences of people is their interaction with food events. The regulation of food intake and its duration has a great importance to protect against diseases. Consequently, this work aims to develop a smart model that is able to determine the recurrences of a person on food places during a day. This model is based on a deep end-to-end model for automatic food places recognition by analyzing egocentric photo-streams. In this paper, we apply multi-scale Atrous convolution networks to extract the key features related to food places of the input images. The proposed model is evaluated on an in-house private dataset called “EgoFoodPlaces”. Experimental results shows promising results of food places classification recognition in egocentric photo-streams. |
Tasks | |
Published | 2018-08-29 |
URL | http://arxiv.org/abs/1808.09829v1 |
http://arxiv.org/pdf/1808.09829v1.pdf | |
PWC | https://paperswithcode.com/paper/macnet-multi-scale-atrous-convolution |
Repo | |
Framework | |
Mitigating Unwanted Biases with Adversarial Learning
Title | Mitigating Unwanted Biases with Adversarial Learning |
Authors | Brian Hu Zhang, Blake Lemoine, Margaret Mitchell |
Abstract | Machine learning is a tool for building models that accurately represent input training data. When undesired biases concerning demographic groups are in the training data, well-trained models will reflect those biases. We present a framework for mitigating such biases by including a variable for the group of interest and simultaneously learning a predictor and an adversary. The input to the network X, here text or census data, produces a prediction Y, such as an analogy completion or income bracket, while the adversary tries to model a protected variable Z, here gender or zip code. The objective is to maximize the predictor’s ability to predict Y while minimizing the adversary’s ability to predict Z. Applied to analogy completion, this method results in accurate predictions that exhibit less evidence of stereotyping Z. When applied to a classification task using the UCI Adult (Census) Dataset, it results in a predictive model that does not lose much accuracy while achieving very close to equality of odds (Hardt, et al., 2016). The method is flexible and applicable to multiple definitions of fairness as well as a wide range of gradient-based learning models, including both regression and classification tasks. |
Tasks | |
Published | 2018-01-22 |
URL | http://arxiv.org/abs/1801.07593v1 |
http://arxiv.org/pdf/1801.07593v1.pdf | |
PWC | https://paperswithcode.com/paper/mitigating-unwanted-biases-with-adversarial |
Repo | |
Framework | |
Automatic Graphics Program Generation using Attention-Based Hierarchical Decoder
Title | Automatic Graphics Program Generation using Attention-Based Hierarchical Decoder |
Authors | Zhihao Zhu, Zhan Xue, Zejian Yuan |
Abstract | Recent progress on deep learning has made it possible to automatically transform the screenshot of Graphic User Interface (GUI) into code by using the encoder-decoder framework. While the commonly adopted image encoder (e.g., CNN network), might be capable of extracting image features to the desired level, interpreting these abstract image features into hundreds of tokens of code puts a particular challenge on the decoding power of the RNN-based code generator. Considering the code used for describing GUI is usually hierarchically structured, we propose a new attention-based hierarchical code generation model, which can describe GUI images in a finer level of details, while also being able to generate hierarchically structured code in consistency with the hierarchical layout of the graphic elements in the GUI. Our model follows the encoder-decoder framework, all the components of which can be trained jointly in an end-to-end manner. The experimental results show that our method outperforms other current state-of-the-art methods on both a publicly available GUI-code dataset as well as a dataset established by our own. |
Tasks | Code Generation |
Published | 2018-10-26 |
URL | http://arxiv.org/abs/1810.11536v1 |
http://arxiv.org/pdf/1810.11536v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-graphics-program-generation-using |
Repo | |
Framework | |
Faster and More Robust Mesh-based Algorithms for Obstacle k-Nearest Neighbour
Title | Faster and More Robust Mesh-based Algorithms for Obstacle k-Nearest Neighbour |
Authors | Shizhe Zhao, Daniel D. Harabor, David Taniar |
Abstract | We are interested in the problem of finding $k$ nearest neighbours in the plane and in the presence of polygonal obstacles ($\textit{OkNN}$). Widely used algorithms for OkNN are based on incremental visibility graphs, which means they require costly and online visibility checking and have worst-case quadratic running time. Recently $\mathbf{Polyanya}$, a fast point-to-point pathfinding algorithm was proposed which avoids the disadvantages of visibility graphs by searching over an alternative data structure known as a navigation mesh. Previously, we adapted $\mathbf{Polyanya}$ to multi-target scenarios by developing two specialised heuristic functions: the $\mathbf{Interval heuristic}$ $h_v$ and the $\mathbf{Target heuristic}$ $h_t$. Though these methods outperform visibility graph algorithms by orders of magnitude in all our experiments they are not robust: $h_v$ expands many redundant nodes when the set of neighbours is small while $h_t$ performs poorly when the set of neighbours is large. In this paper, we propose new algorithms and heuristics for OkNN which perform well regardless of neighbour density. |
Tasks | |
Published | 2018-08-13 |
URL | http://arxiv.org/abs/1808.04043v1 |
http://arxiv.org/pdf/1808.04043v1.pdf | |
PWC | https://paperswithcode.com/paper/faster-and-more-robust-mesh-based-algorithms |
Repo | |
Framework | |
Pay attention! - Robustifying a Deep Visuomotor Policy through Task-Focused Attention
Title | Pay attention! - Robustifying a Deep Visuomotor Policy through Task-Focused Attention |
Authors | Pooya Abolghasemi, Amir Mazaheri, Mubarak Shah, Ladislau Bölöni |
Abstract | Several recent studies have demonstrated the promise of deep visuomotor policies for robot manipulator control. Despite impressive progress, these systems are known to be vulnerable to physical disturbances, such as accidental or adversarial bumps that make them drop the manipulated object. They also tend to be distracted by visual disturbances such as objects moving in the robot’s field of view, even if the disturbance does not physically prevent the execution of the task. In this paper, we propose an approach for augmenting a deep visuomotor policy trained through demonstrations with Task Focused visual Attention (TFA). The manipulation task is specified with a natural language text such as `move the red bowl to the left’. This allows the visual attention component to concentrate on the current object that the robot needs to manipulate. We show that even in benign environments, the TFA allows the policy to consistently outperform a variant with no attention mechanism. More importantly, the new policy is significantly more robust: it regularly recovers from severe physical disturbances (such as bumps causing it to drop the object) from which the baseline policy, i.e. with no visual attention, almost never recovers. In addition, we show that the proposed policy performs correctly in the presence of a wide class of visual disturbances, exhibiting a behavior reminiscent of human selective visual attention experiments. Our proposed approach consists of a VAE-GAN network which encodes the visual input and feeds it to a Motor network that moves the robot joints. Also, our approach benefits from a teacher network for the TFA that leverages textual input command to robustify the visual encoder against various types of disturbances. | |
Tasks | |
Published | 2018-09-26 |
URL | http://arxiv.org/abs/1809.10093v2 |
http://arxiv.org/pdf/1809.10093v2.pdf | |
PWC | https://paperswithcode.com/paper/pay-attention-robustifying-a-deep-visuomotor |
Repo | |
Framework | |
On Estimating Multi-Attribute Choice Preferences using Private Signals and Matrix Factorization
Title | On Estimating Multi-Attribute Choice Preferences using Private Signals and Matrix Factorization |
Authors | Venkata Sriram Siddhardh Nadendla, Cedric Langbort |
Abstract | Revealed preference theory studies the possibility of modeling an agent’s revealed preferences and the construction of a consistent utility function. However, modeling agent’s choices over preference orderings is not always practical and demands strong assumptions on human rationality and data-acquisition abilities. Therefore, we propose a simple generative choice model where agents are assumed to generate the choice probabilities based on latent factor matrices that capture their choice evaluation across multiple attributes. Since the multi-attribute evaluation is typically hidden within the agent’s psyche, we consider a signaling mechanism where agents are provided with choice information through private signals, so that the agent’s choices provide more insight about his/her latent evaluation across multiple attributes. We estimate the choice model via a novel multi-stage matrix factorization algorithm that minimizes the average deviation of the factor estimates from choice data. Simulation results are presented to validate the estimation performance of our proposed algorithm. |
Tasks | |
Published | 2018-02-19 |
URL | http://arxiv.org/abs/1802.07126v1 |
http://arxiv.org/pdf/1802.07126v1.pdf | |
PWC | https://paperswithcode.com/paper/on-estimating-multi-attribute-choice |
Repo | |
Framework | |
A Comparative Study of Distributional and Symbolic Paradigms for Relational Learning
Title | A Comparative Study of Distributional and Symbolic Paradigms for Relational Learning |
Authors | Sebastijan Dumancic, Alberto Garcia-Duran, Mathias Niepert |
Abstract | Many real-world domains can be expressed as graphs and, more generally, as multi-relational knowledge graphs. Though reasoning and learning with knowledge graphs has traditionally been addressed by symbolic approaches, recent methods in (deep) representation learning has shown promising results for specialized tasks such as knowledge base completion. These approaches abandon the traditional symbolic paradigm by replacing symbols with vectors in Euclidean space. With few exceptions, symbolic and distributional approaches are explored in different communities and little is known about their respective strengths and weaknesses. In this work, we compare representation learning and relational learning on various relational classification and clustering tasks and analyse the complexity of the rules used implicitly by these approaches. Preliminary results reveal possible indicators that could help in choosing one approach over the other for particular knowledge graphs. |
Tasks | Knowledge Base Completion, Knowledge Graphs, Relational Reasoning, Representation Learning |
Published | 2018-06-29 |
URL | https://arxiv.org/abs/1806.11391v4 |
https://arxiv.org/pdf/1806.11391v4.pdf | |
PWC | https://paperswithcode.com/paper/on-embeddings-as-an-alternative-paradigm-for |
Repo | |
Framework | |
Detecting and Correcting for Label Shift with Black Box Predictors
Title | Detecting and Correcting for Label Shift with Black Box Predictors |
Authors | Zachary C. Lipton, Yu-Xiang Wang, Alex Smola |
Abstract | Faced with distribution shift between training and test set, we wish to detect and quantify the shift, and to correct our classifiers without test set labels. Motivated by medical diagnosis, where diseases (targets) cause symptoms (observations), we focus on label shift, where the label marginal $p(y)$ changes but the conditional $p(x y)$ does not. We propose Black Box Shift Estimation (BBSE) to estimate the test distribution $p(y)$. BBSE exploits arbitrary black box predictors to reduce dimensionality prior to shift correction. While better predictors give tighter estimates, BBSE works even when predictors are biased, inaccurate, or uncalibrated, so long as their confusion matrices are invertible. We prove BBSE’s consistency, bound its error, and introduce a statistical test that uses BBSE to detect shift. We also leverage BBSE to correct classifiers. Experiments demonstrate accurate estimates and improved prediction, even on high-dimensional datasets of natural images. |
Tasks | Medical Diagnosis |
Published | 2018-02-12 |
URL | http://arxiv.org/abs/1802.03916v3 |
http://arxiv.org/pdf/1802.03916v3.pdf | |
PWC | https://paperswithcode.com/paper/detecting-and-correcting-for-label-shift-with |
Repo | |
Framework | |
Bilevel Programming for Hyperparameter Optimization and Meta-Learning
Title | Bilevel Programming for Hyperparameter Optimization and Meta-Learning |
Authors | Luca Franceschi, Paolo Frasconi, Saverio Salzo, Riccardo Grazzi, Massimilano Pontil |
Abstract | We introduce a framework based on bilevel programming that unifies gradient-based hyperparameter optimization and meta-learning. We show that an approximate version of the bilevel problem can be solved by taking into explicit account the optimization dynamics for the inner objective. Depending on the specific setting, the outer variables take either the meaning of hyperparameters in a supervised learning problem or parameters of a meta-learner. We provide sufficient conditions under which solutions of the approximate problem converge to those of the exact problem. We instantiate our approach for meta-learning in the case of deep learning where representation layers are treated as hyperparameters shared across a set of training episodes. In experiments, we confirm our theoretical findings, present encouraging results for few-shot learning and contrast the bilevel approach against classical approaches for learning-to-learn. |
Tasks | Few-Shot Learning, Hyperparameter Optimization, Meta-Learning |
Published | 2018-06-13 |
URL | http://arxiv.org/abs/1806.04910v2 |
http://arxiv.org/pdf/1806.04910v2.pdf | |
PWC | https://paperswithcode.com/paper/bilevel-programming-for-hyperparameter |
Repo | |
Framework | |
Adversarial Risk and the Dangers of Evaluating Against Weak Attacks
Title | Adversarial Risk and the Dangers of Evaluating Against Weak Attacks |
Authors | Jonathan Uesato, Brendan O’Donoghue, Aaron van den Oord, Pushmeet Kohli |
Abstract | This paper investigates recently proposed approaches for defending against adversarial examples and evaluating adversarial robustness. We motivate ‘adversarial risk’ as an objective for achieving models robust to worst-case inputs. We then frame commonly used attacks and evaluation metrics as defining a tractable surrogate objective to the true adversarial risk. This suggests that models may optimize this surrogate rather than the true adversarial risk. We formalize this notion as ‘obscurity to an adversary,’ and develop tools and heuristics for identifying obscured models and designing transparent models. We demonstrate that this is a significant problem in practice by repurposing gradient-free optimization techniques into adversarial attacks, which we use to decrease the accuracy of several recently proposed defenses to near zero. Our hope is that our formulations and results will help researchers to develop more powerful defenses. |
Tasks | |
Published | 2018-02-15 |
URL | http://arxiv.org/abs/1802.05666v2 |
http://arxiv.org/pdf/1802.05666v2.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-risk-and-the-dangers-of |
Repo | |
Framework | |
Domain and Geometry Agnostic CNNs for Left Atrium Segmentation in 3D Ultrasound
Title | Domain and Geometry Agnostic CNNs for Left Atrium Segmentation in 3D Ultrasound |
Authors | Markus A. Degel, Nassir Navab, Shadi Albarqouni |
Abstract | Segmentation of the left atrium and deriving its size can help to predict and detect various cardiovascular conditions. Automation of this process in 3D Ultrasound image data is desirable, since manual delineations are time-consuming, challenging and observer-dependent. Convolutional neural networks have made improvements in computer vision and in medical image analysis. They have successfully been applied to segmentation tasks and were extended to work on volumetric data. In this paper we introduce a combined deep-learning based approach on volumetric segmentation in Ultrasound acquisitions with incorporation of prior knowledge about left atrial shape and imaging device. The results show, that including a shape prior helps the domain adaptation and the accuracy of segmentation is further increased with adversarial learning. |
Tasks | Domain Adaptation |
Published | 2018-04-20 |
URL | http://arxiv.org/abs/1805.00357v1 |
http://arxiv.org/pdf/1805.00357v1.pdf | |
PWC | https://paperswithcode.com/paper/domain-and-geometry-agnostic-cnns-for-left |
Repo | |
Framework | |
Camera Pose Estimation from Sequence of Calibrated Images
Title | Camera Pose Estimation from Sequence of Calibrated Images |
Authors | Jacek Komorowski, Przemyslaw Rokita |
Abstract | In this paper a method for camera pose estimation from a sequence of images is presented. The method assumes camera is calibrated (intrinsic parameters are known) which allows to decrease a number of required pairs of corresponding points compared to uncalibrated case. Our algorithm can be used as a first stage in a structure from motion stereo reconstruction system. |
Tasks | Pose Estimation |
Published | 2018-09-28 |
URL | http://arxiv.org/abs/1809.11066v1 |
http://arxiv.org/pdf/1809.11066v1.pdf | |
PWC | https://paperswithcode.com/paper/camera-pose-estimation-from-sequence-of |
Repo | |
Framework | |
A Novel Parallel Ray-Casting Algorithm
Title | A Novel Parallel Ray-Casting Algorithm |
Authors | Yan Zhang, Peng Gao, Xiao-Qing Li |
Abstract | The Ray-Casting algorithm is an important method for fast real-time surface display from 3D medical images. Based on the Ray-Casting algorithm, a novel parallel Ray-Casting algorithm is proposed in this paper. A novel operation is introduced and defined as a star operation, and star operations can be computed in parallel in the proposed algorithm compared with the serial chain of star operations in the Ray-Casting algorithm. The computation complexity of the proposed algorithm is reduced from $O(n)$ to $O(\log^n_2)$. |
Tasks | |
Published | 2018-04-16 |
URL | http://arxiv.org/abs/1804.05541v2 |
http://arxiv.org/pdf/1804.05541v2.pdf | |
PWC | https://paperswithcode.com/paper/a-novel-parallel-ray-casting-algorithm |
Repo | |
Framework | |
VTrails: Inferring Vessels with Geodesic Connectivity Trees
Title | VTrails: Inferring Vessels with Geodesic Connectivity Trees |
Authors | Stefano Moriconi, Maria A. Zuluaga, H. Rolf Jäger, Parashkev Nachev, Sébastien Ourselin, M. Jorge Cardoso |
Abstract | The analysis of vessel morphology and connectivity has an impact on a number of cardiovascular and neurovascular applications by providing patient-specific high-level quantitative features such as spatial location, direction and scale. In this paper we present an end-to-end approach to extract an acyclic vascular tree from angiographic data by solving a connectivity-enforcing anisotropic fast marching over a voxel-wise tensor field representing the orientation of the underlying vascular tree. The method is validated using synthetic and real vascular images. We compare VTrails against classical and state-of-the-art ridge detectors for tubular structures by assessing the connectedness of the vesselness map and inspecting the synthesized tensor field as proof of concept. VTrails performance is evaluated on images with different levels of degradation: we verify that the extracted vascular network is an acyclic graph (i.e. a tree), and we report the extraction accuracy, precision and recall. |
Tasks | |
Published | 2018-06-08 |
URL | http://arxiv.org/abs/1806.03111v1 |
http://arxiv.org/pdf/1806.03111v1.pdf | |
PWC | https://paperswithcode.com/paper/vtrails-inferring-vessels-with-geodesic |
Repo | |
Framework | |