January 28, 2020

3360 words 16 mins read

Paper Group ANR 1040

Supporting stylists by recommending fashion style. Efficient Computation of Expected Hypervolume Improvement Using Box Decomposition Algorithms. Towards Task and Architecture-Independent Generalization Gap Predictors. Deep Mangoes: from fruit detection to cultivar identification in colour images of mango trees. Deep Task-Based Quantization. A note …

Supporting stylists by recommending fashion style


Title	Supporting stylists by recommending fashion style
Authors	Tobias Kuhn, Steven Bourke, Levin Brinkmann, Tobias Buchwald, Conor Digan, Hendrik Hache, Sebastian Jaeger, Patrick Lehmann, Oskar Maier, Stefan Matting, Yura Okulovsky
Abstract	Outfittery is an online personalized styling service targeted at men. We have hundreds of stylists who create thousands of bespoke outfits for our customers every day. A critical challenge faced by our stylists when creating these outfits is selecting an appropriate item of clothing that makes sense in the context of the outfit being created, otherwise known as style fit. Another significant challenge is knowing if the item is relevant to the customer based on their tastes, physical attributes and price sensitivity. At Outfittery we leverage machine learning extensively and combine it with human domain expertise to tackle these challenges. We do this by surfacing relevant items of clothing during the outfit building process based on what our stylist is doing and what the preferences of our customer are. In this paper we describe one way in which we help our stylists to tackle style fit for a particular item of clothing and its relevance to an outfit. A thorough qualitative and quantitative evaluation highlights the method’s ability to recommend fashion items by style fit.
Tasks
Published	2019-08-26
URL	https://arxiv.org/abs/1908.09493v1
PDF	https://arxiv.org/pdf/1908.09493v1.pdf
PWC	https://paperswithcode.com/paper/supporting-stylists-by-recommending-fashion
Repo
Framework

Efficient Computation of Expected Hypervolume Improvement Using Box Decomposition Algorithms


Title	Efficient Computation of Expected Hypervolume Improvement Using Box Decomposition Algorithms
Authors	Kaifeng Yang, Michael Emmerich, André Deutz, Thomas Bäck
Abstract	In the field of multi-objective optimization algorithms, multi-objective Bayesian Global Optimization (MOBGO) is an important branch, in addition to evolutionary multi-objective optimization algorithms (EMOAs). MOBGO utilizes Gaussian Process models learned from previous objective function evaluations to decide the next evaluation site by maximizing or minimizing an infill criterion. A common criterion in MOBGO is the Expected Hypervolume Improvement (EHVI), which shows a good performance on a wide range of problems, with respect to exploration and exploitation. However, so far it has been a challenge to calculate exact EHVI values efficiently. In this paper, an efficient algorithm for the computation of the exact EHVI for a generic case is proposed. This efficient algorithm is based on partitioning the integration volume into a set of axis-parallel slices. Theoretically, the upper bound time complexities are improved from previously $O (n^2)$ and $O(n^3)$, for two- and three-objective problems respectively, to $\Theta(n\log n)$, which is asymptotically optimal. This article generalizes the scheme in higher dimensional case by utilizing a new hyperbox decomposition technique, which was proposed by D{"a}chert et al, EJOR, 2017. It also utilizes a generalization of the multilayered integration scheme that scales linearly in the number of hyperboxes of the decomposition. The speed comparison shows that the proposed algorithm in this paper significantly reduces computation time. Finally, this decomposition technique is applied in the calculation of the Probability of Improvement (PoI).
Tasks
Published	2019-04-26
URL	https://arxiv.org/abs/1904.12672v2
PDF	https://arxiv.org/pdf/1904.12672v2.pdf
PWC	https://paperswithcode.com/paper/efficient-computation-of-expected-hypervolume
Repo
Framework

Towards Task and Architecture-Independent Generalization Gap Predictors


Title	Towards Task and Architecture-Independent Generalization Gap Predictors
Authors	Scott Yak, Javier Gonzalvo, Hanna Mazzawi
Abstract	Can we use deep learning to predict when deep learning works? Our results suggest the affirmative. We created a dataset by training 13,500 neural networks with different architectures, on different variations of spiral datasets, and using different optimization parameters. We used this dataset to train task-independent and architecture-independent generalization gap predictors for those neural networks. We extend Jiang et al. (2018) to also use DNNs and RNNs and show that they outperform the linear model, obtaining $R^2=0.965$. We also show results for architecture-independent, task-independent, and out-of-distribution generalization gap prediction tasks. Both DNNs and RNNs consistently and significantly outperform linear models, with RNNs obtaining $R^2=0.584$.
Tasks
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01550v1
PDF	https://arxiv.org/pdf/1906.01550v1.pdf
PWC	https://paperswithcode.com/paper/towards-task-and-architecture-independent
Repo
Framework

Deep Mangoes: from fruit detection to cultivar identification in colour images of mango trees


Title	Deep Mangoes: from fruit detection to cultivar identification in colour images of mango trees
Authors	Philippe Borianne, Frederic Borne, Julien Sarron, Emile Faye
Abstract	This paper presents results on the detection and identification mango fruits from colour images of trees. We evaluate the behaviour and the performances of the Faster R-CNN network to determine whether it is robust enough to “detect and classify” fruits under particularly heterogeneous conditions in terms of plant cultivars, plantation scheme, and visual information acquisition contexts. The network is trained to distinguish the ‘Kent’, ‘Keitt’, and “Boucodiekhal” mango cultivars from 3,000 representative labelled fruit annotations. The validation set composed of about 7,000 annotations was then tested with a confidence threshold of 0.7 and a Non-Maximal-Suppression threshold of 0.25. With a F1-score of 0.90, the Faster R-CNN is well suitable to the simple fruit detection in tiles of 500x500 pixels. We then combine a multi-tiling approach with a Jaccard matrix to merge the different parts of objects detected several times, and thus report the detections made at the tile scale to the native 6,000x4,000 pixel size images. Nonetheless with a F1-score of 0.56, the cultivar identification Faster R-CNN network presents some limitations for simultaneously detecting the mango fruits and identifying their respective cultivars. Despite the proven errors in fruit detection, the cultivar identification rates of the detected mango fruits are in the order of 80%. The ideal solution could combine a Mask R-CNN for the image pre-segmentation of trees and a double-stream Faster R-CNN for detecting the mango fruits and identifying their respective cultivar to provide predictions more relevant to users’ expectations.
Tasks
Published	2019-09-24
URL	https://arxiv.org/abs/1909.10939v1
PDF	https://arxiv.org/pdf/1909.10939v1.pdf
PWC	https://paperswithcode.com/paper/deep-mangoes-from-fruit-detection-to-cultivar
Repo
Framework

Deep Task-Based Quantization


Title	Deep Task-Based Quantization
Authors	Nir Shlezinger, Yonina C. Eldar
Abstract	Quantizers play a critical role in digital signal processing systems. Recent works have shown that the performance of quantization systems acquiring multiple analog signals using scalar analog-to-digital converters (ADCs) can be significantly improved by properly processing the analog signals prior to quantization. However, the design of such hybrid quantizers is quite complex, and their implementation requires complete knowledge of the statistical model of the analog signal, which may not be available in practice. In this work we design data-driven task-oriented quantization systems with scalar ADCs, which determine how to map an analog signal into its digital representation using deep learning tools. These representations are designed to facilitate the task of recovering underlying information from the quantized signals, which can be a set of parameters to estimate, or alternatively, a classification task. By utilizing deep learning, we circumvent the need to explicitly recover the system model and to find the proper quantization rule for it. Our main target application is multiple-input multiple-output (MIMO) communication receivers, which simultaneously acquire a set of analog signals, and are commonly subject to constraints on the number of bits. Our results indicate that, in a MIMO channel estimation setup, the proposed deep task-bask quantizer is capable of approaching the optimal performance limits dictated by indirect rate-distortion theory, achievable using vector quantizers and requiring complete knowledge of the underlying statistical model. Furthermore, for a symbol detection scenario, it is demonstrated that the proposed approach can realize reliable bit-efficient hybrid MIMO receivers capable of setting their quantization rule in light of the task, e.g., to minimize the bit error rate.
Tasks	Quantization
Published	2019-08-01
URL	https://arxiv.org/abs/1908.06845v1
PDF	https://arxiv.org/pdf/1908.06845v1.pdf
PWC	https://paperswithcode.com/paper/deep-task-based-quantization
Repo
Framework

A note on the empirical comparison of RBG and Ludii


Title	A note on the empirical comparison of RBG and Ludii
Authors	Jakub Kowalski, Maksymilian Mika, Jakub Sutowicz, Marek Szykuła
Abstract	We present an experimental comparison of the efficiency of three General Game Playing systems in their current versions: Regular Boardgames (RBG 1.0), Ludii~0.3.0, and a Game Description Language (GDL) propnet. We show that in general, RBG is currently the fastest GGP system. For example, for chess, we demonstrate that RBG is about 37 times faster than Ludii, and Ludii is about 3 times slower than a GDL propnet. Referring to the recent comparison [An Empirical Evaluation of Two General Game Systems: Ludii and RBG, CoG 2019], we show evidences that the benchmark presented there contains a number of significant flaws that lead to wrong conclusions.
Tasks
Published	2019-10-01
URL	https://arxiv.org/abs/1910.00309v2
PDF	https://arxiv.org/pdf/1910.00309v2.pdf
PWC	https://paperswithcode.com/paper/a-note-on-the-empirical-comparison-of-rbg-and
Repo
Framework

CLEVRER: CoLlision Events for Video REpresentation and Reasoning


Title	CLEVRER: CoLlision Events for Video REpresentation and Reasoning
Authors	Kexin Yi, Chuang Gan, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum
Abstract	The ability to reason about temporal and causal events from videos lies at the core of human intelligence. Most video reasoning benchmarks, however, focus on pattern recognition from complex visual and language input, instead of on causal structure. We study the complementary problem, exploring the temporal and causal structures behind videos of objects with simple visual appearance. To this end, we introduce the CoLlision Events for Video REpresentation and Reasoning (CLEVRER), a diagnostic video dataset for systematic evaluation of computational models on a wide range of reasoning tasks. Motivated by the theory of human casual judgment, CLEVRER includes four types of questions: descriptive (e.g., “what color”), explanatory (“what is responsible for”), predictive (“what will happen next”), and counterfactual (“what if”). We evaluate various state-of-the-art models for visual reasoning on our benchmark. While these models thrive on the perception-based task (descriptive), they perform poorly on the causal tasks (explanatory, predictive and counterfactual), suggesting that a principled approach for causal reasoning should incorporate the capability of both perceiving complex visual and language inputs, and understanding the underlying dynamics and causal relations. We also study an oracle model that explicitly combines these components via symbolic representations.
Tasks	Visual Reasoning
Published	2019-10-03
URL	https://arxiv.org/abs/1910.01442v2
PDF	https://arxiv.org/pdf/1910.01442v2.pdf
PWC	https://paperswithcode.com/paper/clevrer-collision-events-for-video
Repo
Framework

Robust superpixels using color and contour features along linear path


Title	Robust superpixels using color and contour features along linear path
Authors	Rémi Giraud, Vinh-Thong Ta, Nicolas Papadakis
Abstract	Superpixel decomposition methods are widely used in computer vision and image processing applications. By grouping homogeneous pixels, the accuracy can be increased and the decrease of the number of elements to process can drastically reduce the computational burden. For most superpixel methods, a trade-off is computed between 1) color homogeneity, 2) adherence to the image contours and 3) shape regularity of the decomposition. In this paper, we propose a framework that jointly enforces all these aspects and provides accurate and regular Superpixels with Contour Adherence using Linear Path (SCALP). During the decomposition, we propose to consider color features along the linear path between the pixel and the corresponding superpixel barycenter. A contour prior is also used to prevent the crossing of image boundaries when associating a pixel to a superpixel. Finally, in order to improve the decomposition accuracy and the robustness to noise, we propose to integrate the pixel neighborhood information, while preserving the same computational complexity. SCALP is extensively evaluated on standard segmentation dataset, and the obtained results outperform the ones of the state-of-the-art methods. SCALP is also extended for supervoxel decomposition on MRI images.
Tasks
Published	2019-03-17
URL	http://arxiv.org/abs/1903.07193v1
PDF	http://arxiv.org/pdf/1903.07193v1.pdf
PWC	https://paperswithcode.com/paper/robust-superpixels-using-color-and-contour
Repo
Framework

Transferable Clean-Label Poisoning Attacks on Deep Neural Nets


Title	Transferable Clean-Label Poisoning Attacks on Deep Neural Nets
Authors	Chen Zhu, W. Ronny Huang, Ali Shafahi, Hengduo Li, Gavin Taylor, Christoph Studer, Tom Goldstein
Abstract	Clean-label poisoning attacks inject innocuous looking (and “correctly” labeled) poison images into training data, causing a model to misclassify a targeted image after being trained on this data. We consider transferable poisoning attacks that succeed without access to the victim network’s outputs, architecture, or (in some cases) training data. To achieve this, we propose a new “polytope attack” in which poison images are designed to surround the targeted image in feature space. We also demonstrate that using Dropout during poison creation helps to enhance transferability of this attack. We achieve transferable attack success rates of over 50% while poisoning only 1% of the training set.
Tasks	Transfer Learning
Published	2019-05-15
URL	https://arxiv.org/abs/1905.05897v2
PDF	https://arxiv.org/pdf/1905.05897v2.pdf
PWC	https://paperswithcode.com/paper/transferable-clean-label-poisoning-attacks-on
Repo
Framework

Enhancing the Privacy of Federated Learning with Sketching


Title	Enhancing the Privacy of Federated Learning with Sketching
Authors	Zaoxing Liu, Tian Li, Virginia Smith, Vyas Sekar
Abstract	In response to growing concerns about user privacy, federated learning has emerged as a promising tool to train statistical models over networks of devices while keeping data localized. Federated learning methods run training tasks directly on user devices and do not share the raw user data with third parties. However, current methods still share model updates, which may contain private information (e.g., one’s weight and height), during the training process. Existing efforts that aim to improve the privacy of federated learning make compromises in one or more of the following key areas: performance (particularly communication cost), accuracy, or privacy. To better optimize these trade-offs, we propose that \textit{sketching algorithms} have a unique advantage in that they can provide both privacy and performance benefits while maintaining accuracy. We evaluate the feasibility of sketching-based federated learning with a prototype on three representative learning models. Our initial findings show that it is possible to provide strong privacy guarantees for federated learning without sacrificing performance or accuracy. Our work highlights that there exists a fundamental connection between privacy and communication in distributed settings, and suggests important open problems surrounding the theoretical understanding, methodology, and system design of practical, private federated learning.
Tasks
Published	2019-11-05
URL	https://arxiv.org/abs/1911.01812v1
PDF	https://arxiv.org/pdf/1911.01812v1.pdf
PWC	https://paperswithcode.com/paper/enhancing-the-privacy-of-federated-learning
Repo
Framework


Title	Wearable Travel Aid for Environment Perception and Navigation of Visually Impaired People
Authors	Jinqiang Bai, Zhaoxiang Liu, Yimin Lin, Ye Li, Shiguo Lian, Dijun Liu
Abstract	This paper presents a wearable assistive device with the shape of a pair of eyeglasses that allows visually impaired people to navigate safely and quickly in unfamiliar environment, as well as perceive the complicated environment to automatically make decisions on the direction to move. The device uses a consumer Red, Green, Blue and Depth (RGB-D) camera and an Inertial Measurement Unit (IMU) to detect obstacles. As the device leverages the ground height continuity among adjacent image frames, it is able to segment the ground from obstacles accurately and rapidly. Based on the detected ground, the optimal walkable direction is computed and the user is then informed via converted beep sound. Moreover, by utilizing deep learning techniques, the device can semantically categorize the detected obstacles to improve the users’ perception of surroundings. It combines a Convolutional Neural Network (CNN) deployed on a smartphone with a depth-image-based object detection to decide what the object type is and where the object is located, and then notifies the user of such information via speech. We evaluated the device’s performance with different experiments in which 20 visually impaired people were asked to wear the device and move in an office, and found that they were able to avoid obstacle collisions and find the way in complicated scenarios.
Tasks	Object Detection
Published	2019-04-30
URL	http://arxiv.org/abs/1904.13037v1
PDF	http://arxiv.org/pdf/1904.13037v1.pdf
PWC	https://paperswithcode.com/paper/wearable-travel-aid-for-environment
Repo
Framework

Coupled VAE: Improved Accuracy and Robustness of a Variational Autoencoder


Title	Coupled VAE: Improved Accuracy and Robustness of a Variational Autoencoder
Authors	Shichen Cao, Jingjing Li, Kenric P. Nelson, Mark A. Kon
Abstract	We present a coupled Variational Auto-Encoder (VAE) method that improves the accuracy and robustness of the probabilistic inferences on represented data. The new method models the dependency between input feature vectors (images) and weighs the outliers with a higher penalty by generalizing the original loss function to the coupled entropy function, using the principles of nonlinear statistical coupling. We evaluate the performance of the coupled VAE model using the MNIST dataset. Compared with the traditional VAE algorithm, the output images generated by the coupled VAE method are clearer and less blurry. The visualization of the input images embedded in 2D latent variable space provides a deeper insight into the structure of new model with coupled loss function: the latent variable has a smaller deviation and a more compact latent space generates the output values. We analyze the histogram of the likelihoods of the input images using the generalized mean, which measures the model’s accuracy as a function of the relative risk. The neutral accuracy, which is the geometric mean and is consistent with a measure of the Shannon cross-entropy, is improved. The robust accuracy, measured by the -2/3 generalized mean, is also improved.
Tasks
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00536v3
PDF	https://arxiv.org/pdf/1906.00536v3.pdf
PWC	https://paperswithcode.com/paper/190600536
Repo
Framework

DeepCABAC: Context-adaptive binary arithmetic coding for deep neural network compression


Title	DeepCABAC: Context-adaptive binary arithmetic coding for deep neural network compression
Authors	Simon Wiedemann, Heiner Kirchhoffer, Stefan Matlage, Paul Haase, Arturo Marban, Talmaj Marinc, David Neumann, Ahmed Osman, Detlev Marpe, Heiko Schwarz, Thomas Wiegand, Wojciech Samek
Abstract	We present DeepCABAC, a novel context-adaptive binary arithmetic coder for compressing deep neural networks. It quantizes each weight parameter by minimizing a weighted rate-distortion function, which implicitly takes the impact of quantization on to the accuracy of the network into account. Subsequently, it compresses the quantized values into a bitstream representation with minimal redundancies. We show that DeepCABAC is able to reach very high compression ratios across a wide set of different network architectures and datasets. For instance, we are able to compress by x63.6 the VGG16 ImageNet model with no loss of accuracy, thus being able to represent the entire network with merely 8.7MB.
Tasks	Neural Network Compression, Quantization
Published	2019-05-15
URL	https://arxiv.org/abs/1905.08318v1
PDF	https://arxiv.org/pdf/1905.08318v1.pdf
PWC	https://paperswithcode.com/paper/190508318
Repo
Framework

Dynamic Fusion: Attentional Language Model for Neural Machine Translation


Title	Dynamic Fusion: Attentional Language Model for Neural Machine Translation
Authors	Michiki Kurosawa, Mamoru Komachi
Abstract	Neural Machine Translation (NMT) can be used to generate fluent output. As such, language models have been investigated for incorporation with NMT. In prior investigations, two models have been used: a translation model and a language model. The translation model’s predictions are weighted by the language model with a hand-crafted ratio in advance. However, these approaches fail to adopt the language model weighting with regard to the translation history. In another line of approach, language model prediction is incorporated into the translation model by jointly considering source and target information. However, this line of approach is limited because it largely ignores the adequacy of the translation output. Accordingly, this work employs two mechanisms, the translation model and the language model, with an attentive architecture to the language model as an auxiliary element of the translation model. Compared with previous work in English–Japanese machine translation using a language model, the experimental results obtained with the proposed Dynamic Fusion mechanism improve BLEU and Rank-based Intuitive Bilingual Evaluation Scores (RIBES) scores. Additionally, in the analyses of the attention and predictivity of the language model, the Dynamic Fusion mechanism allows predictive language modeling that conforms to the appropriate grammatical structure.
Tasks	Language Modelling, Machine Translation
Published	2019-09-11
URL	https://arxiv.org/abs/1909.04879v1
PDF	https://arxiv.org/pdf/1909.04879v1.pdf
PWC	https://paperswithcode.com/paper/dynamic-fusion-attentional-language-model-for
Repo
Framework

LatentGNN: Learning Efficient Non-local Relations for Visual Recognition


Title	LatentGNN: Learning Efficient Non-local Relations for Visual Recognition
Authors	Songyang Zhang, Shipeng Yan, Xuming He
Abstract	Capturing long-range dependencies in feature representations is crucial for many visual recognition tasks. Despite recent successes of deep convolutional networks, it remains challenging to model non-local context relations between visual features. A promising strategy is to model the feature context by a fully-connected graph neural network (GNN), which augments traditional convolutional features with an estimated non-local context representation. However, most GNN-based approaches require computing a dense graph affinity matrix and hence have difficulty in scaling up to tackle complex real-world visual problems. In this work, we propose an efficient and yet flexible non-local relation representation based on a novel class of graph neural networks. Our key idea is to introduce a latent space to reduce the complexity of graph, which allows us to use a low-rank representation for the graph affinity matrix and to achieve a linear complexity in computation. Extensive experimental evaluations on three major visual recognition tasks show that our method outperforms the prior works with a large margin while maintaining a low computation cost.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11634v1
PDF	https://arxiv.org/pdf/1905.11634v1.pdf
PWC	https://paperswithcode.com/paper/latentgnn-learning-efficient-non-local
Repo
Framework