Paper Group ANR 1040
Supporting stylists by recommending fashion style. Efficient Computation of Expected Hypervolume Improvement Using Box Decomposition Algorithms. Towards Task and Architecture-Independent Generalization Gap Predictors. Deep Mangoes: from fruit detection to cultivar identification in colour images of mango trees. Deep Task-Based Quantization. A note …
Supporting stylists by recommending fashion style
Title | Supporting stylists by recommending fashion style |
Authors | Tobias Kuhn, Steven Bourke, Levin Brinkmann, Tobias Buchwald, Conor Digan, Hendrik Hache, Sebastian Jaeger, Patrick Lehmann, Oskar Maier, Stefan Matting, Yura Okulovsky |
Abstract | Outfittery is an online personalized styling service targeted at men. We have hundreds of stylists who create thousands of bespoke outfits for our customers every day. A critical challenge faced by our stylists when creating these outfits is selecting an appropriate item of clothing that makes sense in the context of the outfit being created, otherwise known as style fit. Another significant challenge is knowing if the item is relevant to the customer based on their tastes, physical attributes and price sensitivity. At Outfittery we leverage machine learning extensively and combine it with human domain expertise to tackle these challenges. We do this by surfacing relevant items of clothing during the outfit building process based on what our stylist is doing and what the preferences of our customer are. In this paper we describe one way in which we help our stylists to tackle style fit for a particular item of clothing and its relevance to an outfit. A thorough qualitative and quantitative evaluation highlights the method’s ability to recommend fashion items by style fit. |
Tasks | |
Published | 2019-08-26 |
URL | https://arxiv.org/abs/1908.09493v1 |
https://arxiv.org/pdf/1908.09493v1.pdf | |
PWC | https://paperswithcode.com/paper/supporting-stylists-by-recommending-fashion |
Repo | |
Framework | |
Efficient Computation of Expected Hypervolume Improvement Using Box Decomposition Algorithms
Title | Efficient Computation of Expected Hypervolume Improvement Using Box Decomposition Algorithms |
Authors | Kaifeng Yang, Michael Emmerich, André Deutz, Thomas Bäck |
Abstract | In the field of multi-objective optimization algorithms, multi-objective Bayesian Global Optimization (MOBGO) is an important branch, in addition to evolutionary multi-objective optimization algorithms (EMOAs). MOBGO utilizes Gaussian Process models learned from previous objective function evaluations to decide the next evaluation site by maximizing or minimizing an infill criterion. A common criterion in MOBGO is the Expected Hypervolume Improvement (EHVI), which shows a good performance on a wide range of problems, with respect to exploration and exploitation. However, so far it has been a challenge to calculate exact EHVI values efficiently. In this paper, an efficient algorithm for the computation of the exact EHVI for a generic case is proposed. This efficient algorithm is based on partitioning the integration volume into a set of axis-parallel slices. Theoretically, the upper bound time complexities are improved from previously $O (n^2)$ and $O(n^3)$, for two- and three-objective problems respectively, to $\Theta(n\log n)$, which is asymptotically optimal. This article generalizes the scheme in higher dimensional case by utilizing a new hyperbox decomposition technique, which was proposed by D{"a}chert et al, EJOR, 2017. It also utilizes a generalization of the multilayered integration scheme that scales linearly in the number of hyperboxes of the decomposition. The speed comparison shows that the proposed algorithm in this paper significantly reduces computation time. Finally, this decomposition technique is applied in the calculation of the Probability of Improvement (PoI). |
Tasks | |
Published | 2019-04-26 |
URL | https://arxiv.org/abs/1904.12672v2 |
https://arxiv.org/pdf/1904.12672v2.pdf | |
PWC | https://paperswithcode.com/paper/efficient-computation-of-expected-hypervolume |
Repo | |
Framework | |
Towards Task and Architecture-Independent Generalization Gap Predictors
Title | Towards Task and Architecture-Independent Generalization Gap Predictors |
Authors | Scott Yak, Javier Gonzalvo, Hanna Mazzawi |
Abstract | Can we use deep learning to predict when deep learning works? Our results suggest the affirmative. We created a dataset by training 13,500 neural networks with different architectures, on different variations of spiral datasets, and using different optimization parameters. We used this dataset to train task-independent and architecture-independent generalization gap predictors for those neural networks. We extend Jiang et al. (2018) to also use DNNs and RNNs and show that they outperform the linear model, obtaining $R^2=0.965$. We also show results for architecture-independent, task-independent, and out-of-distribution generalization gap prediction tasks. Both DNNs and RNNs consistently and significantly outperform linear models, with RNNs obtaining $R^2=0.584$. |
Tasks | |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01550v1 |
https://arxiv.org/pdf/1906.01550v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-task-and-architecture-independent |
Repo | |
Framework | |
Deep Mangoes: from fruit detection to cultivar identification in colour images of mango trees
Title | Deep Mangoes: from fruit detection to cultivar identification in colour images of mango trees |
Authors | Philippe Borianne, Frederic Borne, Julien Sarron, Emile Faye |
Abstract | This paper presents results on the detection and identification mango fruits from colour images of trees. We evaluate the behaviour and the performances of the Faster R-CNN network to determine whether it is robust enough to “detect and classify” fruits under particularly heterogeneous conditions in terms of plant cultivars, plantation scheme, and visual information acquisition contexts. The network is trained to distinguish the ‘Kent’, ‘Keitt’, and “Boucodiekhal” mango cultivars from 3,000 representative labelled fruit annotations. The validation set composed of about 7,000 annotations was then tested with a confidence threshold of 0.7 and a Non-Maximal-Suppression threshold of 0.25. With a F1-score of 0.90, the Faster R-CNN is well suitable to the simple fruit detection in tiles of 500x500 pixels. We then combine a multi-tiling approach with a Jaccard matrix to merge the different parts of objects detected several times, and thus report the detections made at the tile scale to the native 6,000x4,000 pixel size images. Nonetheless with a F1-score of 0.56, the cultivar identification Faster R-CNN network presents some limitations for simultaneously detecting the mango fruits and identifying their respective cultivars. Despite the proven errors in fruit detection, the cultivar identification rates of the detected mango fruits are in the order of 80%. The ideal solution could combine a Mask R-CNN for the image pre-segmentation of trees and a double-stream Faster R-CNN for detecting the mango fruits and identifying their respective cultivar to provide predictions more relevant to users’ expectations. |
Tasks | |
Published | 2019-09-24 |
URL | https://arxiv.org/abs/1909.10939v1 |
https://arxiv.org/pdf/1909.10939v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-mangoes-from-fruit-detection-to-cultivar |
Repo | |
Framework | |
Deep Task-Based Quantization
Title | Deep Task-Based Quantization |
Authors | Nir Shlezinger, Yonina C. Eldar |
Abstract | Quantizers play a critical role in digital signal processing systems. Recent works have shown that the performance of quantization systems acquiring multiple analog signals using scalar analog-to-digital converters (ADCs) can be significantly improved by properly processing the analog signals prior to quantization. However, the design of such hybrid quantizers is quite complex, and their implementation requires complete knowledge of the statistical model of the analog signal, which may not be available in practice. In this work we design data-driven task-oriented quantization systems with scalar ADCs, which determine how to map an analog signal into its digital representation using deep learning tools. These representations are designed to facilitate the task of recovering underlying information from the quantized signals, which can be a set of parameters to estimate, or alternatively, a classification task. By utilizing deep learning, we circumvent the need to explicitly recover the system model and to find the proper quantization rule for it. Our main target application is multiple-input multiple-output (MIMO) communication receivers, which simultaneously acquire a set of analog signals, and are commonly subject to constraints on the number of bits. Our results indicate that, in a MIMO channel estimation setup, the proposed deep task-bask quantizer is capable of approaching the optimal performance limits dictated by indirect rate-distortion theory, achievable using vector quantizers and requiring complete knowledge of the underlying statistical model. Furthermore, for a symbol detection scenario, it is demonstrated that the proposed approach can realize reliable bit-efficient hybrid MIMO receivers capable of setting their quantization rule in light of the task, e.g., to minimize the bit error rate. |
Tasks | Quantization |
Published | 2019-08-01 |
URL | https://arxiv.org/abs/1908.06845v1 |
https://arxiv.org/pdf/1908.06845v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-task-based-quantization |
Repo | |
Framework | |
A note on the empirical comparison of RBG and Ludii
Title | A note on the empirical comparison of RBG and Ludii |
Authors | Jakub Kowalski, Maksymilian Mika, Jakub Sutowicz, Marek Szykuła |
Abstract | We present an experimental comparison of the efficiency of three General Game Playing systems in their current versions: Regular Boardgames (RBG 1.0), Ludii~0.3.0, and a Game Description Language (GDL) propnet. We show that in general, RBG is currently the fastest GGP system. For example, for chess, we demonstrate that RBG is about 37 times faster than Ludii, and Ludii is about 3 times slower than a GDL propnet. Referring to the recent comparison [An Empirical Evaluation of Two General Game Systems: Ludii and RBG, CoG 2019], we show evidences that the benchmark presented there contains a number of significant flaws that lead to wrong conclusions. |
Tasks | |
Published | 2019-10-01 |
URL | https://arxiv.org/abs/1910.00309v2 |
https://arxiv.org/pdf/1910.00309v2.pdf | |
PWC | https://paperswithcode.com/paper/a-note-on-the-empirical-comparison-of-rbg-and |
Repo | |
Framework | |
CLEVRER: CoLlision Events for Video REpresentation and Reasoning
Title | CLEVRER: CoLlision Events for Video REpresentation and Reasoning |
Authors | Kexin Yi, Chuang Gan, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum |
Abstract | The ability to reason about temporal and causal events from videos lies at the core of human intelligence. Most video reasoning benchmarks, however, focus on pattern recognition from complex visual and language input, instead of on causal structure. We study the complementary problem, exploring the temporal and causal structures behind videos of objects with simple visual appearance. To this end, we introduce the CoLlision Events for Video REpresentation and Reasoning (CLEVRER), a diagnostic video dataset for systematic evaluation of computational models on a wide range of reasoning tasks. Motivated by the theory of human casual judgment, CLEVRER includes four types of questions: descriptive (e.g., “what color”), explanatory (“what is responsible for”), predictive (“what will happen next”), and counterfactual (“what if”). We evaluate various state-of-the-art models for visual reasoning on our benchmark. While these models thrive on the perception-based task (descriptive), they perform poorly on the causal tasks (explanatory, predictive and counterfactual), suggesting that a principled approach for causal reasoning should incorporate the capability of both perceiving complex visual and language inputs, and understanding the underlying dynamics and causal relations. We also study an oracle model that explicitly combines these components via symbolic representations. |
Tasks | Visual Reasoning |
Published | 2019-10-03 |
URL | https://arxiv.org/abs/1910.01442v2 |
https://arxiv.org/pdf/1910.01442v2.pdf | |
PWC | https://paperswithcode.com/paper/clevrer-collision-events-for-video |
Repo | |
Framework | |
Robust superpixels using color and contour features along linear path
Title | Robust superpixels using color and contour features along linear path |
Authors | Rémi Giraud, Vinh-Thong Ta, Nicolas Papadakis |
Abstract | Superpixel decomposition methods are widely used in computer vision and image processing applications. By grouping homogeneous pixels, the accuracy can be increased and the decrease of the number of elements to process can drastically reduce the computational burden. For most superpixel methods, a trade-off is computed between 1) color homogeneity, 2) adherence to the image contours and 3) shape regularity of the decomposition. In this paper, we propose a framework that jointly enforces all these aspects and provides accurate and regular Superpixels with Contour Adherence using Linear Path (SCALP). During the decomposition, we propose to consider color features along the linear path between the pixel and the corresponding superpixel barycenter. A contour prior is also used to prevent the crossing of image boundaries when associating a pixel to a superpixel. Finally, in order to improve the decomposition accuracy and the robustness to noise, we propose to integrate the pixel neighborhood information, while preserving the same computational complexity. SCALP is extensively evaluated on standard segmentation dataset, and the obtained results outperform the ones of the state-of-the-art methods. SCALP is also extended for supervoxel decomposition on MRI images. |
Tasks | |
Published | 2019-03-17 |
URL | http://arxiv.org/abs/1903.07193v1 |
http://arxiv.org/pdf/1903.07193v1.pdf | |
PWC | https://paperswithcode.com/paper/robust-superpixels-using-color-and-contour |
Repo | |
Framework | |
Transferable Clean-Label Poisoning Attacks on Deep Neural Nets
Title | Transferable Clean-Label Poisoning Attacks on Deep Neural Nets |
Authors | Chen Zhu, W. Ronny Huang, Ali Shafahi, Hengduo Li, Gavin Taylor, Christoph Studer, Tom Goldstein |
Abstract | Clean-label poisoning attacks inject innocuous looking (and “correctly” labeled) poison images into training data, causing a model to misclassify a targeted image after being trained on this data. We consider transferable poisoning attacks that succeed without access to the victim network’s outputs, architecture, or (in some cases) training data. To achieve this, we propose a new “polytope attack” in which poison images are designed to surround the targeted image in feature space. We also demonstrate that using Dropout during poison creation helps to enhance transferability of this attack. We achieve transferable attack success rates of over 50% while poisoning only 1% of the training set. |
Tasks | Transfer Learning |
Published | 2019-05-15 |
URL | https://arxiv.org/abs/1905.05897v2 |
https://arxiv.org/pdf/1905.05897v2.pdf | |
PWC | https://paperswithcode.com/paper/transferable-clean-label-poisoning-attacks-on |
Repo | |
Framework | |
Enhancing the Privacy of Federated Learning with Sketching
Title | Enhancing the Privacy of Federated Learning with Sketching |
Authors | Zaoxing Liu, Tian Li, Virginia Smith, Vyas Sekar |
Abstract | In response to growing concerns about user privacy, federated learning has emerged as a promising tool to train statistical models over networks of devices while keeping data localized. Federated learning methods run training tasks directly on user devices and do not share the raw user data with third parties. However, current methods still share model updates, which may contain private information (e.g., one’s weight and height), during the training process. Existing efforts that aim to improve the privacy of federated learning make compromises in one or more of the following key areas: performance (particularly communication cost), accuracy, or privacy. To better optimize these trade-offs, we propose that \textit{sketching algorithms} have a unique advantage in that they can provide both privacy and performance benefits while maintaining accuracy. We evaluate the feasibility of sketching-based federated learning with a prototype on three representative learning models. Our initial findings show that it is possible to provide strong privacy guarantees for federated learning without sacrificing performance or accuracy. Our work highlights that there exists a fundamental connection between privacy and communication in distributed settings, and suggests important open problems surrounding the theoretical understanding, methodology, and system design of practical, private federated learning. |
Tasks | |
Published | 2019-11-05 |
URL | https://arxiv.org/abs/1911.01812v1 |
https://arxiv.org/pdf/1911.01812v1.pdf | |
PWC | https://paperswithcode.com/paper/enhancing-the-privacy-of-federated-learning |
Repo | |
Framework | |
Wearable Travel Aid for Environment Perception and Navigation of Visually Impaired People
Title | Wearable Travel Aid for Environment Perception and Navigation of Visually Impaired People |
Authors | Jinqiang Bai, Zhaoxiang Liu, Yimin Lin, Ye Li, Shiguo Lian, Dijun Liu |
Abstract | This paper presents a wearable assistive device with the shape of a pair of eyeglasses that allows visually impaired people to navigate safely and quickly in unfamiliar environment, as well as perceive the complicated environment to automatically make decisions on the direction to move. The device uses a consumer Red, Green, Blue and Depth (RGB-D) camera and an Inertial Measurement Unit (IMU) to detect obstacles. As the device leverages the ground height continuity among adjacent image frames, it is able to segment the ground from obstacles accurately and rapidly. Based on the detected ground, the optimal walkable direction is computed and the user is then informed via converted beep sound. Moreover, by utilizing deep learning techniques, the device can semantically categorize the detected obstacles to improve the users’ perception of surroundings. It combines a Convolutional Neural Network (CNN) deployed on a smartphone with a depth-image-based object detection to decide what the object type is and where the object is located, and then notifies the user of such information via speech. We evaluated the device’s performance with different experiments in which 20 visually impaired people were asked to wear the device and move in an office, and found that they were able to avoid obstacle collisions and find the way in complicated scenarios. |
Tasks | Object Detection |
Published | 2019-04-30 |
URL | http://arxiv.org/abs/1904.13037v1 |
http://arxiv.org/pdf/1904.13037v1.pdf | |
PWC | https://paperswithcode.com/paper/wearable-travel-aid-for-environment |
Repo | |
Framework | |
Coupled VAE: Improved Accuracy and Robustness of a Variational Autoencoder
Title | Coupled VAE: Improved Accuracy and Robustness of a Variational Autoencoder |
Authors | Shichen Cao, Jingjing Li, Kenric P. Nelson, Mark A. Kon |
Abstract | We present a coupled Variational Auto-Encoder (VAE) method that improves the accuracy and robustness of the probabilistic inferences on represented data. The new method models the dependency between input feature vectors (images) and weighs the outliers with a higher penalty by generalizing the original loss function to the coupled entropy function, using the principles of nonlinear statistical coupling. We evaluate the performance of the coupled VAE model using the MNIST dataset. Compared with the traditional VAE algorithm, the output images generated by the coupled VAE method are clearer and less blurry. The visualization of the input images embedded in 2D latent variable space provides a deeper insight into the structure of new model with coupled loss function: the latent variable has a smaller deviation and a more compact latent space generates the output values. We analyze the histogram of the likelihoods of the input images using the generalized mean, which measures the model’s accuracy as a function of the relative risk. The neutral accuracy, which is the geometric mean and is consistent with a measure of the Shannon cross-entropy, is improved. The robust accuracy, measured by the -2/3 generalized mean, is also improved. |
Tasks | |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.00536v3 |
https://arxiv.org/pdf/1906.00536v3.pdf | |
PWC | https://paperswithcode.com/paper/190600536 |
Repo | |
Framework | |
DeepCABAC: Context-adaptive binary arithmetic coding for deep neural network compression
Title | DeepCABAC: Context-adaptive binary arithmetic coding for deep neural network compression |
Authors | Simon Wiedemann, Heiner Kirchhoffer, Stefan Matlage, Paul Haase, Arturo Marban, Talmaj Marinc, David Neumann, Ahmed Osman, Detlev Marpe, Heiko Schwarz, Thomas Wiegand, Wojciech Samek |
Abstract | We present DeepCABAC, a novel context-adaptive binary arithmetic coder for compressing deep neural networks. It quantizes each weight parameter by minimizing a weighted rate-distortion function, which implicitly takes the impact of quantization on to the accuracy of the network into account. Subsequently, it compresses the quantized values into a bitstream representation with minimal redundancies. We show that DeepCABAC is able to reach very high compression ratios across a wide set of different network architectures and datasets. For instance, we are able to compress by x63.6 the VGG16 ImageNet model with no loss of accuracy, thus being able to represent the entire network with merely 8.7MB. |
Tasks | Neural Network Compression, Quantization |
Published | 2019-05-15 |
URL | https://arxiv.org/abs/1905.08318v1 |
https://arxiv.org/pdf/1905.08318v1.pdf | |
PWC | https://paperswithcode.com/paper/190508318 |
Repo | |
Framework | |
Dynamic Fusion: Attentional Language Model for Neural Machine Translation
Title | Dynamic Fusion: Attentional Language Model for Neural Machine Translation |
Authors | Michiki Kurosawa, Mamoru Komachi |
Abstract | Neural Machine Translation (NMT) can be used to generate fluent output. As such, language models have been investigated for incorporation with NMT. In prior investigations, two models have been used: a translation model and a language model. The translation model’s predictions are weighted by the language model with a hand-crafted ratio in advance. However, these approaches fail to adopt the language model weighting with regard to the translation history. In another line of approach, language model prediction is incorporated into the translation model by jointly considering source and target information. However, this line of approach is limited because it largely ignores the adequacy of the translation output. Accordingly, this work employs two mechanisms, the translation model and the language model, with an attentive architecture to the language model as an auxiliary element of the translation model. Compared with previous work in English–Japanese machine translation using a language model, the experimental results obtained with the proposed Dynamic Fusion mechanism improve BLEU and Rank-based Intuitive Bilingual Evaluation Scores (RIBES) scores. Additionally, in the analyses of the attention and predictivity of the language model, the Dynamic Fusion mechanism allows predictive language modeling that conforms to the appropriate grammatical structure. |
Tasks | Language Modelling, Machine Translation |
Published | 2019-09-11 |
URL | https://arxiv.org/abs/1909.04879v1 |
https://arxiv.org/pdf/1909.04879v1.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-fusion-attentional-language-model-for |
Repo | |
Framework | |
LatentGNN: Learning Efficient Non-local Relations for Visual Recognition
Title | LatentGNN: Learning Efficient Non-local Relations for Visual Recognition |
Authors | Songyang Zhang, Shipeng Yan, Xuming He |
Abstract | Capturing long-range dependencies in feature representations is crucial for many visual recognition tasks. Despite recent successes of deep convolutional networks, it remains challenging to model non-local context relations between visual features. A promising strategy is to model the feature context by a fully-connected graph neural network (GNN), which augments traditional convolutional features with an estimated non-local context representation. However, most GNN-based approaches require computing a dense graph affinity matrix and hence have difficulty in scaling up to tackle complex real-world visual problems. In this work, we propose an efficient and yet flexible non-local relation representation based on a novel class of graph neural networks. Our key idea is to introduce a latent space to reduce the complexity of graph, which allows us to use a low-rank representation for the graph affinity matrix and to achieve a linear complexity in computation. Extensive experimental evaluations on three major visual recognition tasks show that our method outperforms the prior works with a large margin while maintaining a low computation cost. |
Tasks | |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11634v1 |
https://arxiv.org/pdf/1905.11634v1.pdf | |
PWC | https://paperswithcode.com/paper/latentgnn-learning-efficient-non-local |
Repo | |
Framework | |