January 29, 2020

3389 words 16 mins read

Paper Group ANR 611

A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders. The impact of patient clinical information on automated skin cancer detection. Curiosity-Driven Recommendation Strategy for Adaptive Learning via Deep Reinforcement Learning. GAMMA: A General Agent Motion Prediction Model for Auto …

A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders


Title	A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders
Authors	Manuel Pariente, Antoine Deleforge, Emmanuel Vincent
Abstract	Recent studies have explored the use of deep generative models of speech spectra based of variational autoencoders (VAEs), combined with unsupervised noise models, to perform speech enhancement. These studies developed iterative algorithms involving either Gibbs sampling or gradient descent at each step, making them computationally expensive. This paper proposes a variational inference method to iteratively estimate the power spectrogram of the clean speech. Our main contribution is the analytical derivation of the variational steps in which the en-coder of the pre-learned VAE can be used to estimate the varia-tional approximation of the true posterior distribution, using the very same assumption made to train VAEs. Experiments show that the proposed method produces results on par with the afore-mentioned iterative methods using sampling, while decreasing the computational cost by a factor 36 to reach a given performance .
Tasks	Speech Enhancement
Published	2019-05-03
URL	https://arxiv.org/abs/1905.01209v2
PDF	https://arxiv.org/pdf/1905.01209v2.pdf
PWC	https://paperswithcode.com/paper/a-statistically-principled-and
Repo
Framework

The impact of patient clinical information on automated skin cancer detection


Title	The impact of patient clinical information on automated skin cancer detection
Authors	Andre G. C. Pacheco, Renato A. Krohling
Abstract	Skin cancer is one of the most common types of cancer around the world. For this reason, over the past years, different approaches have been proposed to assist detect it. Nonetheless, most of them are based only on dermoscopy images and do not take into account the patient clinical information. In this work, first, we present a new dataset that contains clinical images, acquired from smartphones, and patient clinical information of the skin lesions. Next, we introduce a straightforward approach to combine the clinical data and the images using different well-known deep learning models. These models are applied to the presented dataset using only the images and combining them with the patient clinical information. We present a comprehensive study to show the impact of the clinical data on the final predictions. The results obtained by combining both sets of information show a general improvement of around 7% in the balanced accuracy for all models. In addition, the statistical test indicates significant differences between the models with and without considering both data. The improvement achieved shows the potential of using patient clinical information in skin cancer detection and indicates that this piece of information is important to leverage skin cancer detection systems.
Tasks
Published	2019-09-16
URL	https://arxiv.org/abs/1909.12912v1
PDF	https://arxiv.org/pdf/1909.12912v1.pdf
PWC	https://paperswithcode.com/paper/the-impact-of-patient-clinical-information-on
Repo
Framework

Curiosity-Driven Recommendation Strategy for Adaptive Learning via Deep Reinforcement Learning


Title	Curiosity-Driven Recommendation Strategy for Adaptive Learning via Deep Reinforcement Learning
Authors	Ruijian Han, Kani Chen, Chunxi Tan
Abstract	The design of recommendations strategies in the adaptive learning system focuses on utilizing currently available information to provide individual-specific learning instructions for learners. As a critical motivate for human behaviors, curiosity is essentially the drive to explore knowledge and seek information. In a psychologically inspired view, we aim to incorporate the element of curiosity for guiding learners to study spontaneously. In this paper, a curiosity-driven recommendation policy is proposed under the reinforcement learning framework, allowing for a both efficient and enjoyable personalized learning mode. Given intrinsic rewards from a well-designed predictive model, we apply the actor-critic method to approximate the policy directly through neural networks. Numeric analyses with a large continuous knowledge state space and concrete learning scenarios are used to further demonstrate the power of the proposed method.
Tasks
Published	2019-10-12
URL	https://arxiv.org/abs/1910.12577v1
PDF	https://arxiv.org/pdf/1910.12577v1.pdf
PWC	https://paperswithcode.com/paper/curiosity-driven-recommendation-strategy-for
Repo
Framework

GAMMA: A General Agent Motion Prediction Model for Autonomous Driving


Title	GAMMA: A General Agent Motion Prediction Model for Autonomous Driving
Authors	Yuanfu Luo, Panpan Cai, David Hsu, Wee Sun Lee
Abstract	Autonomous driving in mixed traffic requires reliable motion prediction of nearby traffic agents such as pedestrians, bicycles, cars, buses, etc.. This prediction problem is extremely challenging because of the diverse dynamics and geometry of traffic agents, complex road conditions, and intensive interactions among the agents. In this paper, we proposed GAMMA, a general agent motion prediction model for autonomous driving, that can predict the motion of heterogeneous traffic agents with different kinematics, geometry, human agents’ inner states, etc.. GAMMA formalizes motion prediction as geometric optimization in the velocity space, and integrates physical constraints and human inner states into this unified framework. Our results show that GAMMA outperforms state-of-the-art approaches significantly on diverse real-world datasets.
Tasks	Autonomous Driving, motion prediction
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01566v3
PDF	https://arxiv.org/pdf/1906.01566v3.pdf
PWC	https://paperswithcode.com/paper/gamma-a-general-agent-motion-prediction-model
Repo
Framework

Visual Tactile Fusion Object Clustering


Title	Visual Tactile Fusion Object Clustering
Authors	Tao Zhang, Yang Cong, Gan Sun, Qianqian Wang, Zhenming Ding
Abstract	Object clustering, aiming at grouping similar objects into one cluster with an unsupervised strategy, has been extensivelystudied among various data-driven applications. However, most existing state-of-the-art object clustering methods (e.g., single-view or multi-view clustering methods) only explore visual information, while ignoring one of most important sensing modalities, i.e., tactile information which can help capture different object properties and further boost the performance of object clustering task. To effectively benefit both visual and tactile modalities for object clustering, in this paper, we propose a deep Auto-Encoder-like Non-negative Matrix Factorization framework for visual-tactile fusion clustering. Specifically, deep matrix factorization constrained by an under-complete Auto-Encoder-like architecture is employed to jointly learn hierarchical expression of visual-tactile fusion data, and preserve the local structure of data generating distribution of visual and tactile modalities. Meanwhile, a graph regularizer is introduced to capture the intrinsic relations of data samples within each modality. Furthermore, we propose a modality-level consensus regularizer to effectively align thevisual and tactile data in a common subspace in which the gap between visual and tactile data is mitigated. For the model optimization, we present an efficient alternating minimization strategy to solve our proposed model. Finally, we conduct extensive experiments on public datasets to verify the effectiveness of our framework.
Tasks
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09430v1
PDF	https://arxiv.org/pdf/1911.09430v1.pdf
PWC	https://paperswithcode.com/paper/visual-tactile-fusion-object-clustering
Repo
Framework

Proportionally Fair Clustering


Title	Proportionally Fair Clustering
Authors	Xingyu Chen, Brandon Fain, Charles Lyu, Kamesh Munagala
Abstract	We extend the fair machine learning literature by considering the problem of proportional centroid clustering in a metric context. For clustering $n$ points with $k$ centers, we define fairness as proportionality to mean that any $n/k$ points are entitled to form their own cluster if there is another center that is closer in distance for all $n/k$ points. We seek clustering solutions to which there are no such justified complaints from any subsets of agents, without assuming any a priori notion of protected subsets. We present and analyze algorithms to efficiently compute, optimize, and audit proportional solutions. We conclude with an empirical examination of the tradeoff between proportional solutions and the $k$-means objective.
Tasks
Published	2019-05-09
URL	https://arxiv.org/abs/1905.03674v2
PDF	https://arxiv.org/pdf/1905.03674v2.pdf
PWC	https://paperswithcode.com/paper/190503674
Repo
Framework

Hotel2vec: Learning Attribute-Aware Hotel Embeddings with Self-Supervision


Title	Hotel2vec: Learning Attribute-Aware Hotel Embeddings with Self-Supervision
Authors	Ali Sadeghian, Shervin Minaee, Ioannis Partalas, Xinxin Li, Daisy Zhe Wang, Brooke Cowan
Abstract	We propose a neural network architecture for learning vector representations of hotels. Unlike previous works, which typically only use user click information for learning item embeddings, we propose a framework that combines several sources of data, including user clicks, hotel attributes (e.g., property type, star rating, average user rating), amenity information (e.g., the hotel has free Wi-Fi or free breakfast), and geographic information. During model training, a joint embedding is learned from all of the above information. We show that including structured attributes about hotels enables us to make better predictions in a downstream task than when we rely exclusively on click data. We train our embedding model on more than 40 million user click sessions from a leading online travel platform and learn embeddings for more than one million hotels. Our final learned embeddings integrate distinct sub-embeddings for user clicks, hotel attributes, and geographic information, providing an interpretable representation that can be used flexibly depending on the application. We show empirically that our model generates high-quality representations that boost the performance of a hotel recommendation system in addition to other applications. An important advantage of the proposed neural model is that it addresses the cold-start problem for hotels with insufficient historical click information by incorporating additional hotel attributes which are available for all hotels.
Tasks
Published	2019-09-30
URL	https://arxiv.org/abs/1910.03943v1
PDF	https://arxiv.org/pdf/1910.03943v1.pdf
PWC	https://paperswithcode.com/paper/hotel2vec-learning-attribute-aware-hotel
Repo
Framework


Title	Dividing and Conquering Cross-Modal Recipe Retrieval: from Nearest Neighbours Baselines to SoTA
Authors	Mikhail Fain, Andrey Ponikar, Ryan Fox, Danushka Bollegala
Abstract	We propose a novel non-parametric method for cross-modal retrieval which is applied on top of precomputed image and text embeddings. By combining our method with standard approaches for building image and text encoders, trained independently with a self-supervised classification objective, we create a baseline model which outperforms most existing methods on a challenging image-to-recipe task. We also use our method for comparing image and text encoders trained using different modern approaches, thus addressing the issues hindering the developments of novel methods for cross-modal recipe retrieval. We demonstrate how to use the insights from model comparison and extend our baseline model with standard triplet loss that improves SoTA on the Recipe1M dataset by a large margin, while using only precomputed features and with much less complexity than existing methods.
Tasks	Cross-Modal Retrieval
Published	2019-11-28
URL	https://arxiv.org/abs/1911.12763v1
PDF	https://arxiv.org/pdf/1911.12763v1.pdf
PWC	https://paperswithcode.com/paper/dividing-and-conquering-cross-modal-recipe
Repo
Framework

Dynamic Control of Stochastic Evolution: A Deep Reinforcement Learning Approach to Adaptively Targeting Emergent Drug Resistance


Title	Dynamic Control of Stochastic Evolution: A Deep Reinforcement Learning Approach to Adaptively Targeting Emergent Drug Resistance
Authors	Dalit Engelhardt
Abstract	The challenge in controlling stochastic systems in which random events can set the system on catastrophic trajectories is to develop a robust ability to respond to such events without significantly compromising the optimality of the baseline control policy. Drug resistance can emerge from random and variable mutations in targeted cell populations; in the absence of an appropriate dosing policy, emergent resistant subpopulations can proliferate and lead to treatment failure. Dynamic feedback dosage control holds promise in combatting this phenomenon, but cell population evolutionary dynamics are complex, stochastic, and often high-dimensional, posing significant challenges to system control. This paper presents CelluDose, a deep reinforcement learning closed-loop dynamic control prototype for automated precision drug dosing targeting stochastic and heterogeneous cell proliferation. Developing optimal dosing schedules for preventing therapy-induced drug resistance involves a tradeoff between the effective suppression of emergent resistant cell subpopulations and the use of conservative dosages and a preference for first-line drugs. CelluDose is trained on model simulations of cell population evolutionary dynamics that combine a system of stochastic differential equations and the additional occurrence of random perturbing events. Both the single-drug and combination therapy policies obtained in training exhibit a 100% success rate at suppressing simulated heterogeneous harmful cell growth and responding to diverse system fluctuations and perturbations within the alloted time and using conservative dosing. The policies obtained were found to be highly robust to model parameter changes and fluctuations not introduced during training.
Tasks
Published	2019-03-27
URL	http://arxiv.org/abs/1903.11373v1
PDF	http://arxiv.org/pdf/1903.11373v1.pdf
PWC	https://paperswithcode.com/paper/dynamic-control-of-stochastic-evolution-a
Repo
Framework

Atlas Based Segmentations via Semi-Supervised Diffeomorphic Registrations


Title	Atlas Based Segmentations via Semi-Supervised Diffeomorphic Registrations
Authors	Charles Huang, Masoud Badiei, Hyunseok Seo, Ming Ma, Xiaokun Liang, Dante Capaldi, Michael Gensheimer, Lei Xing
Abstract	Purpose: Segmentation of organs-at-risk (OARs) is a bottleneck in current radiation oncology pipelines and is often time consuming and labor intensive. In this paper, we propose an atlas-based semi-supervised registration algorithm to generate accurate segmentations of OARs for which there are ground truth contours and rough segmentations of all other OARs in the atlas. To the best of our knowledge, this is the first study to use learning-based registration methods for the segmentation of head and neck patients and demonstrate its utility in clinical applications. Methods: Our algorithm cascades rigid and deformable deformation blocks, and takes on an atlas image (M), set of atlas-space segmentations (S_A), and a patient image (F) as inputs, while outputting patient-space segmentations of all OARs defined on the atlas. We train our model on 475 CT images taken from public archives and Stanford RadOnc Clinic (SROC), validate on 5 CT images from SROC, and test our model on 20 CT images from SROC. Results: Our method outperforms current state of the art learning-based registration algorithms and achieves an overall dice score of 0.789 on our test set. Moreover, our method yields a performance comparable to manual segmentation and supervised segmentation, while solving a much more complex registration problem. Whereas supervised segmentation methods only automate the segmentation process for a select few number of OARs, we demonstrate that our methods can achieve similar performance for OARs of interest, while also providing segmentations for every other OAR on the provided atlas. Conclusions: Our proposed algorithm has significant clinical applications and could help reduce the bottleneck for segmentation of head and neck OARs. Further, our results demonstrate that semi-supervised diffeomorphic registration can be accurately applied to both registration and segmentation problems.
Tasks
Published	2019-11-23
URL	https://arxiv.org/abs/1911.10417v1
PDF	https://arxiv.org/pdf/1911.10417v1.pdf
PWC	https://paperswithcode.com/paper/atlas-based-segmentations-via-semi-supervised
Repo
Framework

Minimal Images in Deep Neural Networks: Fragile Object Recognition in Natural Images


Title	Minimal Images in Deep Neural Networks: Fragile Object Recognition in Natural Images
Authors	Sanjana Srivastava, Guy Ben-Yosef, Xavier Boix
Abstract	The human ability to recognize objects is impaired when the object is not shown in full. “Minimal images” are the smallest regions of an image that remain recognizable for humans. Ullman et al. 2016 show that a slight modification of the location and size of the visible region of the minimal image produces a sharp drop in human recognition accuracy. In this paper, we demonstrate that such drops in accuracy due to changes of the visible region are a common phenomenon between humans and existing state-of-the-art deep neural networks (DNNs), and are much more prominent in DNNs. We found many cases where DNNs classified one region correctly and the other incorrectly, though they only differed by one row or column of pixels, and were often bigger than the average human minimal image size. We show that this phenomenon is independent from previous works that have reported lack of invariance to minor modifications in object location in DNNs. Our results thus reveal a new failure mode of DNNs that also affects humans to a much lesser degree. They expose how fragile DNN recognition ability is for natural images even without adversarial patterns being introduced. Bringing the robustness of DNNs in natural images to the human level remains an open challenge for the community.
Tasks	Object Recognition
Published	2019-02-08
URL	http://arxiv.org/abs/1902.03227v1
PDF	http://arxiv.org/pdf/1902.03227v1.pdf
PWC	https://paperswithcode.com/paper/minimal-images-in-deep-neural-networks
Repo
Framework

Fair in the Eyes of Others


Title	Fair in the Eyes of Others
Authors	Parham Shams, Aurélie Beynier, Sylvain Bouveret, Nicolas Maudet
Abstract	Envy-freeness is a widely studied notion in resource allocation, capturing some aspects of fairness. The notion of envy being inherently subjective though, it might be the case that an agent envies another agent, but that she objectively has no reason to do so. The difficulty here is to define the notion of objectivity, since no ground-truth can properly serve as a basis of this definition. A natural approach is to consider the judgement of the other agents as a proxy for objectivity. Building on previous work by Parijs (who introduced “unanimous envy”) we propose the notion of approval envy: an agent $a_i$ experiences approval envy towards $a_j$ if she is envious of $a_j$, and sufficiently many agents agree that this should be the case, from their own perspectives. Some interesting properties of this notion are put forward. Computing the minimal threshold guaranteeing approval envy clearly inherits well-known intractable results from envy-freeness, but (i) we identify some tractable cases such as house allocation; and (ii) we provide a general method based on a mixed integer programming encoding of the problem, which proves to be efficient in practice. This allows us in particular to show experimentally that existence of such allocations, with a rather small threshold, is very often observed.
Tasks
Published	2019-11-25
URL	https://arxiv.org/abs/1911.11053v1
PDF	https://arxiv.org/pdf/1911.11053v1.pdf
PWC	https://paperswithcode.com/paper/fair-in-the-eyes-of-others
Repo
Framework

Can Bio-Inspired Swarm Algorithms Scale to Modern Societal Problems


Title	Can Bio-Inspired Swarm Algorithms Scale to Modern Societal Problems
Authors	Darren M. Chitty, Elizabeth Wanner, Rakhi Parmar, Peter R. Lewis
Abstract	Taking inspiration from nature for meta-heuristics has proven popular and relatively successful. Many are inspired by the collective intelligence exhibited by insects, fish and birds. However, there is a question over their scalability to the types of complex problems experienced in the modern world. Natural systems evolved to solve simpler problems effectively, replicating these processes for complex problems may suffer from inefficiencies. Several causal factors can impact scalability; computational complexity, memory requirements or pure problem intractability. Supporting evidence is provided using a case study in Ant Colony Optimisation (ACO) regards tackling increasingly complex real-world fleet optimisation problems. This paper hypothesizes that contrary to common intuition, bio-inspired collective intelligence techniques by their very nature exhibit poor scalability in cases of high dimensionality when large degrees of decision making are required. Facilitating scaling of bio-inspired algorithms necessitates reducing this decision making. To support this hypothesis, an enhanced Partial-ACO technique is presented which effectively reduces ant decision making. Reducing the decision making required by ants by up to 90% results in markedly improved effectiveness and reduced runtimes for increasingly complex fleet optimisation problems. Reductions in traversal timings of 40-50% are achieved for problems with up to 45 vehicles and 437 jobs.
Tasks	Decision Making
Published	2019-05-20
URL	https://arxiv.org/abs/1905.08126v1
PDF	https://arxiv.org/pdf/1905.08126v1.pdf
PWC	https://paperswithcode.com/paper/can-bio-inspired-swarm-algorithms-scale-to
Repo
Framework

Beyond Personalization: Research Directions in Multistakeholder Recommendation


Title	Beyond Personalization: Research Directions in Multistakeholder Recommendation
Authors	Himan Abdollahpouri, Gediminas Adomavicius, Robin Burke, Ido Guy, Dietmar Jannach, Toshihiro Kamishima, Jan Krasnodebski, Luiz Pizzato
Abstract	Recommender systems are personalized information access applications; they are ubiquitous in today’s online environment, and effective at finding items that meet user needs and tastes. As the reach of recommender systems has extended, it has become apparent that the single-minded focus on the user common to academic research has obscured other important aspects of recommendation outcomes. Properties such as fairness, balance, profitability, and reciprocity are not captured by typical metrics for recommender system evaluation. The concept of multistakeholder recommendation has emerged as a unifying framework for describing and understanding recommendation settings where the end user is not the sole focus. This article describes the origins of multistakeholder recommendation, and the landscape of system designs. It provides illustrative examples of current research, as well as outlining open questions and research directions for the field.
Tasks	Recommendation Systems
Published	2019-05-01
URL	https://arxiv.org/abs/1905.01986v2
PDF	https://arxiv.org/pdf/1905.01986v2.pdf
PWC	https://paperswithcode.com/paper/beyond-personalization-research-directions-in
Repo
Framework

Deep Reinforcement Learning with Modulated Hebbian plus Q Network Architecture


Title	Deep Reinforcement Learning with Modulated Hebbian plus Q Network Architecture
Authors	Pawel Ladosz, Eseoghene Ben-Iwhiwhu, Jeffrey Dick, Yang Hu, Nicholas Ketz, Soheil Kolouri, Jeffrey L. Krichmar, Praveen Pilly, Andrea Soltoggio
Abstract	This paper presents a new neural architecture that combines a modulated Hebbian network (MOHN) with DQN, which we call modulated Hebbian plus Q network architecture (MOHQA). The hypothesis is that such a combination allows MOHQA to solve difficult partially observable Markov decision process (POMDP) problems which impair temporal difference (TD)-based RL algorithms such as DQN, as the TD error cannot be easily derived from observations. The key idea is to use a Hebbian network with bio-inspired neural traces in order to bridge temporal delays between actions and rewards when confounding observations and sparse rewards result in inaccurate TD errors. In MOHQA, DQN learns low level features and control, while the MOHN contributes to the high-level decisions by associating rewards with past states and actions. Thus the proposed architecture combines two modules with significantly different learning algorithms, a Hebbian associative network and a classical DQN pipeline, exploiting the advantages of both. Simulations on a set of POMDPs and on the MALMO environment show that the proposed algorithm improved DQN’s results and even outperformed control tests with A2C, QRDQN+LSTM and REINFORCE algorithms on some POMDPs with confounding stimuli and sparse rewards.
Tasks	Decision Making
Published	2019-09-21
URL	https://arxiv.org/abs/1909.09902v2
PDF	https://arxiv.org/pdf/1909.09902v2.pdf
PWC	https://paperswithcode.com/paper/190909902
Repo
Framework