January 26, 2020

3147 words 15 mins read

Paper Group ANR 1535

Paper Group ANR 1535

Tuned Inception V3 for Recognizing States of Cooking Ingredients. Parallel Total Variation Distance Estimation with Neural Networks for Merging Over-Clusterings. Purifying Real Images with an Attention-guided Style Transfer Network for Gaze Estimation. Feature Detection and Attenuation in Embeddings. Deep Reference Generation with Multi-Domain Hier …

Tuned Inception V3 for Recognizing States of Cooking Ingredients

Title Tuned Inception V3 for Recognizing States of Cooking Ingredients
Authors Kin Ng
Abstract Cooking is a task that must be performed in a daily basis, and thus it is an activity that many people take for granted. For humans preparing a meal comes naturally, but for robots even preparing a simple sandwich results in an extremely difficult task. In robotics, designing kitchen robots is complicated since cooking relies on a variety of physical interactions that are dependent on different conditions such as changes in the environment, proper execution of sequential instructions, along with motions, and detection of the different states in which cooking-ingredients can be in for their correct grasping and manipulation. In this paper, we focus on the challenge of state recognition and propose a fine tuned convolutional neural network that makes use of transfer learning by reusing the Inception V3 pre-trained model. The model is trained and validated on a cooking dataset consisting of eleven states (e.g. peeled, diced, whole, etc.). The work presented on this paper could provide insight into finding a potential solution to the problem.
Tasks Transfer Learning
Published 2019-05-05
URL https://arxiv.org/abs/1905.03715v1
PDF https://arxiv.org/pdf/1905.03715v1.pdf
PWC https://paperswithcode.com/paper/190503715
Repo
Framework

Parallel Total Variation Distance Estimation with Neural Networks for Merging Over-Clusterings

Title Parallel Total Variation Distance Estimation with Neural Networks for Merging Over-Clusterings
Authors Christian Reiser, Jörg Schlötterer, Michael Granitzer
Abstract We consider the initial situation where a dataset has been over-partitioned into $k$ clusters and seek a domain independent way to merge those initial clusters. We identify the total variation distance (TVD) as suitable for this goal. By exploiting the relation of the TVD to the Bayes accuracy we show how neural networks can be used to estimate TVDs between all pairs of clusters in parallel. Crucially, the needed memory space is decreased by reducing the required number of output neurons from $k^2$ to $k$. On realistically obtained over-clusterings of ImageNet subsets it is demonstrated that our TVD estimates lead to better merge decisions than those obtained by relying on state-of-the-art unsupervised representations. Further the generality of the approach is verified by evaluating it on a a point cloud dataset.
Tasks
Published 2019-12-09
URL https://arxiv.org/abs/1912.04022v1
PDF https://arxiv.org/pdf/1912.04022v1.pdf
PWC https://paperswithcode.com/paper/parallel-total-variation-distance-estimation
Repo
Framework

Purifying Real Images with an Attention-guided Style Transfer Network for Gaze Estimation

Title Purifying Real Images with an Attention-guided Style Transfer Network for Gaze Estimation
Authors Yuxiao Yan, Yang Yan, Jinjia Peng, Huibing Wang, Xianping Fu
Abstract Recently, the progress of learning-by-synthesis has proposed a training model for synthetic images, which can effectively reduce the cost of human and material resources. However, due to the different distribution of synthetic images compared to real images, the desired performance cannot be achieved. Real images consist of multiple forms of light orientation, while synthetic images consist of a uniform light orientation. These features are considered to be characteristic of outdoor and indoor scenes, respectively. To solve this problem, the previous method learned a model to improve the realism of the synthetic image. Different from the previous methods, this paper try to purify real image by extracting discriminative and robust features to convert outdoor real images to indoor synthetic images. In this paper, we first introduce the segmentation masks to construct RGB-mask pairs as inputs, then we design a attention-guided style transfer network to learn style features separately from the attention and bkgd(background) region , learn content features from full and attention region. Moreover, we propose a novel region-level task-guided loss to restrain the features learnt from style and content. Experiments were performed using mixed studies (qualitative and quantitative) methods to demonstrate the possibility of purifying real images in complex directions. We evaluate the proposed method on three public datasets, including LPW, COCO and MPIIGaze. Extensive experimental results show that the proposed method is effective and achieves the state-of-the-art results.
Tasks Gaze Estimation, Style Transfer
Published 2019-07-10
URL https://arxiv.org/abs/2002.06145v1
PDF https://arxiv.org/pdf/2002.06145v1.pdf
PWC https://paperswithcode.com/paper/purifying-real-images-with-an-attention
Repo
Framework

Feature Detection and Attenuation in Embeddings

Title Feature Detection and Attenuation in Embeddings
Authors Yuwei Wang, Yan Zheng, Yanqing Peng, Wei Zhang, Feifei Li
Abstract Embedding is one of the fundamental building blocks for data analysis tasks. Although most embedding schemes are designed to be domain-specific, they have been recently extended to represent various other research domains. However, there are relatively few discussions on analyzing these generated embeddings, and removing undesired features from the embedding. In this paper, we first propose an innovative embedding analyzing method that quantitatively measures the features in the embedding data. We then propose an unsupervised method to remove or alleviate undesired features in the embedding by applying Domain Adversarial Network (DAN). Our empirical results demonstrate that the proposed algorithm has good performance on both industry and natural language processing benchmark datasets.
Tasks
Published 2019-10-13
URL https://arxiv.org/abs/1910.05862v1
PDF https://arxiv.org/pdf/1910.05862v1.pdf
PWC https://paperswithcode.com/paper/feature-detection-and-attenuation-in
Repo
Framework

Deep Reference Generation with Multi-Domain Hierarchical Constraints for Inter Prediction

Title Deep Reference Generation with Multi-Domain Hierarchical Constraints for Inter Prediction
Authors Jiaying Liu, Sifeng Xia, Wenhan Yang
Abstract Inter prediction is an important module in video coding for temporal redundancy removal, where similar reference blocks are searched from previously coded frames and employed to predict the block to be coded. Although traditional video codecs can estimate and compensate for block-level motions, their inter prediction performance is still heavily affected by the remaining inconsistent pixel-wise displacement caused by irregular rotation and deformation. In this paper, we address the problem by proposing a deep frame interpolation network to generate additional reference frames in coding scenarios. First, we summarize the previous adaptive convolutions used for frame interpolation and propose a factorized kernel convolutional network to improve the modeling capacity and simultaneously keep its compact form. Second, to better train this network, multi-domain hierarchical constraints are introduced to regularize the training of our factorized kernel convolutional network. For spatial domain, we use a gradually down-sampled and up-sampled auto-encoder to generate the factorized kernels for frame interpolation at different scales. For quality domain, considering the inconsistent quality of the input frames, the factorized kernel convolution is modulated with quality-related features to learn to exploit more information from high quality frames. For frequency domain, a sum of absolute transformed difference loss that performs frequency transformation is utilized to facilitate network optimization from the view of coding performance. With the well-designed frame interpolation network regularized by multi-domain hierarchical constraints, our method surpasses HEVC on average 6.1% BD-rate saving and up to 11.0% BD-rate saving for the luma component under the random access configuration.
Tasks
Published 2019-05-16
URL https://arxiv.org/abs/1905.06567v1
PDF https://arxiv.org/pdf/1905.06567v1.pdf
PWC https://paperswithcode.com/paper/deep-reference-generation-with-multi-domain
Repo
Framework

Mobility restores the mechanism which supports cooperation in the voluntary prisoner’s dilemma game

Title Mobility restores the mechanism which supports cooperation in the voluntary prisoner’s dilemma game
Authors Marcos Cardinot, Colm O’Riordan, Josephine Griffith, Attila Szolnoki
Abstract It is generally believed that in a situation where individual and collective interests are in conflict, the availability of optional participation is a key mechanism to maintain cooperation. Surprisingly, this effect is sensitive to the use of microscopic dynamics and can easily be broken when agents make a fully rational decision during their strategy updates. In the framework of the celebrated prisoner’s dilemma game, we show that this discrepancy can be fixed automatically if we leave the strict and frequently artifact condition of a fully occupied interaction graph, and allow agents to change not just their strategies but also their positions according to their success. In this way, a diluted graph where agents may move offers a natural and alternative way to handle artifacts arising from the application of specific and sometimes awkward microscopic rules.
Tasks
Published 2019-07-11
URL https://arxiv.org/abs/1907.05482v1
PDF https://arxiv.org/pdf/1907.05482v1.pdf
PWC https://paperswithcode.com/paper/mobility-restores-the-mechanism-which
Repo
Framework

Privacy-Preserving Generalized Linear Models using Distributed Block Coordinate Descent

Title Privacy-Preserving Generalized Linear Models using Distributed Block Coordinate Descent
Authors Erik-Jan van Kesteren, Chang Sun, Daniel L. Oberski, Michel Dumontier, Lianne Ippel
Abstract Combining data from varied sources has considerable potential for knowledge discovery: collaborating data parties can mine data in an expanded feature space, allowing them to explore a larger range of scientific questions. However, data sharing among different parties is highly restricted by legal conditions, ethical concerns, and / or data volume. Fueled by these concerns, the fields of cryptography and distributed learning have made great progress towards privacy-preserving and distributed data mining. However, practical implementations have been hampered by the limited scope or computational complexity of these methods. In this paper, we greatly extend the range of analyses available for vertically partitioned data, i.e., data collected by separate parties with different features on the same subjects. To this end, we present a novel approach for privacy-preserving generalized linear models, a fundamental and powerful framework underlying many prediction and classification procedures. We base our method on a distributed block coordinate descent algorithm to obtain parameter estimates, and we develop an extension to compute accurate standard errors without additional communication cost. We critically evaluate the information transfer for semi-honest collaborators and show that our protocol is secure against data reconstruction. Through both simulated and real-world examples we illustrate the functionality of our proposed algorithm. Without leaking information, our method performs as well on vertically partitioned data as existing methods on combined data – all within mere minutes of computation time. We conclude that our method is a viable approach for vertically partitioned data analysis with a wide range of real-world applications.
Tasks
Published 2019-11-08
URL https://arxiv.org/abs/1911.03183v1
PDF https://arxiv.org/pdf/1911.03183v1.pdf
PWC https://paperswithcode.com/paper/privacy-preserving-generalized-linear-models
Repo
Framework

Offset Calibration for Appearance-Based Gaze Estimation via Gaze Decomposition

Title Offset Calibration for Appearance-Based Gaze Estimation via Gaze Decomposition
Authors Zhaokang Chen, Bertram E. Shi
Abstract Appearance-based gaze estimation provides relatively unconstrained gaze tracking. However, subject-independent models achieve limited accuracy partly due to individual variations. To improve estimation, we propose a novel gaze decomposition method and a single gaze point calibration method, motivated by our finding that the inter-subject squared bias exceeds the intra-subject variance for a subject-independent estimator. We decompose the gaze angle into a subject-dependent bias term and a subject-independent term between the gaze angle and the bias. The subject-independent term is estimated by a deep convolutional network. For calibration-free tracking, we set the subject-dependent bias term to zero. For single gaze point calibration, we estimate the bias from a few images taken as the subject gazes at a point. Experiments on three datasets indicate that as a calibration-free estimator, the proposed method outperforms the state-of-the-art methods by up to $10.0%$. The proposed calibration method is robust and reduces estimation error significantly (up to $35.6%$), achieving state-of-the-art performance for appearance-based eye trackers with calibration.
Tasks Calibration, Gaze Estimation
Published 2019-05-11
URL https://arxiv.org/abs/1905.04451v2
PDF https://arxiv.org/pdf/1905.04451v2.pdf
PWC https://paperswithcode.com/paper/appearance-based-gaze-estimation-via-gaze
Repo
Framework

FastSpeech: Fast, Robust and Controllable Text to Speech

Title FastSpeech: Fast, Robust and Controllable Text to Speech
Authors Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu
Abstract Neural network based end-to-end text to speech (TTS) has significantly improved the quality of synthesized speech. Prominent methods (e.g., Tacotron 2) usually first generate mel-spectrogram from text, and then synthesize speech from the mel-spectrogram using vocoder such as WaveNet. Compared with traditional concatenative and statistical parametric approaches, neural network based end-to-end models suffer from slow inference speed, and the synthesized speech is usually not robust (i.e., some words are skipped or repeated) and lack of controllability (voice speed or prosody control). In this work, we propose a novel feed-forward network based on Transformer to generate mel-spectrogram in parallel for TTS. Specifically, we extract attention alignments from an encoder-decoder based teacher model for phoneme duration prediction, which is used by a length regulator to expand the source phoneme sequence to match the length of the target mel-spectrogram sequence for parallel mel-spectrogram generation. Experiments on the LJSpeech dataset show that our parallel model matches autoregressive models in terms of speech quality, nearly eliminates the problem of word skipping and repeating in particularly hard cases, and can adjust voice speed smoothly. Most importantly, compared with autoregressive Transformer TTS, our model speeds up mel-spectrogram generation by 270x and the end-to-end speech synthesis by 38x. Therefore, we call our model FastSpeech.
Tasks Speech Synthesis, Text-To-Speech Synthesis
Published 2019-05-22
URL https://arxiv.org/abs/1905.09263v5
PDF https://arxiv.org/pdf/1905.09263v5.pdf
PWC https://paperswithcode.com/paper/fastspeech-fast-robust-and-controllable-text
Repo
Framework

Transfer Learning Across Simulated Robots With Different Sensors

Title Transfer Learning Across Simulated Robots With Different Sensors
Authors Hélène Plisnier, Denis Steckelmacher, Diederik Roijers, Ann Nowé
Abstract For a robot to learn a good policy, it often requires expensive equipment (such as sophisticated sensors) and a prepared training environment conducive to learning. However, it is seldom possible to perfectly equip robots for economic reasons, nor to guarantee ideal learning conditions, when deployed in real-life environments. A solution would be to prepare the robot in the lab environment, when all necessary material is available to learn a good policy. After training in the lab, the robot should be able to get by without the expensive equipment that used to be available to it, and yet still be guaranteed to perform well on the field. The transition between the lab (source) and the real-world environment (target) is related to transfer learning, where the state-space between the source and target tasks differ. We tackle a simulated task with continuous states and discrete actions presenting this challenge, using Bootstrapped Dual Policy Iteration, a model-free actor-critic reinforcement learning algorithm, and Policy Shaping. Specifically, we train a BDPI agent, embodied by a virtual robot performing a task in the V-Rep simulator, sensing its environment through several proximity sensors. The resulting policy is then used by a second agent learning the same task in the same environment, but with camera images as input. The goal is to obtain a policy able to perform the task relying on merely camera images.
Tasks Transfer Learning
Published 2019-07-18
URL https://arxiv.org/abs/1907.07958v1
PDF https://arxiv.org/pdf/1907.07958v1.pdf
PWC https://paperswithcode.com/paper/transfer-learning-across-simulated-robots
Repo
Framework

Social Credibility Incorporating Semantic Analysis and Machine Learning: A Survey of the State-of-the-Art and Future Research Directions

Title Social Credibility Incorporating Semantic Analysis and Machine Learning: A Survey of the State-of-the-Art and Future Research Directions
Authors Bilal Abu-Salih, Bushra Bremie, Pornpit Wongthongtham, Kevin Duan, Tomayess Issa, Kit Yan Chan, Mohammad Alhabashneh, Teshreen Albtoush, Sulaiman Alqahtani, Abdullah Alqahtani, Muteeb Alahmari, Naser Alshareef, Abdulaziz Albahlal
Abstract The wealth of Social Big Data (SBD) represents a unique opportunity for organisations to obtain the excessive use of such data abundance to increase their revenues. Hence, there is an imperative need to capture, load, store, process, analyse, transform, interpret, and visualise such manifold social datasets to develop meaningful insights that are specific to an application domain. This paper lays the theoretical background by introducing the state-of-the-art literature review of the research topic. This is associated with a critical evaluation of the current approaches, and fortified with certain recommendations indicated to bridge the research gap.
Tasks
Published 2019-02-27
URL http://arxiv.org/abs/1902.10402v1
PDF http://arxiv.org/pdf/1902.10402v1.pdf
PWC https://paperswithcode.com/paper/social-credibility-incorporating-semantic
Repo
Framework

The Steep Road to Happily Ever After: An Analysis of Current Visual Storytelling Models

Title The Steep Road to Happily Ever After: An Analysis of Current Visual Storytelling Models
Authors Yatri Modi, Natalie Parde
Abstract Visual storytelling is an intriguing and complex task that only recently entered the research arena. In this work, we survey relevant work to date, and conduct a thorough error analysis of three very recent approaches to visual storytelling. We categorize and provide examples of common types of errors, and identify key shortcomings in current work. Finally, we make recommendations for addressing these limitations in the future.
Tasks Visual Storytelling
Published 2019-04-06
URL http://arxiv.org/abs/1904.03366v1
PDF http://arxiv.org/pdf/1904.03366v1.pdf
PWC https://paperswithcode.com/paper/the-steep-road-to-happily-ever-after-an
Repo
Framework

Online Heterogeneous Mixture Learning for Big Data

Title Online Heterogeneous Mixture Learning for Big Data
Authors Kazuki Seshimo, Ota Akira, Nishio Daichi, Yamane Satoshi
Abstract We propose the online machine learning for big data analysis with heterogeneity. We performed an experiment to compare the accuracy of each iteration between batch one and online one. It is possible to converge quickly with the same accuracy as the batch one.
Tasks
Published 2019-06-15
URL https://arxiv.org/abs/1906.08068v1
PDF https://arxiv.org/pdf/1906.08068v1.pdf
PWC https://paperswithcode.com/paper/online-heterogeneous-mixture-learning-for-big
Repo
Framework

Embedding models for recommendation under contextual constraints

Title Embedding models for recommendation under contextual constraints
Authors Syrine Krichene, Mike Gartrell, Clement Calauzenes
Abstract Embedding models, which learn latent representations of users and items based on user-item interaction patterns, are a key component of recommendation systems. In many applications, contextual constraints need to be applied to refine recommendations, e.g. when a user specifies a price range or product category filter. The conventional approach, for both context-aware and standard models, is to retrieve items and apply the constraints as independent operations. The order in which these two steps are executed can induce significant problems. For example, applying constraints a posteriori can result in incomplete recommendations or low-quality results for the tail of the distribution (i.e., less popular items). As a result, the additional information that the constraint brings about user intent may not be accurately captured. In this paper we propose integrating the information provided by the contextual constraint into the similarity computation, by merging constraint application and retrieval into one operation in the embedding space. This technique allows us to generate high-quality recommendations for the specified constraint. Our approach learns constraints representations jointly with the user and item embeddings. We incorporate our methods into a matrix factorization model, and perform an experimental evaluation on one internal and two real-world datasets. Our results show significant improvements in predictive performance compared to context-aware and standard models.
Tasks Recommendation Systems
Published 2019-06-21
URL https://arxiv.org/abs/1907.01637v1
PDF https://arxiv.org/pdf/1907.01637v1.pdf
PWC https://paperswithcode.com/paper/embedding-models-for-recommendation-under
Repo
Framework

MLR (Memory, Learning and Recognition): A General Cognitive Model – applied to Intelligent Robots and Systems Control

Title MLR (Memory, Learning and Recognition): A General Cognitive Model – applied to Intelligent Robots and Systems Control
Authors Aras R. Dargazany
Abstract This paper introduces a new perspective of intelligent robots and systems control. The presented and proposed cognitive model: Memory, Learning and Recognition (MLR), is an effort to bridge the gap between Robotics, AI, Cognitive Science, and Neuroscience. The currently existing gap prevents us from integrating the current advancement and achievements of these four research fields which are actively trying to define intelligence in either application-based way or in generic way. This cognitive model defines intelligence more specifically, parametrically and detailed. The proposed MLR model helps us create a general control model for robots and systems independent of their application domains and platforms since it is mainly based on the dataset provided for robots and systems controls. This paper is mainly proposing and introducing this concept and trying to prove this concept in a small scale, firstly through experimentation. The proposed concept is also applicable to other different platforms in real-time as well as in simulation.
Tasks
Published 2019-07-12
URL https://arxiv.org/abs/1907.05553v1
PDF https://arxiv.org/pdf/1907.05553v1.pdf
PWC https://paperswithcode.com/paper/mlr-memory-learning-and-recognition-a-general
Repo
Framework
comments powered by Disqus