January 27, 2020

3178 words 15 mins read

Paper Group ANR 1168

On the Separability of Classes with the Cross-Entropy Loss Function. Deep Workpiece Region Segmentation for Bin Picking. Anchor Loss: Modulating Loss Scale based on Prediction Difficulty. Research Directions in Democratizing Innovation through Design Automation, One-Click Manufacturing Services and Intelligent Machines. How Old Are You? Face Age Tr …

On the Separability of Classes with the Cross-Entropy Loss Function


Title	On the Separability of Classes with the Cross-Entropy Loss Function
Authors	Rudrajit Das, Subhasis Chaudhuri
Abstract	In this paper, we focus on the separability of classes with the cross-entropy loss function for classification problems by theoretically analyzing the intra-class distance and inter-class distance (i.e. the distance between any two points belonging to the same class and different classes, respectively) in the feature space, i.e. the space of representations learnt by neural networks. Specifically, we consider an arbitrary network architecture having a fully connected final layer with Softmax activation and trained using the cross-entropy loss. We derive expressions for the value and the distribution of the squared L2 norm of the product of a network dependent matrix and a random intra-class and inter-class distance vector (i.e. the vector between any two points belonging to the same class and different classes), respectively, in the learnt feature space (or the transformation of the original data) just before Softmax activation, as a function of the cross-entropy loss value. The main result of our analysis is the derivation of a lower bound for the probability with which the inter-class distance is more than the intra-class distance in this feature space, as a function of the loss value. We do so by leveraging some empirical statistical observations with mild assumptions and sound theoretical analysis. As per intuition, the probability with which the inter-class distance is more than the intra-class distance decreases as the loss value increases, i.e. the classes are better separated when the loss value is low. To the best of our knowledge, this is the first work of theoretical nature trying to explain the separability of classes in the feature space learnt by neural networks trained with the cross-entropy loss function.
Tasks
Published	2019-09-16
URL	https://arxiv.org/abs/1909.06930v1
PDF	https://arxiv.org/pdf/1909.06930v1.pdf
PWC	https://paperswithcode.com/paper/on-the-separability-of-classes-with-the-cross
Repo
Framework

Deep Workpiece Region Segmentation for Bin Picking


Title	Deep Workpiece Region Segmentation for Bin Picking
Authors	Muhammad Usman Khalid, Janik M. Hager, Werner Kraus, Marco F. Huber, Marc Toussaint
Abstract	For most industrial bin picking solutions, the pose of a workpiece is localized by matching a CAD model to point cloud obtained from 3D sensor. Distinguishing flat workpieces from bottom of the bin in point cloud imposes challenges in the localization of workpieces that lead to wrong or phantom detections. In this paper, we propose a framework that solves this problem by automatically segmenting workpiece regions from non-workpiece regions in a point cloud data. It is done in real time by applying a fully convolutional neural network trained on both simulated and real data. The real data has been labelled by our novel technique which automatically generates ground truth labels for real point clouds. Along with real time workpiece segmentation, our framework also helps in improving the number of detected workpieces and estimating the correct object poses. Moreover, it decreases the computation time by approximately 1s due to a reduction of the search space for the object pose estimation.
Tasks	Pose Estimation
Published	2019-09-08
URL	https://arxiv.org/abs/1909.03462v1
PDF	https://arxiv.org/pdf/1909.03462v1.pdf
PWC	https://paperswithcode.com/paper/deep-workpiece-region-segmentation-for-bin
Repo
Framework

Anchor Loss: Modulating Loss Scale based on Prediction Difficulty


Title	Anchor Loss: Modulating Loss Scale based on Prediction Difficulty
Authors	Serim Ryou, Seong-Gyun Jeong, Pietro Perona
Abstract	We propose a novel loss function that dynamically rescales the cross entropy based on prediction difficulty regarding a sample. Deep neural network architectures in image classification tasks struggle to disambiguate visually similar objects. Likewise, in human pose estimation symmetric body parts often confuse the network with assigning indiscriminative scores to them. This is due to the output prediction, in which only the highest confidence label is selected without taking into consideration a measure of uncertainty. In this work, we define the prediction difficulty as a relative property coming from the confidence score gap between positive and negative labels. More precisely, the proposed loss function penalizes the network to avoid the score of a false prediction being significant. To demonstrate the efficacy of our loss function, we evaluate it on two different domains: image classification and human pose estimation. We find improvements in both applications by achieving higher accuracy compared to the baseline methods.
Tasks	Image Classification, Pose Estimation
Published	2019-09-24
URL	https://arxiv.org/abs/1909.11155v1
PDF	https://arxiv.org/pdf/1909.11155v1.pdf
PWC	https://paperswithcode.com/paper/anchor-loss-modulating-loss-scale-based-on
Repo
Framework

Research Directions in Democratizing Innovation through Design Automation, One-Click Manufacturing Services and Intelligent Machines


Title	Research Directions in Democratizing Innovation through Design Automation, One-Click Manufacturing Services and Intelligent Machines
Authors	Binil Starly, Atin Angrish, Paul Cohen
Abstract	The digitalization of manufacturing has created opportunities for consumers to customize products that fit their individualized needs which in turn would drive demand for manufacturing services. However, this pull-based manufacturing system production of extremely low quantity and limitless variety for products is expensive to implement. New emerging technology in design automation driven by data-driven computational design, manufacturing-as-a-service marketplaces and digitally enabled micro-factories holds promise towards democratization of innovation. In this paper, scientific, technology and infrastructure challenges are identified and if solved, the impact of these emerging technologies on product innovation and future factory organization is discussed.
Tasks
Published	2019-09-23
URL	https://arxiv.org/abs/1909.10476v1
PDF	https://arxiv.org/pdf/1909.10476v1.pdf
PWC	https://paperswithcode.com/paper/190910476
Repo
Framework

How Old Are You? Face Age Translation with Identity Preservation Using GANs


Title	How Old Are You? Face Age Translation with Identity Preservation Using GANs
Authors	Zipeng Wang, Zhaoxiang Liu, Jianfeng Huang, Shiguo Lian, Yimin Lin
Abstract	We present a novel framework to generate images of different age while preserving identity information, which is known as face aging. Different from most recent popular face aging networks utilizing Generative Adversarial Networks(GANs) application, our approach do not simply transfer a young face to an old one. Instead, we employ the edge map as intermediate representations, firstly edge maps of young faces are extracted, a CycleGAN-based network is adopted to transfer them into edge maps of old faces, then another pix2pixHD-based network is adopted to transfer the synthesized edge maps, concatenated with identity information, into old faces. In this way, our method can generate more realistic transfered images, simultaneously ensuring that face identity information be preserved well, and the apparent age of the generated image be accurately appropriate. Experimental results demonstrate that our method is feasible for face age translation.
Tasks
Published	2019-09-11
URL	https://arxiv.org/abs/1909.04988v1
PDF	https://arxiv.org/pdf/1909.04988v1.pdf
PWC	https://paperswithcode.com/paper/how-old-are-you-face-age-translation-with
Repo
Framework

Inferring Dynamical Systems with Long-Range Dependencies through Line Attractor Regularization


Title	Inferring Dynamical Systems with Long-Range Dependencies through Line Attractor Regularization
Authors	Dominik Schmidt, Georgia Koppe, Max Beutelspacher, Daniel Durstewitz
Abstract	Vanilla RNN with ReLU activation have a simple structure that is amenable to systematic dynamical systems analysis and interpretation, but they suffer from the exploding vs. vanishing gradients problem. Recent attempts to retain this simplicity while alleviating the gradient problem are based on proper initialization schemes or orthogonality/unitary constraints on the RNN’s recurrence matrix, which, however, comes with limitations to its expressive power with regards to dynamical systems phenomena like chaos or multi-stability. Here, we instead suggest a regularization scheme that pushes part of the RNN’s latent subspace toward a line attractor configuration that enables long short-term memory and arbitrarily slow time scales. We show that our approach excels on a number of benchmarks like the sequential MNIST or multiplication problems, and enables reconstruction of dynamical systems which harbor widely different time scales.
Tasks
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03471v1
PDF	https://arxiv.org/pdf/1910.03471v1.pdf
PWC	https://paperswithcode.com/paper/inferring-dynamical-systems-with-long-range
Repo
Framework

Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems


Title	Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems
Authors	Guangke Chen, Sen Chen, Lingling Fan, Xiaoning Du, Zhe Zhao, Fu Song, Yang Liu
Abstract	Speaker recognition (SR) is widely used in our daily life as a biometric authentication mechanism. The popularity of SR brings in serious security concerns, as demonstrated by recent adversarial attacks. However, the impacts of such threats in the practical black-box setting are still open, since current attacks consider the white-box setting only. In this paper, we conduct the first comprehensive and systematic study of the adversarial attacks on SR systems (SRSs) to understand their security weakness in the practical black-box setting. For this purpose, we propose an adversarial attack, named FakeBob, to craft adversarial samples. Specifically, we formulate the adversarial sample generation as an optimization problem, incorporated with the confidence of adversarial samples and maximal distortion to balance between the strength and imperceptibility of adversarial voices. One key contribution is to propose a novel algorithm to estimate the score threshold, a feature in SRSs, and use it in the optimization problem to solve the optimization problem. We demonstrate that FakeBob achieves close to 100% targeted attack success rate on both open-source and commercial systems. We further demonstrate that FakeBob is also effective (at least 65% untargeted success rate) on both open-source and commercial systems when playing over the air in the physical world. Moreover, we have conducted a human study which reveals that it is hard for human to differentiate the speakers of the original and adversarial voices. Last but not least, we show that three promising defense methods for adversarial attack from the speech recognition domain become ineffective on SRSs against FakeBob, which calls for more effective defense methods. We highlight that our study peeks into the security implications of adversarial attacks on SRSs, and realistically fosters to improve the security robustness of SRSs.
Tasks	Adversarial Attack, Speaker Recognition, Speech Recognition
Published	2019-11-03
URL	https://arxiv.org/abs/1911.01840v1
PDF	https://arxiv.org/pdf/1911.01840v1.pdf
PWC	https://paperswithcode.com/paper/who-is-real-bob-adversarial-attacks-on
Repo
Framework

Frequency and temporal convolutional attention for text-independent speaker recognition


Title	Frequency and temporal convolutional attention for text-independent speaker recognition
Authors	Sarthak Yadav, Atul Rai
Abstract	Majority of the recent approaches for text-independent speaker recognition apply attention or similar techniques for aggregation of frame-level feature descriptors generated by a deep neural network (DNN) front-end. In this paper, we propose methods of convolutional attention for independently modelling temporal and frequency information in a convolutional neural network (CNN) based front-end. Our system utilizes convolutional block attention modules (CBAMs) [1] appropriately modified to accommodate spectrogram inputs. The proposed CNN front-end fitted with the proposed convolutional attention modules outperform the no-attention and spatial-CBAM baselines by a significant margin on the VoxCeleb [2, 3] speaker verification benchmark, and our best model achieves an equal error rate of 2:031% on the VoxCeleb1 test set, improving the existing state of the art result by a significant margin. For a more thorough assessment of the effects of frequency and temporal attention in real-world conditions, we conduct ablation experiments by randomly dropping frequency bins and temporal frames from the input spectrograms, concluding that instead of modelling either of the entities, simultaneously modelling temporal and frequency attention translates to better real-world performance.
Tasks	Speaker Recognition, Speaker Verification, Text-Independent Speaker Recognition
Published	2019-10-16
URL	https://arxiv.org/abs/1910.07364v2
PDF	https://arxiv.org/pdf/1910.07364v2.pdf
PWC	https://paperswithcode.com/paper/frequency-and-temporal-convolutional
Repo
Framework

Online learning-based Model Predictive Control with Gaussian Process Models and Stability Guarantees


Title	Online learning-based Model Predictive Control with Gaussian Process Models and Stability Guarantees
Authors	Michael Maiworm, Daniel Limon, Rolf Findeisen
Abstract	Model predictive control allows to provide high performance and safety guarantees in the form of constraint satisfaction. These properties however can be satisfied only if the underlying model used for prediction of the controlled process is sufficiently accurate. One way to address this challenge is by data-driven and machine learning approaches, such as Gaussian processes, that allow to refine the model online during operation. We present a combination of an output feedback model predictive control scheme and a Gaussian process based prediction model that is capable of efficient online learning. To this end the concept of evolving Gaussian processes is combined with recursive posterior prediction updates. The presented approach guarantees recursive constraint satisfaction and input-to-state stability with respect to the model-plant mismatch. Simulation studies underline that the Gaussian process prediction model can be successfully and efficiently learned online. The resulting computational load is significantly reduced via the combination of the recursive update procedure and by limiting the number of training data points, while maintaining good performance.
Tasks	Gaussian Processes
Published	2019-11-08
URL	https://arxiv.org/abs/1911.03315v2
PDF	https://arxiv.org/pdf/1911.03315v2.pdf
PWC	https://paperswithcode.com/paper/online-gaussian-process-learning-based-model
Repo
Framework

KG-GAN: Knowledge-Guided Generative Adversarial Networks


Title	KG-GAN: Knowledge-Guided Generative Adversarial Networks
Authors	Che-Han Chang, Chun-Hsien Yu, Szu-Ying Chen, Edward Y. Chang
Abstract	Can generative adversarial networks (GANs) generate roses of various colors given only roses of red petals as input? The answer is negative, since GANs’ discriminator would reject all roses of unseen petal colors. In this study, we propose knowledge-guided GAN (KG-GAN) to fuse domain knowledge with the GAN framework. KG-GAN trains two generators; one learns from data whereas the other learns from knowledge with a constraint function. Experimental results demonstrate the effectiveness of KG-GAN in generating unseen flower categories from seen categories given textual descriptions of the unseen ones.
Tasks	Image Generation
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12261v2
PDF	https://arxiv.org/pdf/1905.12261v2.pdf
PWC	https://paperswithcode.com/paper/kg-gan-knowledge-guided-generative
Repo
Framework

Neural Semantic Parsing with Anonymization for Command Understanding in General-Purpose Service Robots


Title	Neural Semantic Parsing with Anonymization for Command Understanding in General-Purpose Service Robots
Authors	Nick Walker, Yu-Tang Peng, Maya Cakmak
Abstract	Service robots are envisioned to undertake a wide range of tasks at the request of users. Semantic parsing is one way to convert natural language commands given to these robots into executable representations. Methods for creating semantic parsers, however, rely either on large amounts of data or on engineered lexical features and parsing rules, which has limited their application in robotics. To address this challenge, we propose an approach that leverages neural semantic parsing methods in combination with contextual word embeddings to enable the training of a semantic parser with little data and without domain specific parser engineering. Key to our approach is the use of an anonymized target representation which is more easily learned by the parser. In most cases, this simplified representation can trivially be transformed into an executable format, and in others the parse can be completed through further interaction with the user. We evaluate this approach in the context of the RoboCup@Home General Purpose Service Robot task, where we have collected a corpus of paraphrased versions of commands from the standardized command generator. Our results show that neural semantic parsers can predict the logical form of unseen commands with 89% accuracy. We release our data and the details of our models to encourage further development from the RoboCup and service robotics communities.
Tasks	Semantic Parsing, Word Embeddings
Published	2019-07-02
URL	https://arxiv.org/abs/1907.01115v1
PDF	https://arxiv.org/pdf/1907.01115v1.pdf
PWC	https://paperswithcode.com/paper/neural-semantic-parsing-with-anonymization
Repo
Framework

Maximal adversarial perturbations for obfuscation: Hiding certain attributes while preserving rest


Title	Maximal adversarial perturbations for obfuscation: Hiding certain attributes while preserving rest
Authors	Indu Ilanchezian, Praneeth Vepakomma, Abhishek Singh, Otkrist Gupta, G. N. Srinivasa Prasanna, Ramesh Raskar
Abstract	In this paper we investigate the usage of adversarial perturbations for the purpose of privacy from human perception and model (machine) based detection. We employ adversarial perturbations for obfuscating certain variables in raw data while preserving the rest. Current adversarial perturbation methods are used for data poisoning with minimal perturbations of the raw data such that the machine learning model’s performance is adversely impacted while the human vision cannot perceive the difference in the poisoned dataset due to minimal nature of perturbations. We instead apply relatively maximal perturbations of raw data to conditionally damage model’s classification of one attribute while preserving the model performance over another attribute. In addition, the maximal nature of perturbation helps adversely impact human perception in classifying hidden attribute apart from impacting model performance. We validate our result qualitatively by showing the obfuscated dataset and quantitatively by showing the inability of models trained on clean data to predict the hidden attribute from the perturbed dataset while being able to predict the rest of attributes.
Tasks	data poisoning
Published	2019-09-27
URL	https://arxiv.org/abs/1909.12734v1
PDF	https://arxiv.org/pdf/1909.12734v1.pdf
PWC	https://paperswithcode.com/paper/maximal-adversarial-perturbations-for
Repo
Framework

DwNet: Dense warp-based network for pose-guided human video generation


Title	DwNet: Dense warp-based network for pose-guided human video generation
Authors	Polina Zablotskaia, Aliaksandr Siarohin, Bo Zhao, Leonid Sigal
Abstract	Generation of realistic high-resolution videos of human subjects is a challenging and important task in computer vision. In this paper, we focus on human motion transfer - generation of a video depicting a particular subject, observed in a single image, performing a series of motions exemplified by an auxiliary (driving) video. Our GAN-based architecture, DwNet, leverages dense intermediate pose-guided representation and refinement process to warp the required subject appearance, in the form of the texture, from a source image into a desired pose. Temporal consistency is maintained by further conditioning the decoding process within a GAN on the previously generated frame. In this way a video is generated in an iterative and recurrent fashion. We illustrate the efficacy of our approach by showing state-of-the-art quantitative and qualitative performance on two benchmark datasets: TaiChi and Fashion Modeling. The latter is collected by us and will be made publicly available to the community.
Tasks	Video Generation
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09139v1
PDF	https://arxiv.org/pdf/1910.09139v1.pdf
PWC	https://paperswithcode.com/paper/dwnet-dense-warp-based-network-for-pose
Repo
Framework

Using Supervised Learning to Classify Metadata of Research Data by Discipline of Research


Title	Using Supervised Learning to Classify Metadata of Research Data by Discipline of Research
Authors	Tobias Weber, Dieter Kranzlmüller, Michael Fromm, Nelson Tavares de Sousa
Abstract	Automated classification of metadata of research data by their discipline(s) of research can be used in scientometric research, by repository service providers, and in the context of research data aggregation services. Openly available metadata of the DataCite index for research data were used to compile a large training and evaluation set comprised of 609,524 records, which is published alongside this paper. These data allow to reproducibly assess classification approaches, such as tree-based models and neural networks. According to our experiments with 20 base classes (multi-label classification), multi-layer perceptron models perform best with a f1-macro score of 0.760 closely followed by Long Short-Term Memory models (f1-macro score of 0.755). A possible application of the trained classification models is the quantitative analysis of trends towards interdisciplinarity of digital scholarly output or the characterization of growth patterns of research data, stratified by discipline of research. Both applications perform at scale with the proposed models which are available for re-use.
Tasks	Multi-Label Classification
Published	2019-10-16
URL	https://arxiv.org/abs/1910.09313v1
PDF	https://arxiv.org/pdf/1910.09313v1.pdf
PWC	https://paperswithcode.com/paper/using-supervised-learning-to-classify
Repo
Framework

On stochastic gradient Langevin dynamics with dependent data streams: the fully non-convex case


Title	On stochastic gradient Langevin dynamics with dependent data streams: the fully non-convex case
Authors	Ngoc Huy Chau, Éric Moulines, Miklos Rásonyi, Sotirios Sabanis, Ying Zhang
Abstract	We consider the problem of sampling from a target distribution, which is \emph{not necessarily logconcave}, in the context of empirical risk minimization and stochastic optimization as presented in \cite{raginsky2017non}. Non-asymptotic analysis results are established in a suitable Wasserstein-type distance for the behaviour of Stochastic Gradient Langevin Dynamics (SGLD) algorithms. We allow the estimation of gradients to be performed even in the presence of \emph{dependent} data streams. Our convergence estimates are sharper and \emph{uniform} in the number of iterations, in contrast to those in previous studies.
Tasks	Stochastic Optimization
Published	2019-05-30
URL	https://arxiv.org/abs/1905.13142v2
PDF	https://arxiv.org/pdf/1905.13142v2.pdf
PWC	https://paperswithcode.com/paper/on-stochastic-gradient-langevin-dynamics-with-1
Repo
Framework