January 27, 2020

3126 words 15 mins read

Paper Group ANR 1125

Paper Group ANR 1125

Shape-Aware Human Pose and Shape Reconstruction Using Multi-View Images. On the notion of number in humans and machines. STM: SpatioTemporal and Motion Encoding for Action Recognition. A Novel Self-Intersection Penalty Term for Statistical Body Shape Models and Its Applications in 3D Pose Estimation. Keyword Spotting for Hearing Assistive Devices R …

Shape-Aware Human Pose and Shape Reconstruction Using Multi-View Images

Title Shape-Aware Human Pose and Shape Reconstruction Using Multi-View Images
Authors Junbang Liang, Ming C. Lin
Abstract We propose a scalable neural network framework to reconstruct the 3D mesh of a human body from multi-view images, in the subspace of the SMPL model. Use of multi-view images can significantly reduce the projection ambiguity of the problem, increasing the reconstruction accuracy of the 3D human body under clothing. Our experiments show that this method benefits from the synthetic dataset generated from our pipeline since it has good flexibility of variable control and can provide ground-truth for validation. Our method outperforms existing methods on real-world images, especially on shape estimations.
Tasks 3D Human Pose Estimation
Published 2019-08-26
URL https://arxiv.org/abs/1908.09464v1
PDF https://arxiv.org/pdf/1908.09464v1.pdf
PWC https://paperswithcode.com/paper/shape-aware-human-pose-and-shape
Repo
Framework

On the notion of number in humans and machines

Title On the notion of number in humans and machines
Authors Norbert Bátfai, Dávid Papp, Gergő Bogacsovics, Máté Szabó, Viktor Szilárd Simkó, Márió Bersenszki, Gergely Szabó, Lajos Kovács, Ferencz Kovács, Erik Szilveszter Varga
Abstract In this paper, we performed two types of software experiments to study the numerosity classification (subitizing) in humans and machines. Experiments focus on a particular kind of task is referred to as Semantic MNIST or simply SMNIST where the numerosity of objects placed in an image must be determined. The experiments called SMNIST for Humans are intended to measure the capacity of the Object File System in humans. In this type of experiment the measurement result is in well agreement with the value known from the cognitive psychology literature. The experiments called SMNIST for Machines serve similar purposes but they investigate existing, well known (but originally developed for other purpose) and under development deep learning computer programs. These measurement results can be interpreted similar to the results from SMNIST for Humans. The main thesis of this paper can be formulated as follows: in machines the image classification artificial neural networks can learn to distinguish numerosities with better accuracy when these numerosities are smaller than the capacity of OFS in humans. Finally, we outline a conceptual framework to investigate the notion of number in humans and machines.
Tasks Image Classification
Published 2019-06-27
URL https://arxiv.org/abs/1906.12213v1
PDF https://arxiv.org/pdf/1906.12213v1.pdf
PWC https://paperswithcode.com/paper/on-the-notion-of-number-in-humans-and
Repo
Framework

STM: SpatioTemporal and Motion Encoding for Action Recognition

Title STM: SpatioTemporal and Motion Encoding for Action Recognition
Authors Boyuan Jiang, Mengmeng Wang, Weihao Gan, Wei Wu, Junjie Yan
Abstract Spatiotemporal and motion features are two complementary and crucial information for video action recognition. Recent state-of-the-art methods adopt a 3D CNN stream to learn spatiotemporal features and another flow stream to learn motion features. In this work, we aim to efficiently encode these two features in a unified 2D framework. To this end, we first propose an STM block, which contains a Channel-wise SpatioTemporal Module (CSTM) to present the spatiotemporal features and a Channel-wise Motion Module (CMM) to efficiently encode motion features. We then replace original residual blocks in the ResNet architecture with STM blcoks to form a simple yet effective STM network by introducing very limited extra computation cost. Extensive experiments demonstrate that the proposed STM network outperforms the state-of-the-art methods on both temporal-related datasets (i.e., Something-Something v1 & v2 and Jester) and scene-related datasets (i.e., Kinetics-400, UCF-101, and HMDB-51) with the help of encoding spatiotemporal and motion features together.
Tasks Action Classification, Action Recognition In Videos, Temporal Action Localization
Published 2019-08-07
URL https://arxiv.org/abs/1908.02486v2
PDF https://arxiv.org/pdf/1908.02486v2.pdf
PWC https://paperswithcode.com/paper/stm-spatiotemporal-and-motion-encoding-for
Repo
Framework

A Novel Self-Intersection Penalty Term for Statistical Body Shape Models and Its Applications in 3D Pose Estimation

Title A Novel Self-Intersection Penalty Term for Statistical Body Shape Models and Its Applications in 3D Pose Estimation
Authors Zaiqiang Wu, Wei Jiang, Hao Luo, Lin Cheng
Abstract Statistical body shape models are widely used in 3D pose estimation due to their low-dimensional parameters representation. However, it is difficult to avoid self-intersection between body parts accurately. Motivated by this fact, we proposed a novel self-intersection penalty term for statistical body shape models applied in 3D pose estimation. To avoid the trouble of computing self-intersection for complex surfaces like the body meshes, the gradient of our proposed self-intersection penalty term is manually derived from the perspective of geometry. First, the self-intersection penalty term is defined as the volume of the self-intersection region. To calculate the partial derivatives with respect to the coordinates of the vertices, we employed detection rays to divide vertices of statistical body shape models into different groups depending on whether the vertex is in the region of self-intersection. Second, the partial derivatives could be easily derived by the normal vectors of neighboring triangles of the vertices. Finally, this penalty term could be applied in gradient-based optimization algorithms to remove the self-intersection of triangular meshes without using any approximation. Qualitative and quantitative evaluations were conducted to demonstrate the effectiveness and generality of our proposed method compared with previous approaches. The experimental results show that our proposed penalty term can avoid self-intersection to exclude unreasonable predictions and improves the accuracy of 3D pose estimation indirectly. Further more, the proposed method could be employed universally in triangular mesh based 3D reconstruction.
Tasks 3D Pose Estimation, 3D Reconstruction, Pose Estimation
Published 2019-01-24
URL http://arxiv.org/abs/1901.08274v1
PDF http://arxiv.org/pdf/1901.08274v1.pdf
PWC https://paperswithcode.com/paper/a-novel-self-intersection-penalty-term-for
Repo
Framework

Keyword Spotting for Hearing Assistive Devices Robust to External Speakers

Title Keyword Spotting for Hearing Assistive Devices Robust to External Speakers
Authors Iván López-Espejo, Zheng-Hua Tan, Jesper Jensen
Abstract Keyword spotting (KWS) is experiencing an upswing due to the pervasiveness of small electronic devices that allow interaction with them via speech. Often, KWS systems are speaker-independent, which means that any person –user or not– might trigger them. For applications like KWS for hearing assistive devices this is unacceptable, as only the user must be allowed to handle them. In this paper we propose KWS for hearing assistive devices that is robust to external speakers. A state-of-the-art deep residual network for small-footprint KWS is regarded as a basis to build upon. By following a multi-task learning scheme, this system is extended to jointly perform KWS and users’ own-voice/external speaker detection with a negligible increase in the number of parameters. For experiments, we generate from the Google Speech Commands Dataset a speech corpus emulating hearing aids as a capturing device. Our results show that this multi-task deep residual network is able to achieve a KWS accuracy relative improvement of around 32% with respect to a system that does not deal with external speakers.
Tasks Keyword Spotting, Multi-Task Learning
Published 2019-06-22
URL https://arxiv.org/abs/1906.09417v2
PDF https://arxiv.org/pdf/1906.09417v2.pdf
PWC https://paperswithcode.com/paper/keyword-spotting-for-hearing-assistive
Repo
Framework

Traffic Sign Detection and Classification around the World

Title Traffic Sign Detection and Classification around the World
Authors Christian Ertler, Jerneja Mislej, Tobias Ollmann, Lorenzo Porzi, Yubin Kuang
Abstract Traffic signs are essential map features globally in the era of autonomous driving and smart cities. To develop accurate and robust algorithms for traffic sign detection and classification, a large-scale and diverse benchmark dataset is required. In this paper, we introduce a traffic sign benchmark dataset of 100K street-level images around the world that encapsulates diverse scenes, wide coverage of geographical locations, and varying weather and lighting conditions and covers more than 300 manually annotated traffic sign classes. The dataset includes 52K images that are fully annotated and 48K images that are partially annotated. This is the largest and the most diverse traffic sign dataset consisting of images from all over world with fine-grained annotations of traffic sign classes. We have run extensive experiments to establish strong baselines for both the detection and the classification tasks. In addition, we have verified that the diversity of this dataset enables effective transfer learning for existing large-scale benchmark datasets on traffic sign detection and classification. The dataset is freely available for academic research: https://www.mapillary.com/dataset/trafficsign.
Tasks Autonomous Driving, Transfer Learning
Published 2019-09-10
URL https://arxiv.org/abs/1909.04422v1
PDF https://arxiv.org/pdf/1909.04422v1.pdf
PWC https://paperswithcode.com/paper/traffic-sign-detection-and-classification
Repo
Framework

Supervised Machine Learning Techniques for Trojan Detection with Ring Oscillator Network

Title Supervised Machine Learning Techniques for Trojan Detection with Ring Oscillator Network
Authors Kyle Worley, Md Tauhidur Rahman
Abstract With the globalization of the semiconductor manufacturing process, electronic devices are powerless against malicious modification of hardware in the supply chain. The ever-increasing threat of hardware Trojan attacks against integrated circuits has spurred a need for accurate and efficient detection methods. Ring oscillator network (RON) is used to detect the Trojan by capturing the difference in power consumption; the power consumption of a Trojan-free circuit is different from the Trojan-inserted circuit. However, the process variation and measurement noise are the major obstacles to detect hardware Trojan with high accuracy. In this paper, we quantitatively compare four supervised machine learning algorithms and classifier optimization strategies for maximizing accuracy and minimizing the false positive rate (FPR). These supervised learning techniques show an improved false positive rate compared to principal component analysis (PCA) and convex hull classification by nearly 40% while maintaining > 90% binary classification accuracy.
Tasks
Published 2019-03-12
URL http://arxiv.org/abs/1903.04677v1
PDF http://arxiv.org/pdf/1903.04677v1.pdf
PWC https://paperswithcode.com/paper/supervised-machine-learning-techniques-for
Repo
Framework

The Sound of Motions

Title The Sound of Motions
Authors Hang Zhao, Chuang Gan, Wei-Chiu Ma, Antonio Torralba
Abstract Sounds originate from object motions and vibrations of surrounding air. Inspired by the fact that humans is capable of interpreting sound sources from how objects move visually, we propose a novel system that explicitly captures such motion cues for the task of sound localization and separation. Our system is composed of an end-to-end learnable model called Deep Dense Trajectory (DDT), and a curriculum learning scheme. It exploits the inherent coherence of audio-visual signals from a large quantities of unlabeled videos. Quantitative and qualitative evaluations show that comparing to previous models that rely on visual appearance cues, our motion based system improves performance in separating musical instrument sounds. Furthermore, it separates sound components from duets of the same category of instruments, a challenging problem that has not been addressed before.
Tasks
Published 2019-04-11
URL http://arxiv.org/abs/1904.05979v1
PDF http://arxiv.org/pdf/1904.05979v1.pdf
PWC https://paperswithcode.com/paper/the-sound-of-motions
Repo
Framework

Optimistic robust linear quadratic dual control

Title Optimistic robust linear quadratic dual control
Authors Jack Umenberger, Thomas B. Schon
Abstract Recent work by Mania et al. has proved that certainty equivalent control achieves nearly optimal regret for linear systems with quadratic costs. However, when parameter uncertainty is large, certainty equivalence cannot be relied upon to stabilize the true, unknown system. In this paper, we present a dual control strategy that attempts to combine the performance of certainty equivalence, with the practical utility of robustness. The formulation preserves structure in the representation of parametric uncertainty, which allows the controller to target reduction of uncertainty in the parameters that `matter most’ for the control task, while robustly stabilizing the uncertain system. Control synthesis proceeds via convex optimization, and the method is illustrated on a numerical example. |
Tasks
Published 2019-12-31
URL https://arxiv.org/abs/1912.13143v1
PDF https://arxiv.org/pdf/1912.13143v1.pdf
PWC https://paperswithcode.com/paper/optimistic-robust-linear-quadratic-dual
Repo
Framework

The Effect of Downstream Classification Tasks for Evaluating Sentence Embeddings

Title The Effect of Downstream Classification Tasks for Evaluating Sentence Embeddings
Authors Peter Potash
Abstract One popular method for quantitatively evaluating the utility of sentence embeddings involves using them in downstream language processing tasks that require sentence representations as input. One simple such task is classification, where the sentence representations are used to train and test models on several classification datasets. We argue that by evaluating sentence representations in such a manner, the goal of the representations becomes learning a low-dimensional factorization of a sentence-task label matrix. We show how characteristics of this matrix can affect the ability for a low-dimensional factorization to perform as sentence representations in a suite of classification tasks. Primarily, sentences that have more labels across all possible classification tasks have a higher reconstruction loss, however the general nature of this effect is ultimately dependent on the overall distribution of labels across all possible sentences.
Tasks Sentence Embeddings
Published 2019-04-03
URL https://arxiv.org/abs/1904.02228v2
PDF https://arxiv.org/pdf/1904.02228v2.pdf
PWC https://paperswithcode.com/paper/the-effect-of-downstream-classification-tasks
Repo
Framework

Decentralized Markov Chain Gradient Descent

Title Decentralized Markov Chain Gradient Descent
Authors Tao Sun, Tianyi Chen, Yuejiao Sun, Qing Liao, Dongsheng Li
Abstract Decentralized stochastic gradient method emerges as a promising solution for solving large-scale machine learning problems. This paper studies the decentralized Markov chain gradient descent (DMGD) algorithm - a variant of the decentralized stochastic gradient methods where the random samples are taken along the trajectory of a Markov chain. This setting is well-motivated when obtaining independent samples is costly or impossible, which excludes the use of the traditional stochastic gradient algorithms. Specifically, we consider the first- and zeroth-order versions of decentralized Markov chain gradient descent over a connected network, where each node only communicates with its neighbors about intermediate results. The nonergodic convergence and the ergodic convergence rate of the proposed algorithms have been rigorously established, and their critical dependences on the network topology and the mixing time of Markov chain have been highlighted. The numerical tests further validate the sample efficiency of our algorithm.
Tasks
Published 2019-09-23
URL https://arxiv.org/abs/1909.10238v1
PDF https://arxiv.org/pdf/1909.10238v1.pdf
PWC https://paperswithcode.com/paper/decentralized-markov-chain-gradient-descent
Repo
Framework

Handwritten Chinese Font Generation with Collaborative Stroke Refinement

Title Handwritten Chinese Font Generation with Collaborative Stroke Refinement
Authors Chuan Wen, Jie Chang, Ya Zhang, Siheng Chen, Yanfeng Wang, Mei Han, Qi Tian
Abstract Automatic character generation is an appealing solution for new typeface design, especially for Chinese typefaces including over 3700 most commonly-used characters. This task has two main pain points: (i) handwritten characters are usually associated with thin strokes of few information and complex structure which are error prone during deformation; (ii) thousands of characters with various shapes are needed to synthesize based on a few manually designed characters. To solve those issues, we propose a novel convolutional-neural-network-based model with three main techniques: collaborative stroke refinement, using collaborative training strategy to recover the missing or broken strokes; online zoom-augmentation, taking the advantage of the content-reuse phenomenon to reduce the size of training set; and adaptive pre-deformation, standardizing and aligning the characters. The proposed model needs only 750 paired training samples; no pre-trained network, extra dataset resource or labels is needed. Experimental results show that the proposed method significantly outperforms the state-of-the-art methods under the practical restriction on handwritten font synthesis.
Tasks
Published 2019-04-30
URL https://arxiv.org/abs/1904.13268v3
PDF https://arxiv.org/pdf/1904.13268v3.pdf
PWC https://paperswithcode.com/paper/handwritten-chinese-font-generation-with
Repo
Framework

C. H. Robinson Uses Heuristics to Solve Rich Vehicle Routing Problems

Title C. H. Robinson Uses Heuristics to Solve Rich Vehicle Routing Problems
Authors Ehsan Khodabandeh, Lawrence V. Snyder, John Dennis, Joshua Hammond, Cody Wanless
Abstract We consider a wide family of vehicle routing problem variants with many complex and practical constraints, known as rich vehicle routing problems, which are faced on a daily basis by C.H. Robinson (CHR). Since CHR has many customers, each with distinct requirements, various routing problems with different objectives and constraints should be solved. We propose a set partitioning framework with a number of route generation algorithms, which have shown to be effective in solving a variety of different problems. The proposed algorithms have outperformed the existing technologies at CHR on 10 benchmark instances and since, have been embedded into the company’s transportation planning and execution technology platform.
Tasks
Published 2019-12-31
URL https://arxiv.org/abs/1912.13157v1
PDF https://arxiv.org/pdf/1912.13157v1.pdf
PWC https://paperswithcode.com/paper/c-h-robinson-uses-heuristics-to-solve-rich
Repo
Framework

Toward Understanding The Effect Of Loss function On Then Performance Of Knowledge Graph Embedding

Title Toward Understanding The Effect Of Loss function On Then Performance Of Knowledge Graph Embedding
Authors Mojtaba Nayyeri, Chengjin Xu, Yadollah Yaghoobzadeh, Hamed Shariat Yazdi, Jens Lehmann
Abstract Knowledge graphs (KGs) represent world’s facts in structured forms. KG completion exploits the existing facts in a KG to discover new ones. Translation-based embedding model (TransE) is a prominent formulation to do KG completion. Despite the efficiency of TransE in memory and time, it suffers from several limitations in encoding relation patterns such as symmetric, reflexive etc. To resolve this problem, most of the attempts have circled around the revision of the score function of TransE i.e., proposing a more complicated score function such as Trans(A, D, G, H, R, etc) to mitigate the limitations. In this paper, we tackle this problem from a different perspective. We show that existing theories corresponding to the limitations of TransE are inaccurate because they ignore the effect of loss function. Accordingly, we pose theoretical investigations of the main limitations of TransE in the light of loss function. To the best of our knowledge, this has not been investigated so far comprehensively. We show that by a proper selection of the loss function for training the TransE model, the main limitations of the model are mitigated. This is explained by setting upper-bound for the scores of positive samples, showing the region of truth (i.e., the region that a triple is considered positive by the model). Our theoretical proofs with experimental results fill the gap between the capability of translation-based class of embedding models and the loss function. The theories emphasise the importance of the selection of the loss functions for training the models. Our experimental evaluations on different loss functions used for training the models justify our theoretical proofs and confirm the importance of the loss functions on the performance.
Tasks Graph Embedding, Knowledge Graph Completion, Knowledge Graph Embedding, Knowledge Graphs
Published 2019-09-02
URL https://arxiv.org/abs/1909.00519v2
PDF https://arxiv.org/pdf/1909.00519v2.pdf
PWC https://paperswithcode.com/paper/on-the-knowledge-graph-completion-using
Repo
Framework

Composing Knowledge Graph Embeddings via Word Embeddings

Title Composing Knowledge Graph Embeddings via Word Embeddings
Authors Lianbo Ma, Peng Sun, Zhiwei Lin, Hui Wang
Abstract Learning knowledge graph embedding from an existing knowledge graph is very important to knowledge graph completion. For a fact $(h,r,t)$ with the head entity $h$ having a relation $r$ with the tail entity $t$, the current approaches aim to learn low dimensional representations $(\mathbf{h},\mathbf{r},\mathbf{t})$, each of which corresponds to the elements in $(h, r, t)$, respectively. As $(\mathbf{h},\mathbf{r},\mathbf{t})$ is learned from the existing facts within a knowledge graph, these representations can not be used to detect unknown facts (if the entities or relations never occur in the knowledge graph). This paper proposes a new approach called TransW, aiming to go beyond the current work by composing knowledge graph embeddings using word embeddings. Given the fact that an entity or a relation contains one or more words (quite often), it is sensible to learn a mapping function from word embedding spaces to knowledge embedding spaces, which shows how entities are constructed using human words. More importantly, composing knowledge embeddings using word embeddings makes it possible to deal with the emerging new facts (either new entities or relations). Experimental results using three public datasets show the consistency and outperformance of the proposed TransW.
Tasks Graph Embedding, Knowledge Graph Completion, Knowledge Graph Embedding, Knowledge Graph Embeddings, Word Embeddings
Published 2019-09-09
URL https://arxiv.org/abs/1909.03794v1
PDF https://arxiv.org/pdf/1909.03794v1.pdf
PWC https://paperswithcode.com/paper/composing-knowledge-graph-embeddings-via-word
Repo
Framework
comments powered by Disqus