January 28, 2020

3253 words 16 mins read

Paper Group ANR 911

On the Use/Misuse of the Term ‘Phoneme’. COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis. Nonlinear Discovery of Slow Molecular Modes using State-Free Reversible VAMPnets. Evolutionary reinforcement learning of dynamical large deviations. Visual Agreement Regularized Training for Multi-Modal Machine Translation. Simple 1- …

On the Use/Misuse of the Term ‘Phoneme’


Title	On the Use/Misuse of the Term ‘Phoneme’
Authors	Roger K. Moore, Lucy Skidmore
Abstract	The term ‘phoneme’ lies at the heart of speech science and technology, and yet it is not clear that the research community fully appreciates its meaning and implications. In particular, it is suspected that many researchers use the term in a casual sense to refer to the sounds of speech, rather than as a well defined abstract concept. If true, this means that some sections of the community may be missing an opportunity to understand and exploit the implications of this important psychological phenomenon. Here we review the correct meaning of the term ‘phoneme’ and report the results of an investigation into its use/misuse in the accepted papers at INTERSPEECH-2018. It is confirmed that a significant proportion of the community (i) may not be aware of the critical difference between `phonetic’ and ‘phonemic’ levels of description, (ii) may not fully understand the significance of ‘phonemic contrast’, and as a consequence, (iii) consistently misuse the term ‘phoneme’. These findings are discussed, and recommendations are made as to how this situation might be mitigated. \|
Tasks
Published	2019-07-26
URL	https://arxiv.org/abs/1907.11640v1
PDF	https://arxiv.org/pdf/1907.11640v1.pdf
PWC	https://paperswithcode.com/paper/on-the-usemisuse-of-the-term-phoneme
Repo
Framework

COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis


Title	COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis
Authors	Yansong Tang, Dajun Ding, Yongming Rao, Yu Zheng, Danyang Zhang, Lili Zhao, Jiwen Lu, Jie Zhou
Abstract	There are substantial instructional videos on the Internet, which enables us to acquire knowledge for completing various tasks. However, most existing datasets for instructional video analysis have the limitations in diversity and scale,which makes them far from many real-world applications where more diverse activities occur. Moreover, it still remains a great challenge to organize and harness such data. To address these problems, we introduce a large-scale dataset called “COIN” for COmprehensive INstructional video analysis. Organized with a hierarchical structure, the COIN dataset contains 11,827 videos of 180 tasks in 12 domains (e.g., vehicles, gadgets, etc.) related to our daily life. With a new developed toolbox, all the videos are annotated effectively with a series of step descriptions and the corresponding temporal boundaries. Furthermore, we propose a simple yet effective method to capture the dependencies among different steps, which can be easily plugged into conventional proposal-based action detection methods for localizing important steps in instructional videos. In order to provide a benchmark for instructional video analysis, we evaluate plenty of approaches on the COIN dataset under different evaluation criteria. We expect the introduction of the COIN dataset will promote the future in-depth research on instructional video analysis for the community.
Tasks	Action Detection
Published	2019-03-07
URL	http://arxiv.org/abs/1903.02874v1
PDF	http://arxiv.org/pdf/1903.02874v1.pdf
PWC	https://paperswithcode.com/paper/coin-a-large-scale-dataset-for-comprehensive
Repo
Framework

Nonlinear Discovery of Slow Molecular Modes using State-Free Reversible VAMPnets


Title	Nonlinear Discovery of Slow Molecular Modes using State-Free Reversible VAMPnets
Authors	Wei Chen, Hythem Sidky, Andrew L Ferguson
Abstract	The success of enhanced sampling molecular simulations that accelerate along collective variables (CVs) is predicated on the availability of variables coincident with the slow collective motions governing the long-time conformational dynamics of a system. It is challenging to intuit these slow CVs for all but the simplest molecular systems, and their data-driven discovery directly from molecular simulation trajectories has been a central focus of the molecular simulation community to both unveil the important physical mechanisms and to drive enhanced sampling. In this work, we introduce state-free reversible VAMPnets (SRV) as a deep learning architecture that learns nonlinear CV approximants to the leading slow eigenfunctions of the spectral decomposition of the transfer operator that evolves equilibrium-scaled probability distributions through time. Orthogonality of the learned CVs is naturally imposed within network training without added regularization. The CVs are inherently explicit and differentiable functions of the input coordinates making them well-suited to use in enhanced sampling calculations. We demonstrate the utility of SRVs in capturing parsimonious nonlinear representations of complex system dynamics in applications to 1D and 2D toy systems where the true eigenfunctions are exactly calculable and to molecular dynamics simulations of alanine dipeptide and the WW domain protein.
Tasks
Published	2019-02-09
URL	https://arxiv.org/abs/1902.03336v2
PDF	https://arxiv.org/pdf/1902.03336v2.pdf
PWC	https://paperswithcode.com/paper/nonlinear-discovery-of-slow-molecular-modes
Repo
Framework

Evolutionary reinforcement learning of dynamical large deviations


Title	Evolutionary reinforcement learning of dynamical large deviations
Authors	Stephen Whitelam, Daniel Jacobson, Isaac Tamblyn
Abstract	We show how to calculate the likelihood of dynamical large deviations using evolutionary reinforcement learning. An agent, a stochastic model, propagates a continuous-time Monte Carlo trajectory and receives a reward conditioned upon the values of certain path-extensive quantities. Evolution produces progressively fitter agents, eventually allowing the calculation of a piece of a large-deviation rate function for a particular model and path-extensive quantity. For models with small state spaces the evolutionary process acts directly on rates, and for models with large state spaces the process acts on the weights of a neural network that parameterizes the model’s rates. This approach shows how path-extensive physics problems can be considered within a framework widely used in machine learning.
Tasks
Published	2019-09-02
URL	https://arxiv.org/abs/1909.00835v4
PDF	https://arxiv.org/pdf/1909.00835v4.pdf
PWC	https://paperswithcode.com/paper/evolutionary-reinforcement-learning-of
Repo
Framework


Title	Visual Agreement Regularized Training for Multi-Modal Machine Translation
Authors	Pengcheng Yang, Boxing Chen, Pei Zhang, Xu Sun
Abstract	Multi-modal machine translation aims at translating the source sentence into a different language in the presence of the paired image. Previous work suggests that additional visual information only provides dispensable help to translation, which is needed in several very special cases such as translating ambiguous words. To make better use of visual information, this work presents visual agreement regularized training. The proposed approach jointly trains the source-to-target and target-to-source translation models and encourages them to share the same focus on the visual information when generating semantically equivalent visual words (e.g. “ball” in English and “ballon” in French). Besides, a simple yet effective multi-head co-attention model is also introduced to capture interactions between visual and textual features. The results show that our approaches can outperform competitive baselines by a large margin on the Multi30k dataset. Further analysis demonstrates that the proposed regularized training can effectively improve the agreement of attention on the image, leading to better use of visual information.
Tasks	Machine Translation
Published	2019-12-27
URL	https://arxiv.org/abs/1912.12014v1
PDF	https://arxiv.org/pdf/1912.12014v1.pdf
PWC	https://paperswithcode.com/paper/visual-agreement-regularized-training-for
Repo
Framework

Simple 1-D Convolutional Networks for Resting-State fMRI Based Classification in Autism


Title	Simple 1-D Convolutional Networks for Resting-State fMRI Based Classification in Autism
Authors	Ahmed El Gazzar, Leonardo Cerliani, Guido van Wingen, Rajat Mani Thomas
Abstract	Deep learning methods are increasingly being used with neuroimaging data like structural and function magnetic resonance imaging (MRI) to predict the diagnosis of neuropsychiatric and neurological disorders. For psychiatric disorders in particular, it is believed that one of the most promising modality is the resting-state functional MRI (rsfMRI), which captures the intrinsic connectivity between regions in the brain. Because rsfMRI data points are inherently high-dimensional (~1M), it is impossible to process the entire input in its raw form. In this paper, we propose a very simple transformation of the rsfMRI images that captures all of the temporal dynamics of the signal but sub-samples its spatial extent. As a result, we use a very simple 1-D convolutional network which is fast to train, requires minimal preprocessing and performs at par with the state-of-the-art on the classification of Autism spectrum disorders.
Tasks
Published	2019-07-02
URL	https://arxiv.org/abs/1907.01288v1
PDF	https://arxiv.org/pdf/1907.01288v1.pdf
PWC	https://paperswithcode.com/paper/simple-1-d-convolutional-networks-for-resting
Repo
Framework

Group Re-Identification with Multi-grained Matching and Integration


Title	Group Re-Identification with Multi-grained Matching and Integration
Authors	Weiyao Lin, Yuxi Li, Hao Xiao, John See, Junni Zou, Hongkai Xiong, Jingdong Wang, Tao Mei
Abstract	The task of re-identifying groups of people underdifferent camera views is an important yet less-studied problem.Group re-identification (Re-ID) is a very challenging task sinceit is not only adversely affected by common issues in traditionalsingle object Re-ID problems such as viewpoint and human posevariations, but it also suffers from changes in group layout andgroup membership. In this paper, we propose a novel conceptof group granularity by characterizing a group image by multi-grained objects: individual persons and sub-groups of two andthree people within a group. To achieve robust group Re-ID,we first introduce multi-grained representations which can beextracted via the development of two separate schemes, i.e. onewith hand-crafted descriptors and another with deep neuralnetworks. The proposed representation seeks to characterize bothappearance and spatial relations of multi-grained objects, and isfurther equipped with importance weights which capture varia-tions in intra-group dynamics. Optimal group-wise matching isfacilitated by a multi-order matching process which in turn,dynamically updates the importance weights in iterative fashion.We evaluated on three multi-camera group datasets containingcomplex scenarios and large dynamics, with experimental resultsdemonstrating the effectiveness of our approach. The published dataset can be found in \url{http://min.sjtu.edu.cn/lwydemo/GroupReID.html}
Tasks
Published	2019-05-17
URL	https://arxiv.org/abs/1905.07108v2
PDF	https://arxiv.org/pdf/1905.07108v2.pdf
PWC	https://paperswithcode.com/paper/group-re-identification-with-multi-grained
Repo
Framework

Signal-to-Noise Ratio: A Robust Distance Metric for Deep Metric Learning


Title	Signal-to-Noise Ratio: A Robust Distance Metric for Deep Metric Learning
Authors	Tongtong Yuan, Weihong Deng, Jian Tang, Yinan Tang, Binghui Chen
Abstract	Deep metric learning, which learns discriminative features to process image clustering and retrieval tasks, has attracted extensive attention in recent years. A number of deep metric learning methods, which ensure that similar examples are mapped close to each other and dissimilar examples are mapped farther apart, have been proposed to construct effective structures for loss functions and have shown promising results. In this paper, different from the approaches on learning the loss structures, we propose a robust SNR distance metric based on Signal-to-Noise Ratio (SNR) for measuring the similarity of image pairs for deep metric learning. By exploring the properties of our SNR distance metric from the view of geometry space and statistical theory, we analyze the properties of our metric and show that it can preserve the semantic similarity between image pairs, which well justify its suitability for deep metric learning. Compared with Euclidean distance metric, our SNR distance metric can further jointly reduce the intra-class distances and enlarge the inter-class distances for learned features. Leveraging our SNR distance metric, we propose Deep SNR-based Metric Learning (DSML) to generate discriminative feature embeddings. By extensive experiments on three widely adopted benchmarks, including CARS196, CUB200-2011 and CIFAR10, our DSML has shown its superiority over other state-of-the-art methods. Additionally, we extend our SNR distance metric to deep hashing learning, and conduct experiments on two benchmarks, including CIFAR10 and NUS-WIDE, to demonstrate the effectiveness and generality of our SNR distance metric.
Tasks	Image Clustering, Metric Learning, Semantic Similarity, Semantic Textual Similarity
Published	2019-04-04
URL	http://arxiv.org/abs/1904.02616v1
PDF	http://arxiv.org/pdf/1904.02616v1.pdf
PWC	https://paperswithcode.com/paper/signal-to-noise-ratio-a-robust-distance
Repo
Framework

Challenges and Pitfalls of Machine Learning Evaluation and Benchmarking


Title	Challenges and Pitfalls of Machine Learning Evaluation and Benchmarking
Authors	Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-mei Hwu
Abstract	An increasingly complex and diverse collection of Machine Learning (ML) models as well as hardware/software stacks, collectively referred to as “ML artifacts”, are being proposed - leading to a diverse landscape of ML. These ML innovations proposed have outpaced researchers’ ability to analyze, study and adapt them. This is exacerbated by the complicated and sometimes non-reproducible procedures for ML evaluation. A common practice of sharing ML artifacts is through repositories where artifact authors post ad-hoc code and some documentation, but often fail to reveal critical information for others to reproduce their results. This results in users’ inability to compare with artifact authors’ claims or adapt the model to his/her own use. This paper discusses common challenges and pitfalls of ML evaluation and benchmarking, which can be used as a guideline for ML model authors when sharing ML artifacts, and for system developers when benchmarking or designing ML systems.
Tasks
Published	2019-04-29
URL	https://arxiv.org/abs/1904.12437v2
PDF	https://arxiv.org/pdf/1904.12437v2.pdf
PWC	https://paperswithcode.com/paper/challenges-and-pitfalls-of-reproducing
Repo
Framework

Characterizing the Decision Boundary of Deep Neural Networks


Title	Characterizing the Decision Boundary of Deep Neural Networks
Authors	Hamid Karimi, Tyler Derr, Jiliang Tang
Abstract	Deep neural networks and in particular, deep neural classifiers have become an integral part of many modern applications. Despite their practical success, we still have limited knowledge of how they work and the demand for such an understanding is evergrowing. In this regard, one crucial aspect of deep neural network classifiers that can help us deepen our knowledge about their decision-making behavior is to investigate their decision boundaries. Nevertheless, this is contingent upon having access to samples populating the areas near the decision boundary. To achieve this, we propose a novel approach we call Deep Decision boundary Instance Generation (DeepDIG). DeepDIG utilizes a method based on adversarial example generation as an effective way of generating samples near the decision boundary of any deep neural network model. Then, we introduce a set of important principled characteristics that take advantage of the generated instances near the decision boundary to provide multifaceted understandings of deep neural networks. We have performed extensive experiments on multiple representative datasets across various deep neural network models and characterized their decision boundaries.
Tasks	Decision Making
Published	2019-12-24
URL	https://arxiv.org/abs/1912.11460v2
PDF	https://arxiv.org/pdf/1912.11460v2.pdf
PWC	https://paperswithcode.com/paper/characterizing-the-decision-boundary-of-deep
Repo
Framework

An artifcial life approach to studying niche differentiation in soundscape ecology


Title	An artifcial life approach to studying niche differentiation in soundscape ecology
Authors	David Kadish, Sebastian Risi, Laura Beloff
Abstract	Artificial life simulations are an important tool in the study of ecological phenomena that can be difficult to examine directly in natural environments. Recent work has established the soundscape as an ecologically important resource and it has been proposed that the differentiation of animal vocalizations within a soundscape is driven by the imperative of intraspecies communication. The experiments in this paper test that hypothesis in a simulated soundscape in order to verify the feasibility of intraspecies communication as a driver of acoustic niche differentiation. The impact of intraspecies communication is found to be a significant factor in the division of a soundscape’s frequency spectrum when compared to simulations where the need to identify signals from conspecifics does not drive the evolution of signalling. The method of simulating the effects of interspecies interactions on the soundscape is positioned as a tool for developing artificial life agents that can inhabit and interact with physical ecosystems and soundscapes.
Tasks	Artificial Life
Published	2019-07-30
URL	https://arxiv.org/abs/1907.12812v1
PDF	https://arxiv.org/pdf/1907.12812v1.pdf
PWC	https://paperswithcode.com/paper/an-artifcial-life-approach-to-studying-niche
Repo
Framework

Innovating HR Using an Expert System for Recruiting IT Specialists – ESRIT


Title	Innovating HR Using an Expert System for Recruiting IT Specialists – ESRIT
Authors	Ciprian-Octavian Truică, Adriana Barnoschi
Abstract	One of the most rapidly evolving and dynamic business sector is the IT domain, where there is a problem finding experienced, skilled and qualified employees. Specialists are essential for developing and implementing new ideas into products. Human resources (HR) department plays a major role in the recruitment of qualified employees by assessing their skills, using different HR metrics, and selecting the best candidates for a specific job. Most recruiters are not qualified to evaluate IT specialists. In order to decrease the gap between the HR department and IT specialists, we designed, implemented and tested an Expert System for Recruiting IT specialist - ESRIT. The expert system uses text mining, natural language processing, and classification algorithms to extract relevant information from resumes by using a knowledge base that stores the relevant key skills and phrases. The recruiter is looking for the same abilities and certificates, trying to place the best applicant into a specific position. The article presents a developing picture of the top major IT skills that will be required in 2014 and it argues for the choice of the IT abilities domain.
Tasks
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04915v1
PDF	https://arxiv.org/pdf/1906.04915v1.pdf
PWC	https://paperswithcode.com/paper/innovating-hr-using-an-expert-system-for
Repo
Framework

Minimum Description Length Revisited


Title	Minimum Description Length Revisited
Authors	Peter Grünwald, Teemu Roos
Abstract	This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. While MDL was originally based on data compression ideas, this introduction can be read without any knowledge thereof. It takes into account all major developments since 2007, the last time an extensive overview was written. These include new methods for model selection and averaging and hypothesis testing, as well as the first completely general definition of {\em MDL estimators}. Incorporating these developments, MDL can be seen as a powerful extension of both penalized likelihood and Bayesian approaches, in which penalization functions and prior distributions are replaced by more general luckiness functions, average-case methodology is replaced by a more robust worst-case approach, and in which methods classically viewed as highly distinct, such as AIC vs BIC and cross-validation vs Bayes can, to a large extent, be viewed from a unified perspective.
Tasks	Model Selection
Published	2019-08-21
URL	https://arxiv.org/abs/1908.08484v2
PDF	https://arxiv.org/pdf/1908.08484v2.pdf
PWC	https://paperswithcode.com/paper/minimum-description-length-revisited
Repo
Framework

Neural Turtle Graphics for Modeling City Road Layouts


Title	Neural Turtle Graphics for Modeling City Road Layouts
Authors	Hang Chu, Daiqing Li, David Acuna, Amlan Kar, Maria Shugrina, Xinkai Wei, Ming-Yu Liu, Antonio Torralba, Sanja Fidler
Abstract	We propose Neural Turtle Graphics (NTG), a novel generative model for spatial graphs, and demonstrate its applications in modeling city road layouts. Specifically, we represent the road layout using a graph where nodes in the graph represent control points and edges in the graph represent road segments. NTG is a sequential generative model parameterized by a neural network. It iteratively generates a new node and an edge connecting to an existing node conditioned on the current graph. We train NTG on Open Street Map data and show that it outperforms existing approaches using a set of diverse performance metrics. Moreover, our method allows users to control styles of generated road layouts mimicking existing cities as well as to sketch parts of the city road layout to be synthesized. In addition to synthesis, the proposed NTG finds uses in an analytical task of aerial road parsing. Experimental results show that it achieves state-of-the-art performance on the SpaceNet dataset.
Tasks
Published	2019-10-04
URL	https://arxiv.org/abs/1910.02055v1
PDF	https://arxiv.org/pdf/1910.02055v1.pdf
PWC	https://paperswithcode.com/paper/neural-turtle-graphics-for-modeling-city-road
Repo
Framework

DocParser: Hierarchical Structure Parsing of Document Renderings


Title	DocParser: Hierarchical Structure Parsing of Document Renderings
Authors	Johannes Rausch, Octavio Martinez, Fabian Bissig, Ce Zhang, Stefan Feuerriegel
Abstract	Translating document renderings (e.g. PDFs, scans) into hierarchical structures is extensively demanded in the daily routines of many real-world applications, and is often a prerequisite step of many downstream NLP tasks. Earlier attempts focused on different but simpler tasks such as the detection of table or cell locations within documents; however, a holistic, principled approach to inferring the complete hierarchical structure in documents is missing. As a remedy, we developed “DocParser”: an end-to-end system for parsing the complete document structure - including all text elements, figures, tables, and table cell structures. To the best of our knowledge, DocParser is the first system that derives the full hierarchical document compositions. Given the complexity of the task, annotating appropriate datasets is costly. Therefore, our second contribution is to provide a dataset for evaluating hierarchical document structure parsing. Our third contribution is to propose a scalable learning framework for settings where domain-specific data is scarce, which we address by a novel approach to weak supervision. Our computational experiments confirm the effectiveness of our proposed weak supervision: Compared to the baseline without weak supervision, it improves the mean average precision for detecting document entities by 37.1%. When classifying hierarchical relations between entity pairs, it improves the F1 score by 27.6%.
Tasks
Published	2019-11-05
URL	https://arxiv.org/abs/1911.01702v1
PDF	https://arxiv.org/pdf/1911.01702v1.pdf
PWC	https://paperswithcode.com/paper/docparser-hierarchical-structure-parsing-of
Repo
Framework