January 27, 2020

3292 words 16 mins read

Paper Group ANR 1243

Paper Group ANR 1243

Slim LSTM networks: LSTM_6 and LSTM_C6. Sound Event Detection in Multichannel Audio using Convolutional Time-Frequency-Channel Squeeze and Excitation. Neural ODEs with stochastic vector field mixtures. Estimation of Body Mass Index from Photographs using Deep Convolutional Neural Networks. Framework for Inferring Following Strategies from Time Seri …

Slim LSTM networks: LSTM_6 and LSTM_C6

Title Slim LSTM networks: LSTM_6 and LSTM_C6
Authors Atra Akandeh, Fathi M. Salem
Abstract We have shown previously that our parameter-reduced variants of Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNN) are comparable in performance to the standard LSTM RNN on the MNIST dataset. In this study, we show that this is also the case for two diverse benchmark datasets, namely, the review sentiment IMDB and the 20 Newsgroup datasets. Specifically, we focus on two of the simplest variants, namely LSTM_6 (i.e., standard LSTM with three constant fixed gates) and LSTM_C6 (i.e., LSTM_6 with further reduced cell body input block). We demonstrate that these two aggressively reduced-parameter variants are competitive with the standard LSTM when hyper-parameters, e.g., learning parameter, number of hidden units and gate constants are set properly. These architectures enable speeding up training computations and hence, these networks would be more suitable for online training and inference onto portable devices with relatively limited computational resources.
Tasks
Published 2019-01-18
URL http://arxiv.org/abs/1901.06401v1
PDF http://arxiv.org/pdf/1901.06401v1.pdf
PWC https://paperswithcode.com/paper/slim-lstm-networks-lstm_6-and-lstm_c6
Repo
Framework

Sound Event Detection in Multichannel Audio using Convolutional Time-Frequency-Channel Squeeze and Excitation

Title Sound Event Detection in Multichannel Audio using Convolutional Time-Frequency-Channel Squeeze and Excitation
Authors Wei Xia, Kazuhito Koishida
Abstract In this study, we introduce a convolutional time-frequency-channel “Squeeze and Excitation” (tfc-SE) module to explicitly model inter-dependencies between the time-frequency domain and multiple channels. The tfc-SE module consists of two parts: tf-SE block and c-SE block which are designed to provide attention on time-frequency and channel domain, respectively, for adaptively recalibrating the input feature map. The proposed tfc-SE module, together with a popular Convolutional Recurrent Neural Network (CRNN) model, are evaluated on a multi-channel sound event detection task with overlapping audio sources: the training and test data are synthesized TUT Sound Events 2018 datasets, recorded with microphone arrays. We show that the tfc-SE module can be incorporated into the CRNN model at a small additional computational cost and bring significant improvements on sound event detection accuracy. We also perform detailed ablation studies by analyzing various factors that may influence the performance of the SE blocks. We show that with the best tfc-SE block, error rate (ER) decreases from 0.2538 to 0.2026, relative 20.17% reduction of ER, and 5.72% improvement of F1 score. The results indicate that the learned acoustic embeddings with the tfc-SE module efficiently strengthen time-frequency and channel-wise feature representations to improve the discriminative performance.
Tasks Sound Event Detection
Published 2019-08-04
URL https://arxiv.org/abs/1908.01399v1
PDF https://arxiv.org/pdf/1908.01399v1.pdf
PWC https://paperswithcode.com/paper/sound-event-detection-in-multichannel-audio-1
Repo
Framework

Neural ODEs with stochastic vector field mixtures

Title Neural ODEs with stochastic vector field mixtures
Authors Niall Twomey, Michał Kozłowski, Raúl Santos-Rodríguez
Abstract It was recently shown that neural ordinary differential equation models cannot solve fundamental and seemingly straightforward tasks even with high-capacity vector field representations. This paper introduces two other fundamental tasks to the set that baseline methods cannot solve, and proposes mixtures of stochastic vector fields as a model class that is capable of solving these essential problems. Dynamic vector field selection is of critical importance for our model, and our approach is to propagate component uncertainty over the integration interval with a technique based on forward filtering. We also formalise several loss functions that encourage desirable properties on the trajectory paths, and of particular interest are those that directly encourage fewer expected function evaluations. Experimentally, we demonstrate that our model class is capable of capturing the natural dynamics of human behaviour; a notoriously volatile application area. Baseline approaches cannot adequately model this problem.
Tasks
Published 2019-05-23
URL https://arxiv.org/abs/1905.09905v1
PDF https://arxiv.org/pdf/1905.09905v1.pdf
PWC https://paperswithcode.com/paper/neural-odes-with-stochastic-vector-field
Repo
Framework

Estimation of Body Mass Index from Photographs using Deep Convolutional Neural Networks

Title Estimation of Body Mass Index from Photographs using Deep Convolutional Neural Networks
Authors Adam Pantanowitz, Emmanuel Cohen, Philippe Gradidge, Nigel Crowther, Vered Aharonson, Benjamin Rosman, David M Rubin
Abstract Obesity is an important concern in public health, and Body Mass Index is one of the useful (and proliferant) measures. We use Convolutional Neural Networks to determine Body Mass Index from photographs in a study with 161 participants. Low data, a common problem in medicine, is addressed by reducing the information in the photographs by generating silhouette images. Results present with high correlation when tested on unseen data.
Tasks
Published 2019-08-29
URL https://arxiv.org/abs/1908.11694v1
PDF https://arxiv.org/pdf/1908.11694v1.pdf
PWC https://paperswithcode.com/paper/estimation-of-body-mass-index-from
Repo
Framework

Framework for Inferring Following Strategies from Time Series of Movement Data

Title Framework for Inferring Following Strategies from Time Series of Movement Data
Authors Chainarong Amornbunchornvej, Tanya Berger-Wolf
Abstract How do groups of individuals achieve consensus in movement decisions? Do individuals follow their friends, the one predetermined leader, or whomever just happens to be nearby? To address these questions computationally, we formalize “Coordination Strategy Inference Problem”. In this setting, a group of multiple individuals moves in a coordinated manner towards a target path. Each individual uses a specific strategy to follow others (e.g. nearest neighbors, pre-defined leaders, preferred friends). Given a set of time series that includes coordinated movement and a set of candidate strategies as inputs, we provide the first methodology (to the best of our knowledge) to infer whether each individual uses local-agreement-system or dictatorship-like strategy to achieve movement coordination at the group level. We evaluate and demonstrate the performance of the proposed framework by predicting the direction of movement of an individual in a group in both simulated datasets as well as two real-world datasets: a school of fish and a troop of baboons. Moreover, since there is no prior methodology for inferring individual-level strategies, we compare our framework with the state-of-the-art approach for the task of classification of group-level-coordination models. The results show that our approach is highly accurate in inferring the correct strategy in simulated datasets even in complicated mixed strategy settings, which no existing method can infer. In the task of classification of group-level-coordination models, our framework performs better than the state-of-the-art approach in all datasets. Animal data experiments show that fish, as expected, follow their neighbors, while baboons have a preference to follow specific individuals. Our methodology generalizes to arbitrary time series data of real numbers, beyond movement data.
Tasks Time Series
Published 2019-11-04
URL https://arxiv.org/abs/1911.01366v2
PDF https://arxiv.org/pdf/1911.01366v2.pdf
PWC https://paperswithcode.com/paper/inferring-coordination-strategies-from-time
Repo
Framework

A Hierarchical Approach for Visual Storytelling Using Image Description

Title A Hierarchical Approach for Visual Storytelling Using Image Description
Authors Md Sultan Al Nahian, Tasmia Tasrin, Sagar Gandhi, Ryan Gaines, Brent Harrison
Abstract One of the primary challenges of visual storytelling is developing techniques that can maintain the context of the story over long event sequences to generate human-like stories. In this paper, we propose a hierarchical deep learning architecture based on encoder-decoder networks to address this problem. To better help our network maintain this context while also generating long and diverse sentences, we incorporate natural language image descriptions along with the images themselves to generate each story sentence. We evaluate our system on the Visual Storytelling (VIST) dataset and show that our method outperforms state-of-the-art techniques on a suite of different automatic evaluation metrics. The empirical results from this evaluation demonstrate the necessities of different components of our proposed architecture and shows the effectiveness of the architecture for visual storytelling.
Tasks Visual Storytelling
Published 2019-09-26
URL https://arxiv.org/abs/1909.12401v1
PDF https://arxiv.org/pdf/1909.12401v1.pdf
PWC https://paperswithcode.com/paper/a-hierarchical-approach-for-visual
Repo
Framework

Explicit Cross-lingual Pre-training for Unsupervised Machine Translation

Title Explicit Cross-lingual Pre-training for Unsupervised Machine Translation
Authors Shuo Ren, Yu Wu, Shujie Liu, Ming Zhou, Shuai Ma
Abstract Pre-training has proven to be effective in unsupervised machine translation due to its ability to model deep context information in cross-lingual scenarios. However, the cross-lingual information obtained from shared BPE spaces is inexplicit and limited. In this paper, we propose a novel cross-lingual pre-training method for unsupervised machine translation by incorporating explicit cross-lingual training signals. Specifically, we first calculate cross-lingual n-gram embeddings and infer an n-gram translation table from them. With those n-gram translation pairs, we propose a new pre-training model called Cross-lingual Masked Language Model (CMLM), which randomly chooses source n-grams in the input text stream and predicts their translation candidates at each time step. Experiments show that our method can incorporate beneficial cross-lingual information into pre-trained models. Taking pre-trained CMLM models as the encoder and decoder, we significantly improve the performance of unsupervised machine translation.
Tasks Language Modelling, Machine Translation, Unsupervised Machine Translation
Published 2019-08-31
URL https://arxiv.org/abs/1909.00180v1
PDF https://arxiv.org/pdf/1909.00180v1.pdf
PWC https://paperswithcode.com/paper/explicit-cross-lingual-pre-training-for
Repo
Framework
Title Investigating Robustness and Interpretability of Link Prediction via Adversarial Modifications
Authors Pouya Pezeshkpour, Yifan Tian, Sameer Singh
Abstract Representing entities and relations in an embedding space is a well-studied approach for machine learning on relational data. Existing approaches, however, primarily focus on improving accuracy and overlook other aspects such as robustness and interpretability. In this paper, we propose adversarial modifications for link prediction models: identifying the fact to add into or remove from the knowledge graph that changes the prediction for a target fact after the model is retrained. Using these single modifications of the graph, we identify the most influential fact for a predicted link and evaluate the sensitivity of the model to the addition of fake facts. We introduce an efficient approach to estimate the effect of such modifications by approximating the change in the embeddings when the knowledge graph changes. To avoid the combinatorial search over all possible facts, we train a network to decode embeddings to their corresponding graph components, allowing the use of gradient-based optimization to identify the adversarial modification. We use these techniques to evaluate the robustness of link prediction models (by measuring sensitivity to additional facts), study interpretability through the facts most responsible for predictions (by identifying the most influential neighbors), and detect incorrect facts in the knowledge base.
Tasks Link Prediction
Published 2019-05-02
URL http://arxiv.org/abs/1905.00563v1
PDF http://arxiv.org/pdf/1905.00563v1.pdf
PWC https://paperswithcode.com/paper/investigating-robustness-and-interpretability
Repo
Framework

Informative Visual Storytelling with Cross-modal Rules

Title Informative Visual Storytelling with Cross-modal Rules
Authors Jiacheng Li, Haizhou Shi, Siliang Tang, Fei Wu, Yueting Zhuang
Abstract Existing methods in the Visual Storytelling field often suffer from the problem of generating general descriptions, while the image contains a lot of meaningful contents remaining unnoticed. The failure of informative story generation can be concluded to the model’s incompetence of capturing enough meaningful concepts. The categories of these concepts include entities, attributes, actions, and events, which are in some cases crucial to grounded storytelling. To solve this problem, we propose a method to mine the cross-modal rules to help the model infer these informative concepts given certain visual input. We first build the multimodal transactions by concatenating the CNN activations and the word indices. Then we use the association rule mining algorithm to mine the cross-modal rules, which will be used for the concept inference. With the help of the cross-modal rules, the generated stories are more grounded and informative. Besides, our proposed method holds the advantages of interpretation, expandability, and transferability, indicating potential for wider application. Finally, we leverage these concepts in our encoder-decoder framework with the attention mechanism. We conduct several experiments on the VIsual StoryTelling~(VIST) dataset, the results of which demonstrate the effectiveness of our approach in terms of both automatic metrics and human evaluation. Additional experiments are also conducted showing that our mined cross-modal rules as additional knowledge helps the model gain better performance when trained on a small dataset.
Tasks Visual Storytelling
Published 2019-07-07
URL https://arxiv.org/abs/1907.03240v2
PDF https://arxiv.org/pdf/1907.03240v2.pdf
PWC https://paperswithcode.com/paper/informative-visual-storytelling-with-cross
Repo
Framework

Network Representation Learning: Consolidation and Renewed Bearing

Title Network Representation Learning: Consolidation and Renewed Bearing
Authors Saket Gurukar, Priyesh Vijayan, Aakash Srinivasan, Goonmeet Bajaj, Chen Cai, Moniba Keymanesh, Saravana Kumar, Pranav Maneriker, Anasua Mitra, Vedang Patel, Balaraman Ravindran, Srinivasan Parthasarathy
Abstract Graphs are a natural abstraction for many problems where nodes represent entities and edges represent a relationship across entities. An important area of research that has emerged over the last decade is the use of graphs as a vehicle for non-linear dimensionality reduction in a manner akin to previous efforts based on manifold learning with uses for downstream database processing, machine learning and visualization. In this systematic yet comprehensive experimental survey, we benchmark several popular network representation learning methods operating on two key tasks: link prediction and node classification. We examine the performance of 12 unsupervised embedding methods on 15 datasets. To the best of our knowledge, the scale of our study – both in terms of the number of methods and number of datasets – is the largest to date. Our results reveal several key insights about work-to-date in this space. First, we find that certain baseline methods (task-specific heuristics, as well as classic manifold methods) that have often been dismissed or are not considered by previous efforts can compete on certain types of datasets if they are tuned appropriately. Second, we find that recent methods based on matrix factorization offer a small but relatively consistent advantage over alternative methods (e.g., random-walk based methods) from a qualitative standpoint. Specifically, we find that MNMF, a community preserving embedding method, is the most competitive method for the link prediction task. While NetMF is the most competitive baseline for node classification. Third, no single method completely outperforms other embedding methods on both node classification and link prediction tasks. We also present several drill-down analysis that reveals settings under which certain algorithms perform well (e.g., the role of neighborhood context on performance) – guiding the end-user.
Tasks Dimensionality Reduction, Link Prediction, Node Classification, Representation Learning
Published 2019-05-02
URL https://arxiv.org/abs/1905.00987v2
PDF https://arxiv.org/pdf/1905.00987v2.pdf
PWC https://paperswithcode.com/paper/network-representation-learning-consolidation
Repo
Framework

Crime Rate Prediction with Region Risk and Movement Patterns

Title Crime Rate Prediction with Region Risk and Movement Patterns
Authors Shakila Khan Rumi, Phillip Luong, Flora D. Salim
Abstract The location-based social network, FourSquare, helps us to understand a city’s mass human mobility. It provides data that characterises the volume of movements across regions and Places of Interests(POIs) to explore the crime dynamics of a city. To fully exploit human movement into crime analysis, we propose the region risk factor which combines monthly aggregated crime and human movement of a region across different time intervals. We then derive a number of features using the region risk factor and conduct extensive experiments with real world data in multiple cities that verify the effectiveness of these features.
Tasks
Published 2019-07-25
URL https://arxiv.org/abs/1908.02570v1
PDF https://arxiv.org/pdf/1908.02570v1.pdf
PWC https://paperswithcode.com/paper/crime-rate-prediction-with-region-risk-and
Repo
Framework

Batch weight for domain adaptation with mass shift

Title Batch weight for domain adaptation with mass shift
Authors Mikołaj Bińkowski, R Devon Hjelm, Aaron Courville
Abstract Unsupervised domain transfer is the task of transferring or translating samples from a source distribution to a different target distribution. Current solutions unsupervised domain transfer often operate on data on which the modes of the distribution are well-matched, for instance have the same frequencies of classes between source and target distributions. However, these models do not perform well when the modes are not well-matched, as would be the case when samples are drawn independently from two different, but related, domains. This mode imbalance is problematic as generative adversarial networks (GANs), a successful approach in this setting, are sensitive to mode frequency, which results in a mismatch of semantics between source samples and generated samples of the target distribution. We propose a principled method of re-weighting training samples to correct for such mass shift between the transferred distributions, which we call batch-weight. We also provide rigorous probabilistic setting for domain transfer and new simplified objective for training transfer networks, an alternative to complex, multi-component loss functions used in the current state-of-the art image-to-image translation models. The new objective stems from the discrimination of joint distributions and enforces cycle-consistency in an abstract, high-level, rather than pixel-wise, sense. Lastly, we experimentally show the effectiveness of the proposed methods in several image-to-image translation tasks.
Tasks Domain Adaptation, Image-to-Image Translation
Published 2019-05-29
URL https://arxiv.org/abs/1905.12760v1
PDF https://arxiv.org/pdf/1905.12760v1.pdf
PWC https://paperswithcode.com/paper/batch-weight-for-domain-adaptation-with-mass
Repo
Framework

Inexact Online Proximal-gradient Method for Time-varying Convex Optimization

Title Inexact Online Proximal-gradient Method for Time-varying Convex Optimization
Authors Amirhossein Ajalloeian, Andrea Simonetto, Emiliano Dall’Anese
Abstract This paper considers an online proximal-gradient method to track the minimizers of a composite convex function that may continuously evolve over time. The online proximal-gradient method is inexact, in the sense that: (i) it relies on an approximate first-order information of the smooth component of the cost; and, (ii) the proximal operator (with respect to the non-smooth term) may be computed only up to a certain precision. Under suitable assumptions, convergence of the error iterates is established for strongly convex cost functions. On the other hand, the dynamic regret is investigated when the cost is not strongly convex, under the additional assumption that the problem includes feasibility sets that are compact. Bounds are expressed in terms of the cumulative error and the path length of the optimal solutions. This suggests how to allocate resources to strike a balance between performance and precision in the gradient computation and in the proximal operator.
Tasks
Published 2019-10-04
URL https://arxiv.org/abs/1910.02018v3
PDF https://arxiv.org/pdf/1910.02018v3.pdf
PWC https://paperswithcode.com/paper/inexact-online-proximal-gradient-method-for
Repo
Framework

SIMCO: SIMilarity-based object COunting

Title SIMCO: SIMilarity-based object COunting
Authors Marco Godi, Christian Joppi, Andrea Giachetti, Marco Cristani
Abstract We present SIMCO, the first agnostic multi-class object counting approach. SIMCO starts by detecting foreground objects through a novel Mask RCNN-based architecture trained beforehand (just once) on a brand-new synthetic 2D shape dataset, InShape; the idea is to highlight every object resembling a primitive 2D shape (circle, square, rectangle, etc.). Each object detected is described by a low-dimensional embedding, obtained from a novel similarity-based head branch; this latter implements a triplet loss, encouraging similar objects (same 2D shape + color and scale) to map close. Subsequently, SIMCO uses this embedding for clustering, so that different types of objects can emerge and be counted, making SIMCO the very first multi-class unsupervised counter. Experiments show that SIMCO provides state-of-the-art scores on counting benchmarks and that it can also help in many challenging image understanding tasks.
Tasks Object Counting
Published 2019-04-15
URL http://arxiv.org/abs/1904.07092v1
PDF http://arxiv.org/pdf/1904.07092v1.pdf
PWC https://paperswithcode.com/paper/simco-similarity-based-object-counting
Repo
Framework

Towards Locally Consistent Object Counting with Constrained Multi-stage Convolutional Neural Networks

Title Towards Locally Consistent Object Counting with Constrained Multi-stage Convolutional Neural Networks
Authors Muming Zhao, Jian Zhang, Chongyang Zhang, Wenjun Zhang
Abstract High-density object counting in surveillance scenes is challenging mainly due to the drastic variation of object scales. The prevalence of deep learning has largely boosted the object counting accuracy on several benchmark datasets. However, does the global counts really count? Armed with this question we dive into the predicted density map whose summation over the whole regions reports the global counts for more in-depth analysis. We observe that the object density map generated by most existing methods usually lacks of local consistency, i.e., counting errors in local regions exist unexpectedly even though the global count seems to well match with the ground-truth. Towards this problem, in this paper we propose a constrained multi-stage Convolutional Neural Networks (CNNs) to jointly pursue locally consistent density map from two aspects. Different from most existing methods that mainly rely on the multi-column architectures of plain CNNs, we exploit a stacking formulation of plain CNNs. Benefited from the internal multi-stage learning process, the feature map could be repeatedly refined, allowing the density map to approach the ground-truth density distribution. For further refinement of the density map, we also propose a grid loss function. With finer local-region-based supervisions, the underlying model is constrained to generate locally consistent density values to minimize the training errors considering both the global and local counts accuracy. Experiments on two widely-tested object counting benchmarks with overall significant results compared with state-of-the-art methods demonstrate the effectiveness of our approach.
Tasks Object Counting
Published 2019-04-06
URL http://arxiv.org/abs/1904.03373v1
PDF http://arxiv.org/pdf/1904.03373v1.pdf
PWC https://paperswithcode.com/paper/towards-locally-consistent-object-counting
Repo
Framework
comments powered by Disqus